# Building a Statically Type Forth-like DSL in Rust

## Dependencies


In [34]:
:dep static_assertions = "1.1"
:dep aligned-vec = "0.6"
// :dep smallbox = "0.8"

extern crate static_assertions;
extern crate aligned_vec;
// extern crate smallbox;

use aligned_vec::{ AVec, ConstAlign };

### [Experiment] Open question on AVec implementation


In [35]:
let v = AVec::<u8, ConstAlign<4096>>::new(128); // what is the alignment?

### [Experiment] Unused macros for multi-argument max.


In [36]:
/// Return a mutable reference to the maximum argument. If there are multiple maximum 
/// arguments, the last one is returned.
#[macro_export]
macro_rules! max_mut {
    ($x:expr) => (&mut $x);
    ($x:expr, $($rest:expr),+) => {
        {
            let max_rest = max_mut!($($rest),+);
            if *max_rest < $x {
                &mut $x
            } else {
                max_rest
            }
        }
    };
}

/// Return a reference to the maximum argument. If there are multiple maximum 
/// arguments, the last one is returned.
#[macro_export]
macro_rules! max {
    ($x:expr) => (&$x);
    ($x:expr, $($rest:expr),+) => {
        {
            let max_rest = max!($($rest),+);
            if *max_rest < $x {
                &$x
            } else {
                max_rest
            }
        }
    };
}

max!(1, 2, 3)

3

In [37]:

fn test() {
    let a = 10;
    let b = 1;
    let c = 10;

    let max_value = max!(a, b, c);
    assert_eq!(max_value, &c);
    println!("max!(1, 2, 3): {}", max!(1, 2, 3)); // This will print 3
    println!("The maximum value is: {}", max_value);

    let mut a = 10;
    let mut b = 1;
    let mut c = 10;

    *max_mut!(a, b, c) = 0;
    println!("c is now: {}", c); // This will print 0
}

test();


max!(1, 2, 3): 3
The maximum value is: 10
c is now: 0


### [Experiment] Unused const max_pos_usize and max_align() functions.


In [38]:
/// Returns the index of the maximum value in a slice of usize values. If there are multiple
/// maximum values, the last one is returned.
const fn max_pos_usize(arr: &[usize]) -> usize {
    let mut max_index: usize = 0;
    let mut i = 0;
    while i < arr.len() {
        if !(arr[i] < arr[max_index]) {
            max_index = i;
        }
        i += 1;
    }
    max_index
}

/// Returns the maximum alignment of the primitive types. May not be the
/// maximum possible alignment of all types.
const fn max_align() -> usize {
    *max!(align_of::<usize>(), align_of::<u128>(), align_of::<f64>())
}

max_align()

16

### [Experiment] Unused true\_!() macro to provide a message with static asserts.


In [39]:
#[macro_export]
macro_rules! true_ {
    ($_:expr) => {
        true
    };
}

true_!("Hello, world!")

true

### [Experiment] Static assert of a types alignment requirements are greater than `max_align()`


In [40]:


/// Static assert that the alignment of the type is less than or equal to the
/// maximum alignment of the primitive types.
#[macro_export]
macro_rules! assert_alignable {
    ($t:ty) => {
        const _: () = {
            static_assertions::const_assert!((std::mem::align_of::<$t>() <= max_align()) && true_!("Alignment of type is greater than maximum alignment of primitive types"));
        };
    };
}

assert_alignable!(String);

### [Experiment] Checking the alignment of an empty type.


In [41]:
std::mem::align_of_val(&||{})

1

### [Experiment] Checking how Vec::capacity, reserve, and set_len work and interact.


In [42]:
let mut a = std::vec::Vec::<u32>::new();
a.push(1);
println!("{}, {}", a.len(), a.capacity());
a.push(2);
a.push(3);
a.push(4);
println!("{}, {}", a.len(), a.capacity());
a.reserve(1);
unsafe { a.set_len(5); }
println!("{}, {}", a.len(), a.capacity());


1, 4
4, 4
5, 8


## RawStack

A stack that can old any type as raw bytes. The stack is type erased so to retrieve a value from the stack, the type must be known.


In [43]:
use std::mem;

pub struct RawStack {
    buffer: Vec<u8>,
}

impl RawStack {
    pub fn new() -> Self {
        RawStack {
            buffer: Vec::with_capacity(4),
        }
    }

    // Push a value onto the stack.
    pub fn push<T>(&mut self, value: T) {
        
        let len = self.buffer.len();
        self.buffer.reserve(size_of::<T>());
        unsafe {
            self.buffer.set_len(len + size_of::<T>());
            std::ptr::write_unaligned(self.buffer.as_mut_ptr().add(len) as *mut T, value);
        }
    }

    /**
        Pop a value of type `T` from the stack.

        # Safety

        The type `T` must be the same type that is on top of the stack.
    */
    pub unsafe fn pop<T>(&mut self) -> T {
        let p : usize = self.buffer.len() - size_of::<T>();
        let result = unsafe { std::ptr::read(self.buffer.as_ptr().add(p) as *const T) };
        self.buffer.truncate(p);
        result
    }
}

fn main() {
    // Example usage:
    let mut stack = RawStack::new();

    // Push two u32 values
    stack.push(100u32);
    stack.push(200u32);

    // Pop in LIFO order. The caller must know the type.
    let value2: u32 = unsafe { stack.pop() };
    println!("Popped value: {}", value2); // prints 200

    let value1: u32 = unsafe { stack.pop() };
    println!("Popped value: {}", value1); // prints 100
}

main();


Popped value: 200
Popped value: 100


### [Experiment] Calculate the padded size of a `max_align()`ed type.


In [44]:
// Helper function to round up size to the next multiple of align.
const fn padded_size(size: usize) -> usize {
    const ALIGN : usize = max_align();
    (size + ALIGN - 1) & !(ALIGN - 1)
}

## RawSequence

A raw sequence is a sequence of values of arbitrary types. New values can be pushed onto the sequence, the values can be iterated (by &Ts) to read and iterated to drop each value.
To iterate the sequence or drop values, the type of the values must be known. Type information is not stored in the sequence.


In [45]:
use std::mem;
use aligned_vec::{ AVec, ConstAlign };

pub struct RawSequence {
    buffer: AVec<u8, ConstAlign<4096>>,
}

const fn truncate_index(align: usize, index: usize) -> usize {
    index & !(align - 1)
}

const fn align_index(align: usize, index: usize) -> usize {
    truncate_index(align, index + align - 1)
}

impl RawSequence {
    pub fn new() -> Self {
        RawSequence {
            buffer: AVec::new(4096),
        }
    }

    // Push a value onto the stack. The value will be stored at an address aligned to max_align().
    pub fn push<T>(&mut self, value: T) {
        assert!(mem::align_of::<T>() <= 4096);
        let len = self.buffer.len();
        let aligned : usize = align_index(mem::align_of::<T>(), len);
        let new_len = aligned + mem::size_of::<T>();

        self.buffer.reserve(new_len - len);
        unsafe {
            self.buffer.set_len(new_len);
            std::ptr::write(self.buffer.as_mut_ptr().add(aligned) as *mut T, value);
        }
    }

    pub unsafe fn drop_in_place<T>(&mut self, p: usize) -> usize {
        let aligned : usize = align_index(mem::align_of::<T>(), p);
        unsafe { std::ptr::drop_in_place(self.buffer.as_ptr().add(aligned) as *mut T) };
        aligned + mem::size_of::<T>()
    }

    pub unsafe fn next<T>(&self, p: usize) -> (&T, usize) {
        let aligned : usize = align_index(mem::align_of::<T>(), p);
        let ptr = unsafe { self.buffer.as_ptr().add(aligned) as *const T };
        unsafe {(&*ptr, aligned + mem::size_of::<T>())}
    }
}

fn main() {
    // Example usage:
    let mut stack = RawSequence::new();

    stack.push(100u32);
    stack.push(200u32);
    stack.push(42.0f64);
    stack.push("Hello, world!");

    let (value, p) = unsafe { stack.next::<u32>(0) };
    println!("{}", value);
    let (value, p) = unsafe { stack.next::<u32>(p) };
    println!("{}", value);
    let (value, p) = unsafe { stack.next::<f64>(p) };
    println!("{}", value);
    let (value, _) = unsafe { stack.next::<&str>(p) };
    println!("{}", value);


    let p = unsafe { stack.drop_in_place::<u32>(0) };
    let p = unsafe { stack.drop_in_place::<u32>(p) };
    let p = unsafe { stack.drop_in_place::<f64>(p) };
    let _ = unsafe { stack.drop_in_place::<&str>(p) };
}

main();


100
200
42
Hello, world!


## Segment

A segment is a sequence of operations that can be executed.


In [46]:
use std::any::TypeId;

pub type Operation = fn(&RawSequence, usize, &mut RawStack) -> usize;

pub struct Segment {
    ops: Vec<Operation>,
    storage: RawSequence,
    dropper: Vec<fn(&mut RawSequence, usize) -> usize>,
    type_ids: Vec<TypeId>,
}

impl Segment {
    pub fn new() -> Self {
        Segment {
            ops: Vec::new(),
            storage: RawSequence::new(),
            dropper: Vec::new(),
            type_ids: Vec::new(),
        }
    }

    fn pop_type<T>(&mut self)
    where
        T: 'static,
    {
        match self.type_ids.pop() {
            Some(tid) if tid == TypeId::of::<T>() => {}
            _ => {
                panic!(
                    "Type mismatch: expected {}", std::any::type_name::<T>());
            }
        }
    }

    fn push_storage<T>(&mut self, value: T)
    where
        T: 'static,
    {
        self.storage.push(value);
        self.dropper.push(|storage, p| {
            unsafe { storage.drop_in_place::<T>(p) }
        });
    }

    pub fn push_op0<R, F>(&mut self, op: F)
    where
        F: Fn() -> R + 'static,
        R: 'static,
    {
        self.push_storage(op);
        self.ops.push(|storage, p, stack| {
            let (f, r) = unsafe { storage.next::<F>(p) };
            stack.push(f());
            r
        });
        self.type_ids.push(TypeId::of::<R>());
    }

    pub fn push_op1<T, R, F>(&mut self, op: F)
    where
        F: Fn(T) -> R + 'static,
        T: 'static,
        R: 'static,
    {
        self.pop_type::<T>();
        self.push_storage(op);
        self.ops.push(|storage, p, stack| {
            let (f, r) = unsafe { storage.next::<F>(p) };
            let x: T = unsafe { stack.pop() };
            stack.push(f(x));
            r
        });
        self.type_ids.push(TypeId::of::<R>());
    }

    pub fn push_op2<T, U, R, F>(&mut self, op: F)
    where
        F: Fn(T, U) -> R + 'static,
        T: 'static,
        U: 'static,
        R: 'static,
    {
        self.pop_type::<U>();
        self.pop_type::<T>();
        self.push_storage(op);
        self.ops.push(|storage, p, stack| {
            let (f, r) = unsafe { storage.next::<F>(p) };
            let y: U = unsafe { stack.pop() };
            let x: T = unsafe { stack.pop() };
            stack.push(f(x, y));
            r
        });
        self.type_ids.push(TypeId::of::<R>());
    }

    pub fn push_op3<T, U, V, R, F>(&mut self, op: F)
    where
        F: Fn(T, U, V) -> R + 'static,
        T: 'static,
        U: 'static,
        V: 'static,
        R: 'static,
    {
        self.pop_type::<V>();
        self.pop_type::<U>();
        self.pop_type::<T>();
        self.push_storage(op);
        self.ops.push(|storage, p, stack| {
            let (f, r) = unsafe { storage.next::<F>(p) };
            let z: V = unsafe { stack.pop() };
            let y: U = unsafe { stack.pop() };
            let x: T = unsafe { stack.pop() };
            stack.push(f(x, y, z));
            r
        });
        self.type_ids.push(TypeId::of::<R>());
    }

    pub fn drop(&mut self) {
        let mut p = 0;
        for e in self.dropper.iter() {
            p = e(&mut self.storage, p);
        }
        assert!(self.storage.buffer.len() == 0, "Storage not empty");
    }

    pub fn run<T>(&mut self) -> T
        where T: 'static 
    {
        self.pop_type::<T>();
        if self.type_ids.len() != 0 {
            panic!("Value(s) left on execution stack");
        }

        let mut stack = RawStack::new();
        let mut p = 0;
        for op in self.ops.iter() {
            p = op(&self.storage, p, &mut stack);
        }
        unsafe { stack.pop() }
    }
}

fn main() {
    // Create a vector for stack operations.
    let mut operations = Segment::new();

    // Add a binary operation (addition).
    operations.push_op0(|| -> u32 { 30 });
    operations.push_op0(|| -> u32 { 12 });
    operations.push_op2(|x: u32, y: u32| -> u32 { x + y });
    operations.push_op0(|| -> u32 { 100 });
    operations.push_op0(|| -> u32 { 10 });
    // Add a ternary operation (x + y - z).
    operations.push_op3(|x: u32, y: u32, z: u32| -> u32 { x + y - z });
    operations.push_op1(|x: u32| -> String { format!("result: {}", x.to_string()) });

    let final_result: String = operations.run();
    println!("{}", final_result);
}

main();

result: 132


A simple parser for the following grammar in Rust:

```ebnf
expression = number, {("+" | "-"), number};
number = digit, {digit};
```


In [47]:
use std::iter::Peekable;
use std::str::Chars;

#[derive(Debug, PartialEq)]
enum Token {
    Number(i32),
    Plus,
    Minus,
}

struct Lexer<'a> {
    input: Peekable<Chars<'a>>,
}

impl<'a> Lexer<'a> {
    fn new(expr: &'a str) -> Self {
        Lexer {
            input: expr.chars().peekable(),
        }
    }

    fn next_token(&mut self) -> Option<Token> {
        self.skip_whitespace();
        let ch = self.input.peek()?;
        if ch.is_digit(10) {
            return Some(Token::Number(self.next_number()));
        }
        match self.input.next()? {
            '+' => Some(Token::Plus),
            '-' => Some(Token::Minus),
            _   => None,
        }
    }

    fn next_number(&mut self) -> i32 {
        let mut num_str = String::new();
        while let Some(&ch) = self.input.peek() {
            if ch.is_digit(10) {
                num_str.push(ch);
                self.input.next();
            } else {
                break;
            }
        }
        num_str.parse().unwrap()
    }

    fn skip_whitespace(&mut self) {
        while let Some(&ch) = self.input.peek() {
            if ch.is_whitespace() {
                self.input.next();
            } else {
                break;
            }
        }
    }
}

struct Parser<'a> {
    lexer: Lexer<'a>,
    current_token: Option<Token>,
    operations: Segment
}

impl<'a> Parser<'a> {
    fn new(expr: &'a str) -> Self {
        let mut lexer = Lexer::new(expr);
        let current_token = lexer.next_token();
        Parser { lexer, current_token, operations: Segment::new() }
    }

    fn parse_expression(&mut self) {
        // Parse the left number.
        let left = match self.current_token.take() {
            Some(Token::Number(n)) => n,
            _ => panic!("Expected a number at the beginning"),
        };

        self.operations.push_op0(move || left);

        // Get the optional operator.
        let op = self.lexer.next_token();

        // If there is an operator, parse the second number.
        if let Some(tok) = op {
            let right = match self.lexer.next_token() {
                Some(Token::Number(n)) => n,
                _ => panic!("Expected a number after operator"),
            };
            self.operations.push_op0(move || right);

            match tok {
                Token::Plus => self.operations.push_op2(move |x: i32, y: i32| x + y),
                Token::Minus => self.operations.push_op2(move |x: i32, y: i32| x - y),
                _ => panic!("Unexpected token"),
            }
        } 
    }

    fn run<T>(&mut self) -> T
        where T: 'static 
    {
        self.operations.run()
    }
}

fn main() {
    let expr = r#"
        12 + 34
    "#;
    let mut parser = Parser::new(expr);
    parser.parse_expression();
    let result: i32 = parser.run();
    println!("Result of {} = {}", expr, result);
}

main();

Result of 
        12 + 34
     = 46


In [48]:
struct VTable<T> {
    value: T,
}

impl<T> VTable<T> {
    fn new(value: T) -> Self {
        VTable {
            value: value,
        }
    }
    pub const INVOKE: fn () = || {
        println!("Hello, world! {}", std::any::type_name::<T>());
    };
}

fn main() {
    let fp: fn() = VTable::<i32>::INVOKE;
    fp();

    let captured_closure = || println!("Testing");

    // Measure the size of the closure’s capture state.
    let size = std::mem::size_of_val(&captured_closure);
    println!("Closure size = {}", size);
}

main();

Hello, world! i32
Closure size = 0
