Skip to content

Virtual machine interpreter for the Lox programming language written in C

License

Notifications You must be signed in to change notification settings

gustavodiasag/clox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clox

clox is a C99 implementation of a bytecode virtual machine interpreter for the Lox programming language following the third part of Robert Nystrom's Crafting Interpreters.

Two versions of the interpreter are instructed by the author, the first one being written in Java, which uses an AST-based style of implementation. The main reason for writing another Lox interpreter using a different approach is that walking through an Abstract Syntax Tree is not memory-efficient, since each piece of syntax in the language becomes a heap-stored AST node. To avoid that, this implementation focuses on exploring and taking advantage of CPU caching by ways of representing code in memory in a denser and more ordered way.

Usage

Build

The project relies on CMake (3.16.3+), which is strictly required in order to build it. After installation, simply clone the project and run:

$ mkdir build && cd build/    # CMake workspace.
$ cmake .. && cmake --build .

A few customizable build settings can be set using CMake's CACHE entries:

cmake [-D <option>=1 ] ..

Where options are non-exclusive and consist of:

  • DEBUG: logs the bytecode compiled and the runtime stack state during the execution of each instruction.

  • LOG_GC: triggers garbage collection more frequently, logging its tracing and the amount of memory reclaimed.

  • OPTIMIZE: changes clox's representation of values, switching from tagged unions to NaN-boxing.

Run

Lox programs can be interpreted as source files or through a REPL interface, by just omitting the file path. A few example programs are provided.

$ ./clox [filepath]

Notes

Besides the main purpose of the book, which is the actual implementation of the interpreters, a bunch of concepts and theorems regarding computer science as a whole is also presented throughout its content. Considering that some of this information, if not all of it, is crucial for one's path becoming a somewhat decent computer scientist, a whole separate section is dedicated to it.

Lox

In a brief description, Lox is a high-level, dynamically typed, garbage-collected programming language that borrows ideas from functional, procedural and object-oriented programming to provide the flexibility expected of a simple scripting language.

It provides most of the functionalities from a basic language implementation, such as data types, expressions, statements, variables, control flow and functions. Besides that, features like classes and closures are also specified, whose complexity is deserving of a more thorough depiction.

Classes

A class and its methods can be declared in Lox like so:

class Example {
    foo() {
        print "Foo.";
    }

    bar(baz) {
        print "This is " + baz + ".";
    }
}

var example = Example();
print example // "Example instance".

When the declaration is executed, Lox creates a class object and stores it in a variable named after the class, which turns it to a first-class value in the language. Classes in Lox are also factory functions for instances, producing them when called.

Instantiation

Classes in Lox also hold state in the form of properties, which can be freely added onto objects. Assigning to a property creates it if it doesn't already exist.

example.foobar = "foobar";
example.qux = "qux";

The keyword this is used to access a field or method on the class object from within a method.

Initialization

If a class has a method named init(), it is called automatically when the object is constructed.

class Example {
    init(foobar, qux) {
        this.foobar = foobar;
        this.qux = qux;
    }
}

var example = Example("foobar", "qux");

Inheritance

Lox supports single inheritance. When a class is declared, its inheriting behaviour can be specified with <. Every method defined in the superclass is also available to its subclasses, including init().

class Text < Example {
    init(foobar, qux, quux) {
        super.init(foobar, qux);
        this.quux = quux;
    }
}

Closures

Considering that functions are defined as first-class values in Lox and that function declarations are statements, the combination of those along with block scope can lead to a behaviour whose semantics are specified by closures.

fun outer() {
    var outside = "outside";

    fun inner() {
        print outside;
    }

    return inner;
}

var fn = outer();
// ...
fn();

In the above example, inner() must keep a reference to any surrounding variables that it uses so that they still exist after the outer function has returned. For that, the function closes over the variables' values at the moment of the call.

Grammar

The syntax rules in the Extended Backus-Naur Form (EBNF) notation for Lox are presented below.

program         = declaration * EOF ;

(* Declarations. *)

declaration     = class_decl | fun_decl | var_decl | statement ;

class_decl      = "class" IDENTIFIER ( "<" IDENTIFIER ) ? "{" function * "}"

fun_decl        = "fun" function ;

var_decl        = "var" IDENTIFIER ( "=" expression ) ? ";" ;

(* Statements. *)

statement       = expr_stmt
                | for_stmt
                | if_stmt
                | print_stmt
                | return_stmt
                | while_stmt
                | block
                ;

expr_stmt       = expression ";" ;

for_stmt        = "for" "(" ( var_decl | expr_stmt | ";" )
                    expression ? ";"
                    expression ? ")" statement

if_stmt         = "if" "(" expression ")" statement ("else" statement) ? ;

print_stmt      = "print" expression ";" ;

return_stmt     = "return" expression ? ";" ;

whle_stmt       = "while" "(" expression ")" statement ;

block           = "{" declaration * "}" ;

(* Expressions. *)

expression      = assignment ;

assignment      = ( call "." ) ? IDENTIFIER "=" assignment | logic_or ;

logic_or        = logic_and ( "or" logic_and ) * ;

logic_and       = equality ( "and" equality) * ;

equality        = comparison ( ( "!=" | "==" ) comparison ) * ;

comparison      = term ( ( ">" | ">=" | "<" | "<=" ) term ) * ;

term            = factor ( ( "-" | "+" ) factor ) * ;

factor          = unary ( ( "/" | "*" ) unary ) * ;

unary           = ( "!" | "-" ) unary | call ;

call            = primary ( "(" arguments ? ")" | "." IDENTIFIER ) * ;

primary         = "true"
                | "false"
                | "nil"
                | "this"
                | NUMBER
                | STRING
                | IDENTIFIER
                | "(" expression ")"
                | "super" "." IDENTIFIER
                ;

(* Utility rules. *)

function        = IDENTIFIER "(" parameters ? ")" block ;

parameters      = IDENTIFIER ( "," IDENTIFIER ) * ;

arguments       = expression ( "," expression ) * ;

(* Lexical grammar. *)

NUMBER          = DIGIT + ( "." DIGIT ) ? ;

STRING          = '"' ( ? all characters ? - '"' ) * '"' ;

IDENTIFIER      = ALPHA ( ALPHA | DIGIT ) * ;

ALPHA           = "a" ... "z" | "A" ... "Z" | "_" ;

DIGIT           = "0" ... "9" ;

License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published