Skip to content

Lexing and Parsing

Nat edited this page May 15, 2026 · 2 revisions

Lexing

Lexing is the process of dividing and input string of characters into more meaningful sub chunks called tokens.

For this compiler lexing is handled by the Lexer class.

We scan the input stream sequentially, deciding how to categorize each character and then creating a list of tokens. Each Token stores a TokenType and a copy of the substring itself. This step is also where whitespace is removed and comments are ignored.

Parsing

Parsing is the process of deriving syntactic meaning from the list of tokens and constructing and Abstract Syntax Tree (AST) data structure to perform further analysis on.

The parsing module is based on a EBNF grammar defined in ./src/parsing/grammar.ebnf. AST creation is handled by a handcrafted recursive descent parser with backtracking. Each function in ./src/parsing/parse.cpp produces a node in the AST which is eventually returned from the top level parse function. The AST structure is defined in ./src/parsing/ast.hpp where each node type has its own unique structure which eases the burden of analysis in future steps.

Clone this wiki locally