Skip to content

Semantic Analysis

Nat edited this page May 15, 2026 · 1 revision

Semantic analysis takes the syntactic structure defined in the abstract syntax tree records information about variables, type, and scope. This step also checks the validity of the semantics (requiring variables be declared before use, ensuring disparate types are not used improperly, etc...).

The semantic analysis module of this compiler is found in ./src/sema primarily in sema.cpp. Generally semantic analysis is performed by "walking" each node of the AST in pre-order fashion and storing type and scope information in a symbol table.

Symbol Table

To incorporate scope information, the symbol table is structured as a linked list where each node contains a map from variable identifier to its corresponding symbol entry. Each symbol entry contains information about the symbols type, name, and completeness (used for forward declarations). C has two distinct namespaces, one for variables and other function declarations, and one for structs. This means when looking up a given identifier, it is important to know whether it's a struct or a declaration. Luckily this information is readily available to us directly from the AST.

Type Information

C has a fairly complex type system, allowing for declarations to be a struct type or one of many primitive types (char, short, int, etc) with many different qualifiers (unsigned, const, etc), and declared as pointers, arrays, or functions. In order to accommodate all this ./src/sema/types.hpp has a series of classes which can be structured together to represent any symbol's type. Comparison between complex types is fairly common, so when storing type information for a given symbol entry or intermediate operation we use a pointer to single global instance of the type stored in a hashset. This way, comparing equality of two types is as simple as a direct pointer comparison and avoids the complexity of deep equality comparisons.

Expression Type Information

After semantic analysis type information can be very useful for IR lowering, optimization, and code generations, so while evaluating each expression for validity, we also append the intermediate type information to the AST data structure for use in future steps.

Clone this wiki locally