This project serves as the abstract syntax tree for scratch-code
. It is able to represent basic structures of imperative programming languages, mainly focused on C.
scratch-code-ast
makes heavy use of object-orientation. This makes it possible to generalize instances easily. To specialize them again, while avoiding object slicing, the project uses std::shared_ptr
in almost every case. This also has the advantage that objects, which often need to be referenced multiple times, do not need to be copied expensively.
When you look at the file include/AST.hpp
, you can see the class hierarchy of scratch-code-ast
. In the following, there is a inheritance graph kindly generated by Doxygen and Graphviz DOT.
As you can see, the ast::Node
class is the ancestor of all other classes. Descriptions all of the classes are:
Class name | Description |
---|---|
Node | is the ancestor of all other classes and contains the parent and id |
Statement | bundles class instances which are able to appear in the parsed source code |
VariableDefinition | defines a variable, consisting of a type and a name |
FunctionDefinition | defines a function, consisting of a return type, name, argument names and types and statement list |
Value | bundles class instances which can be used as a single value |
LValue | means locatable value, i.e. references a ast::VariableDefinition which has a specific and user-defined point in memory |
RValue | is the opposite of ast::LValue , i.e. you do not really know where this value is actually stored as it is temporary |
RValueValue | represents a simple literal, either bool (true or false ), int (e.g. 42 ), real (e.g. 3.14159 ) or string (e.g. "Example string" ) |
FunctionCall | represents calling a function previously defined by FunctionDefinition , is an RValue , as a returned value has this property |
Operation | bundles unary and binary operations |
UnaryOperation | combines a ast::Value with an unary operator (see below) |
BinaryOperation | combines two instances of ast::Value , left hand side and right hand side, with a binary operator (see below) |
ControlFlowStatement | bundles statements which are able to change the flow of execution, i.e. make it possible for it to have multiple paths |
Conditional | represents if [else if]* [else]? statements, i.e. conditional objects which consist of multiple conditions (if and else if) and a final alternative (else) |
ControllableLoop | bundles loops which are controllable by a ast::LoopControlStatement |
ForLoop | a classic for loop, consisting of an initialization, condition, afterthought and statements |
WhileLoop | a classic while loop, consisting of a condition and statements |
LoopControlStatement | can control a ast::ControllableLoop with break or continue |
ReturnStatement | belongs to a ast::FunctionDefintion and makes this function return the specified value, or nothing for void |
StatementList | is essentially just an array of ast::Statement , used to integrate it better into the ast namespace |
VariableDefinitionList | an array of ast::VariableDefinition |
ValueList | an array of ast::Value |
In the strongly-typed enumeration ast::Lexer::ParsedVariableType
in include/LexerTokenDefinitions.hpp
, there are multiple variable types defined. scratch-code-ast
has very strict typing, so these are the only types available.
Type name | Default value | Description |
---|---|---|
void | (none) | can only be used for function return types, not for variables. Indicates that the function returns no value |
bool | false |
a Boolean value, i.e. has only two states: true or false |
int | 0 |
an integer, which has no specific value range and may be positive or negative |
real | 0.0 |
a real value, i.e. a floating point value |
string | "" |
a list of characters which forms a text |
scratch-code-ast
relies on the concept of lvalues and rvalues. Every ast::Value
results in one of these types, which are called value categories in the code. The difference between these two value categories may be hard to grasp at the beginning, but you will soon notice the necessity of this concept.
In general, you can imagine lvalues as values that have a name. When you define a variable and call it by its name later in your code, you get an lvalue. This was already mentioned in the short ast::LValue
class description above.
On the other hand, an rvalue is a temporary value. It cannot be accessed again after its statement is finished.
For another explanation, look at a simple assignment like a = 42;
(of course a
has been defined before). Here, a
is an lvalue which references the previously defined variable a
. Thus, you can assign something to it, in this case 42
. Next, look at the incorrect statement a+3 = 123;
. This makes no sense, as the left hand side of the assignment operator is now an rvalue, because a+3
is just temporary. Thus, you can only assign to lvalues. That's why this concept is essential for scratch-code-ast
.
In the file include/AST.hpp
you can see different hexadecimal numbers as a comment besides the class predeclarations. These are the ids used by scratch-code-ast
. When you look at any class definition (let's take ast::LValue
as an example), you always have the static member static const int uniqueId;
and something like const int LValue::uniqueId = 0x00001311;
in the final implementation. These ids are, of course, unique for a class and follow a specific pattern.
Reading the ids is done from right to left. The left-most hexadecimal digit shows the class's index on the very first level. As ast::Node
is the ancestor of all other classes, it has the id 0x00000001
. On the next level, there are the classes ast::Statement
with 0x00000011
, ast::StatementList
with 0x00000021
, ast::VariableDefinitionList
with 0x00000031
and ast::ValueList
with 0x00000041
. As you can see, ast::Node
has exactly four direct children and their ids are set by their index. As they all are on the second level, the changed digit is the second one from the right.
It's the same with the children of ast::Statement
. As they are on the third level, only the third digit from the right is changed.
As you can see, this way it is possible to directly get the ancestors of a class when you only have the id. When you get an id like 0x00022411
, you directly know that the class that has it is derived from ast::Node
, ast::Statement
, ast::ControlFlowStatement
, ast::ControllableLoop
and finally is an ast::WhileLoop
.
In scratch-code-ast
, you may convert any type to any other type. The following table shows what happens during the different kinds of typecasts.
To | |||||
bool | int | real | string | ||
From | bool | (none) | 1 when true, 0 when false | 1.0 when true, 0.0 when false | "true" when true, "false" when false |
int | false when 0, true otherwise | (none) | value is kept | decimal value as text | |
float | false when 0.0, true otherwise | value is floored to the next integer | (none) | decimal value as text | |
string | true when "true", false otherwise | try to parse as int, 0 on failure | try to parse as real, 0 on failure | (none) |
There are some things to say about different kinds of operators.
- First of all, the logical ones:
LogicalNot
,LogicalAnd
andLogicalOr
. When you apply these to a value, this value is always implicitly casted tobool
beforehand, i.e.if(!a)
meansif(!((bool)a))
. As you can see in the Allowed input types for unary operators below, a typecast to bool is possible for all types different fromvoid
, i.e. all types that imply a value. - Next, comparison operators like
LessThan
,LessThanOrEqual
,GreaterThan
,GreaterThanOrEqual
are evaluated by implicitly casting the values toint
, except thatreal
is kept as it is. - The binary
Add
operator is not only applicable to numeric types (i.e.int
andreal
, but also tostring
. In this case, it means string concatenation. - The difference between the prefix and postfix increment and decrement operators is rather hard to get, but if you would really like to understand it, look at this Stack Overflow answer (it is not by me though).
To decrease the width of the following tables, valcat is used instead of value category, lhs instead of left hand side and rhs instead of right hand side.
Another strongly-typed enumeration in include/LexerTokenDefinitions.hpp
is ParsedUnaryOperation
. It contains the operators used for the ast::UnaryOperation
class. They all have almost the same meaning in C, making a separate explanation not necessary. The input valcat and output valcat column contents were extracted from the functions ast::Lexer::getRequiredValueCategory
and ast::Lexer::getResultingValueCategory
defined in src/LexerTokenDefinitions.cpp
. The contents of the output types column were taken from the ast::Lexer::getResultingType
function from the same file. The position is given relative to the input value, e.g. for a LogicalNot, ~a
is correct, while a~
is not.
Name | Symbol | Position | Allowed input types | Input valcat | Output valcat | Output type |
---|---|---|---|---|---|---|
LogicalNot | ! |
before | bool, int, real, string | (any) | rvalue | bool |
BitwiseNot | ~ |
before | bool, int | (any) | rvalue | (input type kept) |
PrefixIncrement | ++ |
before | int, real | lvalue | lvalue | (input type kept) |
PrefixDecrement | -- |
before | int, real | lvalue | lvalue | (input type kept) |
PostfixIncrement | ++ |
after | int, real | lvalue | rvalue | (input type kept) |
PostfixDecrement | -- |
after | int, real | lvalue | rvalue | (input type kept) |
UnaryPlus | + |
before | int, real | (any) | rvalue | (input type kept) |
UnaryMinus | - |
before | int, real | (any) | rvalue | (input type kept) |
TypecastBool | (bool) |
before | bool, int, real, string | (any) | rvalue | bool |
TypecastInt | (int) |
before | bool, int, real, string | (any) | rvalue | int |
TypecastReal | (real) |
before | bool, int, real, string | (any) | rvalue | real |
TypecastString | (string) |
before | bool, int, real, string | (any) | rvalue | string |
Similarly to unary operators, binary operators are defined in the strongly-typed enumeration ParsedBinaryOperation
in include/LexerTokenDefinitions.hpp
as well. They are used for the ast::BinaryOperation
class. Again, there is no difference to C. The value category and type information also comes from the same file. Note that, for an ast::BinaryOperation
, the types of both values need to be the same. If they are not, that's what typecasts are for.
Name | Symbol | Allowed input types | Lhs input valcat | Rhs input valcat | Output valcat | Output type |
---|---|---|---|---|---|---|
Add | + |
int, real, string | (any) | (any) | rvalue | (input type kept) |
Subtract | - |
int, real | (any) | (any) | rvalue | (input type kept) |
Multiply | * |
int, real | (any) | (any) | rvalue | (input type kept) |
Divide | / |
int, real | (any) | (any) | rvalue | (input type kept) |
Modulo | % |
int, real | (any) | (any) | rvalue | (input type kept) |
BitwiseAnd | & |
bool, int | (any) | (any) | rvalue | (input type kept) |
BitwiseOr | ` | ` | bool, int | (any) | (any) | rvalue |
BitwiseXor | ^ |
bool, int | (any) | (any) | rvalue | (input type kept) |
BitshiftLeft | << |
int | (any) | (any) | rvalue | (input type kept) |
BitshiftRight | >> |
int | (any) | (any) | rvalue | (input type kept) |
Assignment | = |
bool, int, real, string | lvalue | (any) | lvalue | (input type kept) |
AddAssignment | += |
int, real, string | lvalue | (any) | lvalue | (input type kept) |
SubtractAssignment | -= |
int, real | lvalue | (any) | lvalue | (input type kept) |
MultiplyAssignment | *= |
int, real | lvalue | (any) | lvalue | (input type kept) |
DivideAssignment | /= |
int, real | lvalue | (any) | lvalue | (input type kept) |
ModuloAssignment | %= |
int, real | lvalue | (any) | lvalue | (input type kept) |
BitwiseAndAssignment | &= |
bool, int | lvalue | (any) | lvalue | (input type kept) |
BitwiseOrAssignment | ` | =` | bool, int | lvalue | (any) | lvalue |
BitwiseXorAssignment | ^= |
bool, int | lvalue | (any) | lvalue | (input type kept) |
BitshiftLeftAssignment | <<= |
int | lvalue | (any) | lvalue | (input type kept) |
BitshiftRightAssignment | >>= |
int | lvalue | (any) | lvalue | (input type kept) |
LogicalAnd | && |
bool, int, real, string | (any) | (any) | rvalue | bool |
LogicalOr | ` | ` | bool, int, real, string | (any) | (any) | |
LessThan | < |
bool, int, real, string | (any) | (any) | rvalue | bool |
LessThanOrEqual | <= |
bool, int, real, string | (any) | (any) | rvalue | bool |
GreaterThan | > |
bool, int, real, string | (any) | (any) | rvalue | bool |
GreaterThanOrEqual | >= |
bool, int, real, string | (any) | (any) | rvalue | bool |
Equal | == |
bool, int, real, string | (any) | (any) | rvalue | bool |
NotEqual | != |
bool, int, real, string | (any) | (any) | rvalue | bool |
Just like in C, operators in scratch-code-ast
have precedence. For example, because multiplication has a higher precedence (i.e. a lower number in the following table) than addition, 3+4*5
is evaluated like 3+(4*5)
and not (3+4)*5
. This project's precendence is mainly taken from this cppreference page.
Precedence | Symbols | Names | Associativity |
---|---|---|---|
1 | ++ -- |
PostfixIncrement, PostfixDecrement | left-to-right |
2 | ++ -- + - ! ~ (bool) (int) (real) (string) |
PrefixIncrement, PrefixDecrement UnaryPlus, UnaryMinus LogicalNot, BitwiseNot TypecastBool, TypecastInt, TypecastReal, TypecastString |
right-to-left |
3 | * / % |
Multiply, Divide, Modulo | left-to-right |
4 | + - |
Add, Subtract | left-to-right |
5 | << >> |
BitshiftLeft, BitshiftRight | left-to-right |
6 | < <= > >= |
LessThan, LessThanOrEqual GreaterThan, GreaterThanOrEqual |
left-to-right |
7 | == != |
Equal, NotEqual | left-to-right |
8 | & |
BitwiseAnd | left-to-right |
9 | ^ |
BitwiseXor | left-to-right |
10 | ` | ` | BitwiseOr |
11 | && |
LogicalAnd | left-to-right |
12 | ` | ` | |
13 | = += -= *= /= %= <<= >>= &= ^= ` |
=` | Assignment AddAssignment, SubtractAssignment MultiplyAssignment, DivideAssignment, ModuloAssignment BitshiftLeftAssignment, BitshiftRightAssignment BitwiseAndAssignment, BitwiseXorAssignment, BitwiseOrAssignment |