THIS COMPILER IS A WORK IN PROGRESS! ANYTHING CAN CHANGE AT ANY MOMENT WITHOUT ANY NOTICE! USE THIS COMPILER AT YOUR OWN RISK!
- - - - - - - - - - - - - - - - - - - - - - - - C o o o o o
Reinventing the wheel (idiom): "Waste a great deal of time or effort in creating something that already exists."
A small, self-contained C compiler written from scratch in C++ for x86-64 GNU/Linux platforms.
The wheelcc C compiler supports a large subset of C17 (International Standard ISO/IEC 9899:2018), for which it has it's own built-in preprocessor, frontend, IR and backend. It emits x86-64 AT&T assembly for GNU/Linux, which is then linked with gcc/ld. The project is entirely written in C++17, and builds to a standalone executable plus a driver in bash. wheelcc is overall designed after Nora Sandler's Writing a C Compiler, and was tested against it's test suite.
- Get the repo, cd to the bin directory
$ git clone --depth 1 --branch master https://github.com/romainducrocq/wheelcc.git
$ cd wheelcc/bin/
- Configure the repo and install the build/runtime dependencies:
gcc g++ make cmake
$ ./configure.sh
- Build the compiler in Release mode
$ ./make.sh
- Install the
wheelcc
command system-wide (creates a symlink to the driver in/usr/local/bin/
)Or as an alternative, do not install and use
bin/driver.sh
instead
$ ./install.sh
$ . ~/.bashrc
With file main.c
int puts(char* c);
int main(void) {
puts("Hello, world!");
return 0;
}
- Compile and run
$ wheelcc main.c
$ ./main
Hello, World!
- Usage
Note: Except for one source file to compile, all other command-line arguments are optional.
However, the order of arguments passed matters: they are parsed only in this order, any other order will fail!
$ wheelcc --help
Usage: wheelcc [Help] [Debug] [Preprocess] [Include] [Link] [Linkdir] [Linklib] [Output] FILES
[Help]:
--help print help and exit
[Debug]:
-v enable verbose mode
(Test/Debug build only):
--lex print lexing stage and exit
--parse print parsing stage and exit
--validate print semantic stage and exit
--tacky print interm stage and exit
--codegen print assembly stage and exit
--codeemit print emission stage and exit
[Preprocess]:
-E enable macro expansion with gcc
[Include]:
-I<includedir> add a list of paths to include path
[Link]:
-S compile, but do not assemble and link
-c compile and assemble, but do not link
[Linkdir]:
-L<linkdir> add a list of paths to link path
[Linklib]:
-l<libname> link with a list of library files
[Output]:
-o <file> write the output into <file>
FILES: list of .c files to compile
Compile errors output messages with file, line and explanation to stderr:
int main(void) {
int i = { 1 };
return 0;
}
$ wheelcc main.c
/home/user/wheelcc/main.c:2:
error: (no. 547) cannot initialize scalar type ‘int’ with compound initializer
at line 2: int i = { 1 };
wheelcc: error: compilation failed
- cd to the test directory, get the testtime dependencies:
diffutils valgrind
$ cd test/
$ ./get-dependencies.sh
- Test the compiler
$ ./test-compiler.sh
- Test the preprocessor
$ ./test-preprocessor.sh
- Test memory leaks
$ ./test-memory.sh
The latest master branch of wheelcc is tested on these distributions (x86-64):
Debian GNU/Linux | Linux Mint | Ubuntu | openSUSE Leap | Rocky Linux | Arch Linux | EndeavourOS |
---|---|---|---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
wheelcc has a minimal built-in preprocessor that supports include
header directives and comments (singleline and multiline). By default, included files are searched in the same directory as the source file currently being compiled, but other directories to search for can be added to the include path with the -I
option. Other directives, like pragmas, are ignored and stripped out.
The preprocessor does not natively support macros, but macro expansion can be enabled with the -E
command-line option, which falls back on preprocessing with gcc.
wheelcc compiles a list of C source files to x86-64 AT&T GNU/Linux assembly (see Implementation Reference section for a list of supported C language features). (TBD, it is planned to support fasm x86-64 Intel GNU/Linux assembly as an alternative backend output.)
The -S
command-line option can be used to output the assembly without linking, and the -c
option to create an object file instead of an executable. Otherwise, it creates an executable located next to the first source file and with the same name without the extension, or with the name set with the -o
command-line option.
wheelcc also has comprehensive compile error handling, and outputs error messages with the file, line and explanation for the compile error to stderr.
(TBD)
wheelcc has no built-in linker: gcc/ld is used to link the assembly outputed by the compiler. It complies with the System-V ABI, such that libraries already compiled with gcc or other compilers can be linked with the -L
and -l
command-line options and used at runtime in a program compiled by wheelcc. This also allows to link the C standard library APIs which signatures are compatible with the current implementation of wheelcc.
(TBD, it is planned to support fasm as an alternative linker to produce very small executables.)
(TBD, an experimental standard library is planned in the future, with at least support for the compiler tests.)
wheelcc is mostly self-contained and aims to be as less bloated as possible. It simply depends on the C and C++ standard libraries and some header-only dependencies (ctre, boost::regex, tinydir). The build/runtime only requires bash, gcc/g++ and cmake, which makes the compiler easy to build and use on any x86-64 GNU/Linux platform.
wheelcc is implemented entirely in a restricted subset of C++17 with a C-like procedural design/style. Each compilation stage is a single translation unit, with state context data grouped into structures and modified by local functions. The code effectively resembles mostly C plus some C++ sugar for:
- smart pointers with reference counting to manage the lifetime of AST datatypes.
- single inheritance/polymorphism to emulate pattern matching on algebraic datatypes.
- standard containers, collections, string manipulation and move semantics.
Very few other C++ features are used, and only when doing so provides a real advantage (template, constexpr, ...). This is to make the code cleaner without adding too much complexity.
wheelcc supports a large subset of the C17 language, but many features of the language are still not implemented. These include, but are not limited to: enum data structures, variable-length arrays, const types, typedefs, function pointers, non-ascii characters, and float, short, auto, volatile, inline, register and restrict keywords. Any of these may or may not be implemented in the future. As such, wheelcc can not compile the C standard library and is not intended to be used as a production C compiler.
This is what the wheelcc compiler supports of C17 so far. For code examples, see:
test/tests/compiler/
test/tests/preprocessor/
Preprocessor
Language features
- Return, integer constants
- Unary arithmetic operators
- Binary arithmetic operators
- Bitwise arithmetic operators
- Logical and relational operators
- Local variables, assignments
- Compound assignments, increment operators
- If statements, conditional expressions
- Goto statements, labels
- If if-else else compound statements
- While, do while and for loops
- Switch statements
- Functions
- File-scope variables, static and extern storage-class specifiers
Types
- Integers, long integers
- Signed and unsigned integers
- Double floating-point numbers, nan
- Pointers
- Fixed-sized arrays, pointer arithmetic
- Characters, strings literals, ascii
- Void, support dynamic memory allocation
- Structures
- Unions
<param> ::= { <type-specifier> }+ <declarator>
<simple-declarator> ::= <identifier> | "(" <declarator> ")"
<type-specifier> ::= "int" | "long" | "unsigned" | "signed" | "double" | "char" | "void"
| ("struct" | "union") <identifier>
<specifier> ::= <type-specifier> | "static" | "extern"
<block> ::= "{" { <block-item> } "}"
<block-item> ::= <statement> | <declaration>
<initializer> ::= <exp> | "{" <initializer> { "," <initializer> } [ "," ] "}"
<for-init> ::= <variable-declaration> | [ <exp> ] ";"
<statement> ::= "return" [ <exp> ] ";" | <exp> ";"
| "if" "(" <exp> ")" <statement> [ "else" <statement> ] | "goto" <identifier> ";"
| <identifier> ":" | <block> | "break" ";" | "continue" ";"
| "while" "(" <exp> ")" <statement> | "do" <statement> "while" "(" <exp> ")" ";"
| "for" "(" <for-init> [ <exp> ] ";" [ <exp> ] ")" <statement>
| "switch" "(" <exp> ")" <statement> | "case" <const> ":" <statement>
| "default" ":" <statement> | ";"
<exp> ::= <cast-exp> | <exp> <binop> <exp> | <exp> "?" <exp> ":" <exp>
<cast-exp> ::= "(" <type-name> ")" <cast-exp> | <unary-exp>
<unary-exp> ::= <unop> <cast-exp> | "sizeof" <unary-exp> | "sizeof" "(" <type-name> ")"
| <postfix-exp>
<type-name> ::= { <type-specifier> }+ [ <abstract-declarator> ]
<postfix-exp> ::= <primary-exp> { <postfix-op> }
<postfix-op> ::= "[" <exp> "]" | "." <identifier> | "->" <identifier>
<primary-exp> ::= <const> | <identifier> | "(" <exp> ")" | { <string> }+
| <identifier> "(" [ <argument-list> ] ")"
<argument-list> ::= <exp> { "," <exp> }
<abstract-declarator> ::= "*" [ <abstract-declarator> ] | <direct-abstract-declarator>
<direct-abstract-declarator> ::= "(" <abstract-declarator> ")" { "[" <const> "]" }
| { "[" <const> "]" }+
<unop> ::= "-" | "~" | "!" | "*" | "&" | "++" | "--"
<binop> ::= "-" | "+" | "*" | "/" | "%" | "&" | "|" | "^" | "<<" | ">>" | "&&" | "||" | "=="
| "!=" | "<" | "<=" | ">" | ">=" | "=" | "-=" | "+=" | "*=" | "/=" | "%=" | "&="
| "|=" | "^=" | "<<=" | ">>="
<const> ::= <int> | <long> | <uint> | <ulong> | <double> | <char>
<identifier> ::= ? An identifier token ?
<string> ::= ? A string token ?
<int> ::= ? An int token ?
<char> ::= ? A char token ?
<long> ::= ? An int or long token ?
<uint> ::= ? An unsigned int token ?
<ulong> ::= ? An unsigned int or unsigned long token ?
<double> ::= ? A floating-point constant token ?
@romainducrocq