This project is an implementation of the full C89/C90 C specification for x86-64
linux. It'sis mostly based on Engineering a Compiler 2nd Edition. The compiler is self hosting. It's just a hobby, it won't be big and professional like gcc. All of the C89/C90 specification has been implemented. All tests in SQLite's tcltest
pass. The was assembler can be used to produce object files by setting the AS
environment variable.
$ make wcc
$ ./wcc test.c -o test
The compiler goes through the following phases:
- C Preprocessor
- Hand rolled lexer
- Precedence climbing parser
- Simple arithmetic transformations
- Transformation into SSA
- Data flow analysis and replacement of variables with live ranges
- Live range coalescing
- Some more arithmetic optimizations while converting out of a linear IR into a tree IR
- Instruction selection using tree pattern matching
- Live range coalescing (again)
- Register allocation using top-down graph coloring
- Code generation using tree tiling
Run all tests
$ make test
Test 3 stage self compilation
$ make test-self-compilation
Run benchmarks
$ make run-benchmark
Compile and test sqlite3
$ CC=.../wcc ./configure
$ make tcltest
#include <stdio.h>
typedef struct s1 {
int i, j;
} S1;
typedef struct s2 {
S1 *s1;
} S2;
void main() {
S2 *s2;
s2 = malloc(sizeof(S2));
s2->s1 = malloc(sizeof(S1));
s2->s1->j = 1;
printf("%d\n", s2->s1->j);
}
main:
push %rbp # Function prologue
mov %rsp, %rbp
push %rbx
subq $8, %rsp
movq $8, %rdi # s2 = malloc(sizeof(S2));
callq malloc@PLT
movq %rax, %rbx # rbx = s2
movq $8, %rdi # s2->s1 = malloc(sizeof(S1));
callq malloc@PLT
movq %rax, %rcx
addq $8, %rsp
movq %rcx, %rax
movq %rax, (%rbx)
movq %rbx, %rax # s2->s1->j = 1;
movq (%rax), %rax
movl $1, 4(%rax)
subq $8, %rsp # printf("%d\n", s2->s1->j);
movq %rbx, %rax
movq (%rax), %rax
movl 4(%rax), %esi
leaq .SL0(%rip), %rdi
movb $0, %al # printf is variadic, 0 is the number of floating point arguments
callq printf@PLT
addq $8, %rsp
movq $0, %rax # Function exit code zero
popq %rbx # Function epilogue
leaveq
retq
The project started out as a clone of c4 to teach myself to write it from scratch. I then went down a route based on TCC where I wrote a code generator that outputted an object (.o
) file. It then quickly became clear that generating object code without using an assembler is a waste of time, so I adapted it to produce .s
files and compiled them with gcc. I then proceeded implemeting Sebastian Falbesoner's approach to register allocation. At this point I went academic and started reading Engineering a Compiler 2nd Edition. First SSA transformation, then graph coloring register allocation, then finally instruction selection using tree pattern matching.
After a hiatus I resumed work and fixed a ton of bugs in the instruction selection. I then decided to implement the full C89/C90 specification, starting with the non-preprocessor parts, then the preprocessor.
To validate the compiler can compile something other than itself correctly, I fixed enough bugs to get sqlite3
to compile and pass the tcltest
tests.