A modern C compiler implementation written in Python### A modern C compiler implementation written in Python
This is a complete C compiler implementation written in Python 3 that supports a significant subset of the C11 standard. The compiler generates efficient x86-64 assembly code and includes various optimizations and comprehensive error reporting.This is a complete C compiler implementation written in Python 3 that supports a significant subset of the C11 standard. The compiler generates efficient x86-64 assembly code and includes various optimizations and comprehensive error reporting.
-
Complete Compilation Pipeline: Lexical analysis → Parsing → Intermediate representation → Assembly generation- Complete Compilation Pipeline: Lexical analysis → Parsing → Intermediate representation → Assembly generation
-
C11 Subset Support: Variables, functions, control flow, arrays, pointers, structs, and more- C11 Subset Support: Variables, functions, control flow, arrays, pointers, structs, and more
-
Register Allocation: Advanced register allocation using graph coloring algorithms- Register Allocation: Advanced register allocation using graph coloring algorithms
-
Error Reporting: Detailed compile-time error messages with line numbers and context- Error Reporting: Detailed compile-time error messages with line numbers and context
-
Cross-platform: Works on any x86-64 Linux system- Cross-platform: Works on any x86-64 Linux system
-
Python 3.6 or later- Python 3.6 or later
-
GNU binutils (as, ld)- GNU binutils (as, ld)
-
glibc- glibc
- **Clone the repository:**1. Clone the repository:
bashbash
git clone https://github.com/linkl0n-B/C-Compiler-Python.gitgit clone https://github.com/linkl0n-B/C-Compiler-Python.git
cd C-Compiler-Pythoncd C-Compiler-Python
2. **Compile a C program:**2. **Compile a C program:**
```bash```bash
python3 -m lin.main your_program.cpython3 -m lin.main your_program.c
./out./out
- **Specify output filename:**3. Specify output filename:
bashbash
python3 -m lin.main your_program.c -o my_programpython3 -m lin.main your_program.c -o my_program
./my_program./my_program
### Example### Example
Create a simple C program:Create a simple C program:
```c```c
// hello.c// hello.c
#include <stdio.h>#include <stdio.h>
int main() {int main() {
printf("Hello, World!\n"); printf("Hello, World!\n");
return 0; return 0;
}}
Compile and run:Compile and run:
bashbash
python3 -m lin.main hello.c -o hellopython3 -m lin.main hello.c -o hello
./hello./hello
## Architecture## Implementation Overview
#### Preprocessor
### Compiler Stageslin today has a very limited preprocessor that parses out comments and expands `#include` directives. These features are implemented between [`lexer.py`](lin/lexer.py) and [`preproc.py`](lin/lexer.py).
1. **Preprocessor**: Handles `#include` directives and comment removal#### Lexer
2. **Lexer**: Tokenizes the input source codeThe lin lexer is implemented primarily in [`lexer.py`](lin/lexer.py). Additionally, [`tokens.py`](lin/tokens.py) contains definitions of the token classes used in the lexer and [`token_kinds.py`](lin/token_kinds.py) contains instances of recognized keyword and symbol tokens.
3. **Parser**: Recursive descent parser that builds an Abstract Syntax Tree (AST)
4. **IL Generation**: Converts AST to custom Intermediate Language#### Parser
5. **Assembly Generation**: Generates x86-64 assembly with register allocationThe lin parser uses recursive descent techniques for all parsing. It is implemented in [`parser/*.py`](lin/parser/) and creates a parse tree of nodes defined in [`tree/*.py`](lin/tree/).
### Directory Structure#### IL generation
lin traverses the parse tree to generate a flat custom IL (intermediate language). The commands for this IL are in [`il_cmds/*.py`](lin/il_cmds/) . Objects used for IL generation are in [`il_gen.py`](lin/il_gen.py) , but most of the IL generating code is in the `make_code` function of each tree node in [`tree/*.py`](lin/tree/).
```
lin/ # Main compiler package#### ASM generation
├── main.py # Entry pointlin sequentially reads the IL commands, converting each into Intel-format x86-64 assembly code. lin performs register allocation using George and Appel’s iterated register coalescing algorithm (see References below). The general ASM generation functionality is in [`asm_gen.py`](lin/asm_gen.py) , but much of the ASM generating code is in the `make_asm` function of each IL command in [`il_cmds/*.py`](lin/il_cmds/).
├── lexer.py # Lexical analyzer
├── parser/ # Parsing logic## Contributing
├── tree/ # AST node definitionsThis project is no longer under active development and I'm unlikely to review non-trivial PRs. However:
├── il_cmds/ # Intermediate language commands
├── il_gen.py # IL generation- If you have a question about lin, the best way to ask is via Github Issues. I'll answer when I can, but my response may not be so helpful because it's been a while since I've had time to think about this project.
├── asm_gen.py # Assembly generation- If you have an perspective on how lin can be made practically helpful to a group, please make an Issue. I'd love to hear from you, although unfortunately I may not be in a position to implement any changes depending on my other interests and obligations.
└── asm_cmds.py # Assembly commands
## References
tests/ # Comprehensive test suite- [ShivC](https://github.com/ShivamSarodia/ShivC) - lin is a rewrite from scratch of my old C compiler, ShivC, with much more emphasis on feature completeness and code quality. See the ShivC README for more details.
├── feature_tests/ # Feature-specific tests- C11 Specification - http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
├── frontend_tests/ # Parser/lexer tests- x86_64 ABI - https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf
└── general_tests/ # Integration tests- Iterated Register Coalescing (George and Appel) - https://www.cs.purdue.edu/homes/hosking/502/george.pdf
```
## Supported C Features
- **Data Types**: int, char, long, short, unsigned variants, pointers, arrays
- **Control Flow**: if/else, while, for, break, continue, return
- **Functions**: Definition, calls, parameters, recursion
- **Operators**: Arithmetic, logical, comparison, bitwise, assignment
- **Memory**: Dynamic allocation concepts, pointer arithmetic
- **Structures**: Basic struct support
## Testing
Run the comprehensive test suite:
```bash
python3 -m unittest discover tests/ -v
```
## Contributing
This project welcomes contributions! Areas for improvement:
- Additional C language features
- Optimization passes
- Better error messages
- Performance improvements
- Documentation
## Technical Details
- **Target Architecture**: x86-64 Linux
- **Calling Convention**: System V AMD64 ABI
- **Register Allocation**: Graph coloring with coalescing
- **Intermediate Representation**: Custom IL optimized for code generation
## License
This project is released under the MIT License. See LICENSE file for details.
## Contact
For questions, suggestions, or support:
- 📧 Email: lincolncommercialsolutions@gmail.com
- 📱 Telegram: linkl0n
---
*Built with Python • Targets x86-64 Linux • MIT Licensed*