Skip to content

lincolncommercialsolutions/C-Compiler-Python

Repository files navigation

C Compiler in Python# C Compiler in Python

A modern C compiler implementation written in Python### A modern C compiler implementation written in Python

Compiler DemoCompiler Demo


Overview## Overview

This is a complete C compiler implementation written in Python 3 that supports a significant subset of the C11 standard. The compiler generates efficient x86-64 assembly code and includes various optimizations and comprehensive error reporting.This is a complete C compiler implementation written in Python 3 that supports a significant subset of the C11 standard. The compiler generates efficient x86-64 assembly code and includes various optimizations and comprehensive error reporting.

Features## Features

  • Complete Compilation Pipeline: Lexical analysis → Parsing → Intermediate representation → Assembly generation- Complete Compilation Pipeline: Lexical analysis → Parsing → Intermediate representation → Assembly generation

  • C11 Subset Support: Variables, functions, control flow, arrays, pointers, structs, and more- C11 Subset Support: Variables, functions, control flow, arrays, pointers, structs, and more

  • Register Allocation: Advanced register allocation using graph coloring algorithms- Register Allocation: Advanced register allocation using graph coloring algorithms

  • Error Reporting: Detailed compile-time error messages with line numbers and context- Error Reporting: Detailed compile-time error messages with line numbers and context

  • Cross-platform: Works on any x86-64 Linux system- Cross-platform: Works on any x86-64 Linux system

Quick Start## Quick Start

Prerequisites### Prerequisites

  • Python 3.6 or later- Python 3.6 or later

  • GNU binutils (as, ld)- GNU binutils (as, ld)

  • glibc- glibc

Installation & Usage### Installation & Usage

  1. **Clone the repository:**1. Clone the repository:

bashbash

git clone https://github.com/linkl0n-B/C-Compiler-Python.gitgit clone https://github.com/linkl0n-B/C-Compiler-Python.git

cd C-Compiler-Pythoncd C-Compiler-Python




2. **Compile a C program:**2. **Compile a C program:**

```bash```bash

python3 -m lin.main your_program.cpython3 -m lin.main your_program.c

./out./out

  1. **Specify output filename:**3. Specify output filename:

bashbash

python3 -m lin.main your_program.c -o my_programpython3 -m lin.main your_program.c -o my_program

./my_program./my_program




### Example### Example



Create a simple C program:Create a simple C program:

```c```c

// hello.c// hello.c

#include <stdio.h>#include <stdio.h>

int main() {int main() {

    printf("Hello, World!\n");    printf("Hello, World!\n");

    return 0;    return 0;

}}

Compile and run:Compile and run:

bashbash

python3 -m lin.main hello.c -o hellopython3 -m lin.main hello.c -o hello

./hello./hello




## Architecture## Implementation Overview

#### Preprocessor

### Compiler Stageslin today has a very limited preprocessor that parses out comments and expands `#include` directives. These features are implemented between [`lexer.py`](lin/lexer.py) and [`preproc.py`](lin/lexer.py).



1. **Preprocessor**: Handles `#include` directives and comment removal#### Lexer

2. **Lexer**: Tokenizes the input source codeThe lin lexer is implemented primarily in [`lexer.py`](lin/lexer.py). Additionally, [`tokens.py`](lin/tokens.py) contains definitions of the token classes used in the lexer and [`token_kinds.py`](lin/token_kinds.py) contains instances of recognized keyword and symbol tokens.

3. **Parser**: Recursive descent parser that builds an Abstract Syntax Tree (AST)

4. **IL Generation**: Converts AST to custom Intermediate Language#### Parser

5. **Assembly Generation**: Generates x86-64 assembly with register allocationThe lin parser uses recursive descent techniques for all parsing. It is implemented in [`parser/*.py`](lin/parser/) and creates a parse tree of nodes defined in [`tree/*.py`](lin/tree/).



### Directory Structure#### IL generation

lin traverses the parse tree to generate a flat custom IL (intermediate language). The commands for this IL are in [`il_cmds/*.py`](lin/il_cmds/) . Objects used for IL generation are in [`il_gen.py`](lin/il_gen.py) , but most of the IL generating code is in the `make_code` function of each tree node in [`tree/*.py`](lin/tree/).

```

lin/                     # Main compiler package#### ASM generation

├── main.py             # Entry pointlin sequentially reads the IL commands, converting each into Intel-format x86-64 assembly code. lin performs register allocation using George and Appel’s iterated register coalescing algorithm (see References below). The general ASM generation functionality is in [`asm_gen.py`](lin/asm_gen.py) , but much of the ASM generating code is in the `make_asm` function of each IL command in [`il_cmds/*.py`](lin/il_cmds/).

├── lexer.py            # Lexical analyzer

├── parser/             # Parsing logic## Contributing

├── tree/               # AST node definitionsThis project is no longer under active development and I'm unlikely to review non-trivial PRs. However:

├── il_cmds/            # Intermediate language commands

├── il_gen.py           # IL generation- If you have a question about lin, the best way to ask is via Github Issues. I'll answer when I can, but my response may not be so helpful because it's been a while since I've had time to think about this project.

├── asm_gen.py          # Assembly generation- If you have an perspective on how lin can be made practically helpful to a group, please make an Issue. I'd love to hear from you, although unfortunately I may not be in a position to implement any changes depending on my other interests and obligations.

└── asm_cmds.py         # Assembly commands

## References

tests/                   # Comprehensive test suite- [ShivC](https://github.com/ShivamSarodia/ShivC) - lin is a rewrite from scratch of my old C compiler, ShivC, with much more emphasis on feature completeness and code quality. See the ShivC README for more details.

├── feature_tests/      # Feature-specific tests- C11 Specification - http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

├── frontend_tests/     # Parser/lexer tests- x86_64 ABI - https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf

└── general_tests/      # Integration tests- Iterated Register Coalescing (George and Appel) - https://www.cs.purdue.edu/homes/hosking/502/george.pdf

```

## Supported C Features

- **Data Types**: int, char, long, short, unsigned variants, pointers, arrays
- **Control Flow**: if/else, while, for, break, continue, return
- **Functions**: Definition, calls, parameters, recursion
- **Operators**: Arithmetic, logical, comparison, bitwise, assignment
- **Memory**: Dynamic allocation concepts, pointer arithmetic
- **Structures**: Basic struct support

## Testing

Run the comprehensive test suite:
```bash
python3 -m unittest discover tests/ -v
```

## Contributing

This project welcomes contributions! Areas for improvement:

- Additional C language features
- Optimization passes
- Better error messages
- Performance improvements
- Documentation

## Technical Details

- **Target Architecture**: x86-64 Linux
- **Calling Convention**: System V AMD64 ABI
- **Register Allocation**: Graph coloring with coalescing
- **Intermediate Representation**: Custom IL optimized for code generation

## License

This project is released under the MIT License. See LICENSE file for details.

## Contact

For questions, suggestions, or support:
- 📧 Email: lincolncommercialsolutions@gmail.com
- 📱 Telegram: linkl0n

---

*Built with Python • Targets x86-64 Linux • MIT Licensed*

About

A complete C compiler implementation written in Python 3 that supports a significant subset of the C11 standard

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages