A compiled Python dialect that eliminates core sources of CPython overhead — without sacrificing Python's syntax or readability.
This is work in progress. The syntax can still change as we find more things that one could improve.
While there are many projects that try to make python faster. This one is unique in that it extends python with loads of new features that improve spead and that it is a drop in replacement to python. You can just replace python with tython and start optimizing the places where you know that speed is an issue while keeping the python syntax. Like this migration becomes a breeze.
TurboPython addresses a number independent axes of CPython overhead. Each has its own opt-in syntax. See docs/turbopython_syntax.md for the full language reference.
Type annotations are enforced at compile time, not ignored like Python hints. The compiler emits unboxed native arithmetic — no object headers, no dynamic dispatch.
@native
def distance(x: float, y: float) -> float:
return (x**2 + y**2) ** 0.5
# Emits: double distance(double x, double y) { return sqrt(x*x + y*y); }Unannotated functions fall back to normal CPython — typing is opt-in.
A new struct keyword creates stack-allocated, contiguous-memory types with no heap allocation, no refcount, no GC overhead.
struct Vec3:
x: float
y: float
z: float
points: array[Vec3, 1000] # 24,000 bytes, one contiguous block
# vs. Python list: 1000 pointers + 1000 heap objectsMemory comparison: Vec3 = 24 bytes. Equivalent Python object ≈ 200+ bytes.
Rust-inspired ownership eliminates refcount overhead on the hot path. Three modes, all opt-in:
def consume(data: owned list[int]) -> int: # caller transfers ownership — source invalidated
return sum(data)
def analyze(data: ref list[int]) -> float: # immutable borrow — zero-cost, no refcount
return sum(data) / len(data)
def normalize(data: mut ref list[float]): # exclusive mutable borrow — no data races
total = sum(data)
for i in range(len(data)):
data[i] /= totalThe compiler enforces: multiple ref borrows are fine; mut ref is exclusive — any overlap is a compile error.
Explicit control over what compiles to native code:
@native # AOT compile — all types resolved at compile time
def fib(n: int) -> int: ...
@inline # static inline in C — zero-overhead small helpers
def clamp(val: float, lo: float, hi: float) -> float:
return min(max(val, lo), hi)
@jit # JIT compile on first call, specialize on observed types
def flexible_sum(items): ...
const MAX_ITER: int = 256 # compile-time constant, embedded in binaryparallel for partitions loop iterations across all cores via OpenMP. The ownership checker statically guarantees no data races — no locks needed.
parallel for y in range(height):
for x in range(width):
pixels[y * width + x] = mandelbrot(cx, cy, 256)
# Emits: #pragma omp parallel forspawn launches concurrent tasks with typed channels as the only communication mechanism:
spawn filter_stage(data, chan_filtered)
spawn transform_stage(chan_filtered, chan_results)int128 maps to GCC's __int128, giving a range of ±1.7 × 10³⁸ — suitable for large combinatorics and cryptographic primitives without external dependencies.
@native
def fib(n: int) -> int128:
a: int128 = 0
b: int128 = 1
for i in range(n):
tmp: int128 = a + b
a = b
b = tmp
return a| Concept | Standard Python | TurboPython |
|---|---|---|
| Type enforcement | Advisory hints | Compile-time enforced |
| Memory layout | class (heap, dict-backed) |
struct (stack, packed, no GC) |
| Integer range | Arbitrary precision (slow) | int (64-bit) or int128 (128-bit) |
| Ownership | Refcounted, implicit sharing | owned, ref, mut ref |
| Compilation | Interpreted bytecode | @native, @jit, @inline |
| Constants | Convention (UPPER_CASE) |
const (compile-time evaluated) |
| Parallelism | threading (GIL-bound) |
parallel for, spawn, Channel |
| Typed arrays | list (boxed, pointer array) |
array[T, N] (contiguous, unboxed) |
Requirements: Python 3.10+, GCC with OpenMP support.
git clone https://github.com/ribalba/TurboPython.git
cd TurboPython
python -m turbopython.cli --helpNo pip install needed — just run from the repo root.
# benchmarks/hello.tpy
@native
def fib(n: int) -> int:
a: int = 0
b: int = 1
for i in range(n):
tmp: int = a + b
a = b
b = tmp
return a
@native
def main() -> int:
return fib(40)python -m turbopython.cli compile benchmarks/hello.tpy --exeOutput:
✓ Compilation successful
C source: benchmarks/hello.c
Executable: benchmarks/hello
Run it like any native binary:
./benchmarks/hello
time ./benchmarks/hellopython -m turbopython.cli compile benchmarks/hello.tpyimport ctypes
lib = ctypes.CDLL("./benchmarks/hello.so")
lib.fib.argtypes = [ctypes.c_int64]
lib.fib.restype = ctypes.c_int64
print(lib.fib(40)) # 102334155 — computed in native code./tython benchmarks/hello.tpy # compiles + runs main()
./tython myscript.py # runs .py with import hook activeThe name has two references. Turbo comes from Turbo Pascal — Borland's legendary 1980s compiler that made Pascal fast enough to write real software on a home computer, in part by making compilation itself instant. The parallel is intentional: TurboPython aims to make Python fast enough for systems-level work without leaving the language behind. Python is there because it stays Python — same syntax, same stdlib, same feel.
The short name is tython, which is also the name of the drop-in runner. And also the name of a planet (#lightsaber)
python benchmarks/bench_mandelbrot.py
python benchmarks/bench_hello.pyExpected output (numbers vary by machine):
Mandelbrot (400×300, 256 iterations):
Pure Python : 0.263s (checksum: 3303274)
Compiling mandelbrot.tpy... done
TurboPython : 0.007s (checksum: 3303274)
Speedup : 37.1x faster
Fibonacci (fib_sum(150) × 5000 reps, int128):
Pure Python : 1.842s (result: ...)
Compiling hello.tpy... done
TurboPython : 0.031s (result: ...)
Speedup : 59.4x faster
All types must be fully resolved at compile time. Emitted as a C symbol with unboxed arithmetic.
@native
def mandelbrot(cx: float, cy: float, max_iter: int) -> int:
zx: float = 0.0
zy: float = 0.0
for i in range(max_iter):
if zx * zx + zy * zy > 4.0:
return i
tx: float = zx * zx - zy * zy + cx
zy = 2.0 * zx * zy + cy
zx = tx
return max_iterEmitted as static inline in C. Best for small math helpers.
@inline
def vec3_dot(a: Vec3, b: Vec3) -> float:
return a.x * b.x + a.y * b.y + a.z * b.zStack-allocated, copied on assignment, no GC overhead. A Vec3 is exactly 24 bytes — vs 200+ bytes for an equivalent Python object.
struct Vec3:
x: float
y: float
z: float
@native
def vec3_length(v: Vec3) -> float:
return (v.x * v.x + v.y * v.y + v.z * v.z) ** 0.5Emits #pragma omp parallel for. The ownership checker enforces that loop bodies do not share mutable state.
parallel for y in range(height):
for x in range(width):
cx: float = (x - width / 2.0) / (width / 4.0)
cy: float = (y - height / 2.0) / (height / 4.0)
total = total + mandelbrot(cx, cy, max_iter)Rust-inspired, opt-in. Eliminates refcount overhead on the hot path.
def consume(data: owned list[int]) -> int: # caller's variable is invalidated
return sum(data)
def analyze(data: ref list[int]) -> float: # immutable borrow, zero-cost
return sum(data) / len(data)
def normalize(data: mut ref list[float]): # exclusive mutable borrow
total = sum(data)
for i in range(len(data)):
data[i] /= totalconst MAX_ITER: int = 1000
const PI: float = 3.14159265358979# Compile to a .so shared library
python -m turbopython.cli compile examples/mandelbrot.tpy
# Compile with verbose output (shows generated C)
python -m turbopython.cli compile examples/mandelbrot.tpy --verbose
# Compile to a specific output directory
python -m turbopython.cli compile examples/mandelbrot.tpy -o build/
# Produce a standalone executable (requires @native def main() -> int)
python -m turbopython.cli compile benchmarks/hello.tpy --exe
# Specify a custom entry-point function name
python -m turbopython.cli compile benchmarks/hello.tpy --exe --entry run
# Inspect all compilation stages without producing output
python -m turbopython.cli inspect examples/vectors.tpyinspect prints: original source, preprocessed Python, struct layouts, function signatures with inferred C types, and the full type environment. Useful for debugging codegen.
When --exe is passed:
- Validates that the entry-point function (default:
main) exists - Renames it to
__tp_mainin the generated C to avoid clashing with C'smain - Appends a
int main(int argc, char** argv)wrapper - Compiles without
-shared -fPIC, producing a native executable
tython is an executable at the repo root that acts as a Python-aware interpreter for both .py and .tpy files, with the import hook pre-installed.
# Compile and run a .tpy file — calls main() and uses its return as exit code
./tython examples/vectors.tpy
./tython benchmarks/hello.tpy
# Run a .py script — .tpy files on sys.path are importable by name
./tython myscript.py
# Inline command
./tython -c "import vectors; print(vectors.compute_total_distance(100))"
# Interactive REPL with import hook active
./tythonInside a .py script run via tython, any .tpy file on sys.path imports transparently:
# myscript.py — no special setup needed when run via tython
import vectors
print(vectors.compute_total_distance(1000))The import hook can also be used in any regular Python script without the tython runner:
from turbopython.importer import install
install()
import vectors # finds vectors.tpy on sys.path, compiles to vectors.so
print(vectors.compute_total_distance(1000))install() inserts a sys.meta_path finder that:
- Searches
sys.pathfor<module>.tpywhen an import cannot find a.py/.pyc - Compiles the
.tpywith the full TurboPython pipeline - Wraps the resulting
.soin a module object withargtypes/restypeset automatically from the compiled type signatures - Returns the module — the caller uses it as any normal Python module
The .so is written next to the .tpy file and reused on subsequent runs.
| File | Demonstrates |
|---|---|
| examples/mandelbrot.tpy | @native, typed arithmetic, struct |
| examples/vectors.tpy | struct value types, @inline, @native, main |
| examples/nbody.tpy | parallel for, struct arrays, main |
| benchmarks/hello.tpy | int128, fibonacci, main |
| benchmarks/hello.py | Pure Python equivalent of hello.tpy |
| benchmarks/bench_hello.py | Fibonacci benchmark vs pure Python |
| benchmarks/bench_mandelbrot.py | Mandelbrot benchmark vs pure Python |
TurboPython/
├── README.md
├── tython # Drop-in runner / interpreter
├── turbopython/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── compiler.py # Pipeline driver
│ ├── preprocessor.py # Stage 1: syntax → valid Python + metadata
│ ├── type_checker.py # Stage 2: type resolution and validation
│ ├── ownership.py # Stage 3: move/borrow checking
│ ├── codegen.py # Stage 4: C code generation + GCC invocation
│ ├── importer.py # sys.meta_path hook for transparent .tpy imports
│ └── test_compiler.py # Test suite
├── examples/
│ ├── mandelbrot.tpy # Mandelbrot fractal
│ ├── vectors.tpy # 3D vector math
│ └── nbody.tpy # N-body gravitational simulation
├── benchmarks/
│ ├── hello.tpy # int128 fibonacci (compile to .so or executable)
│ ├── hello.py # Pure Python equivalent
│ ├── bench_hello.py # Side-by-side benchmark
│ └── bench_mandelbrot.py # Mandelbrot benchmark
└── docs/
├── ARCHITECTURE.md # Detailed pipeline design
└── turbopython_syntax.md # Full language reference
| TurboPython | C type | Range |
|---|---|---|
int |
int64_t |
±9.2 × 10¹⁸ |
int128 |
__int128 |
±1.7 × 10³⁸ |
float |
double |
64-bit IEEE 754 |
bool |
int |
0 / 1 |
str |
const char* |
read-only C string |
array[float, N] |
double* |
contiguous heap/stack |
struct Foo |
Foo (typedef'd struct) |
stack-allocated value type |
- Indentation-based blocks
def,class,for,if,while,with,return,yield- List/dict/set comprehensions
- Standard library imports
- Unannotated functions run as normal CPython
The philosophy: opt in to performance where it matters, keep everything else as dynamic and expressive as Python.
- docs/ARCHITECTURE.md — Pipeline internals, stage-by-stage design, production gaps
- docs/turbopython_syntax.md — Full language reference with syntax tables