A high-performance, modular traffic simulation engine written in C++ with Python bindings — simulating 10,000+ vehicles across 100 intersections in real time.
Overview • Architecture • Features • Results • Usage • Tech Stack • Report
RealFlow is a real-time urban traffic simulation framework that models thousands of vehicles navigating a city grid with traffic lights, dynamic acceleration, and lane behaviour.
The engine is written in C++17 for maximum performance and exposed to Python via pybind11 — so the hot simulation loop runs at compiled C++ speed while Python handles orchestration, benchmarking, checkpointing, and visualisation.
Modern traffic management systems, autonomous vehicle simulators, and urban planning tools all rely on simulation engines that can process thousands of dynamic entities per second. I wanted to understand what it actually takes to build one from scratch — not just call a library, but implement the data structures, memory layout, state machines, and determinism guarantees that make a simulation engine production-grade.
The project taught me why cache-friendly memory layout is not a micro-optimisation but a fundamental design decision, and how deterministic seeding is the foundation of any trustworthy simulation.
┌─────────────────────────────────────────────────────────────────────┐
│ Python Layer │
│ ScenarioRunner → SimConfig → BenchmarkAnalysis → Visualisation │
└──────────────────────────┬──────────────────────────────────────────┘
│ pybind11 bindings
┌──────────────────────────▼──────────────────────────────────────────┐
│ C++ Core Engine │
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌───────────────────────┐ │
│ │ EntityPool │ │ Intersection │ │ CheckpointManager │ │
│ │ (SoA) │ │ State Machine │ │ FNV-1a Hashing │ │
│ │ │ │ │ │ Binary Serialisation │ │
│ │ pos_x[] │ │ GREEN_NS ──► │ │ │ │
│ │ pos_y[] │ │ YELLOW_NS ──► │ │ save(pool, path) │ │
│ │ speed[] │ │ RED_NS ──► │ │ load(path) │ │
│ │ heading[] │ │ YELLOW_EW ──► │ │ hash_state(pool) │ │
│ │ accel[] │ │ │ │ │ │
│ └─────────────┘ └──────────────────┘ └───────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ SimulationEngine │ │
│ │ init() → tick_soa() → tick_soa() → ... → save_checkpoint() │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
The most important design decision in the project. Instead of storing each vehicle as a struct:
// ❌ AoS — cache-unfriendly: accessing speed[i] jumps 48 bytes between vehicles
struct Vehicle { float x, y, speed, heading, accel; uint8_t lane, state; ... };
vector<Vehicle> vehicles;
// ✅ SoA — cache-friendly: speed[0..15] fits in ONE 64-byte cache line
struct EntityPool {
vector<float> pos_x, pos_y, speed, heading, accel;
vector<uint8_t> lane, state;
};Processing all speeds sequentially loads 16 floats per cache line vs 1 in AoS. Result: ~30% reduction in average tick execution latency at 10,000 vehicles.
Each intersection runs a 4-phase state machine:
GREEN_NS (60 ticks) → YELLOW_NS (10 ticks) → RED_NS/GREEN_EW (60 ticks) → YELLOW_EW (10 ticks) → ...
Vehicles within 30m of a red light decelerate at −3 m/s². Vehicles on green accelerate at +1.5 m/s². Intersections are staggered in phase to prevent city-wide synchronisation.
Every run with the same seed produces byte-identical output:
std::mt19937seeded with a fixeduint64_tseed- All random distributions drawn in deterministic order
- State verified with FNV-1a 64-bit hash after every run
- Verified: 5/5 runs with seed=42 produce identical 64-bit state fingerprints
The engine can serialise and restore complete simulation state:
- Saves all SoA arrays to a compact binary file (~254 KB for 10,000 vehicles)
- State integrity verified by cross-checking position values after reload
- Enables pause/resume, reproducible debugging, and regression testing
New traffic scenarios plug in without touching the C++ core:
class RushHourScenario(ScenarioBase): # 12 lines — just a config subclass
def get_config(self):
cfg = SimConfig()
cfg.num_vehicles = 10000
cfg.max_speed = 10.0 # congested
return cfgThree built-in scenarios: Rush Hour, Highway, Suburban. Adding a new scenario requires ~12 lines vs ~85 in a naive copy-paste approach → ~35% extensibility gain.
The entire C++ engine is accessible from Python with zero overhead for batch operations:
import traffic_sim
eng = traffic_sim.SimulationEngine(cfg)
eng.run_benchmark_soa() # pure C++ hot loop
pos_x = eng.get_pos_x() # returns std::vector<float> as Python list| Metric | Value |
|---|---|
| Vehicles per run | 10,000+ |
| Intersections | 100 (10×10 grid) |
| Simulation ticks | 200 |
| SoA avg tick latency | ~1–3 ms |
| AoS avg tick latency | ~1.3–4 ms |
| Latency reduction (SoA vs AoS) | ~30% |
| Throughput | 500,000–2,000,000+ entity-ticks/sec |
| Reproducibility (5 runs, seed=42) | 100% identical |
| Checkpoint file size | ~254 KB |
| Extensibility gain | ~35% |
Exact latency values depend on Colab hardware assigned at runtime.
| Scenario | Vehicles | Intersections | Max Speed | Characteristic |
|---|---|---|---|---|
| Rush Hour | 10,000 | 144 (12×12) | 10 m/s | Dense, congested |
| Highway | 3,000 | 16 (4×4) | 30 m/s | Sparse, fast |
| Suburban | 6,000 | 64 (8×8) | 15 m/s | Balanced |
- Open Google Colab
- Create a new notebook
- Paste and run cells 1–7 from
notebook/realflow_simulation.ipynbin order - Cell 1 must be re-run every new Colab session (recompiles the
.so)
import traffic_sim
# Configure
cfg = traffic_sim.SimConfig()
cfg.num_vehicles = 10000
cfg.num_intersections = 100
cfg.ticks = 200
cfg.seed = 42
# Run
engine = traffic_sim.SimulationEngine(cfg)
result = engine.run_benchmark_soa()
print(f"Throughput: {result.throughput:,.0f} entity-ticks/sec")
print(f"Avg tick: {result.avg_tick_ms:.3f} ms")
# Get vehicle state
pos_x = engine.get_pos_x() # list of 10,000 x-positions
speeds = engine.get_speeds() # list of 10,000 speeds
states = engine.get_states() # 0=moving, 1=stopped, 2=turning
# Save checkpoint
engine.save_checkpoint("run_001.bin")
# Verify determinism
hash1 = engine.get_state_hash()
print(f"State hash: {hash1:#018x}")git clone https://github.com/sgm7373/RealFlow.git
cd RealFlow
pip install -r requirements.txt
# Compile C++ engine
g++ -O3 -march=native -shared -fPIC -std=c++17 \
-I src/ \
-I $(python3 -c "import pybind11; print(pybind11.get_include())") \
-I $(python3 -c "import sysconfig; print(sysconfig.get_path('include'))") \
src/SimulationEngine.cpp \
-o traffic_sim.so
python3 -c "import traffic_sim; print('✅ Ready')"RealFlow/
│
├── notebook/
│ └── realflow_simulation.ipynb # All 7 Colab cells — run top to bottom
│
├── src/
│ ├── Entity.hpp # SoA EntityPool + AoS VehicleAoS structs
│ ├── Intersection.hpp # 4-phase traffic light state machine
│ ├── Checkpoint.hpp # FNV-1a hashing + binary serialisation
│ └── SimulationEngine.cpp # Main loop + pybind11 bindings
│
├── report/
│ └── project_report.md # Full technical report
│
├── requirements.txt
├── .gitignore
└── README.md
| Technology | Version | Role |
|---|---|---|
| C++17 | GCC 11+ | Core simulation engine, hot loop |
| pybind11 | 2.11+ | C++ ↔ Python bindings |
| Python | 3.10+ | Orchestration, benchmarking, visualisation |
| NumPy | 1.24+ | Post-processing simulation output arrays |
| Matplotlib | 3.7+ | Traffic flow plots, heatmaps, benchmark charts |
std::mt19937 |
stdlib | Deterministic pseudo-random number generation |
| FNV-1a | custom | 64-bit state hashing for reproducibility |
| Google Colab | — | Development + execution environment |
Python executes ~50–100 million simple operations per second. C++ executes ~1–3 billion. For a simulation with 10,000 vehicles × 200 ticks = 2,000,000 entity updates, this gap matters. pybind11 lets us keep Python's ease of use for everything except the part that must be fast.
Modern CPUs have 32–64 KB L1 cache. A cache line is 64 bytes = 16 floats.
In AoS, vehicle structs are ~40 bytes each — accessing speed[i] for all vehicles
jumps across memory, generating cache misses for every access.
In SoA, speed[0..15] occupies one cache line — sequential access = zero cache misses
in the hot path. At 10,000 vehicles this difference is measurable and significant.
SHA-256 is cryptographically secure but overkill for simulation verification. FNV-1a is a non-cryptographic hash: extremely fast (processes 8 bytes per iteration), produces a 64-bit fingerprint, and has good avalanche properties — a single bit change in position data produces a completely different hash. Perfect for verifying that two simulation runs are byte-identical.
Text (CSV/JSON) serialisation of 10,000 float positions introduces rounding errors from decimal conversion. Binary serialisation writes the exact IEEE 754 float bits — so a checkpoint loaded back produces the exact same floating point state, which is essential for deterministic resume and regression testing.
MIT License — free to use, modify, and distribute.
Sourabh More GitHub: @sgm7373