Real-time options pricing and risk engine: a C++17 Black-Scholes / Monte Carlo core with analytic Greeks, exposed to Python via pybind11, GPU-accelerated on Apple Metal, and streamed over WebSocket to a live Next.js dashboard.
┌──────────────────────────────┐
│ Next.js dashboard │
│ (live prices, Greeks, P&L) │
└───────────────┬──────────────┘
│ WebSocket (p99 < 5 ms, localhost)
┌───────────────┴──────────────┐
│ FastAPI + uvicorn server │
│ (server/ws_server.py) │
└───────────────┬──────────────┘
│ pybind11 (GIL released around C++ calls)
┌───────────────────────────┴───────────────────────────┐
│ C++17 pricing core (core/) │
│ Black-Scholes · analytic Greeks · Monte Carlo (GBM) │
├────────────────────────────┬───────────────────────────┤
│ CPU: SIMD (Accelerate │ GPU: Apple Metal │
│ vForce) + 8-thread MC │ Philox 4x32-10 PRNG, │
│ 3.8–4.1x vs NumPy │ up to 69x vs NumPy │
└────────────────────────────┴───────────────────────────┘
The Python layer (python/) holds the benchmark harnesses, market-data validation, and the VaR backtest.
All numbers measured on an Apple M3 MacBook Air. The baseline is vectorized NumPy (PCG64 generator, fully vectorized batch pricing) — not a Python for-loop, so the speedups are against a competent baseline. GPU timings include the full round trip: host parameter write, Metal command encoding, GPU dispatch, and host readback/reduction — transfer overhead is not excluded.
| Workload | Paths | Speedup vs vectorized NumPy |
|---|---|---|
| Monte Carlo, Apple Metal GPU | 10,000,000 | 69x |
| Monte Carlo, Apple Metal GPU | 1,000,000 | 23.5x |
| Monte Carlo, CPU (8 threads + SIMD) | — | 4.1x |
| Black-Scholes batch, CPU (Accelerate vForce SIMD) | — | 3.8x |
The GPU advantage grows with path count; at small workloads (100k paths) fixed dispatch cost dominates and the CPU path is the right choice. The GPU kernel uses the Philox 4x32-10 counter-based PRNG so that every GPU thread gets a statistically independent stream — a guarantee sequential PRNGs like mt19937 do not provide when split across threads. The kernel computes in float32; the host reduces partial sums in float64 to avoid accumulation error.
Streaming latency: p99 under 5 ms (measured 4.4 ms) end-to-end through the FastAPI WebSocket layer — localhost loopback, 5 concurrent clients, measured by server/latency_harness.py.
VaR backtest: historical 95% VaR backtested on 851 trading days of real multi-asset market data. Observed breach rate: 4.5% against the 5% expected for a correctly calibrated 95% VaR. A breach rate near — not far below — the nominal 5% is the goal: materially higher would mean the model understates risk, materially lower would mean it overstates risk and ties up capital. 4.5% over 851 days is within sampling error of the target.
Reproduce:
python python/benchmark_v2.py # CPU SIMD + multithreaded MC vs NumPy
python python/phase6_gate.py # GPU MC at 100k / 1M / 10M paths
python python/phase3_gate.py # VaR backtest + market-data validation
python server/latency_harness.py # WebSocket p50/p95/p99 latencyRequirements: Apple Silicon Mac (Accelerate/NEON for SIMD, Metal for GPU), CMake >= 3.21, a C++17 compiler, Python 3.9+, Node.js >= 18.17.
# 1. Build the C++ core, tests, and Python bindings
pip install pybind11 numpy scipy yfinance fastapi uvicorn websockets
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DPython3_EXECUTABLE=$(which python3)
cmake --build build --parallel
# 2. Run the C++ acceptance gate (BS prices vs Hull, Greeks analytic-vs-FD,
# MC convergence)
./build/tests/phase1_validation
# 3. Start the WebSocket server
python server/ws_server.py
# 4. Start the dashboard
cd dashboard && npm install && npm run dev
# 5. End-to-end tests (Playwright starts the server + dashboard itself)
cd dashboard && npx playwright testcore/ C++17 pricing library — Black-Scholes, Monte Carlo, Greeks;
Metal GPU kernel in core/src/monte_carlo_gpu.mm
bindings/ pybind11 bindings (GIL released around C++ compute)
python/ Benchmarks, market-data validation, VaR backtest
server/ FastAPI + uvicorn WebSocket server, latency harness
dashboard/ Next.js dashboard + Playwright end-to-end tests
tests/ C++ acceptance gate (BS prices, Greeks, MC convergence)
MIT — see LICENSE.