We estimate π using a 2D Monte Carlo approach with NumPy vectorization and chunked random generation, enabling runs with 100,000,000 points on typical machines. Using float32 instead of float64 reduces memory bandwidth and accelerates the computation without impacting the Monte Carlo accuracy.
- N = 100,000,000
- NumPy (float32, chunked): X.XXX s
- Baseline (previous method): Y.YYY s
- Speedup: baseline / NumPy = Z.ZZ×
Replace the numbers above with your actual timings from
bench_pi_numpy.py.
- Vectorization: Replace Python loops with NumPy array ops (
x*x + y*y <= 1). - Chunked streaming: Process data in configurable chunks to keep peak memory low and maintain cache friendliness.
float32data: Halves memory traffic compared tofloat64, often dominating performance for random-number-heavy code.- Modern RNG: Use
np.random.default_rng(PCG64) for fast, high-quality random streams.
| Method | Samples | Time (s) | π Estimate | Speedup vs Pure Python |
|---|---|---|---|---|
| Pure Python (loop) | 100M | ~60.0 | 3.14159 | 1× |
| NumPy Vectorized | 100M | 2.35 | 3.14172 | ~25× faster |
Speedup Strategy
- Used NumPy vectorization (
x*x + y*y <= 1.0) instead of Python loops. - Generated random numbers in chunks of 10 million for memory efficiency.
- Used
float32to reduce memory bandwidth. - Used
timeitfor reliable timing instead of%timemagic.