Empirical analysis of how Polymarket's multi-market architecture distributes volume, creates ghost markets, and impacts information precision.
Dataset: 36,777 events containing 190,783 individual binary contracts across 6 categories (Politics, Crypto, Sports, Finance, Culture, Weather). Full Polymarket history July 2022 - March 2026 via the Gamma API.
- Top 5 markets capture 90% of event volume regardless of how many markets exist. The top 3 capture 94% at the median across 16,856 events.
- Ghost market rate scales with N. Events with 11-20 markets: 36% ghosts. Events with 51+: 72%. Functional market count plateaus at 10-15 regardless of total N.
- The fixed $0.01 tick creates a structural friction gradient. Tail contracts (price <$0.10) face 111% median rounding tax vs 1.6% for contracts near $0.50.
- N is growing over time but the gap between claimed resolution and used resolution widens with N.
pip install -r requirements.txt
-
notebooks/01-data-collection.ipynb- Fetches all event/market metadata from the Gamma API. First run takes ~5-10 minutes (API pagination). Subsequent runs load from cache (data/events_metadata.parquet). -
notebooks/02-discretisation-cost-analysis.ipynb- All analysis: N distribution, concentration ratios, cumulative volume curves, ghost market rates, rounding tax, and table/chart generation.
Run in order. Notebook 02 depends on the cached data from notebook 01.
The data/ directory is gitignored (parquet files are large). Running notebook 01 regenerates it from the Gamma API. No API keys required.
notebooks/ Jupyter notebooks (run in order)
src/ Python modules (fetch, config)
output/ Charts and table PNGs (committed)
data/ Cached parquet files (gitignored, regenerated by notebook 01)
- Saguillo et al. (2025), "Unravelling the Probabilistic Forest" - $40M arbitrage from Polymarket's fragmented binary structure
- Capponi et al. (2025), "Semantic Trading Clusters" - leader-follower trading across isolated correlated markets
Analysis by functionSPACE. We're building a prediction market primitive where traders express beliefs as continuous probability distributions rather than binary positions.