We predict game results using models trained on historical data, then trade on Polymarket (NHL, NBA, Tennis, etc.) based on those predictions.
Main idea: Collect game data (outcomes, scores, in-game prices) → train models on that data → use the trained models to predict game outcomes and price moves → trade when our predictions disagree with the market.
Current implementation: NHL (Polymarket + ESPN). The same flow—data → train → predict → trade—extends to other sports by adding sport-specific data and market discovery.
- Data — We gather historical game data: who won, scores, period/clock, and Polymarket token prices over time. This is the training set.
- Training — We train ML models on that data: a pre-game model predicts who wins (P(home wins)); an in-game model predicts reward (will price go up if we buy now?) or min/max price in a window.
- Prediction — At prediction time we feed current game state (and prices) into the trained models and get probabilities or price targets.
- Trading — We compare our predictions to Polymarket’s prices and trade when there is enough edge (pre-game), or when the in-game model signals buy low / sell high.
So: all trading is driven by predictions from models that were trained on that same kind of data.
| Mode | Description |
|---|---|
| Pre-game | Trained outcome model predicts P(home wins). We compare to Polymarket prices and place orders when our predicted probability implies an edge. |
| In-game | Trained reward/price model, fed with score, time left, period, and prices, predicts whether buying now will be profitable. We trade when the model signals buy low / sell high. |
Features:
- Data pipeline: Game outcomes (ESPN), game records (outcome + events + price series), optional 1s price cache during live games. This data is what we train on.
- Trained models: Pre-game outcome model (P(home wins)); in-game reward model (P(reward), P(loss), optional min/max price and buy/sell-opportunity). All trained on the collected data.
- Strategy: Use model predictions with thresholds or Kelly sizing; storm-loss and oversold rules.
- Execution: Paper trading by default; live orders via Polymarket CLOB when credentials are set.
| Layer | Technology |
|---|---|
| Language | Python 3 (type hints, dataclasses) |
| APIs | Polymarket (Gamma API + CLOB via py-clob-client), ESPN (scoreboard, summary, team schedule) |
| Auth / config | python-dotenv (.env), optional Polymarket private key + API creds for live orders |
| ML / training | scikit-learn (LogisticRegression, MLPClassifier, GradientBoosting), numpy, joblib (model serialization) |
| HTTP | requests |
| Other | beautifulsoup4 (Harvitronix Elo scraping), pathlib, concurrent.futures (parallel backtest / data collection) |
config.py— ESPN/Polymarket URLs, thresholds (min edge, Kelly fraction, max position), paper vs live, data collector settings.polymarket_client.py— CLOB client (public + authenticated), order books, market discovery (e.g. NHL via Gammatag_id).espn_client.py— Scoreboard, game info, live game state (score, period, clock).model.py— Elo store, outcome model (P(home wins)); used for pre-game fair price.in_game_strategy.py— Signal functions: fixed targets, reward-model thresholds, price-range (predicted min/max), dual-token.train_in_game_model.py— Train reward / loss / buy-sell / price-range models fromin_game_dataset.jsonl.execution.py— Paper vs live, position tracking, order validation against order book.
- Python 3.10+
- Dependencies in
requirements.txt:
requests>=2.28.0
beautifulsoup4>=4.12.0
py-clob-client>=0.20.0
python-dotenv>=1.0.0
scikit-learn>=1.0.0
numpy>=1.20.0
joblib>=1.0.0
Install:
cd polymarket_nhl_bot
pip install -r requirements.txt-
Clone / open the project and install dependencies (above).
-
Environment (optional but recommended)
Copy.env.exampleto.envand set:PAPER_TRADING=true(default) orfalsefor live orders.- For live:
POLYMARKET_PRIVATE_KEY, and optionallyPOLYMARKET_FUNDER,POLYMARKET_API_KEY,POLYMARKET_API_SECRET.
-
Data directories
Scripts createdata/,data/game_records/,data/polymarket_snapshots/as needed.
Flow: build the dataset → train the models on that data → backtest and run. Below are the concrete steps.
# Fetch game outcomes (ESPN)
python -m polymarket_nhl_bot.fetch_game_outcomes --from 2025-10-01 --to 2026-02-26
# Build game records (outcome + events + price series per game)
python -m polymarket_nhl_bot.build_game_records --from 2026-02-01 --to 2026-02-26Optional: run the data collector during live games for dense price history, or cache_live_prices.py for 1s snapshots; then build records with --snapshots-dir.
python harvitronix_elo.py # writes data/elo_ratings.jsonPre-game outcome model (predicts P(home wins)):
python -m polymarket_nhl_bot.train_model --dir data/game_records --model-out data/outcome_model.pklIn-game reward model (trained on in-game data):
# Build in-game dataset from game records (reward/loss and optional buy/sell/price-range targets)
python -m polymarket_nhl_bot.build_in_game_dataset --dir data/game_records --out data/in_game_dataset.jsonl --window-sec 600 --fee 0.02
# Train the model on that data (logistic default; optional: --model mlp, --model gb, --train-loss, --train-price-range, --train-buy-sell)
python train_in_game_model.py --data data/in_game_dataset.jsonl --model-out data/reward_model.pkl --epochs 500Pre-game (paper):
python -m polymarket_nhl_bot.backtest_paper --days 14 --threshold 0.05 --stake 1.0In-game (reward model on historical game records):
python backtest_in_game.py --from-records --use-reward-model --capital 1000 --month 2026-01Optional: --use-price-range, --use-dual, --sizing kelly, --stake-pct, etc. See COMMANDS.md.
# One-shot: fetch prices + game state, run model, print signals
python live_test.py --model data/reward_model.pkl
# Loop every 60s
python live_test.py --model data/reward_model.pkl --loop 60python main.py # pre-game predictions, optional paper/live ordersData collector (poll Polymarket during games):
python -m polymarket_nhl_bot.data_collector
python -m polymarket_nhl_bot.data_collector --fast # 5s poll for finer dataThe architecture is sport-agnostic; only data and market discovery are sport-specific.
| Step | NHL (current) | NBA / Tennis / others |
|---|---|---|
| Game data | espn_client.py (ESPN NHL scoreboard/summary) |
Add or swap client: ESPN NBA, ATP/WTA, etc. Same idea: score/period/clock → game state. |
| Markets | discover_nhl_markets() (Gamma API tag_id=899) |
Use Gamma (or CLOB) with the right tags/filters for NBA, Tennis, etc. |
| Game records | build_game_records uses NHL events + Polymarket prices |
Reuse pipeline; feed sport-specific event/score/clock and same price format. |
| In-game features | Score, period, time remaining, Elo, price deltas | Keep same feature set where applicable; add sport-specific fields (e.g. sets for Tennis) if needed. |
| Models / backtest / execution | No change | Same train_in_game_model, backtest_in_game, live_test, execution. |
So: implement or plug in a sport-specific client (like espn_client for NHL) and a market discovery function (like discover_nhl_markets) for that sport; keep the rest of the pipeline (dataset builder, training, strategy, execution) as is.
polymarket_nhl_bot/
├── config.py # URLs, thresholds, env
├── polymarket_client.py # CLOB + Gamma, order books, market discovery
├── espn_client.py # ESPN NHL scoreboard / game state
├── model.py # Elo + outcome model (pre-game)
├── in_game_strategy.py # Signal functions (reward, price-range, dual)
├── train_in_game_model.py # Train reward/loss/price-range models
├── build_in_game_dataset.py
├── build_game_records.py
├── backtest_in_game.py # In-game backtest from records or snapshots
├── live_test.py # Live prices + game state → signals
├── execution.py # Paper/live, positions, order checks
├── main.py # Pre-game bot entry
├── data_collector.py # Poll Polymarket during games
├── replay_game.py # Replay a past game with model
├── COMMANDS.md # Full command reference
├── requirements.txt
└── data/ # game_records, snapshots, models, elo, etc.
- COMMANDS.md — Full command reference (data, validate, train, backtest, run).
- Polymarket: CLOB API, Gamma API.
- ESPN: Public scoreboard/summary APIs used for NHL game state.
Trading on prediction markets involves risk. This project is for research and education. Use paper trading first; only deploy real funds if you understand the risks and comply with your jurisdiction’s laws.