MLB betting intelligence system with ML-driven predictions, live execution on Kalshi, and transparent performance analytics.
Sports betting (especially MLB) has a few persistent problems:
- Too much data, too little time — schedule context, pitcher changes, weather, team form, and live odds all move quickly.
- Execution is the hard part — even good models fail if betting decisions aren’t consistent, timed correctly, and recorded.
- Post-hoc analysis is usually missing — without clean history, you can’t answer “did this strategy work?” with confidence.
MLB Intelligence turns the day-to-day work into a repeatable system:
- Ingest schedules, odds, weather, pitcher stats, and team stats
- Engineer features (80+ game-level features)
- Predict win probabilities (LR default; optional LightGBM/XGBoost/MLP/ensemble)
- Compare to the market to identify edge
- Execute on Kalshi (live or dry-run)
- Settle + audit results and ROI in a single database
MLB Intelligence is an end-to-end stack for MLB trading:
- One orchestrator that runs the full workflow (
run_pipeline.py) - A dashboard (FastAPI) for transparent performance analytics + operator controls
- A Postgres-backed history for bets, balances, orders, and model artifacts
- A public modeling snapshot (
data/master_mlb.csv) that can be updated periodically (not continuously)
- Dashboard — FastAPI app serving public analytics + private operator controls
- Pipeline — runs ahead of scheduled first pitch; predicts and places bets in parallel
- Model — calibrated classifiers + walk-forward validation; Kelly stake sizing
- DB — PostgreSQL source of truth for games, bets, and run history
Key components:
run_pipeline.py # Main orchestrator
run_pipeline_v2.py # Parallel V2 orchestrator (LightGBM; dry-run by default)
fetch/ # Data ingestion — odds, weather, stats
models/model_v1/train.py # Model training + walk-forward evaluation
models/model_v1/predict.py # Inference — win probabilities + sizing
models/model_v2/ # V2 LightGBM predict + nightly deterministic eval (`eval.py`)
bet/place_bet.py # Kalshi execution
settle_games.py # Post-game settlement
dashboard/app.py # Dashboard API
homelab.py # SSH helper for app/db LXCs
- Postgres is the source of truth.
- Files under
data/are primarily caches/backups and may be stale. data/master_mlb.csvis intended as a public snapshot of modeling data and can be refreshed on a controlled cadence.
See CONTRIBUTING.md for local setup, testing, and dev commands.
Push to GitHub → GitHub Actions runner auto-deploys to the app LXC.
Direct LXC SSH debugging via homelab.py:
python3 homelab.py app "systemctl status mlb-dashboard --no-pager -l | tail -40"
python3 homelab.py app "journalctl -u mlb-dashboard -n 100 --no-pager"BUSL-1.1 (Business Source License 1.1)
- See LICENSE
- See LICENSE-FAQ.md
