Skip to content

jeverett32/mlb-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

447 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLB Intelligence logo

MLB Intelligence

MLB betting intelligence system with ML-driven predictions, live execution on Kalshi, and transparent performance analytics.

Tests BUSL-1.1 Python 3.10+

MLB Intelligence dashboard

Problem

Sports betting (especially MLB) has a few persistent problems:

  • Too much data, too little time — schedule context, pitcher changes, weather, team form, and live odds all move quickly.
  • Execution is the hard part — even good models fail if betting decisions aren’t consistent, timed correctly, and recorded.
  • Post-hoc analysis is usually missing — without clean history, you can’t answer “did this strategy work?” with confidence.

Action

MLB Intelligence turns the day-to-day work into a repeatable system:

  1. Ingest schedules, odds, weather, pitcher stats, and team stats
  2. Engineer features (80+ game-level features)
  3. Predict win probabilities (LR default; optional LightGBM/XGBoost/MLP/ensemble)
  4. Compare to the market to identify edge
  5. Execute on Kalshi (live or dry-run)
  6. Settle + audit results and ROI in a single database

Solution

MLB Intelligence is an end-to-end stack for MLB trading:

  • One orchestrator that runs the full workflow (run_pipeline.py)
  • A dashboard (FastAPI) for transparent performance analytics + operator controls
  • A Postgres-backed history for bets, balances, orders, and model artifacts
  • A public modeling snapshot (data/master_mlb.csv) that can be updated periodically (not continuously)

What’s inside

  • Dashboard — FastAPI app serving public analytics + private operator controls
  • Pipeline — runs ahead of scheduled first pitch; predicts and places bets in parallel
  • Model — calibrated classifiers + walk-forward validation; Kelly stake sizing
  • DB — PostgreSQL source of truth for games, bets, and run history

Key components:

run_pipeline.py       # Main orchestrator
run_pipeline_v2.py    # Parallel V2 orchestrator (LightGBM; dry-run by default)
fetch/                # Data ingestion — odds, weather, stats
models/model_v1/train.py   # Model training + walk-forward evaluation
models/model_v1/predict.py # Inference — win probabilities + sizing
models/model_v2/    # V2 LightGBM predict + nightly deterministic eval (`eval.py`)
bet/place_bet.py      # Kalshi execution
settle_games.py       # Post-game settlement
dashboard/app.py      # Dashboard API
homelab.py            # SSH helper for app/db LXCs

Data model philosophy

  • Postgres is the source of truth.
  • Files under data/ are primarily caches/backups and may be stale.
  • data/master_mlb.csv is intended as a public snapshot of modeling data and can be refreshed on a controlled cadence.

Docs

Contributing / local development

See CONTRIBUTING.md for local setup, testing, and dev commands.

Deployment

Push to GitHub → GitHub Actions runner auto-deploys to the app LXC.

Direct LXC SSH debugging via homelab.py:

python3 homelab.py app "systemctl status mlb-dashboard --no-pager -l | tail -40"
python3 homelab.py app "journalctl -u mlb-dashboard -n 100 --no-pager"

Data sources

License

BUSL-1.1 (Business Source License 1.1)

About

Machine learning pipeline for predicting MLB games deployed onto a web app which performs live predictions and bets on kalshi

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-FAQ.md

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors