Skip to content

llukehanna/BT-docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

BT — Kalshi Edge Bot (public engineering docs)

This repo contains the public documentation for BT, a Python trading bot I run against Kalshi prediction markets. The source is kept in a private repo to protect the edge. The goal here is to describe the engineering — architecture, math, safety, stack — with enough detail that it's verifiable as a real system, without giving away the strategy itself.

What it does

BT identifies systematically mispriced contracts on Kalshi and places automated orders on them. Two market families currently:

  1. Weather markets (KXHIGH* — daily high-temperature at US airports). Compares 31-member GFS ensemble forecasts from Open-Meteo against live Kalshi contract prices.
  2. Sports markets (KX* — NBA/NFL moneylines, spreads, totals). Compares sharp consensus odds (de-vigged, from multiple books via a public aggregator) against Kalshi pricing.

In both cases the pipeline is the same shape: predict → compare → threshold → size → place → settle → measure.

Why these markets

Prediction markets have structural features that reward engineering discipline over information asymmetry:

  • Objective settlement. Weather settles off the NWS Daily Climate Report; sports settles off the final score. There's nothing to argue about — calibration either works or it doesn't.
  • Retail-dominated flow. Pricing inefficiencies persist longer than in equities/FX.
  • Public data sources. GFS ensemble forecasts and consensus sports odds are free and high-quality. No data moat required — the edge is in the pipeline, not the data.
  • Quadratic fee structure (0.07 * P * (1−P) per-fill on Kalshi taker orders) makes edge math sharp and easy to reason about.

Math, briefly

Fees

Kalshi's taker fee is quadratic in contract price:

fee = 0.07 * P * (1 - P)

This peaks at $0.0175/contract at P=0.50 and falls to zero at the boundaries. Every edge calculation deducts fees per-fill before deciding to trade — a net edge below the fee curve is not an edge. Maker (limit) orders avoid the fee entirely when filled passively.

Edge

edge = p_model - p_market - fee(p_market)

Trade only when |edge| exceeds a calibrated threshold.

Sizing — fractional Kelly

Full Kelly maximizes long-run log-wealth but drawdowns are brutal and miscalibration destroys the strategy. Fractional Kelly (start small, ramp only after calibration proves out) trades expected growth for survivability:

kelly_fraction = f * (edge / (price * (1 - price)))
position_usd = bankroll * min(kelly_fraction, max_position_pct)

Both f and max_position_pct are hard-capped in config and can't be raised without tripping the risk gate.

Calibration

A model that says "70%" and wins 55% of the time is broken, not edge. BT ships a calibration pipeline (kwx calibrate) that buckets historical signals by predicted probability and checks realized hit rate against predicted hit rate via reliability curves + Brier score. Live trading is gated on the system passing calibration — not on any backtest P&L number.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         scheduled scans                          │
│                 (systemd timers — Linux/macOS)                   │
└─────────────────────────────────┬────────────────────────────────┘
                                  │
                                  ▼
 ┌─────────────────────┐     ┌──────────────────────┐     ┌─────────────────────┐
 │  forecast adapters  │     │   market adapters    │     │   odds adapter      │
 │                     │     │                      │     │                     │
 │ Open-Meteo (GFS/    │     │ Kalshi — REST +      │     │ sharp consensus     │
 │ HRRR/NBM), METAR    │     │ WebSocket auth via   │     │ (de-vigged)         │
 │                     │     │ RSA-PSS signing      │     │                     │
 └──────────┬──────────┘     └──────────┬───────────┘     └──────────┬──────────┘
            │                           │                            │
            └───────────────┬───────────┴────────────────┬───────────┘
                            │                            │
                            ▼                            ▼
                   ┌──────────────────┐        ┌──────────────────┐
                   │  signal engine   │        │  odds engine     │
                   │                  │        │                  │
                   │ bracket parsing, │        │ de-vig, average, │
                   │ prob distribution│        │ consensus line   │
                   └────────┬─────────┘        └────────┬─────────┘
                            │                           │
                            └─────────────┬─────────────┘
                                          ▼
                            ┌──────────────────────────┐
                            │      edge detector       │
                            │ edge = p_model − p_mkt   │
                            │          − fee(p_mkt)    │
                            └────────────┬─────────────┘
                                         ▼
                           ┌───────────────────────────┐
                           │ risk gate + kill switch   │
                           │ • hard caps (pos, day)    │
                           │ • calibration pass check  │
                           │ • .kill file check        │
                           └────────────┬──────────────┘
                                        ▼
                          ┌────────────────────────────┐
                          │   execution (maker first)  │
                          │   order lifecycle + fills  │
                          └────────────┬───────────────┘
                                       ▼
                          ┌────────────────────────────┐
                          │  SQLite + JSONL journal    │
                          │  (ACID state + audit log)  │
                          └────────────┬───────────────┘
                                       ▼
                          ┌────────────────────────────┐
                          │  settlement + calibration  │
                          │  (NWS / final score)       │
                          └────────────────────────────┘

Repo layout (private)

kwx/                      # main package (~12.6K LOC Python)
  auth/                   # Kalshi RSA-PSS auth signing
  forecast/               # Open-Meteo GFS/HRRR adapters, METAR pull
  markets/                # Kalshi REST + WS market data
  sports_markets/         # Kalshi KX* sports adapters (at-format, spread, total)
  odds/                   # sharp consensus odds client, de-vig engine
  signals/                # bracket parsing + probability distribution
  sports_edge/            # sports-specific edge compute
  matching/               # signal ↔ market matching
  risk/                   # hard caps, kill switch, gates
  exec/                   # order placement, lifecycle, cancel
  settle/                 # settlement fetch + resolution
  calib/                  # reliability curves, Brier, auto-calibration
  backtest/               # historical replay harness
  storage/                # SQLite schema + migrations + JSONL journal
  ops/                    # ops scripts, daily summary, heartbeat
  scan.py                 # top-level scan orchestrator
  cli.py, __main__.py     # operator CLI

scripts/                  # ops scripts (sports-scan, heartbeat, daily summary, CI gate)
deploy/                   # systemd units (Linux) / launchd plists (macOS)
tests/                    # 88 test files — pytest + respx (httpx mocks)
docs/                     # runbook, key rotation, operator handbook

Stack + why

Concern Choice Rationale
Runtime Python 3.11+ Stdlib is strong enough for most of this, and cryptography + httpx + numpy + scikit-learn cover the gaps.
Package manager uv + pyproject.toml Fast, deterministic, handles both venv and lockfile.
HTTP httpx (sync) Modern successor to requests. No async complexity needed — scan frequency is minutes, not milliseconds.
Kalshi client Hand-rolled thin wrapper Community SDKs are auto-generated, opaque, and churn. Auditability matters when orders are automatic.
Auth cryptography (RSA-PSS / SHA-256 / MGF1) Per Kalshi's 2025 auth spec: per-request signed header, salt = digest length.
Forecasts Raw HTTP to Open-Meteo Skips openmeteo-requests SDK; cuts a dependency.
Storage SQLite (stdlib) + append-only JSONL journal ACID single-file DB for state; JSONL for audit replay. No Postgres required for single-operator scale.
Calibration numpy + scikit-learn IsotonicRegression Standard, proven, reversible.
Scheduling systemd timers (Linux) / launchd (macOS) Process supervision belongs to the OS, not the app. No in-process while True: sleep().
Testing pytest + respx respx for httpx mocking; fixture-based replay of real API responses so the bracket parser is tested against actual yes_sub_title formats without burning rate limits.
Lint / types ruff + mypy --strict Strict mode on all guarded subpackages. CI gate (scripts/ci.sh) must pass before every commit.
Config pydantic-settings + config.toml Fail-fast validation on startup. Invalid config → exit code 2.

What's explicitly NOT used

Tech Why not
kalshi-python (deprecated) Auto-generated, opaque, churns
requests Maintenance mode; httpx is the modern default
pandas Overkill; numpy + SQL is enough at this scale
DuckDB ~11k signal rows/year — SQLite is already instant
asyncio / async httpx Zero concurrency benefit for this workload — pure complexity tax
APScheduler OS-level schedulers are more reliable and observable
python-dotenv alone Loses validation — pydantic-settings is a small upgrade

Safety

This is the part I spent the most time getting right.

  • No live trading until paper-trading calibration passes. The code ships gated. env = "demo" by default; flipping to prod requires the calibration report to show realized hit rate within tolerance of predicted hit rate, across a minimum sample window.
  • Hard caps (per-position USD, per-position % of bankroll, per-day loss) enforced at the risk gate — not as soft checks inside the strategy. The gate is a separate module; the strategy cannot bypass it.
  • Kill switch. A .kill file in the runtime directory halts all order placement immediately. Heartbeat checks the file every scan. Operator can flip this in <2 seconds.
  • Fee-aware everywhere. The quadratic fee is deducted in every edge calculation — a signal that doesn't pay for itself after fees is not a signal.
  • No real-money trading in automated tests. Test suites hit a recorded-response fixture layer (respx); they cannot reach the real Kalshi API.
  • Settlement is authoritative. PnL comes from resolved contracts, not from end-of-day marks. Settlement source (NWS / final score) is the only source of truth.

What I learned

A few things that weren't obvious at the start:

  • Time zones are the #1 bug source. NWS reports on LST (no DST). Kalshi tickers bracket dates in the market's nominal TZ. Every settlement bug I've hit came from a silent TZ assumption. The codebase eventually converged on "store UTC, render local, settle off LST, assert at every boundary."
  • The RSA-PSS signing is exact. salt_length = digest_length (SHA-256 = 32 bytes). Getting this wrong fails silently for hours until you look at response codes. Now tested explicitly.
  • "Data quality" as a first-class concept. The system writes a snapshot per scan with provenance. If a forecast adapter changes its response shape, the signal generator doesn't silently produce nonsense — it raises.
  • Backoff and rate-limit pacing. Open-Meteo free tier is 10K req/day. Kalshi has tiered limits. The OddsPapi client in particular needed proactive pacing between requests (tested with pytest) to avoid 429s.

Status

  • 17 phases shipped
  • 88 test files
  • Weather pipeline operational (demo env, running on a Pi via systemd timers)
  • Sports pipeline operational (v2.0), NBA + NFL at-format, spreads, totals
  • Calibration reporting integrated into daily summary
  • Running demo-env only — not live with real capital

What's in the private repo

Full source (12.6K LOC Python), strategy-specific parameters, calibrated thresholds, the backtest harness, market-specific adapters, and the complete .planning/ trail (17 phases of discuss → plan → execute → verify).

Happy to walk through it on request.


Author: Luke Hanna · lllukehanna@gmail.com · Los Angeles

About

Public engineering docs for BT — a Python trading bot against Kalshi weather + sports markets. Source private; architecture, math, and stack documented here.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors