BT — Kalshi Edge Bot (public engineering docs)

This repo contains the public documentation for BT, a Python trading bot I run against Kalshi prediction markets. The source is kept in a private repo to protect the edge. The goal here is to describe the engineering — architecture, math, safety, stack — with enough detail that it's verifiable as a real system, without giving away the strategy itself.

What it does

BT identifies systematically mispriced contracts on Kalshi and places automated orders on them. Two market families currently:

Weather markets (KXHIGH* — daily high-temperature at US airports). Compares 31-member GFS ensemble forecasts from Open-Meteo against live Kalshi contract prices.
Sports markets (KX* — NBA/NFL moneylines, spreads, totals). Compares sharp consensus odds (de-vigged, from multiple books via a public aggregator) against Kalshi pricing.

In both cases the pipeline is the same shape: predict → compare → threshold → size → place → settle → measure.

Why these markets

Prediction markets have structural features that reward engineering discipline over information asymmetry:

Objective settlement. Weather settles off the NWS Daily Climate Report; sports settles off the final score. There's nothing to argue about — calibration either works or it doesn't.
Retail-dominated flow. Pricing inefficiencies persist longer than in equities/FX.
Public data sources. GFS ensemble forecasts and consensus sports odds are free and high-quality. No data moat required — the edge is in the pipeline, not the data.
Quadratic fee structure (0.07 * P * (1−P) per-fill on Kalshi taker orders) makes edge math sharp and easy to reason about.

Math, briefly

Fees

Kalshi's taker fee is quadratic in contract price:

fee = 0.07 * P * (1 - P)

This peaks at $0.0175/contract at P=0.50 and falls to zero at the boundaries. Every edge calculation deducts fees per-fill before deciding to trade — a net edge below the fee curve is not an edge. Maker (limit) orders avoid the fee entirely when filled passively.

Edge

edge = p_model - p_market - fee(p_market)

Trade only when |edge| exceeds a calibrated threshold.

Sizing — fractional Kelly

Full Kelly maximizes long-run log-wealth but drawdowns are brutal and miscalibration destroys the strategy. Fractional Kelly (start small, ramp only after calibration proves out) trades expected growth for survivability:

kelly_fraction = f * (edge / (price * (1 - price)))
position_usd = bankroll * min(kelly_fraction, max_position_pct)

Both f and max_position_pct are hard-capped in config and can't be raised without tripping the risk gate.

Calibration

A model that says "70%" and wins 55% of the time is broken, not edge. BT ships a calibration pipeline (kwx calibrate) that buckets historical signals by predicted probability and checks realized hit rate against predicted hit rate via reliability curves + Brier score. Live trading is gated on the system passing calibration — not on any backtest P&L number.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         scheduled scans                          │
│                 (systemd timers — Linux/macOS)                   │
└─────────────────────────────────┬────────────────────────────────┘
                                  │
                                  ▼
 ┌─────────────────────┐     ┌──────────────────────┐     ┌─────────────────────┐
 │  forecast adapters  │     │   market adapters    │     │   odds adapter      │
 │                     │     │                      │     │                     │
 │ Open-Meteo (GFS/    │     │ Kalshi — REST +      │     │ sharp consensus     │
 │ HRRR/NBM), METAR    │     │ WebSocket auth via   │     │ (de-vigged)         │
 │                     │     │ RSA-PSS signing      │     │                     │
 └──────────┬──────────┘     └──────────┬───────────┘     └──────────┬──────────┘
            │                           │                            │
            └───────────────┬───────────┴────────────────┬───────────┘
                            │                            │
                            ▼                            ▼
                   ┌──────────────────┐        ┌──────────────────┐
                   │  signal engine   │        │  odds engine     │
                   │                  │        │                  │
                   │ bracket parsing, │        │ de-vig, average, │
                   │ prob distribution│        │ consensus line   │
                   └────────┬─────────┘        └────────┬─────────┘
                            │                           │
                            └─────────────┬─────────────┘
                                          ▼
                            ┌──────────────────────────┐
                            │      edge detector       │
                            │ edge = p_model − p_mkt   │
                            │          − fee(p_mkt)    │
                            └────────────┬─────────────┘
                                         ▼
                           ┌───────────────────────────┐
                           │ risk gate + kill switch   │
                           │ • hard caps (pos, day)    │
                           │ • calibration pass check  │
                           │ • .kill file check        │
                           └────────────┬──────────────┘
                                        ▼
                          ┌────────────────────────────┐
                          │   execution (maker first)  │
                          │   order lifecycle + fills  │
                          └────────────┬───────────────┘
                                       ▼
                          ┌────────────────────────────┐
                          │  SQLite + JSONL journal    │
                          │  (ACID state + audit log)  │
                          └────────────┬───────────────┘
                                       ▼
                          ┌────────────────────────────┐
                          │  settlement + calibration  │
                          │  (NWS / final score)       │
                          └────────────────────────────┘

Repo layout (private)

kwx/                      # main package (~12.6K LOC Python)
  auth/                   # Kalshi RSA-PSS auth signing
  forecast/               # Open-Meteo GFS/HRRR adapters, METAR pull
  markets/                # Kalshi REST + WS market data
  sports_markets/         # Kalshi KX* sports adapters (at-format, spread, total)
  odds/                   # sharp consensus odds client, de-vig engine
  signals/                # bracket parsing + probability distribution
  sports_edge/            # sports-specific edge compute
  matching/               # signal ↔ market matching
  risk/                   # hard caps, kill switch, gates
  exec/                   # order placement, lifecycle, cancel
  settle/                 # settlement fetch + resolution
  calib/                  # reliability curves, Brier, auto-calibration
  backtest/               # historical replay harness
  storage/                # SQLite schema + migrations + JSONL journal
  ops/                    # ops scripts, daily summary, heartbeat
  scan.py                 # top-level scan orchestrator
  cli.py, __main__.py     # operator CLI

scripts/                  # ops scripts (sports-scan, heartbeat, daily summary, CI gate)
deploy/                   # systemd units (Linux) / launchd plists (macOS)
tests/                    # 88 test files — pytest + respx (httpx mocks)
docs/                     # runbook, key rotation, operator handbook

Stack + why

Concern	Choice	Rationale
Runtime	Python 3.11+	Stdlib is strong enough for most of this, and `cryptography` + `httpx` + `numpy` + `scikit-learn` cover the gaps.
Package manager	uv + `pyproject.toml`	Fast, deterministic, handles both venv and lockfile.
HTTP	httpx (sync)	Modern successor to `requests`. No async complexity needed — scan frequency is minutes, not milliseconds.
Kalshi client	Hand-rolled thin wrapper	Community SDKs are auto-generated, opaque, and churn. Auditability matters when orders are automatic.
Auth	`cryptography` (RSA-PSS / SHA-256 / MGF1)	Per Kalshi's 2025 auth spec: per-request signed header, salt = digest length.
Forecasts	Raw HTTP to Open-Meteo	Skips `openmeteo-requests` SDK; cuts a dependency.
Storage	SQLite (stdlib) + append-only JSONL journal	ACID single-file DB for state; JSONL for audit replay. No Postgres required for single-operator scale.
Calibration	numpy + scikit-learn IsotonicRegression	Standard, proven, reversible.
Scheduling	systemd timers (Linux) / launchd (macOS)	Process supervision belongs to the OS, not the app. No in-process `while True: sleep()`.
Testing	pytest + respx	respx for httpx mocking; fixture-based replay of real API responses so the bracket parser is tested against actual `yes_sub_title` formats without burning rate limits.
Lint / types	ruff + mypy --strict	Strict mode on all guarded subpackages. CI gate (`scripts/ci.sh`) must pass before every commit.
Config	pydantic-settings + `config.toml`	Fail-fast validation on startup. Invalid config → exit code 2.

What's explicitly NOT used

Tech	Why not
`kalshi-python` (deprecated)	Auto-generated, opaque, churns
`requests`	Maintenance mode; httpx is the modern default
`pandas`	Overkill; numpy + SQL is enough at this scale
DuckDB	~11k signal rows/year — SQLite is already instant
`asyncio` / async httpx	Zero concurrency benefit for this workload — pure complexity tax
APScheduler	OS-level schedulers are more reliable and observable
`python-dotenv` alone	Loses validation — pydantic-settings is a small upgrade

Safety

This is the part I spent the most time getting right.

No live trading until paper-trading calibration passes. The code ships gated. env = "demo" by default; flipping to prod requires the calibration report to show realized hit rate within tolerance of predicted hit rate, across a minimum sample window.
Hard caps (per-position USD, per-position % of bankroll, per-day loss) enforced at the risk gate — not as soft checks inside the strategy. The gate is a separate module; the strategy cannot bypass it.
Kill switch. A .kill file in the runtime directory halts all order placement immediately. Heartbeat checks the file every scan. Operator can flip this in <2 seconds.
Fee-aware everywhere. The quadratic fee is deducted in every edge calculation — a signal that doesn't pay for itself after fees is not a signal.
No real-money trading in automated tests. Test suites hit a recorded-response fixture layer (respx); they cannot reach the real Kalshi API.
Settlement is authoritative. PnL comes from resolved contracts, not from end-of-day marks. Settlement source (NWS / final score) is the only source of truth.

What I learned

A few things that weren't obvious at the start:

Time zones are the #1 bug source. NWS reports on LST (no DST). Kalshi tickers bracket dates in the market's nominal TZ. Every settlement bug I've hit came from a silent TZ assumption. The codebase eventually converged on "store UTC, render local, settle off LST, assert at every boundary."
The RSA-PSS signing is exact. salt_length = digest_length (SHA-256 = 32 bytes). Getting this wrong fails silently for hours until you look at response codes. Now tested explicitly.
"Data quality" as a first-class concept. The system writes a snapshot per scan with provenance. If a forecast adapter changes its response shape, the signal generator doesn't silently produce nonsense — it raises.
Backoff and rate-limit pacing. Open-Meteo free tier is 10K req/day. Kalshi has tiered limits. The OddsPapi client in particular needed proactive pacing between requests (tested with pytest) to avoid 429s.

Status

17 phases shipped
88 test files
Weather pipeline operational (demo env, running on a Pi via systemd timers)
Sports pipeline operational (v2.0), NBA + NFL at-format, spreads, totals
Calibration reporting integrated into daily summary
Running demo-env only — not live with real capital

What's in the private repo

Full source (12.6K LOC Python), strategy-specific parameters, calibrated thresholds, the backtest harness, market-specific adapters, and the complete .planning/ trail (17 phases of discuss → plan → execute → verify).

Happy to walk through it on request.

Author: Luke Hanna · lllukehanna@gmail.com · Los Angeles

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BT — Kalshi Edge Bot (public engineering docs)

What it does

Why these markets

Math, briefly

Fees

Edge

Sizing — fractional Kelly

Calibration

Architecture

Repo layout (private)

Stack + why

What's explicitly NOT used

Safety

What I learned

Status

What's in the private repo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

BT — Kalshi Edge Bot (public engineering docs)

What it does

Why these markets

Math, briefly

Fees

Edge

Sizing — fractional Kelly

Calibration

Architecture

Repo layout (private)

Stack + why

What's explicitly NOT used

Safety

What I learned

Status

What's in the private repo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages