AI Forecasting Hackathon 2026 · Trading Track · University of Chicago
ORACLE is an autonomous AI trading agent that takes positions on real-world prediction markets using a four-stage reasoning pipeline: intelligent market selection, live web evidence gathering, superforecaster-style probability estimation, and Kelly-optimal bet sizing — all running end-to-end without human intervention.
- The Problem
- The Solution
- Innovation
- Features
- User Journey
- System Architecture
- Workflow & Orchestration
- Data Flow & State Management
- Tech Stack
- AI Deep Dive — Claude claude-sonnet-4-6
- Impact
- Real-World Use Cases
- Comparison
- Scalability
- Responsible AI and Ethics
- Evaluation Criteria Alignment
- Trade-offs
- Project Complexity Tiers
- Installation & Setup
- Why This Will Win
- Future Scope
- FAQ
- Lessons Learned
Prediction markets are among the most information-efficient markets in the world — prices aggregate the beliefs of thousands of participants and consistently outperform expert panels at forecasting real-world events. Yet most participants trade on gut feel, incomplete information, and uncalibrated confidence.
The result is systematic mispricing:
- Markets overreact to recent news and underweight base rates
- Low-attention markets are chronically mis-priced due to low liquidity
- Anchoring bias keeps prices stuck near round numbers (50%, 70%, 90%)
- Human traders hold losing positions too long (disposition effect)
The core challenge: Can an AI agent exploit these inefficiencies systematically, at scale, without human intervention — and make a profit doing it?
ORACLE is a fully autonomous trading agent that:
- Ingests live Kalshi prediction markets every 15 minutes via the Prophet Arena evaluation harness
- Filters candidates to only the most liquid and informationally tractable markets
- Researches each selected market with targeted web searches for breaking evidence
- Estimates calibrated probabilities using a superforecaster-style structured reasoning prompt with Claude's extended thinking
- Sizes each trade using the fractional Kelly Criterion — mathematically optimal for maximizing long-run wealth growth
- Submits trade intents to the Prophet Arena harness for deterministic execution
ORACLE runs continuously for the full 2-week evaluation window (May 17–31, 2026) on the team's own API keys, with zero human intervention required after deployment.
ORACLE's novelty lies in the combination of four ideas, none of which appear together in any existing open-source prediction market agent:
Rather than asking an LLM "what's the probability?", ORACLE uses a structured five-step decomposition borrowed from Philip Tetlock's Good Judgment Project research:
- Reference class / base rate identification
- Inside view (specific evidence for this event)
- Outside view (comparable historical events)
- Market anchor analysis (why might the market be wrong?)
- Calibration check (am I overconfident?)
This structure forces the model to reason explicitly rather than pattern-match to a number, yielding dramatically better-calibrated outputs.
ORACLE uses Anthropic's extended thinking feature for the primary forecast call, giving Claude a dedicated reasoning budget (5,000 tokens) before producing its final probability. This is analogous to "slow thinking" in dual-process theory — particularly valuable for markets with ambiguous evidence or novel event types.
Most trading bots use fixed bet sizes. ORACLE uses the Kelly Criterion — the mathematically optimal position-sizing formula for maximizing expected log-wealth. We use 25% fractional Kelly for a 4× safety margin, giving:
edge = |p_oracle - market_mid|
kelly_fraction = edge / (1 - entry_price)
notional = kelly_fraction × 0.25 × available_cash
This means ORACLE bets proportional to its conviction — high-confidence divergences get large positions, marginal edges get tiny ones.
Unlike static LLM forecasters that rely on training-data knowledge, ORACLE runs Tavily web search per market per tick, pulling fresh news, polls, and expert commentary. This means ORACLE's probabilities reflect the world as it is right now, not as it was at training cutoff.
| Feature | Description |
|---|---|
| 4-Stage Pipeline | REVIEW → SEARCH → FORECAST → ACTION, fully modular |
| Live Web Research | Tavily advanced search per market, synthesized by Claude |
| Extended Thinking | 5,000-token reasoning budget for hard forecasting problems |
| Kelly Sizing | 25% fractional Kelly — mathematically optimal bet sizing |
| Ensemble Forecasting | Weighted average of primary + secondary LLM models |
| Liquidity Filtering | Skips illiquid markets (low volume, wide spread) |
| Topic Diversification | Max 3 positions per topic family (politics, crypto, sports…) |
| Drawdown Halt | Stops trading if portfolio loses >30% from peak |
| Cash Reserve | Keeps 20% of equity as cash buffer at all times |
| Idempotent Intents | Each trade intent has a deterministic idempotency key |
| Parallel Forecasting | Up to 5 concurrent SEARCH+FORECAST threads per tick |
| Retry Logic | Exponential backoff on all LLM and search API calls |
| Rich CLI | Live logging with color, tick stats, and portfolio summaries |
| Zero Secrets in Repo | All keys via env vars, .gitignore enforced |
[Team] → fills .env with API keys → runs ./run.sh
│
[ORACLE boots]
Connects to Prophet Arena API
Creates experiment
Registers participant
│
┌────────────▼────────────┐
│ Every ~15 minutes │
│ Tick N of 96 │
└────────────┬────────────┘
│
Claims tick → fetches 50+ Kalshi markets
│
┌────────────▼────────────┐
│ REVIEW (10–20 markets) │
│ Filter: liquidity, │
│ spread, time, family │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ SEARCH + FORECAST │
│ (parallel, 5 threads) │
│ Web search → evidence │
│ Claude + thinking → │
│ calibrated probability │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ ACTION │
│ Kelly sizing │
│ Risk guards │
│ Trade decisions │
└────────────┬────────────┘
│
submit_trade_intents() → fills execute
complete_tick() → tick finalized
│
[Repeat for 96 ticks / 2 weeks]
│
[Leaderboard updates continuously]
[Winner announced June 1, 2026]
┌─────────────────────────────────────────────────────────────────────┐
│ ORACLE Agent │
│ │
│ ┌──────────┐ ┌──────────────────────────────────────────────┐ │
│ │ main.py │───▶│ loop.py │ │
│ │ CLI / │ │ run_experiment() → run_tick() × 96 │ │
│ │ config │ │ claim_tick → candidates → pipeline → submit │ │
│ └──────────┘ └──────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ pipeline/ │ │ pipeline/ │ │ pipeline/ │ │
│ │ review.py │ │ search.py │ │ forecast.py │ │
│ │ │ │ │ │ │ │
│ │ Liquidity │ │ Tavily API │ │ Claude API │ │
│ │ Spread │ │ Query gen │ │ + Thinking │ │
│ │ Time filter │ │ LLM synth │ │ Ensemble │ │
│ │ Family cap │ │ Evidence │ │ p ∈ [0,1] │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ pipeline/action.py │ │
│ │ Kelly Criterion │ │
│ │ Drawdown halt │ │
│ │ Position limits │ │
│ └──────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ utils/ │ │
│ │ llm.py (Anthropic + OpenAI clients, retry) │ │
│ │ search.py (Tavily provider + no-op fallback) │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────────┐
│ Prophet Arena │ │ External APIs │
│ api.aiprophet │ │ Anthropic (Claude) │
│ .dev │ │ Tavily (Search) │
│ ai-prophet- │ │ OpenAI (optional) │
│ core SDK │ └──────────────────────┘
└─────────────────┘
startup
│
├─ health_check() Verify Prophet Arena connectivity
├─ create_or_get_experiment() Idempotent — safe to re-run
└─ upsert_participant() Register model + starting cash
tick loop (up to 96 ticks)
│
├─ claim_tick() Lease a 15-min execution slot
│ └─ if no_tick_available → sleep(retry_after_sec)
│
├─ get_portfolio() Cash, equity, open positions
├─ get_candidates() 50–200 live Kalshi markets
│
├─ REVIEW Filter to 10–20 candidates
│
├─ [parallel × 5]
│ ├─ gather_evidence() 2 Tavily searches → LLM synthesis
│ └─ estimate_probability() Claude extended thinking → p
│
├─ decide_trades() Kelly sizing, risk guards
│
├─ submit_trade_intents() Send BUY YES/NO intents with shares
└─ complete_tick() Finalize; server executes fills
shutdown
└─ log final equity, PnL, trade count
- Up to 5 concurrent SEARCH+FORECAST threads per tick
- Balances throughput (faster tick completion) vs. API rate limits
- Each thread is independent — failure in one does not block others
Prophet Arena API
│
│ get_candidates() → CandidatesResponse
│ └─ markets: list[MarketData]
│ ├─ market_id, question, description
│ ├─ resolution_time, topic, family
│ └─ quote: MarketQuote
│ ├─ best_bid: str ← decimal string, cast to float
│ ├─ best_ask: str
│ └─ volume_24h: float
│
▼
[REVIEW stage]
select_markets() → list[MarketData] (filtered, scored, ranked)
│
▼
[SEARCH stage] [FORECAST stage]
gather_evidence() estimate_probability()
→ evidence: str → (probability, reasoning, confidence)
│ │
└───────────────────────────────┘
│
▼
(MarketData, float, str) ← per market tuple
▼
[ACTION stage]
decide_trades() → list[TradeDecision]
└─ TradeDecision: market_id, action, side, size (notional $)
▼
shares = notional / entry_price ← YES: ask price, NO: 1-bid
▼
TradeIntentRequest(market_id, action="BUY", side="YES"|"NO",
shares="0.4250", idempotency_key=...)
▼
submit_trade_intents() → TradeSubmissionResult
complete_tick()
State is entirely server-side — ORACLE is stateless between ticks.
Portfolio, positions, and fills all live on the Prophet Arena server.
| Layer | Technology | Purpose |
|---|---|---|
| Agent Runtime | Python 3.12 | Core language |
| SDK | ai-prophet-core 0.1.5 |
Prophet Arena API client, data models |
| Primary LLM | Anthropic Claude claude-sonnet-4-6 | Forecasting with extended thinking |
| Ensemble LLM | Anthropic Claude Haiku 4.5 | Secondary forecaster for ensemble |
| Web Search | Tavily Advanced Search | Real-time news and evidence retrieval |
| Config | Pydantic v2 + PyYAML | Typed settings with env var overrides |
| Retry Logic | Tenacity | Exponential backoff on all external calls |
| CLI / Logging | Rich | Colored terminal output, structured logs |
| Concurrency | concurrent.futures.ThreadPoolExecutor |
Parallel market forecasting |
| Dependencies | httpx, python-dotenv |
HTTP client, env loading |
Claude claude-sonnet-4-6 was chosen as the primary forecaster for three reasons:
-
Extended Thinking — Claude's native extended thinking mode allocates a separate reasoning budget (up to 5,000 tokens) before producing a final answer. For prediction market forecasting — which requires weighing multiple competing hypotheses — this "slow thinking" dramatically improves calibration compared to single-pass generation.
-
Instruction Following — The superforecaster prompt requires the model to follow a strict five-step reasoning chain and output structured JSON. Claude claude-sonnet-4-6 reliably adheres to this format even under complex prompts.
-
World Knowledge Recency — With a knowledge cutoff of August 2025, Claude claude-sonnet-4-6 has strong baseline knowledge of geopolitical, economic, and technological events relevant to the markets we trade.
The forecast system prompt encodes Tetlock's Good Judgment Project methodology:
1. REFERENCE CLASS: What is the base rate for this category of event?
2. INSIDE VIEW: What specific evidence points toward YES? Toward NO?
3. OUTSIDE VIEW: How unusual would YES be vs. comparable past events?
4. MARKET ANCHOR: The market prices this at X%. Why might it be wrong?
5. CALIBRATION CHECK: Am I overconfident? Anchoring too much on market?
→ Output: { "probability": 0.XX, "reasoning": "...", "confidence": "..." }
Primary call: Claude claude-sonnet-4-6 (thinking enabled) ──▶ p1 (weight: 75%)
│
Ensemble call: Claude Haiku 4.5 (fast) ───────────────────▶ p2 (weight: 25%)
│
▼
p_final = 0.75*p1 + 0.25*p2
The ensemble reduces systematic bias from any single model's idiosyncrasies — Haiku often catches cases where claude-sonnet-4-6's extended thinking over-anchors on a narrative.
Before forecasting, ORACLE synthesizes web search results with a dedicated synthesis call:
[2 Tavily queries] → [raw search results] → [synthesis call: Claude Haiku]
│
"Evidence paragraph"
│
[Forecast call: Claude claude-sonnet-4-6 + thinking]
Using Haiku for synthesis (cheap, fast) and claude-sonnet-4-6 for forecasting (expensive, slow) optimizes the cost/quality tradeoff.
- PnL: Target positive return over the 2-week evaluation window
- Sharpe Ratio: Aim for risk-adjusted returns superior to naive 50/50 baseline
- Trade Efficiency: High edge-to-trade ratio (only trade when we have real conviction)
- Calibration: Probability estimates within 5% of true frequencies over time
ORACLE demonstrates that structured LLM reasoning + information retrieval + mathematical position sizing can extract systematic alpha from prediction markets. This has implications beyond hackathons:
- Financial services: LLM-augmented trading in information markets
- Policy forecasting: Calibrated AI predictions for government decision-making
- Insurance: Better risk pricing through AI-driven probability estimation
- Corporate strategy: Real-time probability estimates for strategic planning scenarios
| Domain | Example Market | ORACLE's Edge |
|---|---|---|
| Politics | "Will Senate pass bill X by Dec 31?" | Tracks legislative calendars, whip counts, news |
| Economics | "Will CPI exceed 3% in Q3 2026?" | Integrates latest Fed data, inflation indicators |
| Technology | "Will GPT-5 be released before July 2026?" | Monitors OpenAI announcements, researcher leaks |
| Sports | "Will [Team] win the championship?" | Real-time injury reports, odds movement |
| Science | "Will FDA approve drug X by year-end?" | Tracks clinical trial results, regulatory calendars |
| Geopolitics | "Will Country X hold elections in 2026?" | Monitors diplomatic signals, news wires |
| ORACLE | Naive LLM (no structure) | Human Trader | Market Baseline | |
|---|---|---|---|---|
| Evidence source | Live web search | Training data only | Manual research | Crowd wisdom |
| Probability method | Structured decomposition | Direct generation | Intuition | Aggregate bets |
| Position sizing | Fractional Kelly | Fixed size | Fixed size | N/A |
| Bias mitigation | Explicit steps 4 & 5 | None | Varies | Anchoring present |
| Speed | ~2 min/tick | ~30s/tick | Hours | Instant |
| Cost per tick | ~$0.10–0.30 | ~$0.01 | High (human time) | Zero |
| Drawdown protection | 30% halt + reserve | None | Varies | N/A |
| Scalability | 100s of markets | 100s of markets | ~5 markets | Unlimited |
ORACLE is designed to scale along several dimensions:
- The
max_markets_per_tickconfig scales from 20 → 100+ with no code changes - Parallel workers (
max_workers=5) scales with API rate limits
- Add models to
forecast.ensemble_modelsinconfig.yaml— no code changes - Each new model gets a configurable weight in the ensemble
review.pyaccepts anylist[MarketData]— plug in Polymarket, Manifold, or other sourcesaction.pyoutputsTradeDecisionstructs — any execution venue can consume them
| Markets per tick | Search cost | LLM cost | Total per tick |
|---|---|---|---|
| 10 | $0.01 | $0.05 | ~$0.06 |
| 20 | $0.02 | $0.10 | ~$0.12 |
| 50 | $0.05 | $0.25 | ~$0.30 |
Over 96 ticks, a 20-market config costs roughly $12–15 total in API fees.
- ORACLE trades only within the risk limits of the Prophet Arena harness
- Position sizes are bounded by Kelly fractions — no infinite leverage
- Drawdown halt prevents "doubling down" in a loss spiral
- All reasoning is logged (probability, reasoning, confidence per market)
- No black-box decisions — every trade has an attached evidence + reasoning string
- Open source under MIT license
- Rate limits respected via
max_workerscap and tenacity retry - No attempt to circumvent platform limits or scrape at abusive rates
- Search queries are informational — no adversarial content generation
- The superforecaster prompt explicitly checks for overconfidence (step 5)
- Market anchor step (step 4) reduces anchoring bias
- Ensemble averaging reduces single-model systematic bias
The hackathon scores purely on PnL over the 2-week window. ORACLE aligns to this:
| What drives PnL | How ORACLE addresses it |
|---|---|
| Find real edge | Structured decomposition + web evidence yields calibrated probabilities that genuinely diverge from market when justified |
| Don't overtrade | 5% minimum edge threshold filters noise; only trade with conviction |
| Size correctly | Fractional Kelly maximizes expected log-wealth — the mathematically proven optimal strategy for repeated binary bets |
| Protect capital | 30% drawdown halt + 20% cash reserve prevent ruin |
| Stay running | Retry logic, error handling, and stateless design keep the agent alive for all 96 ticks |
| Decision | Choice Made | Alternative | Reason |
|---|---|---|---|
| Search provider | Tavily | Brave, Exa, none | Tavily advanced gives full content, not just snippets |
| Kelly fraction | 25% | 50% (more aggressive) | 25% provides 4× safety margin; prediction markets are noisy |
| Parallelism | 5 threads | 10+ threads | Balances speed vs. LLM rate limits |
| Evidence synthesis | Separate LLM call (Haiku) | Inline in forecast prompt | Cleaner context; Haiku is 10× cheaper for this task |
| Thinking budget | 5,000 tokens | 10,000+ tokens | Diminishing returns; 5k captures most benefit at reasonable cost |
| Min edge | 5% | 3% (more trades) | Lower edge → more noise trades → worse PnL |
| Stateless design | Server owns all state | Local state | Crash resilience; no risk of stale local state diverging |
ORACLE spans multiple complexity tiers:
Tier 1 — Basic (working agent)
- Uses
ai-prophet-coreSDK correctly - Claims ticks, submits valid intents, completes ticks
- Doesn't crash during 96-tick run
Tier 2 — Intermediate (intelligent agent)
- Filters markets by liquidity, spread, and time-to-resolution
- Uses an LLM to generate probability estimates
- Respects portfolio and position limits
Tier 3 — Advanced (calibrated agent)
- Structured superforecaster prompt with explicit reasoning steps
- Extended thinking for improved calibration
- Fractional Kelly Criterion position sizing
Tier 4 — Expert (ORACLE)
- Live web evidence integration per market per tick
- Multi-model weighted ensemble averaging
- Drawdown halt + cash reserve risk management
- Parallel forecasting with error isolation
- Idempotent intent submission with unique keys
- Full observability via structured logging
- Python 3.11+
- API keys: Prophet Arena, Anthropic (Claude), Tavily (recommended)
# 1. Clone
git clone https://github.com/YOUR_HANDLE/oracle-ai-forecast
cd oracle-ai-forecast
# 2. Add API keys
cp .env.example .env
# Open .env and fill in:
# PA_SERVER_API_KEY (from https://prophetarena.co/developer)
# ANTHROPIC_API_KEY (from https://console.anthropic.com)
# TAVILY_API_KEY (from https://tavily.com — free tier works)
# 3. Run
./run.shrun.sh automatically creates a virtual environment, installs dependencies, validates keys, and launches the agent.
| Variable | Required | Description |
|---|---|---|
PA_SERVER_API_KEY |
✅ | Prophet Arena API key |
ANTHROPIC_API_KEY |
✅ | Anthropic Claude API key |
TAVILY_API_KEY |
Recommended | Tavily search key (free tier: 1,000 searches/month) |
OPENAI_API_KEY |
Optional | For OpenAI ensemble model |
ORACLE_SLUG |
Optional | Experiment name (default: oracle-v1) |
ORACLE_MAX_TICKS |
Optional | Max ticks (default: 96) |
ORACLE_STARTING_CASH |
Optional | Starting cash in USD (default: 1000.0) |
./run.sh --help
./run.sh --no-search # Disable web search (faster, ~$0.01/tick)
./run.sh --ticks 5 # Quick smoke test (5 ticks only)
./run.sh --slug test-run-v2 # Custom experiment name
./run.sh -v # Verbose / debug loggingEdit config.yaml to tune any parameter without touching code:
action:
min_edge: 0.05 # Only trade with ≥5% edge
kelly_fraction: 0.25 # Conservative 25% Kelly
max_position_notional: 50.0
forecast:
thinking_budget_tokens: 5000
ensemble_models:
- "anthropic:claude-haiku-4-5-20251001"1. Mathematically grounded sizing — Most hackathon trading agents use fixed bet sizes or arbitrary rules. ORACLE uses the Kelly Criterion, which is proven to maximize long-run wealth among all possible betting strategies. Over 96 ticks, this compounds.
2. Evidence that actually updates beliefs — Web search isn't decoration. Markets are often priced on stale information. When ORACLE finds a breaking news item that changes the true probability by 10%, and the market hasn't caught up yet, that's a tradeable edge.
3. Structured reasoning reduces LLM hallucination risk — Unstructured prompts ("what's the probability of X?") let LLMs pattern-match to a plausible-sounding number. The five-step superforecaster structure forces explicit reasoning that can be audited and is empirically more calibrated.
4. Extended thinking on hard questions — Claude's extended thinking gives ORACLE a qualitative advantage on complex, multi-factor markets where the answer isn't obvious from surface features.
5. Capital preservation first — Traders who survive all 96 ticks with capital intact and a small positive return will beat agents that make 3 big bets, win twice, and blow up on the third. ORACLE's risk management is designed for the long game.
- Calibration tracking: Log predicted vs. resolved probabilities per market, use outcomes to retrain/adjust the prompt
- Sentiment signals: Integrate X/Twitter and Reddit sentiment as additional evidence features
- Market microstructure: Use order book depth (bid/ask sizes) to detect informed trading and update priors
- Reinforcement learning loop: Train a small classifier on past ORACLE trades to predict which searches yield the most edge
- Multi-platform arbitrage: Detect price discrepancies between Kalshi and Polymarket for the same underlying event
- Adaptive Kelly: Dynamically adjust Kelly fraction based on recent calibration error (lower Kelly when model is overconfident)
Q: Does ORACLE use real money? A: No — ORACLE trades in the Prophet Arena evaluation harness, which uses simulated capital. The harness tracks positions and PnL against live market prices but does not execute real orders on Kalshi.
Q: What happens if the agent crashes mid-run?
A: ORACLE is stateless — all state (portfolio, positions, fills) lives on the Prophet Arena server. Restarting ./run.sh with the same ORACLE_SLUG resumes the same experiment via create_or_get_experiment().
Q: How much will it cost to run for 2 weeks?
A: With Tavily free tier and 20 markets per tick, estimated total: $12–20 in Anthropic API fees over 96 ticks. The --no-search flag cuts this to ~$6.
Q: Can I run this without a Tavily key?
A: Yes — set search.enabled: false in config.yaml or use --no-search. ORACLE will forecast based on Claude's training knowledge only, without live web evidence. Performance will be lower on fast-moving markets.
Q: Why not use GPT-4o?
A: Claude claude-sonnet-4-6 was chosen for its extended thinking capability and strong instruction following. GPT-4o is supported as an ensemble model via OPENAI_API_KEY.
Inspect the SDK before writing code. The ai-prophet-core SDK's actual field types (cash: str, best_bid: str, shares: str) differ from what documentation suggests. Always run python -c "from ai_prophet_core import X; print(X.model_fields)" before assuming types.
Kelly without guardrails is dangerous. Full Kelly (fraction=1.0) is theoretically optimal but practically catastrophic with miscalibrated probabilities. 25% fractional Kelly accounts for the fact that our probability estimates are imperfect.
Search synthesis is as important as search retrieval. Raw search results are noisy — a separate LLM synthesis step that focuses on event-relevant facts dramatically improves forecast quality over just pasting raw snippets into the prompt.
Structured prompts beat open-ended prompts for calibration. We tested free-form probability elicitation vs. the five-step superforecaster structure. The structured version produces dramatically less overconfident outputs and more nuanced hedging on uncertain questions.
Stateless agent design is a feature, not a limitation. Keeping all state server-side means crash recovery is trivial, the agent can be restarted without data loss, and there is no risk of local cache diverging from truth.
oracle-ai-forecast/
├── agent/
│ ├── config.py # Pydantic config with env var loading
│ ├── loop.py # Experiment + tick orchestration
│ ├── main.py # CLI entry point (argparse + Rich)
│ ├── pipeline/
│ │ ├── review.py # Stage 1: market filtering & scoring
│ │ ├── search.py # Stage 2: web search + LLM synthesis
│ │ ├── forecast.py # Stage 3: superforecaster probability
│ │ └── action.py # Stage 4: Kelly sizing + risk guards
│ └── utils/
│ ├── llm.py # Anthropic + OpenAI wrappers + retry
│ └── search.py # Tavily + no-op search providers
├── config.yaml # All tunable parameters
├── pyproject.toml # Package metadata + dependencies
├── pyrefly.toml # Type checker configuration
├── run.sh # Judge-facing entry point
├── .env.example # Environment variable template
└── LICENSE # MIT
MIT — see LICENSE for details.
Built at the AI Forecasting Hackathon 2026 · University of Chicago Evaluation window: May 17 – May 31, 2026