# AI Stock Forecaster  
**(FMP + Kronos + FinText-TSFM | Signal-Only, Point-in-Time Safe)**

---

## 0) Project Explanation & Philosophy

### What this project is

This project builds a **decision-support forecasting model** that answers one core question:

> **Which AI stocks are most attractive to buy today, on a risk-adjusted basis, over the next 20 / 60 / 90 trading days?**

The system outputs **ranked stock recommendations and return distributions**, not trades.  
Its purpose is to generate **credible alpha signals** that survive realistic financial constraints.

The design explicitly accounts for:
- non-stationary market behavior,
- weak and noisy financial signals,
- transaction costs and liquidity effects,
- and strict point-in-time (PIT) correctness.

---

### What this project is NOT

This project does **not**:
- place trades,
- connect to brokers,
- optimize execution,
- or manage live capital.

Any portfolio-related logic exists **only to validate signal realism**, not to implement trading.

---

### Core modeling philosophy

1. **Ranking beats regression**  
   Relative ordering of stocks is more stable and economically useful than exact price prediction.

2. **Point-in-time correctness is non-negotiable**  
   Any signal unavailable at time *T* must not influence predictions at time *T*.

3. **Economic validity > statistical fit**  
   Signals must survive transaction costs, turnover, and regime shifts.

4. **Multiple weak signals > single strong model**  
   Combine complementary views:
   - price dynamics (Kronos),
   - return structure (FinText-TSFM),
   - fundamentals and context (tabular models).

---

## 1) System Outputs (Signal-Only)

At each rebalance date **T**, for each stock and horizon (20 / 60 / 90 trading days):

### Per-stock outputs
- **Expected excess return** vs benchmark (QQQ default; XLK/SMH optional)
- **Return distribution** (5th / 50th / 95th percentiles)
- **Alpha ranking score** (cross-sectional)
- **Confidence score** (calibrated uncertainty)
- **Key drivers** (feature blocks influencing the rank)

### Cross-sectional outputs
- Ranked list: **Top buys / neutral / avoid**
- Optional confidence buckets (high vs low confidence)

---

## 2) Scope & Validation Philosophy (Signal-Only)

### Scope
- The system produces **signals**, not trades.
- No execution or order placement logic is implemented.

### Why portfolio concepts still appear
Portfolio concepts (turnover, costs, constraints) are used **only for evaluation realism**, to answer:
> *Would these signals remain economically meaningful if followed by an investor?*

### Optional realism check
- Paper trading (e.g., Alpaca paper) may be used **post-hoc** to validate:
  - timestamp integrity,
  - universe construction,
  - signal stability.
- Paper trading results are **never** used for training or model selection.

---

## 3) Data & Point-in-Time Infrastructure (FMP-First)

### 3.1 Data sources
- **Market**: Daily OHLCV, splits, dividends
- **Fundamentals**: Income, balance sheet, cash flow (quarterly)
- **Metadata**: Sector, industry, shares outstanding, market cap
- **Events**: Earnings dates with announcement time
- **Benchmarks**: QQQ (default), optional XLK / SMH
- **Regime proxies**: VIX, market breadth, rate proxies

---

### 3.2 Point-in-time (PIT) rules

Each datapoint stores:
- `value`
- `observed_at` (first public release timestamp)
- `effective_from`
- `source`

Rules:
- Fundamentals are **as-reported**, never restated historically
- Forward-fill allowed **only after** `observed_at`
- No feature may use information released after the cutoff time

---

### 3.3 Daily cutoff policy (anti-lookahead)

- Fixed cutoff time (e.g., 4:00pm ET)
- Features for date *T* may only use data with timestamps ≤ cutoff(T)
- Earnings handling distinguishes pre-market vs after-close announcements

---

### 3.4 Data audits & bias detection

Automated checks:
- PIT violation scanner
- Survivorship reconstruction audit
- Corporate action sanity checks
- Missingness and outlier detection

**Success criteria**
- < 0.1% PIT violations
- Universe reproducible for any historical date
- All datasets auditable and replayable

---

## 4) Survivorship-Safe Dynamic Universe

### 4.1 Universe construction (critical)

At each rebalance date **T**:
- Start with all U.S. equities meeting liquidity and price thresholds
- Filter by AI-relevant sector / industry tags
- Select **Top N by market cap as-of T**
- Persist constituents with timestamp

Hardcoded “today’s winners” are explicitly disallowed.

---

### 4.2 Delistings & mergers
- Delisted stocks remain in historical universes where data exists
- Missing data is explicitly modeled rather than silently dropped

**Success criteria**
- Constituents vary meaningfully through time
- Backtests include both winners and failures

---

## 5) Feature Engineering (Bias-Safe)

### 5.0 Readiness Checklist & Implementation Plan

#### Infrastructure Available (from Chapters 3-4) ✅
| Component | Module | What It Provides |
|-----------|--------|------------------|
| Prices | `FMPClient.get_historical_prices()` | Split-adjusted OHLCV with `observed_at` |
| Fundamentals | `FMPClient.get_income_statement()` etc. | With `fillingDate` for PIT |
| Volume/ADV | `DuckDBPITStore.get_avg_volume()` | Computed from OHLCV |
| Events | `EventStore` | EARNINGS, FILING, SENTIMENT with PIT |
| Earnings | `AlphaVantageClient` + `ExpectationsClient` | BMO/AMC timing, surprises |
| Regime/VIX | `FMPClient.get_index_historical()` | SPY, VIX for regime detection |
| Universe | `UniverseBuilder` | FULL survivorship via Polygon |
| ID Mapping | `SecurityMaster` | Stable IDs, ticker changes |
| Calendar | `TradingCalendarImpl` | NYSE holidays, cutoffs |
| Caching | All clients | `data/cache/*` directories |

#### API Keys Available ✅
- `FMP_KEYS` - Prices, fundamentals, profiles (free tier: 250/day)
- `POLYGON_KEYS` - Symbol master, universe (free tier: 5/min)
- `ALPHAVANTAGE_KEYS` - Earnings calendar (free tier: 25/day)

---

#### Chapter 5 TODO List

**5.1 Targets (Labels)**
- [ ] Implement forward excess return calculation vs QQQ benchmark
- [ ] Create label generator for 20/60/90 trading day horizons
- [ ] Ensure labels are strictly PIT-safe (no future leakage)

**5.2 Price & Volume Features**
- [ ] Momentum features (1m, 3m, 6m, 12m returns)
- [ ] Volatility (realized vol, vol-of-vol)
- [ ] Drawdown (max drawdown, current vs high)
- [ ] Relative strength vs universe median
- [ ] Beta vs benchmark (rolling window)
- [ ] ADV and volatility-adjusted ADV

**5.3 Fundamental Features (Relative)**
- [ ] P/E vs own 3-year history (z-score)
- [ ] P/S vs sector median
- [ ] Margins vs sector peers
- [ ] Revenue/earnings growth vs sector
- [ ] All ratios rank-transformed cross-sectionally

**5.4 Event & Calendar Features**
- [ ] Days to next earnings
- [ ] Days since last earnings
- [ ] Post-earnings drift window indicator
- [ ] Surprise magnitude (last N quarters)
- [ ] Filing recency (days since last 10-Q/10-K)

**5.5 Regime & Macro Features**
- [ ] VIX level and percentile
- [ ] Market trend regime (bull/bear/neutral)
- [ ] Sector rotation indicators
- [ ] All features timestamped with cutoff enforcement

**5.6 Missingness Masks**
- [ ] Create explicit "known at time T" indicators
- [ ] Missingness as first-class feature (not just imputation)
- [ ] Track data coverage statistics

**5.7 Feature Hygiene & Redundancy**
- [ ] Cross-sectional z-score/rank standardization
- [ ] Rolling Spearman correlation matrix
- [ ] Feature clustering (identify blocks)
- [ ] VIF diagnostics (tabular features)
- [ ] Rolling IC stability checks
- [ ] Sign consistency analysis

**5.8 Feature Neutralization (Diagnostics)**
- [ ] Sector-neutral IC computation
- [ ] Beta-neutral IC computation
- [ ] Market-neutral IC computation

**Testing & Validation**
- [ ] Unit tests for each feature block
- [ ] PIT violation scanner on all features
- [ ] Univariate IC ≥ 0.03 check for strong signals
- [ ] IC stability across rolling windows
- [ ] Feature coverage > 95% (post-masking)

---

#### Rate Limit Strategy
1. Cache universe snapshots by rebalance date (Polygon: 5/min)
2. Batch FMP requests where possible (profiles, quotes)
3. Use Alpha Vantage sparingly (25/day limit)
4. Store computed features in DuckDB for reuse

---

### 5.1 Targets
- Forward **excess returns** vs benchmark
- Horizons: 20 / 60 / 90 trading days

---

### 5.2 Price & volume features ✅ COMPLETE
**Implemented in `src/features/price_features.py`**

| Feature | Description |
|---------|-------------|
| `mom_1m/3m/6m/12m` | Returns over 21/63/126/252 trading days |
| `vol_20d/60d` | Annualized volatility |
| `vol_of_vol` | Volatility of rolling volatility |
| `max_drawdown_60d` | Maximum drawdown |
| `rel_strength_1m/3m` | Z-score vs universe |
| `beta_252d` | Beta vs QQQ benchmark |
| `adv_20d/60d` | Average daily dollar volume |

---

### 5.3 Fundamentals (relative, normalized) ✅ COMPLETE
**Implemented in `src/features/fundamental_features.py`**

Raw ratios are avoided — all features are RELATIVE:
- `pe_zscore_3y`: P/E vs own 3-year history
- `pe_vs_sector`: P/E relative to sector median
- `ps_vs_sector`: P/S relative to sector median
- `gross_margin_vs_sector`: Margins vs sector
- `revenue_growth_vs_sector`: Growth vs sector peers
- `roe_zscore`, `roa_zscore`: Quality metrics z-scored

---

### 5.4 Events & calendars
- Days to earnings
- Post-earnings window indicators
- Surprise history
- Announcement-time-aware alignment

---

### 5.5 Regime & macro
- Volatility regimes
- Market trend regimes
- All features timestamped and cutoff-safe

---

### 5.6 Availability & missingness masks
- Explicit indicators for “known at time T”

---

### 5.7 Feature Hygiene & Redundancy Control
**Critical for finance-grade credibility:**

- **Cross-sectional standardization**: z-score or rank-transform within universe at time T
- **Correlation screening**: Rolling Spearman correlations to identify feature clusters
- **Block aggregation**: Group redundant features (e.g., momentum_6m + momentum_12m) — don't drop singles
- **VIF diagnostics**: Variance Inflation Factor for tabular features (diagnostic, not hard filter)
- **Stability checks** (more important than VIF):
  - Rolling IC decay
  - Regime-conditional IC
  - Sign consistency across time
- **Rank-space monotonicity** enforcement where applicable

> **Principle**: A feature with IC 0.04 once and −0.01 later is worse than IC 0.02 stable forever.

---

### 5.8 Feature Neutralization (Evaluation-Only, Optional)
**For diagnostics, not training:**

- **Sector-neutral IC**: Alpha after removing sector effects
- **Beta-neutral IC**: Alpha independent of market exposure
- **Market-neutral IC**: Pure stock-specific alpha

This reveals **where alpha comes from** — used for interpretation, not model input.

---

**Feature success criteria**
- > 95% completeness (post-masking)
- Strong univariate signals show IC ≳ 0.03
- No feature introduces PIT violations
- **Stability**: IC sign consistent across ≥70% of rolling windows
- **Redundancy understood**: Feature blocks documented, correlation matrix computed

---

## 6) Evaluation Framework (Core Credibility Layer)

### 6.1 Walk-forward evaluation
- Expanding window
- Multiple market regimes covered

### 6.2 Purging & embargo
- Overlapping label windows purged
- Embargo applied for multi-horizon labels

---

### 6.3 Metrics (ranking-first)
- RankIC / IC by horizon and regime
- Top-minus-bottom quintile spread
- Net-of-cost Sharpe (diagnostic only)
- Turnover, drawdown, hit-rate

---

### 6.4 Cost realism
- 20 bps base round-trip
- Liquidity-scaled slippage
- Signals rejected if performance vanishes post-cost

**Evaluation success criteria**
- Results stable across folds
- No single fold dominates performance
- Performance survives realistic costs

---

## 7) Baseline Models (Models to Beat)

1. Naive (random / benchmark mean)
2. Factor baselines (momentum, low-vol, quality)
3. Tabular ML (LightGBM / CatBoost)

**Baseline gates**
- Factor IC > 0.02
- ML IC > 0.05
- TSFM models must beat tuned ML baseline on **median OOS IC**

---

## 8) Kronos Module (Price Dynamics)

- Input: OHLCV sequences
- Rolling / ReVIN-style normalization
- Outputs: embeddings and horizon-aware signals
- Fine-tuning via walk-forward only

**Kronos success criteria**
- Zero-shot IC measured
- Fine-tuning improves IC by ≥ 0.01
- Stable behavior across price level shifts

---

## 9) FinText-TSFM Module (Return Structure)

- Input: historical excess returns
- Year-specific checkpoints to reduce pretraining leakage
- Outputs: return distributions and embeddings

**FinText success criteria**
- Adds independent signal (low correlation with Kronos)
- Improves fusion IC consistently across regimes

---

## 10) NLP Sentiment (Separate)

- Finance-specific NLP model (news / transcripts)
- Strict cutoff-time enforcement

Sentiment is optional and never required.

---

## 11) Fusion Model (Ranking-First)

- Gated fusion of:
  - Kronos embeddings
  - FinText-TSFM embeddings
  - Tabular context features

### Objectives
- Primary: pairwise / listwise ranking loss
- Secondary: distribution calibration loss
- No pure MSE price regression

### Ablation gates
- Feature blocks removed if unstable
- Fusion must beat best single model

---

## 12) Regime-Aware Ensembling

- Components: Fusion, ML baseline, simple factor
- Regime detector (volatility / trend)
- Smooth, regularized ensemble weights

**Success criteria**
- Ensemble improves median IC
- Reduces variance across regimes

---


## 13) Calibration & Confidence

- Quantile calibration
- Confidence stratification

**Success criteria**
- Quantile coverage error < 5%
- High-confidence bucket materially outperforms

---

## 14) Monitoring & Research Ops

- Prediction logging with timestamps
- Matured-label scoring
- Feature and performance drift detection

Alerts:
- RankIC decay
- Calibration breakdown
- Ranking instability

---

## 15) Outputs & Interfaces

- Ranked stock lists
- Per-stock explanation summaries
- Batch scoring interface
- Full traceability of inputs and decisions

---

## 16) Global Research Acceptance Criteria

A model is considered **valid** if:

- Median walk-forward RankIC exceeds baseline by ≥ 0.02
- Net-of-cost performance positive in ≥ 70% of folds
- Top-10 ranking churn < 30% month-over-month
- Performance degrades gracefully under regime shifts
- No PIT or survivorship violations detected
