# ETF Analysis Pipeline

Runs the full analysis pipeline via thin wrappers over `notebooks/scripts/s1-s6`.

| Section | Script | Description |
|---------|--------|-------------|
| 0 | - | Setup & Configuration |
| 1 | s1_universe | Universe Discovery (~5,000 ETFs) |
| 2 | s2_collect | Historical Data Collection (IB + yfinance) |
| 3 | s3_factors | Factor Scoring (Mom/Qual/Val/Vol) |
| 4 | s4_optimize | Portfolio Construction |
| 5 | s5_backtest | Backtesting |
| 6 | s6_trades | Trade Recommendations |

Output: `~/trading/live_portfolio/trade_plan.csv` → used by `02_execute_trades.ipynb`

---
## Section 0: Setup

In [1]:
import sys
import warnings
from pathlib import Path

import nest_asyncio
nest_asyncio.apply()
warnings.filterwarnings("ignore")

PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT / "src"))
sys.path.insert(0, str(PROJECT_ROOT / "notebooks"))

DATA_DIR = Path.home() / "trade_data" / "ETFTrader"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"
IB_CACHE_DIR = DATA_DIR / "ib_historical"
LIVE_DIR = Path.home() / "trading" / "live_portfolio"

for d in [PROCESSED_DIR, LIVE_DIR, IB_CACHE_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# ── Strategy parameters ──────────────────────────────────
# Factor weights (value auto-skipped if no expense data)
FACTOR_WEIGHTS = {
    "momentum": 0.35, "quality": 0.30,
    "value": 0.15, "volatility": 0.20,
}
NUM_POSITIONS = 30           # 30 positions per backtest recommendation
OPTIMIZER_TYPE = "mvo"       # "mvo" (recommended, robust) | "rankbased" | "minvar"
REBALANCE_FREQ = "quarterly" # "quarterly" | "bimonthly" | "monthly"

# IB settings
IB_HOST = "127.0.0.1"
IB_PORT = 4001
IB_CLIENT_ID = 5

print(f"Data:       {DATA_DIR}")
print(f"Output:     {LIVE_DIR}")
print(f"Positions:  {NUM_POSITIONS}")
print(f"Optimizer:  {OPTIMIZER_TYPE}")
print(f"Rebalance:  {REBALANCE_FREQ}")

Data:       /home/stuar/trade_data/ETFTrader
Output:     /home/stuar/trading/live_portfolio
Positions:  30
Optimizer:  mvo
Rebalance:  quarterly


---
## Section 1: Universe Discovery

In [2]:
from scripts.s1_universe import discover_universe

all_tickers, categories, universe_df = discover_universe(PROJECT_ROOT)
print(f"Universe: {len(all_tickers)} ETFs")

Full universe: 5042 ETFs
  Categorized (curated):  792
  Uncategorized (NASDAQ): 4250
  Leveraged/inverse:      75 (kept for data collection)
Universe: 5042 ETFs


---
## Section 2: Data Collection

Set `RUN_COLLECTION = True` to connect to IB Gateway.

In [None]:
RUN_COLLECTION = False  # Set True to connect to IB

from scripts.s2_collect import collect_data, apply_quality_filter

prices = collect_data(
    tickers=all_tickers,
    ib_cache_dir=IB_CACHE_DIR,
    processed_dir=PROCESSED_DIR,
    ib_host=IB_HOST, ib_port=IB_PORT, ib_client_id=IB_CLIENT_ID,
    run_collection=RUN_COLLECTION,
)

if prices is not None:
    prices = apply_quality_filter(prices)
    print(f"Prices: {prices.shape[1]} tickers x {prices.shape[0]} days")

Loaded IB prices: 3801 tickers x 1256 days
  Range: 2021-02-16 to 2026-02-13
Connected to IB: U9544585

  DATA COLLECTION — scanning cache...
  
  Universe: 5042 tickers
    CURRENT (up-to-date, skip):    0
    STALE (incremental update):     4953
    MISSING (full 5Y download):     89
  
  Stale tickers to update:
      AAAA   last: 2026-02-13  gap: 2d
      AAAC   last: 2026-02-13  gap: 2d
      AAAU   last: 2026-02-13  gap: 2d
      AADR   last: 2026-02-13  gap: 2d
      AAEQ   last: 2026-02-13  gap: 2d
      AALG   last: 2026-02-13  gap: 2d
      AAPB   last: 2026-02-13  gap: 2d
      AAPD   last: 2026-02-13  gap: 2d
      AAPR   last: 2026-02-13  gap: 2d
      AAPU   last: 2026-02-13  gap: 2d
      AAPW   last: 2026-02-13  gap: 2d
      AAPX   last: 2026-02-13  gap: 2d
      AAPY   last: 2026-02-13  gap: 2d
      AAUM   last: 2026-02-13  gap: 2d
      AAUS   last: 2026-02-13  gap: 2d
      AAVM   last: 2026-02-13  gap: 2d
      AAXJ   last: 2026-02-13  gap: 2d
      ABCS   last: 2

KeyboardInterrupt: 

---
## Section 3: Factor Scoring

In [None]:
from scripts.s3_factors import score_factors

combined_scores, prices_basic, factor_detail = score_factors(
    prices, factor_weights=FACTOR_WEIGHTS,
    categories=categories, raw_dir=RAW_DIR,
)
print(f"Scores: {len(combined_scores)} tickers")
combined_scores.nlargest(20)

---
## Section 4: Portfolio Construction

In [None]:
from scripts.s4_optimize import build_portfolio

target_weights = build_portfolio(
    combined_scores, prices_basic,
    num_positions=NUM_POSITIONS,
    optimizer_type=OPTIMIZER_TYPE,
    factor_detail=factor_detail,
)

target_weights.to_csv(LIVE_DIR / "target_portfolio_latest.csv", header=True)
print(f"\nPortfolio: {len(target_weights)} positions")

---
## Section 5: Backtesting

In [None]:
from scripts.s5_backtest import run_backtest

results = run_backtest(
    prices_basic, combined_scores,
    rebalance_frequency=REBALANCE_FREQ,
    num_positions=NUM_POSITIONS,
)

---
## Section 6: Trade Recommendations

Connects to IB, pulls live positions, generates trade plan with $70k cash reserve.

In [None]:
from scripts.s6_trades import generate_trades

trades = generate_trades(
    target_weights, LIVE_DIR,
    combined_scores=combined_scores,
    ib_host=IB_HOST, ib_port=IB_PORT,
)

if trades:
    import pandas as pd
    pd.DataFrame(trades)

---

**Next step:** Open `02_execute_trades.ipynb` to review, edit, and execute the trade plan.