# ETF Analysis Pipeline

Runs the full analysis pipeline via thin wrappers over `notebooks/scripts/s1-s6`.

| Section | Script | Description |
|---------|--------|-------------|
| 0 | - | Setup & Configuration |
| 1 | s1_universe | Universe Discovery (~5,000 ETFs) |
| 2 | s2_collect | Historical Data Collection (IB + yfinance) |
| 3 | s3_factors | Factor Scoring (Mom/Qual/Val/Vol) |
| 4 | s4_optimize | Portfolio Construction |
| 5 | s5_backtest | Backtesting |
| 6 | s6_trades | Trade Recommendations |

Output: `~/trading/live_portfolio/trade_plan.csv` â†’ used by `02_execute_trades.ipynb`

---
## Section 0: Setup

In [8]:
import sys
import warnings
from pathlib import Path

import nest_asyncio
nest_asyncio.apply()
warnings.filterwarnings("ignore")

PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT / "src"))
sys.path.insert(0, str(PROJECT_ROOT / "notebooks"))

DATA_DIR = Path.home() / "trade_data" / "ETFTrader"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"
IB_CACHE_DIR = DATA_DIR / "ib_historical"
LIVE_DIR = Path.home() / "trading" / "live_portfolio"

for d in [PROCESSED_DIR, LIVE_DIR, IB_CACHE_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# Strategy parameters
FACTOR_WEIGHTS = {"momentum": 0.35, "quality": 0.30, "value": 0.15, "volatility": 0.20}
NUM_POSITIONS = 20
OPTIMIZER_TYPE = "rankbased"
REBALANCE_FREQ = "bimonthly"

# IB settings
IB_HOST = "127.0.0.1"
IB_PORT = 4001
IB_CLIENT_ID = 5

print(f"Data:       {DATA_DIR}")
print(f"Output:     {LIVE_DIR}")
print(f"Positions:  {NUM_POSITIONS}")
print(f"Rebalance:  {REBALANCE_FREQ}")

Data:       /home/stuar/trade_data/ETFTrader
Output:     /home/stuar/trading/live_portfolio
Positions:  20
Rebalance:  bimonthly


---
## Section 1: Universe Discovery

In [9]:
from scripts.s1_universe import discover_universe

all_tickers, categories, universe_df = discover_universe(PROJECT_ROOT)
print(f"Universe: {len(all_tickers)} ETFs")

Full universe: 5042 ETFs
  Categorized (curated):  792
  Uncategorized (NASDAQ): 4250
  Leveraged/inverse:      75 (kept for data collection)
Universe: 5042 ETFs


---
## Section 2: Data Collection

Set `RUN_COLLECTION = True` to connect to IB Gateway.

In [None]:
RUN_COLLECTION = True  # Set True to connect to IB

from scripts.s2_collect import collect_data, apply_quality_filter

prices = collect_data(
    tickers=all_tickers,
    ib_cache_dir=IB_CACHE_DIR,
    processed_dir=PROCESSED_DIR,
    ib_host=IB_HOST, ib_port=IB_PORT, ib_client_id=IB_CLIENT_ID,
    run_collection=RUN_COLLECTION,
)

if prices is not None:
    prices = apply_quality_filter(prices)
    print(f"Prices: {prices.shape[1]} tickers x {prices.shape[0]} days")

Loaded IB prices: 1478 tickers x 1256 days
  Range: 2021-02-16 to 2026-02-13
Quality filter: 1478 -> 1478 tickers
Date range: 2021-02-16 to 2026-02-13
Trading days: 1256
Prices: 1478 tickers x 1256 days


---
## Section 3: Factor Scoring

In [11]:
from scripts.s3_factors import score_factors

combined_scores, prices_basic = score_factors(
    prices, factor_weights=FACTOR_WEIGHTS,
    categories=categories, raw_dir=RAW_DIR,
)
print(f"Scores: {len(combined_scores)} tickers")
combined_scores.head(20)

Filtered out 19 leveraged ETFs: AGQ, BOIL, CURE, DDM, DRV, DUST, DXD, EDZ, ERX, ERY and 9 more
Basic model: 1459 tickers (19 leveraged/inverse excluded)
Calculating factors...
  Momentum:   1459 tickers


simplified_value: All scores are NaN!


  Quality:    1458 tickers
  Value:      0 tickers
  Volatility: 1459 tickers

Integrated scores: 1459 tickers
Scores: 1459 tickers


EMCB    0.349119
SCHR    0.474029
SPHD    0.280822
IMOM    0.654294
KWT     0.357272
FAN     0.664794
BJK     0.085885
IWR     0.426866
HYUP    0.439281
IPOS    0.604708
NOBL    0.463148
KJUL    0.413910
FSGS    0.225193
DMAY    0.426194
ISRA    0.630240
IVOO    0.422340
JSMD    0.340604
JOJO    0.566422
GOLY    0.407422
GQRE    0.510983
dtype: float64

---
## Section 4: Portfolio Construction

In [12]:
from scripts.s4_optimize import build_portfolio

target_weights = build_portfolio(
    combined_scores, prices_basic,
    num_positions=NUM_POSITIONS,
    optimizer_type=OPTIMIZER_TYPE,
)

target_weights.to_csv(LIVE_DIR / "target_portfolio_latest.csv", header=True)
print(f"Portfolio: {len(target_weights)} positions")
target_weights

Optimizer: RANKBASED, Positions: 20

Portfolio: 20 positions
Max weight: 7.7%
Min weight: 3.0%
HHI: 0.0541
Expected vol: 13.2%
Portfolio: 20 positions


EFAS    0.077154
GLDI    0.073391
FID     0.069812
FGD     0.066407
EWK     0.063168
INEQ    0.060088
DTH     0.057157
ISVL    0.054369
WDIV    0.051718
VYMI    0.049196
IDV     0.046796
DWX     0.044514
FYLD    0.042343
FNDC    0.040278
FIDI    0.038314
ECOW    0.036445
DIM     0.034667
EYLD    0.032977
DLS     0.031368
IQDF    0.029839
dtype: float64

---
## Section 5: Backtesting

In [13]:
from scripts.s5_backtest import run_backtest

results = run_backtest(
    prices_basic, combined_scores,
    rebalance_frequency=REBALANCE_FREQ,
    num_positions=NUM_POSITIONS,
)

m = results["metrics"]
print(f"CAGR:     {m.get('cagr', 0):.1%}")
print(f"Sharpe:   {m.get('sharpe_ratio', 0):.2f}")
print(f"Max DD:   {m.get('max_drawdown', 0):.1%}")
print(f"Rebal/yr: {m.get('num_rebalances', 0) / max(1, len(prices_basic) / 252):.1f}")

Running backtest...
  Rebalance frequency: bimonthly
  Stop-loss: 12%
  Drift threshold: 5%
  Capital: $1,000,000


Stop-loss triggered for EYLD on 2022-03-07: sold 1200.90 shares @ $23.79
Stop-loss triggered for ECOW on 2022-04-26: sold 1902.14 shares @ $16.75
Stop-loss triggered for FNDC on 2022-05-09: sold 1257.50 shares @ $28.12
Stop-loss triggered for IQDF on 2022-05-09: sold 1505.48 shares @ $17.34
Stop-loss triggered for DLS on 2022-06-13: sold 542.33 shares @ $50.25
Stop-loss triggered for DIM on 2022-06-14: sold 627.58 shares @ $48.61
Stop-loss triggered for ISVL on 2022-06-16: sold 1865.16 shares @ $25.54
Stop-loss triggered for EWK on 2022-07-12: sold 3816.27 shares @ $15.46
Stop-loss triggered for INEQ on 2022-09-20: sold 2809.99 shares @ $18.81
Stop-loss triggered for IDV on 2022-09-21: sold 1973.59 shares @ $20.35
Stop-loss triggered for EFAS on 2022-09-23: sold 6694.74 shares @ $9.57
Stop-loss triggered for EWK on 2022-09-23: sold 3757.56 shares @ $13.72
Stop-loss triggered for DTH on 2022-09-23: sold 1868.44 shares @ $26.41
Stop-loss triggered for DWX on 2022-09-23: sold 1439.56 shar


Results:
  CAGR:              12.3%
  Sharpe:            0.66
  Sortino:           0.95
  Max Drawdown:      -23.4%
  Volatility:        12.8%
  Total Return:      78.7%
  Rebalances:        4 (0.8/year)
  Win Rate:          55%
CAGR:     12.3%
Sharpe:   0.66
Max DD:   -23.4%
Rebal/yr: 0.8


---
## Section 6: Trade Recommendations

Connects to IB, pulls live positions, generates trade plan with $70k cash reserve.

In [14]:
from scripts.s6_trades import generate_trades

trades = generate_trades(
    target_weights, LIVE_DIR,
    ib_host=IB_HOST, ib_port=IB_PORT,
)

if trades:
    import pandas as pd
    pd.DataFrame(trades)

Connected: U9544585

Account: NLV=$188,548  Cash=$95,442  Reserve=$70,000  Deployable=$25,442
Positions: 21  |  Live prices: 38

Trade plan: 25 trades
  Buys: $109,346  |  Sells: $83,931
  Cash after: $70,026 (reserve: $70,000)
Saved: /home/stuar/trading/live_portfolio/trade_plan.csv


---

**Next step:** Open `02_execute_trades.ipynb` to review, edit, and execute the trade plan.