# ETF Analysis Pipeline

Runs the full analysis pipeline via thin wrappers over `notebooks/scripts/s1-s6`.

| Section | Script | Description |
|---------|--------|-------------|
| 0 | - | Setup & Configuration |
| 1 | s1_universe | Universe Discovery (~5,000 ETFs) |
| 2 | s2_collect | Historical Data Collection (IB + yfinance) |
| 3 | s3_factors | Factor Scoring (Mom/Qual/Val/Vol) |
| 4 | s4_optimize | Portfolio Construction |
| 5 | s5_backtest | Backtesting |
| 6 | s6_trades | Trade Recommendations |

Output: `~/trading/live_portfolio/trade_plan.csv` → used by `02_execute_trades.ipynb`

---
## Section 0: Setup

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import warnings
from pathlib import Path

import nest_asyncio
nest_asyncio.apply()
warnings.filterwarnings("ignore")

PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT / "src"))
sys.path.insert(0, str(PROJECT_ROOT / "notebooks"))

DATA_DIR = Path.home() / "trade_data" / "ETFTrader"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"
IB_CACHE_DIR = DATA_DIR / "ib_historical"
LIVE_DIR = Path.home() / "trading" / "live_portfolio"

for d in [PROCESSED_DIR, LIVE_DIR, IB_CACHE_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# ── Strategy parameters ──────────────────────────────────
# Factor weights (value auto-skipped if no expense data)
FACTOR_WEIGHTS = {
    "momentum": 0.35, "quality": 0.30,
    "value": 0.15, "volatility": 0.20,
}
NUM_POSITIONS = 30           # 30 positions per backtest recommendation
OPTIMIZER_TYPE = "mvo"       # "mvo" (recommended, robust) | "rankbased" | "minvar"
REBALANCE_FREQ = "quarterly" # "quarterly" | "bimonthly" | "monthly"

# ── Cash deployment ──────────────────────────────────────
# Set DEPLOY_CASH to deploy a specific amount of additional cash.
# Set to None to use the default cash_reserve mode ($70k held back).
DEPLOY_CASH = None #50_000           # e.g. 50_000 to deploy $50k more

# IB settings
IB_HOST = "127.0.0.1"
IB_PORT = 4001
IB_CLIENT_ID = 5

print(f"Data:       {DATA_DIR}")
print(f"Output:     {LIVE_DIR}")
print(f"Positions:  {NUM_POSITIONS}")
print(f"Optimizer:  {OPTIMIZER_TYPE}")
print(f"Rebalance:  {REBALANCE_FREQ}")
if DEPLOY_CASH is not None:
    print(f"Deploy:     ${DEPLOY_CASH:,.0f} additional cash")
else:
    print(f"Deploy:     auto (NLV-based, $70k reserve)")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Data:       /home/stuar/trade_data/ETFTrader
Output:     /home/stuar/trading/live_portfolio
Positions:  30
Optimizer:  mvo
Rebalance:  quarterly
Deploy:     $50,000 additional cash


---
## Section 1: Universe Discovery

In [16]:
from scripts.s1_universe import discover_universe

all_tickers, categories, universe_df = discover_universe(PROJECT_ROOT)
print(f"Universe: {len(all_tickers)} ETFs")

Full universe: 5042 ETFs
  Categorized (curated):  792
  Uncategorized (NASDAQ): 4250
  Leveraged/inverse:      75 (kept for data collection)
Universe: 5042 ETFs


---
## Section 2: Data Collection

Set `RUN_COLLECTION = True` to connect to IB Gateway.

In [None]:
RUN_COLLECTION = True  # Set True to connect to IB

from scripts.s2_collect import collect_data, apply_quality_filter

prices = collect_data(
    tickers=all_tickers,
    ib_cache_dir=IB_CACHE_DIR,
    processed_dir=PROCESSED_DIR,
    ib_host=IB_HOST, ib_port=IB_PORT, ib_client_id=IB_CLIENT_ID,
    run_collection=RUN_COLLECTION,
)

if prices is not None:
    prices = apply_quality_filter(prices)
    print(f"Prices: {prices.shape[1]} tickers x {prices.shape[0]} days")

Loaded IB prices: 3801 tickers x 1256 days
  Range: 2021-02-16 to 2026-02-13
Quality filter: 3801 -> 2080 tickers
Date range: 2021-02-16 to 2026-02-13
Trading days: 1256
Prices: 2080 tickers x 1256 days


---
## Section 3: Factor Scoring

In [18]:
from scripts.s3_factors import score_factors

combined_scores, prices_basic, factor_detail = score_factors(
    prices, factor_weights=FACTOR_WEIGHTS,
    categories=categories, raw_dir=RAW_DIR,
)
print(f"Scores: {len(combined_scores)} tickers")
combined_scores.nlargest(20)

Filtered out 69 leveraged ETFs: AGQ, BOIL, CURE, DDM, DRV, DUST, DXD, EDZ, ERX, ERY and 59 more
Basic model: 2011 tickers (69 leveraged/inverse excluded)
Calculating factors...
  Momentum:   2011 tickers
  Quality:    2010 tickers
  Value:      SKIPPED (no expense ratio data available)
              Redistributing weight to remaining factors
              Adjusted weights: momentum=41%, quality=35%, volatility=24%
  Volatility: 2011 tickers

Integrated scores: 2011 tickers
Active factors: momentum, quality, volatility (weights: 41%, 35%, 24%)
Scores: 2011 tickers


EFAS    0.892491
GLDI    0.882638
FID     0.881544
FGD     0.869354
EWK     0.868652
UIVM    0.868582
ROAM    0.865994
INEQ    0.864719
VIDI    0.864623
DTH     0.863094
WDIV    0.861654
ISVL    0.858801
VYMI    0.857895
RODM    0.857143
DWX     0.855511
FYLD    0.848928
IDV     0.848671
ECOW    0.848187
FNDC    0.848174
FIDI    0.846845
dtype: float64

---
## Section 4: Portfolio Construction

In [19]:
from scripts.s4_optimize import build_portfolio

target_weights = build_portfolio(
    combined_scores, prices_basic,
    num_positions=NUM_POSITIONS,
    optimizer_type=OPTIMIZER_TYPE,
    factor_detail=factor_detail,
)

target_weights.to_csv(LIVE_DIR / "target_portfolio_latest.csv", header=True)
print(f"\nPortfolio: {len(target_weights)} positions")

Optimizer: MVO, Positions: 30

Portfolio: 30 positions
Max weight: 7.5%
Min weight: 3.0%
HHI: 0.0361
Expected vol: 13.1%

──────────────────────────────────────────────────────────────────────────────────────────
 #  Ticker  Weight  Name                                        Momen  Quali  Volat
──────────────────────────────────────────────────────────────────────────────────────────
 1  EFAS     7.5%  Global X MSCI SuperDividend EAFE ETF         93%   100%    70% 
 2  GLDI     5.1%  UBS ETRACS Gold Shares Covered Call ETN      88%    99%    74% 
 3  FID      5.4%  First Trust S&P Intl Dividend Aristocra...   88%   100%    74% 
 4  FGD      3.3%  First Trust Dow Jones Global Select Div...   93%   100%    63% 
 5  EWK      3.0%  iShares MSCI Belgium ETF                     90%    98%    68% 
 6  UIVM     3.3%  VictoryShares Intl Value Momentum ETF        93%   100%    63% 
 7  ROAM     3.0%  Hartford Multifactor Emerging Markets ETF    90%    99%    66% 
 8  INEQ     3.1%  Columbia Int

---
## Section 5: Backtesting

In [20]:
from scripts.s5_backtest import run_backtest

results = run_backtest(
    prices_basic, combined_scores,
    rebalance_frequency=REBALANCE_FREQ,
    num_positions=NUM_POSITIONS,
)

Running backtest (HISTORICAL SIMULATION)...
  Period:     2021-02-16 to 2026-02-13
  Rebalance:  quarterly
  Stop-loss:  12%
  Drift:      5%
  Capital:    $1,000,000



Stop-loss triggered for SDIV on 2022-03-02: sold 1037.06 shares @ $21.11
Stop-loss triggered for EYLD on 2022-03-07: sold 937.89 shares @ $23.79
Stop-loss triggered for ECOW on 2022-04-26: sold 1535.91 shares @ $16.75
Stop-loss triggered for FNDC on 2022-05-09: sold 888.64 shares @ $28.12
Stop-loss triggered for IQDF on 2022-05-09: sold 1099.94 shares @ $17.34
Stop-loss triggered for ROAM on 2022-06-13: sold 2149.98 shares @ $17.14
Stop-loss triggered for DLS on 2022-06-13: sold 416.56 shares @ $50.25
Stop-loss triggered for DEEF on 2022-06-13: sold 900.45 shares @ $22.45
Stop-loss triggered for JPIN on 2022-06-13: sold 463.68 shares @ $41.86
Stop-loss triggered for DIM on 2022-06-14: sold 482.03 shares @ $48.61
Stop-loss triggered for ISVL on 2022-06-16: sold 1233.05 shares @ $25.54
Stop-loss triggered for UIVM on 2022-06-23: sold 1119.65 shares @ $34.46
Stop-loss triggered for VIDI on 2022-07-05: sold 1899.74 shares @ $18.35
Stop-loss triggered for EWK on 2022-07-12: sold 2580.41 sha


──────────────────────────────────────────────────
BACKTEST RESULTS (historical simulation)
──────────────────────────────────────────────────
  CAGR:              10.8%
  Sharpe:            0.59
  Sortino:           0.87
  Max Drawdown:      -21.7%
  Volatility:        11.5%
  Total Return:      66.4%
  Rebalances:        2 (0.4/year)
  Win Rate:          55%

  Stop-loss events:  27 (historical)
──────────────────────────────────────────────────

  NOTE: 27 stop-loss events occurred DURING
  the backtest (mostly the 2022 drawdown).
  They show how the strategy protects capital.
  They do NOT affect today's target portfolio.


---
## Section 6: Trade Recommendations

**Three-phase process:**
1. **Cleanup** — Cancel orphan orders, cover shorts, put account in clean state
2. **Generate** — Compare target vs live positions, produce trade plan
3. **Snapshot** — Write `portfolio_state.json` as the contract for `02_execute_trades.ipynb`

**Cash deployment modes:**
- `DEPLOY_CASH = None` — sizes positions to full NLV, caps buys by $70k cash reserve
- `DEPLOY_CASH = 50_000` — deploys exactly $50k more, sizes to (invested + $50k)

In [21]:
from scripts.s6_trades import cleanup_account, generate_trades, write_portfolio_state

# Phase 1: Clean up the account (cancel orders, cover shorts)
print("=" * 60)
print("PHASE 1: Account Cleanup")
print("=" * 60)
cleanup = cleanup_account(
    ib_host=IB_HOST, ib_port=IB_PORT,
    ib_client_id=IB_CLIENT_ID + 1,  # client 6 for cleanup (read-write)
)

# Phase 2: Generate trade plan
print("\n" + "=" * 60)
print("PHASE 2: Generate Trade Plan")
print("=" * 60)
trades, context = generate_trades(
    target_weights, LIVE_DIR,
    combined_scores=combined_scores,
    ib_host=IB_HOST, ib_port=IB_PORT,
    ib_client_id=IB_CLIENT_ID,  # client 5 (read-only)
    deploy_cash=DEPLOY_CASH,
    return_context=True,
)

# Phase 3: Write state file (contract for notebook 2)
if trades:
    print("\n" + "=" * 60)
    print("PHASE 3: Write Portfolio State")
    print("=" * 60)
    state_file = write_portfolio_state(
        cleanup_result=cleanup,
        trades=trades,
        target_weights=target_weights,
        live_dir=LIVE_DIR,
        deploy_cash=DEPLOY_CASH,
        sizing_basis=context.get("sizing_basis"),
        ib_positions=context.get("ib_positions"),
        live_prices=context.get("live_prices"),
        cash=context.get("cash"),
        nlv=context.get("nlv"),
        account=context.get("account"),
    )
    print(f"\nNext step: Open 02_execute_trades.ipynb")

    import pandas as pd
    pd.DataFrame(trades)
else:
    print("\nNo trades generated. Nothing to execute.")

PHASE 1: Account Cleanup
Connected for cleanup: U9544585
  No open orders to cancel

──────────────────────────────────────────────────
CLEANUP COMPLETE
  Orders cancelled: 0
  Shorts covered:   0
  Remaining orders: 0
  Remaining shorts: 0
  Cash after:       $98,572
  NLV after:        $187,213
──────────────────────────────────────────────────

PHASE 2: Generate Trade Plan
Connected: U9544585

Account: NLV=$187,213  Cash=$98,572  Invested=$88,481
Deploy mode: deploying $50,000 additional cash
Sizing basis: $138,481 (invested + deploy_cash)
Positions: 31  |  Live prices: 31

Trade plan: 30 trades
  Buys: $49,984  |  Sells: $0
  Cash after: $48,588 (deployed: $50,000)
Saved: /home/stuar/trading/live_portfolio/trade_plan.csv

──────────────────────────────────────────────────────────────────────
PORTFOLIO TRANSITION SUMMARY
──────────────────────────────────────────────────────────────────────
  Retained:  30 positions (already held & still in target)
  Exiting:   0 positions (held but

---

**Next step:** Open `02_execute_trades.ipynb` to verify state, execute, and confirm fills.

The `portfolio_state.json` file ensures notebook 2 can detect if anything
changed between analysis and execution (positions, cash, or file edits).