<br><br><br><br>
<h1>02. Evaluating the Buy-the-dip trading strategy</h1>

<p>Modeling "dips" through a poisson process and copulas for dependence to evaluate optimal time to free up capial to "buy the dip".</p>

<br><br><br><br>
<h2>0. Document setup</h2>

In [3]:
import sys
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
from arch import arch_model
from statsmodels.stats.diagnostic import acorr_ljungbox
from typing import Dict, Union, Optional
from scipy.stats import kendalltau, spearmanr, kstest, uniform, norm
from statsmodels.distributions.empirical_distribution import ECDF
from scipy.stats import t as tdist
from scipy.special import gammaln
from scipy.optimize import minimize_scalar, minimize
from dataclasses import dataclass
from typing import Callable, Dict, Any, Optional, Sequence, Tuple
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401
from scipy.stats import multivariate_normal, multivariate_t


# Add project root so we can import from src/
project_root = Path("..").resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))



from src.data_handlers.yfinance_data_loader import load_prices, load_prices_many






<br><br><br><br>
<h2>1. Data preparation</h2>

In [4]:
tickers = ["AAPL", "MSFT", "GE", "AMZN", "TSLA"]
raw_prices = load_prices_many(tickers, "2015-01-01", "2024-01-01")
list(raw_prices.keys()), {k: v.shape for k, v in raw_prices.items()}

[1/5] Loading AAPL...
  -> ok: 2264 rows, 5 cols
[2/5] Loading MSFT...
  -> ok: 2264 rows, 5 cols
[3/5] Loading GE...
  -> ok: 2264 rows, 5 cols
[4/5] Loading AMZN...


  df = yf.download(ticker, start=start, end=end, progress=False, group_by="column")


  -> ok: 2264 rows, 5 cols
[5/5] Loading TSLA...


  df = yf.download(ticker, start=start, end=end, progress=False, group_by="column")


  -> ok: 2264 rows, 5 cols


(['AAPL', 'MSFT', 'GE', 'AMZN', 'TSLA'],
 {'AAPL': (2264, 5),
  'MSFT': (2264, 5),
  'GE': (2264, 5),
  'AMZN': (2264, 5),
  'TSLA': (2264, 5)})

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
## Buy-the-Dip (Cox / EVT / Multivariate) - Project To-Do

### 0) Problem framing (decision rule)
- Define action: “prepare capital” vs “deploy”
- Fix horizon `Δ` (e.g. 5/10/20 trading days) and threshold `θ`
- Target signal:  \( P(\ge 1\ \text{dip in } [t,t+Δ] \mid \mathcal{F}_t) > θ \)

### 1) Data + preprocessing
- Load adjusted prices for all assets (same calendar, handle missing)
- Compute returns (log or simple) + any needed features (vol, momentum, drawdown)
- Choose evaluation split scheme (walk-forward / expanding window)

### 2) Define “dip” events (label construction)
- Choose `dip_length = L`
- Define loss window:  \( Y_t = -\sum_{k=1}^{L} r_{t+k} \)
- Optional rebound filter (avoid counting crash continuation as “dip opportunity”)
- Decluster event times to avoid overlapping-window double counts

### 3) EVT severity layer (define extreme threshold)
- Standardize losses by regime/vol:  \( \tilde{Y}_t = Y_t / \hat{\sigma}_{t,L} \)
- Fit POT/GPD to exceedances above high threshold `u`
- Define dip threshold as tail-quantile `u_q` (e.g. 0.95/0.99) on \( \tilde{Y}_t \)
- Output: binary dip events + severity distribution (tail risk)

### 4) Cox (doubly-stochastic Poisson) timing layer
- Build state vector `X_t` (realized vol, drawdown, trend, cross-asset stress, etc.)
- Fit intensity per asset:  \( \lambda_i(t) = \exp(\beta_i^\top X_t) \)
  - (optional) include latent shared factor \( Z_t \): \( \lambda_i(t) = \exp(\beta_i^\top X_t + \gamma_i Z_t) \)
- Compute forward probability:
  \( P_i(\ge 1\ \text{dip in } [t,t+Δ]) = 1-\exp(-\int_t^{t+Δ}\lambda_i(u)\,du) \)

### 5) Strategy logic (capital readiness → trades)
- “Prepare” when \( P_i > θ_1 \); “deploy” rules (size, laddering, risk limits)
- Add cost model (carry/opportunity cost) + constraints (max cash, turnover)

### 6) Multivariate layer (joint stress / dependence)
- Build multivariate “stress” measures (market factor, vol factor, correlation spike)
- Choose a dependence model for extremes / co-dips (copula or shared-intensity factor)
- Produce portfolio-level probability of “≥1 dip somewhere” and/or “co-dip” risk

### 7) Backtest + diagnostics
- Compare against baselines (homogeneous Poisson, vol trigger, simple drawdown rule)
- Metrics: hit rate, lead time, drawdown, turnover, opportunity cost, PnL
- Calibration: do predicted dip probabilities match realized frequencies?
