CQF Final project

"We need to test if time-series are weakly stationary, or integrated of order zero (I(0)), if its statistical properties—specifically its mean, variance, and autocovariance—are invariant with respect to time. The majority of financial price series do not exhibit this property; they are typically non-stationary and contain a unit root, meaning they are integrated of order one (I(1)). A critical issue arises when standard regression techniques are applied to I(1) series. Regressing one I(1) series on another can lead to a "spurious regression," a situation where high R-squared values and statistically significant coefficients are observed even when no genuine economic relationship exists between the variables. This necessitates formal testing for stationarity."

The augmented Dickey–Fuller specification is:

$$
\Delta y_t = \alpha + \beta\,t + \gamma\,y_{t-1}
        + \sum_{i=1}^{p} \delta_i\,\Delta y_{t-i} + \varepsilon_t
$$

The hypotheses are:

- Null hypothesis: $H_0: \gamma = 0$
  (implying a unit root; the series is non-stationary)

- Alternative hypothesis: $H_1: \gamma < 0$
  (implying stationarity)


The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test statistic for trend‐stationarity is given by

$$
\mathrm{KPSS} \;=\;
\frac{1}{T^2} \sum_{t=1}^T S_t^2 \;\bigg/\; \widehat{\sigma}^2
$$

where

$S_t = \sum_{i=1}^t \widehat{u}_i$

$\widehat{u}_i = y_i - \widehat{\beta}_0 - \widehat{\beta}_1\,i$
are the residuals from the OLS regression of \(y_t\) on an intercept and time trend.  
$(\widehat{\sigma}^2\)$ is a consistent estimate of the long‐run variance of $(\widehat{u}_t\)$, often computed via a Newey–West estimator:
  $$
  \widehat{\sigma}^2
  = \frac{1}{T}\sum_{t=1}^T \widehat{u}_t^2
    \;+\; 2 \sum_{\ell=1}^L w\bigl(\ell,L\bigr)\,
    \frac{1}{T}\sum_{t=\ell+1}^T \widehat{u}_t\,\widehat{u}_{t-\ell},
  $$
  with Bartlett weights \(w(\ell,L)=1-\ell/(L+1)\).

The hypotheses reverse those of the ADF:

- **Null hypothesis** (stationarity around a deterministic trend):  
  $$H_0:\; \{y_t\}\text{ is trend‐stationary}$$

- **Alternative hypothesis** (presence of a unit root):  
  $$H_1:\; \{y_t\}\text{ has a unit root (non‐stationary)}$$

**Interpretation:**  
- A large KPSS statistic leads to rejection of \(H_0\), suggesting non‐stationarity.  
- Used alongside the ADF:  
  - **Fail to reject ADF null** (evidence of unit root) **and** **reject KPSS null** (evidence against stationarity) ⇒ strong confirmation that \(y_t\) is \(I(1)\).


## Statistical Analysis and Threshold Optimization

This section performs the statistical tests and threshold optimization for the downloaded asset groups.

In [None]:
from datetime import datetime, timedelta
from data_download import fetch_asset_data

In [20]:
end_date   = datetime.now()
start_date = end_date - timedelta(days=5*365)

asset_groups = {
    # Commodities
    "precious_metals_triple": ["GC=F", "SI=F", "PL=F"], # Gold, Silver, Platinum Futures
    "oil_pair": ["CL=F", "BZ=F"],                     # WTI, Brent Crude Futures
    "agri_pair": ["ZC=F", "ZS=F"],                    # Corn, Soybean Futures

    # Fixed Income & Currency
    "yield_pair": ["^TNX", "IGLT.L"],                 # US 10Y Yield, iShares UK Gilts ETF
    "currency_pair": ["AUDUSD=X", "CADUSD=X"],        # AUD/USD, CAD/USD

    # Volatility
    "volatility_pair": ["^VIX", "VIXY"],            # US VIX Index vs. Short-Term VIX Futures ETF*

    # Country Indices
    "eu_index_pair_1": ["^FCHI", "^GDAXI"],           # CAC 40, DAX
    "eu_index_pair_2": ["^IBEX", "FTSEMIB.MI"],      # IBEX 35, FTSE MIB

    # Equities
    "fr_banking_pair": ["BNP.PA", "GLE.PA"],          # BNP Paribas, Societe Generale
    "fast_fashion_pair": ["ITX.MC", "HM-B.ST"],       # Inditex, H&M
    "german_auto_triple": ["VOW3.DE", "MBG.DE", "BMW.DE"], # VW, Mercedes, BMW
    "investor_ab_pair": ["INVE-A.ST", "INVE-B.ST"],    # Investor A, Investor B
    "vw_porsche_pair": ["VOW3.DE", "P911.DE"],        # VW, Porsche AG
    "semiconductor_pair": ["ASML.AS", "IFX.DE"],      # ASML, Infineon

    # ETFs
    "sector_etf_pair": ["XLRE", "XLU"]                # Real Estate ETF, Utilities ETF
}

all_data = fetch_asset_data(asset_groups, start_date, end_date)

[*********************100%***********************]  3 of 3 completed

[*********************100%***********************]  2 of 2 completed
[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed
[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed
[                       0%                       ]
[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  2 of 2 completed

[*********************100%***********************]  3 of 3 completed

[*********************100%***********************]  2 of 2 completed
[                       0%                 

In [None]:
records = []
for group, df in all_data.items():
    # Univariate tests
    for col in df.columns:
        records.append({'group': group, 'asset': col, **adf_results(df[col]), **kpss_results(df[col])})

    # Pair vs. Triple logic
    n_assets = len(df.columns)
    if n_assets == 2:
        y, x = df.columns
        # Engle–Granger
        eg = engle_granger(df, y, x)
        records.append({'group': group, 'test': 'Engle-Granger', 'beta': eg['beta'], 'eg_pvalue': eg['eg_pvalue']})

        # Matrix OLS
        X0 = sm.add_constant(df[x])
        mbeta = matrix_ols_regression(df[y].values, X0.values)
        if mbeta is not None:
            records.append({
                'group': group,
                'test': 'Matrix-OLS',
                'const': mbeta[0],
                'slope': mbeta[1]
            })

        # If cointegrated, OU & ECM
        if eg['spread'] is not None:
            ou = ou_params(eg['spread'])
            records.append({'group': group, 'test': 'OU', **ou})

            ecm = analyze_error_correction_model(df[y], df[x], eg['spread'])
            records.append({'group': group, 'test': 'ECM', **ecm})

        # Kalman summary
        kf = kalman_hedge(df, y, x)
        records.append({
            'group': group,
            'test': 'Kalman',
            'kf_beta_mean': kf['kf_beta'].mean()
        })

    elif n_assets == 3:
        # Johansen for triples
        jres = johansen(df)
        records.append({'group': group, 'test': 'Johansen', **jres})

summary_df = pd.DataFrame(records)