 # **Semivariances → MHAR‑ReCov Spillovers (STEP‑BY‑STEP PIPELINE)**



 End‑to‑end, reproducible code that:



 1. Loads intraday day‑ahead electricity prices *(Europe / Australia)*

 2. Computes hourly **simple returns**

 3. Aggregates to daily **positive** and **negative** realised semicovariance matrices (ReCov⁺, ReCov⁻)

 4. Converts each matrix to its *vech* vector

 5. Applies a **Probability‑Integral Transform (PIT)** element‑wise to stabilise variances

 6. Saves intermediary Parquet artefacts

 7. Runs two MHAR‑ReCov LASSO spillover analyses – one for ReCov⁺, one for ReCov⁻ –

    and prints previews (`head()`) plus Total Spillover Indices.



 Run each cell sequentially; every major step prints a preview so you can inspect

 the data as it flows through the pipeline. No single “driver” function is used –

 execution is fully transparent and incremental.



 ---

 **Usage**



 * Open this file in Jupyter / VS Code. Each `# %%` marker denotes a cell.

 * Execute cells from top to bottom. When prompted, pick *europe* or *australia*.

 * Scroll to follow the transformation – outputs appear inline after each cell.

 ------------------------------------------------------------------------------

In [70]:
# Imports & global config
import os, json, random, warnings
from scipy.stats import norm, rankdata
import numpy as np, pandas as pd
from tqdm.notebook import tqdm

# ML / stats
from sklearn.model_selection import KFold
from sklearn.linear_model import MultiTaskLassoCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from utils.mhar_utils import FAST_LASSO_OPTS, create_mhar_lags, gvd, spillover_metrics

# Display helpers (Jupyter)
import ipywidgets as widgets
from IPython.display import display, Markdown

warnings.filterwarnings('ignore')
SEED = 12345
np.random.seed(SEED); random.seed(SEED)


 ## 0 · Choose study area

 Pick between the European or Australian dataset. Subsequent cells read

 `area_dd.value`, so re‑running a cell after changing the dropdown automatically

 recomputes that step with the new selection.

In [71]:
area_dd = widgets.Dropdown(options=["europe", "australia"], value="europe", description="Dataset:")
display(area_dd)

# Convenience alias used by subsequent cells -----------------------------------
get_area = lambda: area_dd.value


Dropdown(description='Dataset:', options=('europe', 'australia'), value='europe')

 ## 1 · Helpers: data loading, returns, display utilities

In [54]:
PRICES_PATHS = {
    "europe":    "parquet_files/filtered_data.parquet",
    "australia": "parquet_files/filtered_data_australia.parquet",
}

# ---------------------------------------------------------------------------
# Generic preview printer
# ---------------------------------------------------------------------------

def show(title, obj, n=5):
    """Pretty‑print a markdown title + DataFrame/Series preview."""
    display(Markdown(f"### {title}"))
    if isinstance(obj, dict):      # show first item when dict of DFs
        k, v = next(iter(obj.items()))
        display(Markdown(f"*First key:* **{k}**"))
        display(v.head(n))
    else:
        display(obj.head(n))

# ---------------------------------------------------------------------------
# ETL helpers
# ---------------------------------------------------------------------------

def load_prices(area: str) -> pd.DataFrame:
    """Return wide price DataFrame (index = timestamp, columns = areas)."""
    df = (
        pd.read_parquet(PRICES_PATHS[area])
          .sort_values(["Area", "Start DateTime"])
    )
    wide = (
        df.pivot(index="Start DateTime", columns="Area", values="Day-ahead Price (EUR/MWh)")
          .sort_index()
    )
    return wide


def simple_returns(prices):
    ret = prices.diff().dropna()
    ret = ret.replace([np.inf, -np.inf], np.nan).dropna(how="any")
    ret["Date"] = ret.index.date
    return ret




 ## 2 · Helpers: semicovariances, vectorisation, transforms

In [91]:
# ---------------------------------------------------------------------------
# Daily realised semicovariances (ReCov⁺ / ReCov⁻)
# ---------------------------------------------------------------------------

def daily_semicov(ret: pd.DataFrame):
    pos, neg = {}, {}
    for day, grp in tqdm(ret.groupby("Date"), desc="daily semicov"):
        r = grp.drop(columns="Date")
        T, N = r.shape
        cov_p = np.zeros((N, N))
        cov_n = np.zeros((N, N))
        m_plus = np.zeros((N, N))
        m_minus = np.zeros((N, N))

        for row in r.values:
            rp = np.clip(row, 0, None)
            rn = np.clip(row, None, 0)
            cov_p += np.outer(rp, rp)
            cov_n += np.outer(rn, rn)
            m_plus += np.outer(rp, rn)
            m_minus += np.outer(rn, rp)

        # Optionally, divide by T or just leave as sums (PIT will normalize scale)
        # cov_p /= T; cov_n /= T; m_plus /= T; m_minus /= T

        # For ReCov+ (positive semi), take cov_p + m_plus
        # For ReCov- (negative semi), take cov_n + m_minus

        cols = r.columns
        pos[day] = pd.DataFrame(cov_p + m_plus, index=cols, columns=cols)
        neg[day] = pd.DataFrame(cov_n + m_minus, index=cols, columns=cols)

    return pos, neg


# ---------------------------------------------------------------------------
# vech vectorisation 
# ---------------------------------------------------------------------------

def build_vech_dataframe(cov_dict):
    example = next(iter(cov_dict.values()))
    areas   = example.columns.tolist()
    labels  = [
        f"{areas[i]}" if i == j else f"{areas[i]}-{areas[j]}"
        for i in range(len(areas))
        for j in range(i, len(areas))
    ]

    records, dates = [], []
    for day in sorted(cov_dict.keys()):
        mat = cov_dict[day].values
        records.append(mat[np.tril_indices(len(mat))])
        dates.append(day)

    df = pd.DataFrame(records, index=pd.to_datetime(dates), columns=labels)
    return df, labels


def pit_transform(df):
    ranks = df.rank(axis=0, method="average")
    U     = ranks.div(len(df) + 1)
    return pd.DataFrame(norm.ppf(U), index=df.index, columns=df.columns)


# ---------------------------------------------------------------------------
# MHAR‑ReCov spillover estimation (PIT + estandarización)  — H = 1
# ---------------------------------------------------------------------------
def mhar_spillover(pit_path: str, area: str):
    """
    Ajusta un MHAR‑ReCov con LASSO multirrespuesta sobre datos PIT ya
    centrados/estandarizados (igual que glmnet en R) y devuelve la tabla
    GFEVD en % y el TSI.
    """
    pit_df = pd.read_parquet(pit_path)
    with open(f"parquet_files/vech_labels_{area}.json") as f:
        pit_df.columns = json.load(f)

    # 1 · Regresores HAR centrados
    X_full = create_mhar_lags(pit_df)
    X      = X_full.drop(columns=pit_df.columns)
    Y = pit_df.loc[X.index]

    # 2 · Pipeline: StandardScaler → MultiTaskLassoCV
    pipe = make_pipeline(
        StandardScaler(with_mean=True, with_std=True),
        MultiTaskLassoCV(**FAST_LASSO_OPTS)
    )
    pipe.fit(X, Y)
    mtl = pipe.named_steps['multitasklassocv']

    # 3 · Betas y Φ₁
    B, ints = mtl.coef_, mtl.intercept_          # (K, 3K)  y (K,)
    Bd, Bw, Bm = np.split(B, 3, axis=1)
    Phi1 = Bd + Bw/7 + Bm/30

    # 4 · Residuales y Σ
    Y_hat = pipe.predict(X)
    resid = Y.values - Y_hat
    Sigma = resid.T @ resid / resid.shape[0]

    # 5 · FEVD (Pesaran–Shin) via util gvd
    K        = Y.shape[1]
    theta, θ = gvd([np.eye(K), Phi1], Sigma)      # θ = row-normalised

    # 6 · TSI
    tsi, *_  = spillover_metrics(θ)

    fevd_pct = pd.DataFrame(θ*100, index=Y.columns, columns=Y.columns)
    return fevd_pct, tsi



# ---------------------------------------------------------------------------
# Optional: save daily semicovariances in long format ------------------------
# ---------------------------------------------------------------------------

def flat_save(cov_dict, tag, area):
    rows = []
    for d, mat in cov_dict.items():
        for i, a in enumerate(mat.index):
            for j, b in enumerate(mat.columns):
                rows.append({"Date": d, "Market1": a, "Market2": b, "Value": mat.iloc[i, j]})
    pd.DataFrame(rows).to_parquet(f"parquet_files/daily_semicov_{tag}_{area}.parquet")

# Ensure output directory exists -------------------------------------------
os.makedirs("parquet_files", exist_ok=True)


 ## 3 · Load intraday prices → hourly simple returns

 Run this cell to ingest the raw day‑ahead price data and compute hourly

 log‑returns.

In [92]:
# 1 · Load prices ------------------------------------------------------------
prices = load_prices(get_area())
show("Intraday prices", prices)

# 2 · Hourly log‑returns -----------------------------------------------------
rets = simple_returns(prices)
show("Hourly simple returns", rets)


### Intraday prices

Area,nsw,qld,sa,tas,vic
Start DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2009-07-01 00:00:00,16.941263,17.65,16.73028,15.67154,15.5
2009-07-01 00:05:00,17.709524,18.810089,17.82049,16.057039,15.5
2009-07-01 00:10:00,17.678644,18.617599,18.123159,15.90246,15.39
2009-07-01 00:15:00,16.736212,18.6113,17.623659,14.27313,12.81297
2009-07-01 00:20:00,15.63884,17.65,16.334089,13.24149,11.8


### Hourly simple returns

Area,nsw,qld,sa,tas,vic,Date
Start DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2009-07-01 00:05:00,0.768261,1.160089,1.09021,0.385499,0.0,2009-07-01
2009-07-01 00:10:00,-0.03088,-0.19249,0.302669,-0.154579,-0.11,2009-07-01
2009-07-01 00:15:00,-0.942432,-0.006299,-0.4995,-1.62933,-2.57703,2009-07-01
2009-07-01 00:20:00,-1.097372,-0.9613,-1.28957,-1.03164,-1.01297,2009-07-01
2009-07-01 00:25:00,-1.910073,-1.92397,-4.372559,-2.11276,1.06755,2009-07-01


 ## 4 · Intraday → daily ReCov⁺ / ReCov⁻ semicovariances

In [93]:
pos, neg = daily_semicov(rets)
show("ReCov⁺ (first day)", pos)
show("ReCov⁻ (first day)", neg)

# Optional long‑format save (uncomment if needed) ---------------------------
# flat_save(pos, "pos", get_area())
# flat_save(neg, "neg", get_area())


daily semicov:   0%|          | 0/3530 [00:00<?, ?it/s]

### ReCov⁺ (first day)

*First key:* **2009-07-01**

Area,nsw,qld,sa,tas,vic
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
nsw,610.588825,243.949303,896.602764,1313.744451,321.916446
qld,231.466893,239.13008,89.983677,357.120262,213.808685
sa,903.21001,186.179659,3236.679679,543.797443,325.664518
tas,1320.259527,375.179927,530.008942,4717.445965,549.077156
vic,312.496366,217.7929,268.960987,517.108733,312.548439


### ReCov⁻ (first day)

*First key:* **2009-07-01**

Area,nsw,qld,sa,tas,vic
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
nsw,625.114683,231.666146,816.005548,1513.349879,302.360545
qld,244.148556,236.393949,195.057745,294.674944,234.477808
sa,809.398303,98.861764,3124.609033,490.975,220.295498
tas,1506.834803,276.615279,504.763501,5978.828806,449.009071
vic,311.780626,230.493593,276.99903,480.977493,307.29257


 ## 5 · vech vectorisation + PIT transform

In [94]:
# vech vectorisation ---------------------------------------------------------
v_pos, labels = build_vech_dataframe(pos)
v_neg, _      = build_vech_dataframe(neg)
show("vech ReCov⁺", v_pos)
show("vech ReCov⁻", v_neg)

# PIT transform -------------------------------------------------------------
pit_pos = pit_transform(v_pos)
pit_neg = pit_transform(v_neg)
show("PIT ReCov⁺", pit_pos)
show("PIT ReCov⁻", pit_neg)

# Save Parquet artefacts -----------------------------------------------------
path_pos = f"parquet_files/pit_vech_pos_{get_area()}.parquet"
path_neg = f"parquet_files/pit_vech_neg_{get_area()}.parquet"

pit_pos.to_parquet(path_pos)
pit_neg.to_parquet(path_neg)
with open(f"parquet_files/vech_labels_{get_area()}.json", "w") as f:
    json.dump(labels, f)


### vech ReCov⁺

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
2009-07-01,610.588825,231.466893,239.13008,903.21001,186.179659,3236.679679,1320.259527,375.179927,530.008942,4717.445965,312.496366,217.7929,268.960987,517.108733,312.548439
2009-07-02,23178.296516,44783.58836,165524.863562,4367.980238,321.865695,6257.88067,3573.303725,732.828114,3090.275304,5279.371343,5882.775382,802.140581,6244.893133,4448.650614,8985.550303
2009-07-03,824.975591,807.558367,882.421731,845.337319,830.517949,945.697272,610.502208,414.865828,497.57405,1130.34511,892.964526,923.596354,948.325121,467.459047,1037.074145
2009-07-04,697.208237,476.450985,453.017406,574.106135,471.88754,712.193199,1291.15805,465.309879,471.08519,4450.235225,567.721131,516.881465,574.154649,527.81016,622.702592
2009-07-05,2748.9895,2634.620823,2569.863281,2691.936643,2591.34577,2667.054198,2748.063568,2523.036917,2604.73528,3160.411234,2714.500781,2617.063303,2672.181195,2625.622226,2725.962627


### vech ReCov⁻

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
2009-07-01,625.114683,244.148556,236.393949,809.398303,98.861764,3124.609033,1506.834803,276.615279,504.763501,5978.828806,311.780626,230.493593,276.99903,480.977493,307.29257
2009-07-02,13330.223976,34116.756993,148754.6135,4316.262084,-276.121781,5192.20025,2731.666351,1141.703403,1283.404625,5123.733804,6358.957223,45.825421,5815.927519,2954.557308,8673.800045
2009-07-03,521.716314,488.252898,628.928107,480.989498,380.381578,530.292419,540.108369,383.701082,460.525712,900.063547,530.308122,476.246658,508.829479,476.496332,585.900004
2009-07-04,486.142863,283.287976,290.400972,347.329106,285.959202,452.610544,1135.679674,149.39281,344.790233,4728.454849,341.15041,325.541167,369.151925,231.153361,402.414935
2009-07-05,2463.089792,2381.9762,2368.154441,2443.413317,2388.332758,2502.35776,2447.162618,2198.93736,2280.993486,3039.265622,2425.992895,2367.90281,2429.071409,2292.633978,2423.627253


### PIT ReCov⁺

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
2009-07-01,-0.715846,-0.929028,-1.288196,-0.344233,-0.62949,-0.092061,-0.157176,-0.008164,-0.074252,-0.05149,-0.644271,-0.592696,-0.705792,-0.079236,-0.905245
2009-07-02,0.34348,0.806923,1.056874,0.112042,-0.377571,0.067135,0.209158,0.21424,0.458942,-0.015973,0.382149,-0.031595,0.386735,0.583414,0.444007
2009-07-03,-0.579211,-0.304573,-0.486728,-0.37224,-0.018103,-0.481938,-0.488326,0.025204,-0.101333,-0.630355,-0.145687,0.021653,-0.124195,-0.111328,-0.21424
2009-07-04,-0.652154,-0.514894,-0.839802,-0.518139,-0.217874,-0.622581,-0.162928,0.067135,-0.121334,-0.083509,-0.329956,-0.187438,-0.310528,-0.072116,-0.451855
2009-07-05,-0.206982,0.020943,-0.104901,-0.016683,0.318733,-0.139949,0.119903,0.59016,0.417523,-0.208433,0.164367,0.331456,0.17517,0.438529,0.087784


### PIT ReCov⁻

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
2009-07-01,-0.685897,-0.876738,-1.239294,-0.369199,-0.926844,-0.093487,-0.089922,-0.066424,-0.04012,0.038699,-0.610562,-0.558347,-0.632955,-0.091348,-0.890374
2009-07-02,0.207707,0.70944,1.029957,0.101333,-1.46776,0.030885,0.115614,0.409018,0.249244,-0.025914,0.401312,-1.307978,0.376809,0.481938,0.436186
2009-07-03,-0.77011,-0.471599,-0.641652,-0.585098,-0.283078,-0.752091,-0.530356,0.060733,-0.076388,-0.728746,-0.338965,-0.218601,-0.313509,-0.0942,-0.462889
2009-07-04,-0.809875,-0.779695,-1.097452,-0.766296,-0.408247,-0.833757,-0.206982,-0.294179,-0.212062,-0.055755,-0.557518,-0.37148,-0.47398,-0.415201,-0.677836
2009-07-05,-0.220782,0.005324,-0.12491,-0.055755,0.290475,-0.15502,0.087784,0.613987,0.406704,-0.217147,0.142817,0.300858,0.170126,0.409018,0.07354


 ## 6 · MHAR‑ReCov spillover analyses

 Run the next cell to estimate the MHAR‑ReCov models for the positive and

 negative semicovariance series, display the spillover tables, and report the

 Total Spillover Indices (TSI).

In [95]:
# Spillovers: ReCov⁺ ---------------------------------------------------------
spill_p, tsi_p = mhar_spillover(path_pos, get_area())
show("Spillover table – ReCov⁺", spill_p.round(2))
show("Total Spillover Index (TSI⁺)", pd.Series({"TSI+": tsi_p}))

# Spillovers: ReCov⁻ ---------------------------------------------------------
spill_n, tsi_n = mhar_spillover(path_neg, get_area())
show("Spillover table – ReCov⁻", spill_n.round(2))
show("Total Spillover Index (TSI⁻)", pd.Series({"TSI-": tsi_n}))

### Spillover table – ReCov⁺

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
nsw,26.18,8.43,12.13,9.79,0.77,11.81,3.15,0.81,1.43,5.75,4.11,1.21,3.36,2.42,8.64
nsw-qld,11.59,35.58,25.42,1.64,5.37,1.35,0.53,1.9,1.09,0.56,3.41,4.13,2.38,1.36,3.68
nsw-sa,15.82,23.74,35.23,1.93,2.17,2.06,0.73,1.03,0.97,1.17,3.29,2.49,2.59,1.38,5.4
nsw-tas,10.4,1.16,1.4,27.99,2.37,20.99,1.57,1.16,1.48,1.98,7.82,3.05,4.3,3.98,10.35
nsw-vic,1.38,7.01,2.9,4.12,45.59,0.91,1.44,5.68,0.9,0.55,5.14,17.92,2.23,0.98,3.26


### Total Spillover Index (TSI⁺)

TSI+    66.595312
dtype: float64

### Spillover table – ReCov⁻

Unnamed: 0,nsw,nsw-qld,nsw-sa,nsw-tas,nsw-vic,qld,qld-sa,qld-tas,qld-vic,sa,sa-tas,sa-vic,tas,tas-vic,vic
nsw,25.03,8.18,11.45,10.81,2.18,11.34,3.32,0.86,1.35,5.53,4.24,2.09,2.59,2.64,8.39
nsw-qld,11.15,33.96,23.99,1.54,5.46,1.1,1.49,2.19,1.93,0.77,4.13,4.79,2.83,1.14,3.54
nsw-sa,15.11,22.83,34.01,2.32,2.71,2.0,1.07,1.29,1.29,1.1,3.87,3.57,2.27,1.29,5.27
nsw-tas,11.47,1.17,1.76,26.34,4.53,20.22,2.11,1.7,1.54,2.42,6.16,3.05,3.45,3.85,10.24
nsw-vic,3.37,6.36,3.13,6.86,38.54,2.18,2.03,6.14,1.39,0.57,5.81,16.02,2.65,0.98,3.96


### Total Spillover Index (TSI⁻)

TSI-    67.664098
dtype: float64