# Digital Twin v3.0: Hierarchical Stacking Ensemble

**Author:** Kian Mansouri Jamshidi
**Project Director:** Kian Mansouri Jamshidi
**Date:** 2025-09-27

## Objective
This is the ultimate experiment of Sprint 5. We will construct a multi-layered, hierarchical stacking ensemble—a true 'network of AIs'—to achieve the maximum possible performance. This architecture synthesizes the predictions of our best models through multiple layers of abstraction, culminating in a single 'Superior AI' prediction. This is our definitive attempt to break the 0.85 R² barrier.

### 1. Imports and Data Preparation

We load all necessary libraries and our three diverse champion models.

In [1]:
import pandas as pd
import numpy as np
import glob
import joblib
from pathlib import Path

from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import r2_score
from sklearn.linear_model import RidgeCV
import lightgbm as lgb

# --- Load Data ---
PROJECT_ROOT = Path('.').resolve().parent
TELEMETRY_DIR = PROJECT_ROOT / 'data' / 'telemetry_v2'
ARTIFACT_DIR = PROJECT_ROOT / 'artifacts' / 'phase2'
df_list = [pd.read_parquet(file) for file in glob.glob(str(TELEMETRY_DIR / "*.parquet"))]
df = pd.concat(df_list, ignore_index=True).sort_values(by='timestamp').reset_index(drop=True)
if 'cpu_temp_celsius_avg' in df.columns:
    df = df.drop('cpu_temp_celsius_avg', axis=1)
print("Data loaded.")

# --- Load Pre-trained Base Models (Layer 1) ---
model_v2_0 = joblib.load(ARTIFACT_DIR / 'digital_twin_v2.0.joblib')
model_v2_1 = joblib.load(ARTIFACT_DIR / 'digital_twin_v2.1_tuned.joblib')
model_v2_3 = joblib.load(ARTIFACT_DIR / 'digital_twin_v2.3_deep.joblib')
print("All three base models loaded successfully.")

Data loaded.
All three base models loaded successfully.


### 2. Full Feature Generation and Alignment

We must create the three distinct feature sets for our base models and ensure they are perfectly aligned to the same set of data points.

In [2]:
# --- Feature Engineering Functions --- #
def create_v2_0_features(df):
    # This function creates features for v2.0 and v2.1 models
    df_f = df.copy()
    workload_dummies = pd.get_dummies(df_f['workload_type'], prefix='workload')
    df_f = pd.concat([df_f, workload_dummies], axis=1)
    df_f['overall_util_rolling_mean'] = df_f['cpu_util_overall'].rolling(window=10).mean()
    df_f['overall_util_rolling_std'] = df_f['cpu_util_overall'].rolling(window=10).std()
    other_cores = [c for c in df.columns if 'cpu_util_core' in c and c != 'cpu_util_core_0']
    for i in range(1, 6):
        df_f[f'overall_util_lag_{i}'] = df_f['cpu_util_overall'].shift(i)
        for core in other_cores:
            df_f[f'{core}_lag_{i}'] = df_f[core].shift(i)
    return df_f.drop('workload_type', axis=1).dropna().reset_index(drop=True)

def create_v2_3_features(df):
    df_f = df.copy()
    workload_dummies = pd.get_dummies(df_f['workload_type'], prefix='workload')
    df_f = pd.concat([df_f, workload_dummies], axis=1)
    df_f['overall_util_rolling_mean'] = df_f['cpu_util_overall'].rolling(window=30).mean()
    df_f['overall_util_rolling_std'] = df_f['cpu_util_overall'].rolling(window=30).std()
    other_cores = [c for c in df.columns if 'cpu_util_core' in c and c != 'cpu_util_core_0']
    for i in range(1, 21):
        df_f[f'overall_util_lag_{i}'] = df_f['cpu_util_overall'].shift(i)
        if i <= 10:
            for core in other_cores:
                df_f[f'{core}_lag_{i}'] = df_f[core].shift(i)
    return df_f.drop('workload_type', axis=1).dropna().reset_index(drop=True)

# --- Create and Align --- #
df_v2_0 = create_v2_0_features(df)
df_v2_3 = create_v2_3_features(df)
target = 'cpu_util_core_0'

common_indices = df_v2_0.index.intersection(df_v2_3.index)
df_aligned_v2_0 = df_v2_0.loc[common_indices]
df_aligned_v2_3 = df_v2_3.loc[common_indices]

features_v2_0_cols = [c for c in df_aligned_v2_0.columns if ('cpu_util' in c and c != target) or 'workload_' in c]
features_v2_3_cols = [c for c in df_aligned_v2_3.columns if ('cpu_util' in c and c != target) or 'workload_' in c]

X_v2_0 = df_aligned_v2_0[features_v2_0_cols]
X_v2_3 = df_aligned_v2_3[features_v2_3_cols]
y = df_aligned_v2_0[target]

print(f"Data aligned. Final dataset size: {len(y)} rows.")

Data aligned. Final dataset size: 6899 rows.


### 3. Hierarchical Training

This is the core of the process. We will use cross-validation to generate out-of-fold predictions to train each layer of the hierarchy, preventing data leakage.

In [3]:
# Split data into a main training set and a final, unseen hold-out test set
indices = np.arange(len(y))
train_indices, test_indices = train_test_split(indices, test_size=0.2, random_state=42)

print("--- Stage 1: Generating Layer 1 Predictions ---")
# Use cross_val_predict to get out-of-fold predictions for the training set
layer1_preds_train_v2_0 = cross_val_predict(model_v2_0, X_v2_0.iloc[train_indices], y.iloc[train_indices], cv=5, n_jobs=-1)
layer1_preds_train_v2_1 = cross_val_predict(model_v2_1, X_v2_0.iloc[train_indices], y.iloc[train_indices], cv=5, n_jobs=-1)
layer1_preds_train_v2_3 = cross_val_predict(model_v2_3, X_v2_3.iloc[train_indices], y.iloc[train_indices], cv=5, n_jobs=-1)

# Create the training set for Layer 2
X_meta_train_L2 = pd.DataFrame({
    'pred_v2_0': layer1_preds_train_v2_0,
    'pred_v2_1': layer1_preds_train_v2_1,
    'pred_v2_3': layer1_preds_train_v2_3
})
y_train_L2 = y.iloc[train_indices]

print("--- Stage 2: Training Layer 2 Meta-Models ---")
meta_model_X = RidgeCV()
meta_model_Y = lgb.LGBMRegressor(random_state=42, n_jobs=-1)

# Get out-of-fold predictions for Layer 2 to train Layer 3
layer2_preds_train_X = cross_val_predict(meta_model_X, X_meta_train_L2, y_train_L2, cv=5, n_jobs=-1)
layer2_preds_train_Y = cross_val_predict(meta_model_Y, X_meta_train_L2, y_train_L2, cv=5, n_jobs=-1)

# Create the training set for Layer 3
X_meta_train_L3 = pd.DataFrame({
    'pred_meta_X': layer2_preds_train_X,
    'pred_meta_Y': layer2_preds_train_Y
})
y_train_L3 = y_train_L2

print("--- Stage 3: Training Layer 3 Unifying Model ---")
unifying_model = lgb.LGBMRegressor(**model_v2_3.get_params()) # Use the powerful deep history params
unifying_model.fit(X_meta_train_L3, y_train_L3)

print("Hierarchical training complete.")

--- Stage 1: Generating Layer 1 Predictions ---
--- Stage 2: Training Layer 2 Meta-Models ---
--- Stage 3: Training Layer 3 Unifying Model ---
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000081 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 510
[LightGBM] [Info] Number of data points in the train set: 5519, number of used features: 2
Hierarchical training complete.


### 4. Final Evaluation of the Full Hierarchy

Now we will pass our unseen hold-out test set through the entire trained pipeline to get our final prediction and score.

In [4]:
print("--- Evaluating on hold-out test set ---")

# Layer 1 Predictions on Test Set
layer1_preds_test_v2_0 = model_v2_0.predict(X_v2_0.iloc[test_indices])
layer1_preds_test_v2_1 = model_v2_1.predict(X_v2_0.iloc[test_indices])
layer1_preds_test_v2_3 = model_v2_3.predict(X_v2_3.iloc[test_indices])
X_meta_test_L2 = pd.DataFrame({
    'pred_v2_0': layer1_preds_test_v2_0,
    'pred_v2_1': layer1_preds_test_v2_1,
    'pred_v2_3': layer1_preds_test_v2_3
})

# Layer 2 Predictions on Test Set (Need to train the L2 models first on FULL L1 training data)
meta_model_X.fit(X_meta_train_L2, y_train_L2)
meta_model_Y.fit(X_meta_train_L2, y_train_L2)
layer2_preds_test_X = meta_model_X.predict(X_meta_test_L2)
layer2_preds_test_Y = meta_model_Y.predict(X_meta_test_L2)
X_meta_test_L3 = pd.DataFrame({
    'pred_meta_X': layer2_preds_test_X,
    'pred_meta_Y': layer2_preds_test_Y
})

# Layer 3 (Final) Prediction
final_predictions = unifying_model.predict(X_meta_test_L3)

# The true values for our test set
y_test = y.iloc[test_indices]

r2 = r2_score(y_test, final_predictions)

print(f"\n--- Final Hierarchical Ensemble Performance ---")
print(f"R-squared (R²): {r2:.4f}")

if r2 >= 0.85:
    print("\nMISSION ACCOMPLISHED: The Hierarchical Ensemble has broken the 85% barrier!")
elif r2 > 0.7436:
    print("\nULTIMATE BREAKTHROUGH: The hierarchy is superior to all previous models.")
else:
    print("\nLIMIT REACHED: The hierarchy did not improve upon the simpler ensemble. The V2.5 result remains the peak.")

--- Evaluating on hold-out test set ---
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000109 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 765
[LightGBM] [Info] Number of data points in the train set: 5519, number of used features: 3
[LightGBM] [Info] Start training from score 4.857456

--- Final Hierarchical Ensemble Performance ---
R-squared (R²): 0.7183

LIMIT REACHED: The hierarchy did not improve upon the simpler ensemble. The V2.5 result remains the peak.
