# Digital Twin v2.5: Stacking Ensemble

**Author:** Kian Mansouri Jamshidi
**Project Director:** Kian Mansouri Jamshidi
**Date:** 2025-09-27

## Objective
This is the final and most advanced modeling experiment of Sprint 5. We will construct a **Stacking Ensemble**, a 'network of AIs', to learn how to best combine the predictions from our two strongest models: the robust **V2.0** and the complex **V2.3 (Deep History)**.

Instead of a simple weighted average, a final 'meta-model' will be trained on the outputs of our base models. This is our definitive attempt to synthesize their strengths and achieve the highest possible R² score.

### 1. Imports and Full Data Preparation

We begin by loading the data and the two pre-trained champion models.

In [1]:
import pandas as pd
import numpy as np
import glob
import joblib
from pathlib import Path

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import RidgeCV # A good choice for a meta-model
import lightgbm as lgb

# --- Load Data ---
PROJECT_ROOT = Path('.').resolve().parent
TELEMETRY_DIR = PROJECT_ROOT / 'data' / 'telemetry_v2'
ARTIFACT_DIR = PROJECT_ROOT / 'artifacts' / 'phase2'
df_list = [pd.read_parquet(file) for file in glob.glob(str(TELEMETRY_DIR / "*.parquet"))]
df = pd.concat(df_list, ignore_index=True).sort_values(by='timestamp').reset_index(drop=True)
if 'cpu_temp_celsius_avg' in df.columns:
    df = df.drop('cpu_temp_celsius_avg', axis=1)
print("Data loaded.")

# --- Load Pre-trained Models ---
model_v2_0 = joblib.load(ARTIFACT_DIR / 'digital_twin_v2.0.joblib')
model_v2_3 = joblib.load(ARTIFACT_DIR / 'digital_twin_v2.3_deep.joblib')
print("Base models v2.0 and v2.3 loaded.")

Data loaded.
Base models v2.0 and v2.3 loaded.


### 2. Feature and Target Preparation

We need to create the two different feature sets required by our two base models. Crucially, they must be perfectly aligned on the same set of rows.

In [2]:
# --- Feature Engineering Functions (from previous notebooks) --- #
def create_v2_features(input_df):
    df_featured = input_df.copy()
    workload_dummies = pd.get_dummies(df_featured['workload_type'], prefix='workload')
    df_featured = pd.concat([df_featured, workload_dummies], axis=1)
    df_featured['overall_util_rolling_mean'] = df_featured['cpu_util_overall'].rolling(window=10).mean()
    df_featured['overall_util_rolling_std'] = df_featured['cpu_util_overall'].rolling(window=10).std()
    other_core_cols = [c for c in input_df.columns if 'cpu_util_core' in c and c != 'cpu_util_core_0']
    for i in range(1, 6):
        df_featured[f'overall_util_lag_{i}'] = df_featured['cpu_util_overall'].shift(i)
        for core_col in other_core_cols:
            df_featured[f'{core_col}_lag_{i}'] = df_featured[core_col].shift(i)
    df_model = df_featured.drop('workload_type', axis=1).dropna().reset_index(drop=True)
    return df_model

def create_v2_3_features(input_df):
    df_featured = input_df.copy()
    workload_dummies = pd.get_dummies(df_featured['workload_type'], prefix='workload')
    df_featured = pd.concat([df_featured, workload_dummies], axis=1)
    df_featured['overall_util_rolling_mean'] = df_featured['cpu_util_overall'].rolling(window=30).mean()
    df_featured['overall_util_rolling_std'] = df_featured['cpu_util_overall'].rolling(window=30).std()
    other_core_cols = [c for c in input_df.columns if 'cpu_util_core' in c and c != 'cpu_util_core_0']
    for i in range(1, 21):
        df_featured[f'overall_util_lag_{i}'] = df_featured['cpu_util_overall'].shift(i)
        if i <= 10:
            for core_col in other_core_cols:
                df_featured[f'{core_col}_lag_{i}'] = df_featured[core_col].shift(i)
    df_model = df_featured.drop('workload_type', axis=1).dropna().reset_index(drop=True)
    return df_model

# --- Create and Align the Two Feature Sets ---
df_v2_0 = create_v2_features(df)
df_v2_3 = create_v2_3_features(df)
target = 'cpu_util_core_0'

common_indices = df_v2_0.index.intersection(df_v2_3.index)
df_v2_0_aligned = df_v2_0.loc[common_indices]
df_v2_3_aligned = df_v2_3.loc[common_indices]

features_v2_0_cols = [c for c in df_v2_0_aligned.columns if ('cpu_util' in c and c != target) or 'workload_' in c]
features_v2_3_cols = [c for c in df_v2_3_aligned.columns if ('cpu_util' in c and c != target) or 'workload_' in c]

X = df_v2_0_aligned[features_v2_0_cols] # Use one dataframe's features as the base
y = df_v2_0_aligned[target]

print(f"Data aligned. Final dataset size for stacking: {X.shape[0]} rows.")

Data aligned. Final dataset size for stacking: 6899 rows.


### 3. Build and Train the Stacking Ensemble

We will now define the ensemble. Our two pre-trained LightGBM models will be the 'base estimators'. A simple but powerful regularized linear regression (`RidgeCV`) will be the 'final estimator' or 'meta-model'. It will learn the optimal way to combine the outputs of the base models.

In [3]:
# The StackingRegressor requires a list of estimators.
# Since our models require different feature sets, we can't use the simple API.
# We will perform the stacking manually, which gives us full control.

print("Performing manual stacking...")

# Split the aligned data into training and testing sets
X_train_v2_0, X_test_v2_0, y_train, y_test = train_test_split(X[features_v2_0_cols], y, test_size=0.2, random_state=42)
X_train_v2_3, X_test_v2_3, _, _ = train_test_split(df_v2_3_aligned[features_v2_3_cols], y, test_size=0.2, random_state=42)

# 1. Get predictions from base models on the test set
print("Generating predictions from base models...")
pred_test_v2_0 = model_v2_0.predict(X_test_v2_0)
pred_test_v2_3 = model_v2_3.predict(X_test_v2_3)

# 2. Create the training set for the meta-model
# The features for the meta-model are the predictions of the base models
pred_train_v2_0 = model_v2_0.predict(X_train_v2_0)
pred_train_v2_3 = model_v2_3.predict(X_train_v2_3)

X_meta_train = pd.DataFrame({
    'pred_v2_0': pred_train_v2_0,
    'pred_v2_3': pred_train_v2_3
})

# 3. Train the meta-model
print("Training the meta-model...")
meta_model = RidgeCV()
meta_model.fit(X_meta_train, y_train)

# 4. Create the test set for the meta-model and make final predictions
X_meta_test = pd.DataFrame({
    'pred_v2_0': pred_test_v2_0,
    'pred_v2_3': pred_test_v2_3
})

final_predictions = meta_model.predict(X_meta_test)
print("Stacking complete.")

Performing manual stacking...
Generating predictions from base models...
Training the meta-model...
Stacking complete.


### 4. Final Evaluation of the Stacking Ensemble

In [4]:
r2 = r2_score(y_test, final_predictions)

print(f"--- Final Stacking Ensemble Performance ---")
print(f"R-squared (R²): {r2:.4f}")

if r2 >= 0.85:
    print("\nMISSION ACCOMPLISHED: The stacking ensemble has broken the 85% barrier!")
elif r2 > 0.71:
    print("\nBREAKTHROUGH: The stacking ensemble provided a significant performance boost over any single model.")
else:
    print("\nLIMIT REACHED: Even the stacking ensemble could not improve performance. The V2.0 model remains the champion.")

# Get the coefficients (weights) learned by the meta-model
meta_weights = meta_model.coef_
print("\nLearned Meta-Model Weights:")
print(f"Weight for V2.0 Model: {meta_weights[0]:.4f}")
print(f"Weight for V2.3 Model: {meta_weights[1]:.4f}")

--- Final Stacking Ensemble Performance ---
R-squared (R²): 0.7436

BREAKTHROUGH: The stacking ensemble provided a significant performance boost over any single model.

Learned Meta-Model Weights:
Weight for V2.0 Model: 1.0579
Weight for V2.3 Model: 0.0300
