# Digital Twin — Jet Engine Health & RUL (CMAPSS FD001–FD004)

**Author:** Shreyas Gowda B  
**Purpose:** A production‑ready notebook that loads NASA **CMAPSS** turbofan datasets (or synthesizes data),
builds online features, trains an RUL model with uncertainty, and provides a live streaming simulator.

This version is hardened to work with your file naming like `train_FD001`, `test_FD001`, `RUL_FD001` (with or without `.txt`).


## 0. Environment Setup

### What this cell does

- Imports libraries: `sys` for data handling, ML, plotting, and utilities.
- Defines helper functions: `ensure`.
- Produces visualizations (e.g., true vs predicted RUL, residual plots, or feature distributions).
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
import sys, subprocess, importlib
def ensure(pkg, name=None):
    try:
        importlib.import_module(name or pkg)
    except Exception:
        print(f"Installing {pkg}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])

for p in ["pandas","numpy","scikit-learn","plotly","ipywidgets","tqdm","joblib","pyarrow"]:
    ensure(p)
print("Environment ready.")

## 1. Imports & Global Config

### What this cell does

- Imports libraries: `datetime, joblib, numpy, os, pandas, plotly.graph_objects, plotly.subplots, sklearn.ensemble, sklearn.metrics, sklearn.model_selection, sklearn.preprocessing, tqdm` for data handling, ML, plotting, and utilities.
- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Fits a feature scaler (e.g., `StandardScaler`) to normalize inputs consistently across train/validation/inference.
- Trains a regression model for RUL prediction (e.g., `GradientBoostingRegressor`).
- Splits data into training/validation (or uses cross-validation) to estimate generalization performance.
- Computes evaluation metrics (MAE/RMSE/R²) to quantify prediction accuracy.
- Produces visualizations (e.g., true vs predicted RUL, residual plots, or feature distributions).
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
import os, glob, math, json, time, random, warnings
from datetime import datetime
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from tqdm import tqdm
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
from joblib import dump
warnings.filterwarnings('ignore')

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED); random.seed(RANDOM_SEED)

# --- Set these for your machine ---
DATA_ROOT = os.environ.get('CMAPSS_DIR', './CMAPSS')  # folder containing train_FD001, test_FD001, RUL_FD001
DATASET   = os.environ.get('CMAPSS_SUBSET', 'FD001')  # FD001 | FD002 | FD003 | FD004
# ----------------------------------

ARTIFACT_DIR = './artifacts_v4'
os.makedirs(ARTIFACT_DIR, exist_ok=True)

USE_ROBUST_SCALER = True
RUL_CAP = 125
SEQ_WINDOW = 30
USE_SYNTH_IF_MISSING = True

print({'DATA_ROOT': DATA_ROOT, 'DATASET': DATASET, 'RUL_CAP': RUL_CAP, 'SEQ_WINDOW': SEQ_WINDOW})

## 2. Loader (handles `train_FD001` / `test_FD001` / `RUL_FD001`)

### What this cell does

- Loads dataset files (e.g., CMAPSS `train_FDxxx.txt`, `test_FDxxx.txt`, `RUL_FDxxx.txt`) into Pandas DataFrames.
- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Defines helper functions: `_pick_one, _resolve_paths, _read_cmapss, load_cmapss_flexible, _map_cols, synth_cmapss_like`.
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
def _pick_one(candidates):
    for p in candidates:
        if os.path.exists(p):
            return p
        hits = glob.glob(p)
        if hits:
            return hits[0]
    return None

def _resolve_paths(data_root: str, subset: str):
    train_candidates = [
        os.path.join(data_root, f"train_{subset}.txt"),
        os.path.join(data_root, f"train_{subset}"),
        os.path.join(data_root, f"{subset}_train.txt"),
        os.path.join(data_root, f"{subset}_train"),
    ]
    test_candidates = [
        os.path.join(data_root, f"test_{subset}.txt"),
        os.path.join(data_root, f"test_{subset}"),
        os.path.join(data_root, f"{subset}_test.txt"),
        os.path.join(data_root, f"{subset}_test"),
    ]
    rul_candidates = [
        os.path.join(data_root, f"RUL_{subset}.txt"),
        os.path.join(data_root, f"RUL_{subset}"),
        os.path.join(data_root, f"{subset}_RUL.txt"),
        os.path.join(data_root, f"{subset}_RUL"),
    ]
    tr = _pick_one(train_candidates)
    te = _pick_one(test_candidates)
    ru = _pick_one(rul_candidates)
    return tr, te, ru

def _read_cmapss(path: str) -> pd.DataFrame:
    # CMAPSS uses variable spaces -> \s+ ; header absent
    return pd.read_csv(path, sep=r"\s+", header=None)

def load_cmapss_flexible(data_root: str, subset: str):
    tr, te, ru = _resolve_paths(data_root, subset)
    print("Resolved paths:")
    print("  train ->", tr)
    print("  test  ->", te)
    print("  RUL   ->", ru)
    if not (tr and te and ru):
        return None, None, None
    df_tr = _read_cmapss(tr)
    df_te = _read_cmapss(te)
    rul   = _read_cmapss(ru).iloc[:,0]
    # Map columns: engine_id, cycle, setting1..3, s1..sK (K typically 26)
    def _map_cols(df):
        n = df.shape[1]
        base = ["engine_id","cycle"] + [f"setting{i}" for i in range(1,4)]
        k = n - len(base)
        sensors = [f"s{i}" for i in range(1, k+1)]
        m = df.copy(); m.columns = base + sensors
        return m, sensors
    df_tr, s_tr = _map_cols(df_tr)
    df_te, s_te = _map_cols(df_te)
    assert s_tr == s_te, "Train/Test sensor schema mismatch"
    return df_tr, df_te, rul

# Try load; else synthesize
train_df, test_df, rul_series = load_cmapss_flexible(DATA_ROOT, DATASET)
if train_df is None:
    if not USE_SYNTH_IF_MISSING:
        raise FileNotFoundError("CMAPSS files not found. Set DATA_ROOT correctly or enable synthesis.")
    print(f"⚠️ {DATASET} not found in {os.path.abspath(DATA_ROOT)} — generating synthetic data...")
    # Synthetic CMAPSS-like generator
    def synth_cmapss_like(n_engines=120, max_cycles=(180, 320), n_sensors=26):
        rows = []
        for eid in range(1, n_engines+1):
            T = int(np.random.randint(*max_cycles))
            s1 = np.clip(np.cumsum(np.random.randn(T)/100), -1, 1)
            s2 = np.clip(np.cumsum(np.random.randn(T)/150), -1, 1)
            s3 = np.clip(np.cumsum(np.random.randn(T)/200), -1, 1)
            base = np.random.uniform(0.2, 0.8, size=n_sensors)
            slope = np.random.uniform(-0.003, 0.003, size=n_sensors)
            slope[1] = abs(slope[1]) + 0.004   # temp-like rises
            slope[9] = -abs(slope[9]) - 0.004  # pressure-like falls
            for t in range(1, T+1):
                sensors = base + slope*t + 0.02*np.random.randn(n_sensors)
                rows.append([eid,t,s1[t-1],s2[t-1],s3[t-1],*sensors])
        cols = ["engine_id","cycle"]+[f"setting{i}" for i in range(1,4)]+[f"s{i}" for i in range(1,n_sensors+1)]
        return pd.DataFrame(rows, columns=cols)
    train_df = synth_cmapss_like()
    test_df = None; rul_series = None

print("✅ train_df:", train_df.shape)
train_df.head()

## 3. RUL Labeling & Rolling Features

### What this cell does

- Defines helper functions: `add_rul, rolling_features, _hi`.
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
def add_rul(df: pd.DataFrame, cap: int = RUL_CAP):
    df = df.copy()
    mx = df.groupby('engine_id')['cycle'].max().rename('max_cycle')
    df = df.merge(mx, on='engine_id', how='left')
    df['RUL'] = df['max_cycle'] - df['cycle']
    if cap is not None:
        df['RUL'] = df['RUL'].clip(upper=cap)
    return df.drop(columns=['max_cycle'])

train_df = add_rul(train_df, RUL_CAP)
sensor_cols = [c for c in train_df.columns if c.startswith('s')]
KEY_SENSORS = sensor_cols[:10]  # change to use all sensors if you want

def rolling_features(df: pd.DataFrame, window:int=SEQ_WINDOW):
    df = df.sort_values(['engine_id','cycle']).copy()
    for c in KEY_SENSORS:
        df[f'{c}_ma']   = df.groupby('engine_id')[c].transform(lambda x: x.rolling(window, min_periods=3).mean())
        df[f'{c}_std']  = df.groupby('engine_id')[c].transform(lambda x: x.rolling(window, min_periods=3).std())
        df[f'{c}_diff'] = df.groupby('engine_id')[c].diff()
    # Health Index: per-engine z-norm then average; invert sign so higher=worse
    def _hi(g):
        arr = g[KEY_SENSORS].values
        if arr.shape[0] < 5:
            return pd.Series([np.nan]*len(g), index=g.index)
        z = (arr - np.nanmean(arr, axis=0)) / (np.nanstd(arr, axis=0)+1e-6)
        return pd.Series(-np.nanmean(z, axis=1), index=g.index)
    df['HI'] = df.groupby('engine_id', group_keys=False).apply(_hi)
    return df

feat_df = rolling_features(train_df, window=SEQ_WINDOW)
feat_df.head(3)

## 4. Engine-wise Split & Scaling

### What this cell does

- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Fits a feature scaler (e.g., `StandardScaler`) to normalize inputs consistently across train/validation/inference.
- Splits data into training/validation (or uses cross-validation) to estimate generalization performance.
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
engines = feat_df['engine_id'].unique()
eng_tr, eng_va = train_test_split(engines, test_size=0.25, random_state=RANDOM_SEED)
df_tr = feat_df[feat_df.engine_id.isin(eng_tr)].copy()
df_va = feat_df[feat_df.engine_id.isin(eng_va)].copy()

feature_cols = []
for c in KEY_SENSORS:
    feature_cols += [c, f'{c}_ma', f'{c}_std', f'{c}_diff']
feature_cols += ['HI']

X_tr = df_tr[feature_cols].fillna(method='ffill').fillna(method='bfill').fillna(0)
y_tr = df_tr['RUL']
X_va = df_va[feature_cols].fillna(method='ffill').fillna(method='bfill').fillna(0)
y_va = df_va['RUL']

scaler = RobustScaler() if USE_ROBUST_SCALER else StandardScaler()
X_tr_s = scaler.fit_transform(X_tr)
X_va_s = scaler.transform(X_va)
dump(scaler, os.path.join(ARTIFACT_DIR,'scaler.joblib'))
print('Shapes:', X_tr.shape, X_va.shape)

## 5. Train Gradient Boosting (point + quantiles)

### What this cell does

- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Trains a regression model for RUL prediction (e.g., `GradientBoostingRegressor`).
- Trains quantile/interval models (e.g., P10 lower, P90 upper) to provide uncertainty bounds.
- Fits models and performs predictions on validation or test data.
- Computes evaluation metrics (MAE/RMSE/R²) to quantify prediction accuracy.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
gb_point = GradientBoostingRegressor(random_state=RANDOM_SEED, n_estimators=300, learning_rate=0.05, max_depth=3)
gb_point.fit(X_tr_s, y_tr)
dump(gb_point, os.path.join(ARTIFACT_DIR,'gb_point.joblib'))

gb_lo = GradientBoostingRegressor(loss='quantile', alpha=0.1, random_state=RANDOM_SEED, n_estimators=300, learning_rate=0.05, max_depth=3)
gb_hi = GradientBoostingRegressor(loss='quantile', alpha=0.9, random_state=RANDOM_SEED, n_estimators=300, learning_rate=0.05, max_depth=3)
gb_lo.fit(X_tr_s, y_tr); gb_hi.fit(X_tr_s, y_tr)
dump(gb_lo, os.path.join(ARTIFACT_DIR,'gb_lo_p10.joblib'))
dump(gb_hi, os.path.join(ARTIFACT_DIR,'gb_hi_p90.joblib'))

pred_va = gb_point.predict(X_va_s)
rmse = math.sqrt(mean_squared_error(y_va, pred_va))
mae  = mean_absolute_error(y_va, pred_va)
print({'val_RMSE': round(rmse,2), 'val_MAE': round(mae,2)})

### Validation Plot: True vs Predicted RUL

### What this cell does

- Produces visualizations (e.g., true vs predicted RUL, residual plots, or feature distributions).
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_va.values, y=pred_va, mode='markers', name='Val', opacity=0.55))
mx = max(y_va.max(), pred_va.max())
fig.add_trace(go.Scatter(x=[0,mx], y=[0,mx], mode='lines', name='Ideal', line=dict(dash='dash')))
fig.update_layout(template='plotly_dark', title=f'{DATASET}: True vs Pred RUL', xaxis_title='True RUL', yaxis_title='Pred RUL')
fig.show()

## 6. Test-Set Evaluation (uses RUL vector if available)

### What this cell does

- Defines helper functions: `build_feats, _hi`.
- Fits models and performs predictions on validation or test data.
- Computes evaluation metrics (MAE/RMSE/R²) to quantify prediction accuracy.
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
if test_df is not None and rul_series is not None:
    # Build features on the full history, then score last cycle per engine
    def build_feats(df):
        df = df.sort_values(['engine_id','cycle']).copy()
        for c in KEY_SENSORS:
            df[f'{c}_ma']   = df.groupby('engine_id')[c].transform(lambda x: x.rolling(SEQ_WINDOW, min_periods=3).mean())
            df[f'{c}_std']  = df.groupby('engine_id')[c].transform(lambda x: x.rolling(SEQ_WINDOW, min_periods=3).std())
            df[f'{c}_diff'] = df.groupby('engine_id')[c].diff()
        def _hi(g):
            arr = g[KEY_SENSORS].values
            if arr.shape[0] < 5:
                return pd.Series([np.nan]*len(g), index=g.index)
            z = (arr - np.nanmean(arr, axis=0)) / (np.nanstd(arr, axis=0)+1e-6)
            return pd.Series(-np.nanmean(z, axis=1), index=g.index)
        df['HI'] = df.groupby('engine_id', group_keys=False).apply(_hi)
        return df
    test_feat = build_feats(test_df)
    last = test_feat.sort_values(['engine_id','cycle']).groupby('engine_id').tail(1)
    Xt = last[feature_cols].fillna(method='ffill').fillna(method='bfill').fillna(0)
    Xts = scaler.transform(Xt)
    y_pred_test = gb_point.predict(Xts)
    y_true_test = rul_series.values.astype(float)
    if len(y_true_test) == len(y_pred_test):
        trmse = math.sqrt(mean_squared_error(y_true_test, y_pred_test))
        tmae  = mean_absolute_error(y_true_test, y_pred_test)
        print({'test_RMSE': round(trmse,2), 'test_MAE': round(tmae,2)})
    else:
        print("⚠️ Length mismatch: cannot compute test metrics.")
else:
    print("(No test set available — skipped)")

## 7. Streaming Simulator (Plotly + ipywidgets)

### What this cell does

- Imports libraries: `IPython.display, collections, ipywidgets, numpy, pandas, plotly.graph_objects, plotly.subplots, time` for data handling, ML, plotting, and utilities.
- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Defines helper functions: `predict_with_interval, __init__, update, features, stream, tail, _on_start, _on_stop`.
- Fits models and performs predictions on validation or test data.
- Produces visualizations (e.g., true vs predicted RUL, residual plots, or feature distributions).
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
# --- Cell 1: Start/Stop controls + streaming logic (no plots) ---

import time
import numpy as np
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

# Safety: required globals must exist
_needed = ["train_df","scaler","gb_point","gb_lo","gb_hi","feature_cols","KEY_SENSORS","SEQ_WINDOW"]
_missing = [n for n in _needed if n not in globals()]
assert not _missing, f"Missing in notebook: {', '.join(_missing)}"

# ---------- Prediction + features ----------
def predict_with_interval(feat_row: dict):
    x  = pd.DataFrame([feat_row])[feature_cols].fillna(0)
    xs = scaler.transform(x)
    p  = float(gb_point.predict(xs)[0])
    lo = float(gb_lo.predict(xs)[0])
    hi = float(gb_hi.predict(xs)[0])
    return p, lo, hi

class OnlineState:
    def __init__(self, window:int=SEQ_WINDOW):
        from collections import deque
        self.hist = deque(maxlen=window)
        self.window = window
    def update(self, row):
        self.hist.append(row)
        return self.features()
    def features(self):
        if len(self.hist) < 2:
            return None
        h = pd.DataFrame(list(self.hist))
        feats = {}
        for c in KEY_SENSORS:
            s = h[c].astype(float)
            feats[c] = float(s.iloc[-1])
            feats[f"{c}_ma"]  = float(s.mean())
            std_val = float(s.std())
            feats[f"{c}_std"] = std_val if std_val == std_val else 0.0
            feats[f"{c}_diff"] = float(s.iloc[-1] - s.iloc[-2]) if len(s) > 1 else 0.0
        arr = h[KEY_SENSORS].to_numpy(dtype=float)
        mu  = np.nanmean(arr, axis=0)
        sd  = np.nanstd(arr, axis=0) + 1e-6
        z   = (arr - mu) / sd
        feats["HI"] = -float(np.nanmean(z))
        return feats

def build_series_for_engine(engine_id:int) -> pd.DataFrame:
    return (train_df.loc[train_df.engine_id == engine_id]
            .sort_values('cycle')
            .reset_index(drop=True))

# ---------- Hooks that the plot cell will define ----------
# The plot cell must assign these symbols to real functions/objects
PLOT_HOOKS = {
    "on_status": None,   # fn(str) -> None
    "on_step":   None,   # fn(cycle, s2, hi, p, lo, hi, lastn) -> None
    "on_reset":  None,   # fn() -> None      (clear traces/buffers)
    "set_max":   None,   # fn(n:int) -> None (progress max)
    "set_val":   None,   # fn(v:int) -> None (progress value)
}

# ---------- Widgets (controls shown first, as requested) ----------
default_engine = int(train_df['engine_id'].value_counts().idxmax())
engine_id_w = widgets.BoundedIntText(
    value=default_engine,
    min=int(train_df.engine_id.min()),
    max=int(train_df.engine_id.max()),
    description='Engine:',
    layout=widgets.Layout(width="200px")
)
speed_w  = widgets.Dropdown(options=[('1×',1),('5×',5),('20×',20)], value=5, description='Speed:')
lastn_w  = widgets.IntSlider(min=50, max=500, step=10, value=200, description='Last N:')
start_b  = widgets.Button(description='Start', button_style='success', icon='play')
stop_b   = widgets.Button(description='Stop',  button_style='warning', icon='stop')
status_o = widgets.HTML("<b>Status:</b> idle")
progress = widgets.IntProgress(min=0, max=100, value=0, bar_style='')

controls = widgets.HBox([engine_id_w, speed_w, lastn_w, start_b, stop_b])
display(controls, widgets.HBox([status_o, widgets.Label("  "), progress]))

# ---------- Streaming loop (no plotting here) ----------
running = {"flag": False}

def _emit_status(msg: str):
    status_o.value = f"<b>Status:</b> {msg}"
    if callable(PLOT_HOOKS["on_status"]):
        PLOT_HOOKS["on_status"](msg)

def stream_series(edf: pd.DataFrame, window:int, speed:int, lastn:int):
    nrows = len(edf)
    if nrows < 3:
        _emit_status(f"Not enough cycles to stream (n={nrows}).")
        return
    if callable(PLOT_HOOKS["set_max"]):
        PLOT_HOOKS["set_max"](nrows)
    if callable(PLOT_HOOKS["on_reset"]):
        PLOT_HOOKS["on_reset"]()

    state = OnlineState(window=window)
    running['flag'] = True
    _emit_status(f"streaming ({nrows} cycles)")

    step_count = 0
    for _, row in edf.iterrows():
        if not running['flag']:
            _emit_status("stopped.")
            break
        feats = state.update(row)
        step_count += 1
        if callable(PLOT_HOOKS["set_val"]):
            PLOT_HOOKS["set_val"](step_count)
        if feats is None:
            time.sleep(max(0.01, 0.2/float(speed)))
            continue

        p, lo, hi = predict_with_interval(feats)
        s2_val = float(row.get('s2', np.nan)) if 's2' in row else np.nan
        cycle  = int(row['cycle'])

        if callable(PLOT_HOOKS["on_step"]):
            PLOT_HOOKS["on_step"](cycle, s2_val, float(feats["HI"]), float(p), float(lo), float(hi), lastn)

        time.sleep(max(0.01, 0.2/float(speed)))

    if running['flag']:
        _emit_status("finished.")
    running['flag'] = False

def _on_start(_):
    if running['flag']:
        return
    eid   = int(engine_id_w.value)
    speed = int(speed_w.value)
    lastn = int(lastn_w.value)
    edf   = build_series_for_engine(eid)
    try:
        stream_series(edf=edf, window=int(SEQ_WINDOW), speed=speed, lastn=lastn)
    except Exception as e:
        running['flag'] = False
        _emit_status(f"error: {e}")

def _on_stop(_):
    running['flag'] = False
    _emit_status("stop requested (will stop at next step).")

start_b.on_click(_on_start)
stop_b.on_click(_on_stop)


In [None]:
# --- Cell 2: Plot creation + hook wiring (run after Cell 1) ---

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display

# Create the figure
fig = make_subplots(
    rows=3, cols=1, shared_xaxes=True, vertical_spacing=0.07,
    subplot_titles=("Sensor s2", "Health Index (HI)", "Predicted RUL (10–90%)")
)
fig.add_trace(go.Scatter(mode='lines', name='s2'), row=1, col=1)
fig.add_trace(go.Scatter(mode='lines', name='HI'), row=2, col=1)
fig.add_trace(go.Scatter(mode='lines', name='RUL'), row=3, col=1)
fig.add_trace(go.Scatter(mode='lines', name='RUL lo', line=dict(dash='dot')), row=3, col=1)
fig.add_trace(go.Scatter(mode='lines', name='RUL hi', line=dict(dash='dot')), row=3, col=1)
fig.update_layout(height=700, template='plotly_dark', showlegend=True)
fw = go.FigureWidget(fig)
display(fw)

# Local buffers
xs, s2s, his, pr, prl, prh = [], [], [], [], [], []

# Progress and status come from Cell 1's widgets if present
# Otherwise, define no-op fallbacks
def _noop(*args, **kwargs):
    pass

# Hook implementations
def _on_reset():
    xs.clear(); s2s.clear(); his.clear(); pr.clear(); prl.clear(); prh.clear()
    with fw.batch_update():
        for tr in fw.data:
            tr.x = []; tr.y = []

def _on_status(msg: str):
    # If Cell 1's status_o exists, it already shows. No need to duplicate.
    _ = msg  # placeholder, customize if you want a second status display here.

def _on_step(cycle, s2_val, hi_val, p, lo, hi, lastn):
    xs.append(cycle); s2s.append(s2_val); his.append(hi_val); pr.append(p); prl.append(lo); prh.append(hi)
    def tail(a): return a[-lastn:]
    with fw.batch_update():
        fw.data[0].x = tail(xs); fw.data[0].y = tail(s2s)
        fw.data[1].x = tail(xs); fw.data[1].y = tail(his)
        fw.data[2].x = tail(xs); fw.data[2].y = tail(pr)
        fw.data[3].x = tail(xs); fw.data[3].y = tail(prl)
        fw.data[4].x = tail(xs); fw.data[4].y = tail(prh)

# Progress wiring (use Cell 1 widgets if available)
try:
    def _set_max(n:int):
        progress.max = n
    def _set_val(v:int):
        progress.value = v
except NameError:
    _set_max = _noop
    _set_val = _noop

# Register hooks
PLOT_HOOKS["on_reset"] = _on_reset
PLOT_HOOKS["on_status"] = _on_status
PLOT_HOOKS["on_step"] = _on_step
PLOT_HOOKS["set_max"] = _set_max
PLOT_HOOKS["set_val"] = _set_val


## 8. Save Artifacts & Model Card

### What this cell does

- Resolves filesystem paths (e.g., `DATA_ROOT`, `DATASET`) and locates data files robustly.
- Computes evaluation metrics (MAE/RMSE/R²) to quantify prediction accuracy.
- Writes a `model_card.json` with dataset, features, metrics, and training configuration metadata.
- Engineers rolling-window features: `*_ma`, `*_std`, `*_diff`, plus a composite Health Index (HI) from z-scores.
- All code remains unchanged; this cell-level note was added for clarity and maintainability.

In [None]:
card = {
  'project': 'Digital Twin — CMAPSS RUL',
  'subset': DATASET,
  'timestamp': datetime.utcnow().isoformat()+'Z',
  'schema': 'engine_id, cycle, settings(3), sensors(s1..sK)',
  'window': SEQ_WINDOW,
  'rul_cap': RUL_CAP,
  'features_used': feature_cols,
  'metrics': {'val_RMSE': float(round(rmse,3)), 'val_MAE': float(round(mae,3))}
}
with open(os.path.join(ARTIFACT_DIR,'model_card.json'),'w') as f:
    json.dump(card, f, indent=2)
print('Artifacts in', ARTIFACT_DIR, ':', os.listdir(ARTIFACT_DIR))


---

**Made with ❤️ by Shreyas Gowda.**  
Explanatory comments were added with the help of GPT to provide detailed, teacher-style documentation.
