# 03 — Risk Scoring & Dashboard Exports

This notebook turns the engineered datasets and baseline models produced in **02_feature_engineering_and_labels.ipynb** into **business-ready outputs**:

- **Risk scoring** at the appropriate operational level (asset-hour and asset-day)
- A **Top-N daily alert feed** (budgeted triage: *top 5 assets/day*)
- **Interpretability artifacts** that explain *why* an alert fired (top coefficient contributions)
- Clean **dashboard export tables** suitable for BI tools (Power BI/Tableau/Looker) and API serving

## Inputs (from Notebook 02)

We expect one or more run folders under:

`data/processed/feature_engineering/<RUN_ID>/`

Where `<RUN_ID>` looks like `YYYYMMDDThhmmssZ`.  
This notebook will automatically locate the **most recent run folder** that contains the required artifacts.

Key artifacts used (when present):
- `panel_asset_hour_future_incident.parquet`
- `panel_preprocess.joblib`
- `panel_feature_names.csv`
- `panel_baseline_logreg_saga.joblib`
- `panel_X_test.npz`, `panel_y_test.npy`, `panel_ids_test.parquet` (for test scoring and actionability exports)
- (Optional) time-split / event-derived artifacts for asset-level context

## Outputs (this notebook)

Exports are written to a new folder:

`data/processed/risk_scoring/<EXPORT_RUN_ID>/`

Core outputs:
- `alerts_top5_assets_per_day.csv` — daily triage list (top 5 assets/day)
- `alerts_top5_assets_per_day_drivers_long.csv` — “why” for each alert (top contributions at peak hour)
- `asset_day_scores.csv` — asset-day risk rollup
- `asset_risk_scores.csv` — overall asset risk ranking
- `DASHBOARD_EXPORTS.json` — machine-readable manifest of generated export files


In [1]:
#============================================================
# Cell 1 — Setup: imports, paths, run auto-discovery, and output directory
#
# Purpose:
#   - Define project root and standard folders
#   - Auto-locate the MOST RECENT feature-engineering RUN folder that contains
#     the panel artifacts we need (robust to kernel restarts and OUT_DIR drift)
#   - Create a fresh export run directory for this notebook's outputs
#   - Establish a small set of helper utilities used throughout the notebook
#
# Expected:
#   - Repo structure similar to:
#       gmp-packaging-risk-analytics/
#         src/
#         data/
#           processed/feature_engineering/<RUN_ID>/
#         *.ipynb
#
# Outputs created:
#   - EXPORT_DIR: data/processed/risk_scoring/<EXPORT_RUN_ID>/
#============================================================

from __future__ import annotations

import json
import os
from pathlib import Path
from datetime import datetime, timezone

import numpy as np
import pandas as pd
import scipy.sparse as sp
import joblib


# -----------------------------
# 1) Project root + canonical data folders
# -----------------------------
# Notebook lives at repo root in your current layout (00_, 01_, 02_, 03_ sit together).
# If you move notebooks later, adjust ROOT detection.
ROOT = Path.cwd()

DATA_DIR = ROOT / "data"
FE_DIR = DATA_DIR / "processed" / "feature_engineering"     # where notebook 02 wrote run folders
RS_DIR = DATA_DIR / "processed" / "risk_scoring"            # where *this* notebook writes exports

print("ROOT:", ROOT)
print("FE_DIR:", FE_DIR)
print("RS_DIR:", RS_DIR)

assert FE_DIR.exists(), f"Missing feature engineering folder: {FE_DIR}. Did you run notebook 02?"


# -----------------------------
# 2) Helper: find most recent run folder that contains required artifacts
# -----------------------------
# We look for a run folder that contains panel model artifacts.
# This avoids brittle dependence on a prior variable like OUT_DIR, and survives kernel restarts.
REQUIRED_PANEL_FILES = [
    "panel_baseline_logreg_saga.joblib",
    "panel_feature_names.csv",
]

OPTIONAL_BUT_COMMON_PANEL_FILES = [
    "panel_asset_hour_future_incident.parquet",
    "panel_ids_test.parquet",
    "panel_X_test.npz",
    "panel_y_test.npy",
]

def find_latest_run_dir(base_dir: Path, required_files: list[str]) -> Path:
    """
    Return the newest run directory under base_dir that contains all required_files.
    Run dirs look like 'YYYYMMDDThhmmssZ'. We sort lexicographically which works for that format.
    """
    if not base_dir.exists():
        raise FileNotFoundError(f"Base directory does not exist: {base_dir}")

    candidates = []
    for p in sorted(base_dir.glob("20*T*Z"), reverse=True):
        if not p.is_dir():
            continue
        if all((p / f).exists() for f in required_files):
            candidates.append(p)

    if not candidates:
        # Show nearby directories to help debugging
        nearby = [x.name for x in sorted(base_dir.glob("*"))[:10]]
        raise FileNotFoundError(
            "Could not find any feature-engineering run folder containing required panel artifacts.\n"
            f"Base dir: {base_dir}\n"
            f"Required: {required_files}\n"
            f"Example entries in base dir: {nearby}"
        )

    return candidates[0]

PANEL_RUN_DIR = find_latest_run_dir(FE_DIR, REQUIRED_PANEL_FILES)

print("\nSelected PANEL_RUN_DIR:", PANEL_RUN_DIR)

# Quick visibility for the human
print("\nPanel run dir contents (selected key files):")
for fn in REQUIRED_PANEL_FILES + OPTIONAL_BUT_COMMON_PANEL_FILES:
    p = PANEL_RUN_DIR / fn
    print(f"  {'✅' if p.exists() else '— '} {fn}")


# -----------------------------
# 3) Create a new export run directory for this notebook
# -----------------------------
EXPORT_RUN_ID = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
EXPORT_DIR = RS_DIR / EXPORT_RUN_ID
EXPORT_DIR.mkdir(parents=True, exist_ok=True)

print("\nEXPORT_RUN_ID:", EXPORT_RUN_ID)
print("EXPORT_DIR:", EXPORT_DIR)


# -----------------------------
# 4) Mini utility: write a JSON manifest of outputs (updated later)
# -----------------------------
EXPORT_MANIFEST_PATH = EXPORT_DIR / "DASHBOARD_EXPORTS.json"

def update_manifest(manifest_path: Path, new_items: dict) -> None:
    """
    Merge new_items into an on-disk JSON manifest (create if absent).
    Values should be JSON-serializable.
    """
    if manifest_path.exists():
        payload = json.loads(manifest_path.read_text())
    else:
        payload = {}

    payload.update(new_items)
    manifest_path.write_text(json.dumps(payload, indent=2))

# Initialize manifest with provenance
update_manifest(EXPORT_MANIFEST_PATH, {
    "export_run_id": EXPORT_RUN_ID,
    "created_utc": datetime.now(timezone.utc).isoformat(),
    "panel_run_dir": str(PANEL_RUN_DIR),
})

print("\nInitialized manifest:", EXPORT_MANIFEST_PATH)
print(EXPORT_MANIFEST_PATH.read_text())

ROOT: /home/parallels/projects/gmp-packaging-risk-analytics
FE_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering
RS_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring

Selected PANEL_RUN_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering/20251220T211621Z

Panel run dir contents (selected key files):
  ✅ panel_baseline_logreg_saga.joblib
  ✅ panel_feature_names.csv
  ✅ panel_asset_hour_future_incident.parquet
  ✅ panel_ids_test.parquet
  ✅ panel_X_test.npz
  ✅ panel_y_test.npy

EXPORT_RUN_ID: 20251220T223649Z
EXPORT_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z

Initialized manifest: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json
{
  "export_run_id": "20251220T223649Z",
  "created_utc": "2025-12-20T22:36:49.011693+00:00",
  "panel_run_

## ### What Cell 1 Just Did — Setup, Run Discovery, and Export Run Initialization

This cell initialized the notebook’s working context and made the pipeline **restart-safe** by locating the correct upstream artifacts automatically.

**Project + folder setup**
- Confirmed the repository root as:  
  `/home/parallels/projects/gmp-packaging-risk-analytics`
- Set the feature-engineering runs directory to:  
  `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering`
- Set the risk-scoring export base directory to:  
  `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring`

**Auto-discovered the most recent valid panel run**
- Selected the latest feature-engineering run folder containing the required panel artifacts:  
  `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering/20251220T211621Z`
- Verified the key inputs exist for downstream scoring and actionability:
  - `panel_baseline_logreg_saga.joblib`
  - `panel_feature_names.csv`
  - `panel_asset_hour_future_incident.parquet`
  - `panel_ids_test.parquet`
  - `panel_X_test.npz`
  - `panel_y_test.npy`

**Created a new export run folder for this notebook**
- Created a new export run id: `20251220T223649Z`
- Created the export output folder:  
  `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z`

**Initialized the export manifest**
- Wrote `DASHBOARD_EXPORTS.json` into the export folder with provenance linking this export run back to the selected panel run directory:
  ```json
  {
    "export_run_id": "20251220T223649Z",
    "created_utc": "2025-12-20T22:36:49.011693+00:00",
    "panel_run_dir": "/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering/20251220T211621Z"
  }


In [2]:
#============================================================
# Cell 2 — Load panel artifacts + score the TEST set
#   Goal:
#     - Load the trained panel baseline model + the TEST split artifacts
#     - Produce a traceable scored table at the asset-hour level (with IDs)
#     - Save the scored table as a canonical input for downstream rollups/exports
#
#   Inputs (from PANEL_RUN_DIR):
#     - panel_X_test.npz
#     - panel_y_test.npy
#     - panel_ids_test.parquet
#     - panel_feature_names.csv
#     - panel_baseline_logreg_saga.joblib
#
#   Outputs (to EXPORT_DIR):
#     - panel_test_scores_asset_hour.parquet
#     - panel_test_scores_asset_hour.csv   (optional, smaller “business friendly” version)
#============================================================

import numpy as np
import pandas as pd
import scipy.sparse as sp
import joblib

from pathlib import Path

# -----------------------------
# 1) Resolve input paths (panel run artifacts)
# -----------------------------
X_test_path   = Path(PANEL_RUN_DIR) / "panel_X_test.npz"
y_test_path   = Path(PANEL_RUN_DIR) / "panel_y_test.npy"
ids_test_path = Path(PANEL_RUN_DIR) / "panel_ids_test.parquet"
feat_path     = Path(PANEL_RUN_DIR) / "panel_feature_names.csv"
model_path    = Path(PANEL_RUN_DIR) / "panel_baseline_logreg_saga.joblib"

for p in [X_test_path, y_test_path, ids_test_path, feat_path, model_path]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required panel artifact: {p}")

# -----------------------------
# 2) Load model + test matrices
# -----------------------------
X_test_tx = sp.load_npz(X_test_path)
if not sp.issparse(X_test_tx):
    # Defensive: should not happen, but keep robust if format changes.
    X_test_tx = sp.csr_matrix(X_test_tx)
else:
    X_test_tx = X_test_tx.tocsr()

y_test = np.load(y_test_path).astype("int8")
ids_test = pd.read_parquet(ids_test_path).copy()

clf = joblib.load(model_path)

# -----------------------------
# 3) Load feature names (robust to schema)
# -----------------------------
feat_df = pd.read_csv(feat_path)
if "feature" in feat_df.columns:
    feature_names = feat_df["feature"].astype(str).tolist()
elif "feature_name" in feat_df.columns:
    feature_names = feat_df["feature_name"].astype(str).tolist()
elif len(feat_df.columns) == 1:
    feature_names = feat_df.iloc[:, 0].astype(str).tolist()
else:
    feature_names = feat_df.iloc[:, 0].astype(str).tolist()

# Sanity checks (prevents silent mismatches)
nX = X_test_tx.shape[1]
if len(feature_names) != nX:
    raise ValueError(
        "Feature dimension mismatch:\n"
        f"  X_test features   : {nX}\n"
        f"  feature_names len : {len(feature_names)}\n"
        f"  feature file      : {feat_path}"
    )

n_model = getattr(clf, "n_features_in_", None)
if n_model is not None and n_model != nX:
    raise ValueError(
        "Model feature dimension mismatch:\n"
        f"  X_test features : {nX}\n"
        f"  model expects   : {n_model}\n"
        f"  model file      : {model_path}"
    )

# -----------------------------
# 4) Ensure key ID columns exist (traceability)
# -----------------------------
# We’ll treat these as “must-have” for dashboard exports.
required_id_cols = ["asset_id"]
missing = [c for c in required_id_cols if c not in ids_test.columns]
if missing:
    raise KeyError(
        "ids_test is missing required identifier columns:\n"
        f"  missing: {missing}\n"
        f"  columns: {ids_test.columns.tolist()}"
    )

# Timestamp column is expected for “when” rollups.
ts_col = None
for c in ["ts_hour_utc", "ts_hour", "timestamp_utc", "ts_utc"]:
    if c in ids_test.columns:
        ts_col = c
        break
if ts_col is None:
    raise KeyError(
        "Could not find a timestamp column in ids_test.\n"
        f"Available columns: {ids_test.columns.tolist()}"
    )

# Parse timestamp consistently as UTC tz-aware
ids_test[ts_col] = pd.to_datetime(ids_test[ts_col], utc=True, errors="coerce")
if ids_test[ts_col].isna().any():
    bad = int(ids_test[ts_col].isna().sum())
    raise ValueError(f"{ts_col} has {bad} NaT values after parsing; check ids_test parquet.")

# -----------------------------
# 5) Score the TEST set (hour-level)
# -----------------------------
# LogisticRegression supports predict_proba on sparse matrices.
p_hat = clf.predict_proba(X_test_tx)[:, 1].astype(float)

scores = ids_test.copy()
scores["y_true"] = y_test
scores["p_hat"] = p_hat
scores["ts_utc"] = scores[ts_col]  # canonical alias for downstream use
scores["date_utc"] = scores["ts_utc"].dt.date

# Quick sanity snapshot
print("Loaded:")
print(f"  PANEL_RUN_DIR : {PANEL_RUN_DIR}")
print(f"  X_test_tx     : {X_test_tx.shape} | sparse: {sp.issparse(X_test_tx)}")
print(f"  y_test        : {y_test.shape} | pos_rate: {float(y_test.mean()):.6f}")
print(f"  ids_test      : {ids_test.shape}")
print(f"  ts_col        : {ts_col}")
print("\nScored table snapshot:")
print(f"  rows          : {len(scores):,}")
print(f"  unique assets : {scores['asset_id'].nunique():,}")
print(f"  date range    : {scores['ts_utc'].min()} → {scores['ts_utc'].max()}")

# -----------------------------
# 6) Persist scored table (canonical downstream input)
# -----------------------------
out_parquet = Path(EXPORT_DIR) / "panel_test_scores_asset_hour.parquet"
scores.to_parquet(out_parquet, index=False)
print("\nSaved:", out_parquet)

# Optional: export a smaller CSV (keeps it business-friendly)
csv_cols = [c for c in ["asset_id", "ts_utc", "date_utc", "p_hat", "y_true", "site_id", "line_id", "asset_type", "is_legacy"] if c in scores.columns]
out_csv = Path(EXPORT_DIR) / "panel_test_scores_asset_hour.csv"
scores[csv_cols].to_csv(out_csv, index=False)
print("Saved:", out_csv)

# Show a small preview
display(scores.head(10))

Loaded:
  PANEL_RUN_DIR : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/feature_engineering/20251220T211621Z
  X_test_tx     : (8071, 76) | sparse: True
  y_test        : (8071,) | pos_rate: 0.075208
  ids_test      : (8071, 6)
  ts_col        : ts_hour_utc

Scored table snapshot:
  rows          : 8,071
  unique assets : 24
  date range    : 2025-11-27 00:00:00+00:00 → 2025-12-11 00:00:00+00:00

Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_test_scores_asset_hour.parquet
Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_test_scores_asset_hour.csv


Unnamed: 0,asset_id,ts_hour_utc,site_id,line_id,asset_type,is_legacy,y_true,p_hat,ts_utc,date_utc
0,A0001,2025-11-27 00:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.131399,2025-11-27 00:00:00+00:00,2025-11-27
1,A0001,2025-11-27 01:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.098103,2025-11-27 01:00:00+00:00,2025-11-27
2,A0001,2025-11-27 02:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.108845,2025-11-27 02:00:00+00:00,2025-11-27
3,A0001,2025-11-27 03:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.122167,2025-11-27 03:00:00+00:00,2025-11-27
4,A0001,2025-11-27 04:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.126038,2025-11-27 04:00:00+00:00,2025-11-27
5,A0001,2025-11-27 05:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.122089,2025-11-27 05:00:00+00:00,2025-11-27
6,A0001,2025-11-27 06:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.111483,2025-11-27 06:00:00+00:00,2025-11-27
7,A0001,2025-11-27 07:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.137937,2025-11-27 07:00:00+00:00,2025-11-27
8,A0001,2025-11-27 08:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.114635,2025-11-27 08:00:00+00:00,2025-11-27
9,A0001,2025-11-27 09:00:00+00:00,S1,S1-L2,blister_packer,False,0,0.108846,2025-11-27 09:00:00+00:00,2025-11-27


### What Cell 2 Just Did

This cell loaded the **panel model artifacts** from the selected feature-engineering run folder (`PANEL_RUN_DIR`) and produced a **traceable, hour-level risk scoring table** for the TEST split. Concretely, it loaded the sparse feature matrix `panel_X_test.npz` (shape **(8071, 76)**), the labels `panel_y_test.npy` (positive rate **0.075208**), the identifier table `panel_ids_test.parquet` (which includes `asset_id` and `ts_hour_utc`), the feature name list, and the trained baseline model `panel_baseline_logreg_saga.joblib`. It then computed **predicted probabilities** (`p_hat = P(target_future_incident=1)`) for every asset-hour in the TEST set, standardized the timestamp into a canonical `ts_utc` column, and added a daily bucket `date_utc` for downstream dashboard rollups.

**Key outputs created in this run (`EXPORT_DIR`):**
- `panel_test_scores_asset_hour.parquet` — the canonical scored TEST table (all columns, analysis-friendly)
- `panel_test_scores_asset_hour.csv` — a smaller, business-friendly view (IDs + `p_hat` + label)

**Sanity snapshot from the run output:**
- Rows scored: **8,071** (asset-hours)
- Unique test assets: **24**
- UTC date range covered: **2025-11-27** through **2025-12-11**
- The preview shows expected columns like: `asset_id`, `ts_hour_utc`/`ts_utc`, `site_id`, `line_id`, `asset_type`, `is_legacy`, `y_true`, and `p_hat`.

These scored tables are the primary inputs for the next cells, where we’ll create **dashboard-ready rollups** (e.g., top risky assets, risk by day/site/line) and enforce an **alerts budget** such as “top 5 assets/day.”


In [3]:
#============================================================
# Cell 3 — Dashboard rollups + alert budget (Top-5 assets/day on TEST)
#   Build business-friendly summaries from the hour-level scored table:
#     • Asset-day risk (max p_hat) + “when” (ts_peak) + asset-day truth label
#     • Top-K assets/day alerts (K=5) + precision@K (asset-day)
#     • Daily / Site-day / Line-day rollups for dashboard tiles
#   Persist all outputs into EXPORT_DIR and register them in DASHBOARD_EXPORTS.json
#============================================================

import json
import numpy as np
import pandas as pd
from pathlib import Path

# -----------------------------
# 0) Load the scored TEST table produced in Cell 2
# -----------------------------
scores_path = EXPORT_DIR / "panel_test_scores_asset_hour.parquet"
assert scores_path.exists(), f"Missing {scores_path} (run Cell 2 first)"

scores = pd.read_parquet(scores_path).copy()

required_cols = ["asset_id", "p_hat", "y_true"]
for c in required_cols:
    assert c in scores.columns, f"Expected column '{c}' in scores table"

# Timestamp column: prefer ts_utc (canonical), fallback to ts_hour_utc if needed
ts_col = "ts_utc" if "ts_utc" in scores.columns else ("ts_hour_utc" if "ts_hour_utc" in scores.columns else None)
assert ts_col is not None, f"Could not find timestamp column (ts_utc or ts_hour_utc). Columns: {scores.columns.tolist()}"

# Ensure tz-aware UTC timestamps
scores[ts_col] = pd.to_datetime(scores[ts_col], utc=True, errors="coerce")
assert scores[ts_col].notna().all(), f"{ts_col} has NaT after parsing; check upstream scoring."

# Ensure date bucket exists (UTC)
if "date_utc" not in scores.columns:
    scores["date_utc"] = scores[ts_col].dt.date

# Optional context columns (present in your ids tables)
context_cols = [c for c in ["site_id", "line_id", "asset_type", "is_legacy"] if c in scores.columns]

print("Loaded scored TEST table:")
print(" ", scores_path)
print("  rows         :", len(scores))
print("  unique assets:", scores["asset_id"].nunique())
print("  date range   :", scores[ts_col].min(), "→", scores[ts_col].max())
print("  pos rate (hour-level):", float(scores["y_true"].mean()))

# -----------------------------
# 1) Asset-day risk table (max p_hat per asset per day)
#    This gives a very business-friendly “top risky assets today” view.
# -----------------------------
# Keep a stable row index to track the “peak hour” row if we need it later
scores = scores.reset_index(drop=True)
scores["row_ix"] = np.arange(len(scores), dtype=int)

g_keys = ["date_utc", "asset_id"]

# Find, for each asset-day: the row with the highest p_hat (peak hour)
# We do this by sorting descending on p_hat then taking groupby().first()
asset_day_peak = (
    scores.sort_values("p_hat", ascending=False)
          .groupby(g_keys, as_index=False)
          .first()
          .rename(columns={ts_col: "ts_peak"})
)

# Asset-day truth label: did the asset have ANY positive hour that day?
asset_day_truth = (
    scores.groupby(g_keys, as_index=False)["y_true"]
          .max()
          .rename(columns={"y_true": "y_asset_day"})
)

asset_day = asset_day_peak.merge(asset_day_truth, on=g_keys, how="left")

# Keep only the columns we want to expose on dashboards (plus useful context)
keep_cols = (
    ["date_utc", "asset_id", "ts_peak", "p_hat", "y_asset_day", "row_ix"]
    + context_cols
)
asset_day = asset_day[[c for c in keep_cols if c in asset_day.columns]].copy()

# -----------------------------
# 2) Enforce alerts budget: Top-5 assets/day (TEST)
#    This is “precision@K” in an operational form: we only alert on 5 assets/day.
# -----------------------------
K = 5

alerts = (
    asset_day.sort_values(["date_utc", "p_hat"], ascending=[True, False])
             .groupby("date_utc", as_index=False)
             .head(K)
             .reset_index(drop=True)
)

# Precision@K (asset-day): fraction of alerted asset-days with ≥1 positive hour
precision_at_k = float(alerts["y_asset_day"].mean()) if len(alerts) else np.nan
avg_assets_per_day = float(alerts.groupby("date_utc")["asset_id"].nunique().mean()) if len(alerts) else 0.0

print("\nAlerts budget (TEST):")
print(f"  K (assets/day)          : {K}")
print(f"  Avg assets/day selected : {avg_assets_per_day:.2f}")
print(f"  Precision@K (asset-day) : {precision_at_k:.3f}")

# -----------------------------
# 3) Dashboard rollups (simple aggregations)
#    These become easy KPI tiles / tables on a dashboard.
# -----------------------------
# Daily rollup: “how many high-risk predictions do we see each day?”
daily = (
    scores.groupby("date_utc", as_index=False)
          .agg(
              n_hours=("p_hat", "size"),
              n_pos_hours=("y_true", "sum"),
              pos_rate_hours=("y_true", "mean"),
              p_hat_mean=("p_hat", "mean"),
              p_hat_p95=("p_hat", lambda s: float(np.quantile(s, 0.95))),
              p_hat_max=("p_hat", "max"),
              n_assets=("asset_id", "nunique"),
          )
)

# Add operational alert metrics (Top-5 assets/day)
alerts_by_day = (
    alerts.groupby("date_utc", as_index=False)
          .agg(
              n_alert_assets=("asset_id", "nunique"),
              n_alert_asset_days_positive=("y_asset_day", "sum"),
              precision_at_k=("y_asset_day", "mean"),
              max_alert_p_hat=("p_hat", "max"),
          )
)
daily = daily.merge(alerts_by_day, on="date_utc", how="left")

# Site-day rollup
if "site_id" in scores.columns:
    site_day = (
        scores.groupby(["date_utc", "site_id"], as_index=False)
              .agg(
                  n_hours=("p_hat", "size"),
                  n_pos_hours=("y_true", "sum"),
                  pos_rate_hours=("y_true", "mean"),
                  p_hat_mean=("p_hat", "mean"),
                  p_hat_max=("p_hat", "max"),
                  n_assets=("asset_id", "nunique"),
              )
    )
else:
    site_day = pd.DataFrame()

# Line-day rollup
if "line_id" in scores.columns:
    line_day = (
        scores.groupby(["date_utc", "line_id"], as_index=False)
              .agg(
                  n_hours=("p_hat", "size"),
                  n_pos_hours=("y_true", "sum"),
                  pos_rate_hours=("y_true", "mean"),
                  p_hat_mean=("p_hat", "mean"),
                  p_hat_max=("p_hat", "max"),
                  n_assets=("asset_id", "nunique"),
              )
    )
else:
    line_day = pd.DataFrame()

# -----------------------------
# 4) Persist outputs + register in manifest
# -----------------------------
out_asset_day = EXPORT_DIR / "panel_asset_day_scores_test.csv"
out_alerts    = EXPORT_DIR / "panel_alerts_top5_assets_per_day_test.csv"
out_daily     = EXPORT_DIR / "panel_daily_rollup_test.csv"
out_site_day  = EXPORT_DIR / "panel_site_day_rollup_test.csv"
out_line_day  = EXPORT_DIR / "panel_line_day_rollup_test.csv"
out_budget    = EXPORT_DIR / "panel_alert_budget_top5_metrics_test.json"

asset_day.to_csv(out_asset_day, index=False)
alerts.to_csv(out_alerts, index=False)
daily.to_csv(out_daily, index=False)

print("\nSaved:")
print(" ", out_asset_day)
print(" ", out_alerts)
print(" ", out_daily)

if not site_day.empty:
    site_day.to_csv(out_site_day, index=False)
    print(" ", out_site_day)
if not line_day.empty:
    line_day.to_csv(out_line_day, index=False)
    print(" ", out_line_day)

budget_payload = {
    "policy": "top_k_assets_per_day",
    "k": int(K),
    "precision_at_k_asset_day_test": float(precision_at_k) if precision_at_k == precision_at_k else None,
    "avg_assets_per_day_test": float(avg_assets_per_day),
    "note": "precision@K computed on asset-day label y_asset_day=max(y_true) over hours for that asset on that UTC date (TEST only).",
}
out_budget.write_text(json.dumps(budget_payload, indent=2))
print(" ", out_budget)

# Update DASHBOARD_EXPORTS.json (append-only style)
manifest_path = EXPORT_DIR / "DASHBOARD_EXPORTS.json"
manifest = json.loads(manifest_path.read_text()) if manifest_path.exists() else {}
manifest.setdefault("exports", [])

new_exports = [
    out_asset_day.name,
    out_alerts.name,
    out_daily.name,
    out_budget.name,
]
if not site_day.empty:
    new_exports.append(out_site_day.name)
if not line_day.empty:
    new_exports.append(out_line_day.name)

# De-dup while preserving order
seen = set(manifest["exports"])
for f in new_exports:
    if f not in seen:
        manifest["exports"].append(f)
        seen.add(f)

manifest_path.write_text(json.dumps(manifest, indent=2))
print("\nUpdated manifest:", manifest_path)

# -----------------------------
# 5) Quick previews
# -----------------------------
print("\nTop alerts (first 15 rows):")
display(alerts.head(15))

print("\nDaily rollup (preview):")
display(daily.head(10))

Loaded scored TEST table:
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_test_scores_asset_hour.parquet
  rows         : 8071
  unique assets: 24
  date range   : 2025-11-27 00:00:00+00:00 → 2025-12-11 00:00:00+00:00
  pos rate (hour-level): 0.07520753314335274

Alerts budget (TEST):
  K (assets/day)          : 5
  Avg assets/day selected : 5.00
  Precision@K (asset-day) : 0.400

Saved:
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_asset_day_scores_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_alerts_top5_assets_per_day_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_daily_rollup_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_site_day_rollup_test.csv
  /home/paral

Unnamed: 0,date_utc,asset_id,ts_peak,p_hat,y_asset_day,row_ix,site_id,line_id,asset_type,is_legacy
0,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,S2,S2-L4,case_packer,True
1,2025-11-27,A0046,2025-11-27 18:00:00+00:00,0.731878,0,3383,S4,S4-L4,blister_packer,True
2,2025-11-27,A0056,2025-11-27 19:00:00+00:00,0.693739,0,4056,S3,S3-L1,cartoner,True
3,2025-11-27,A0063,2025-11-27 16:00:00+00:00,0.692411,0,4389,S3,S3-L1,sterilizer,True
4,2025-11-27,A0037,2025-11-27 22:00:00+00:00,0.682174,0,2378,S1,S1-L3,capper,True
5,2025-11-28,A0046,2025-11-28 11:00:00+00:00,0.999991,1,3400,S4,S4-L4,blister_packer,True
6,2025-11-28,A0056,2025-11-28 13:00:00+00:00,0.999987,1,4074,S3,S3-L1,cartoner,True
7,2025-11-28,A0110,2025-11-28 04:00:00+00:00,0.999983,1,7763,S4,S4-L3,capper,True
8,2025-11-28,A0019,2025-11-28 05:00:00+00:00,0.99998,1,1376,S3,S3-L5,labeler,True
9,2025-11-28,A0045,2025-11-28 14:00:00+00:00,0.999951,1,3067,S2,S2-L2,labeler,True



Daily rollup (preview):


Unnamed: 0,date_utc,n_hours,n_pos_hours,pos_rate_hours,p_hat_mean,p_hat_p95,p_hat_max,n_assets,n_alert_assets,n_alert_asset_days_positive,precision_at_k,max_alert_p_hat
0,2025-11-27,576,0,0.0,0.393552,0.700848,0.753193,24,5,0,0.0,0.753193
1,2025-11-28,576,78,0.135417,0.499999,0.797957,0.999991,24,5,5,1.0,0.999991
2,2025-11-29,576,81,0.140625,0.427053,0.734381,0.999956,24,5,3,0.6,0.999956
3,2025-11-30,576,50,0.086806,0.544726,0.82649,0.999972,24,5,2,0.4,0.999972
4,2025-12-01,576,7,0.012153,0.551168,0.833077,0.864172,24,5,0,0.0,0.864172
5,2025-12-02,576,9,0.015625,0.479401,0.78132,0.999972,24,5,1,0.2,0.999972
6,2025-12-03,576,41,0.071181,0.398783,0.708626,0.99998,24,5,3,0.6,0.99998
7,2025-12-04,576,46,0.079861,0.393049,0.712068,0.757414,24,5,1,0.2,0.757414
8,2025-12-05,576,13,0.022569,0.49991,0.799057,0.999978,24,5,1,0.2,0.999978
9,2025-12-06,576,11,0.019097,0.422288,0.737457,0.777275,24,5,0,0.0,0.777275


### What Cell 3 Just Did

This cell turns the hour-level panel test scores into **dashboard-ready rollups** and an **operational alert list** that respects our budget of **Top 5 assets per day**.

**Inputs used**
- Loaded the scored test set from Cell 2:  
  `panel_test_scores_asset_hour.parquet` (8,071 rows, 24 assets)  
- Confirmed the scoring time window covers **2025-11-27 → 2025-12-11 (UTC)** and the hour-level positive rate is **~0.0752**.

**Outputs produced**
1. **Asset-day scoring table (`panel_asset_day_scores_test.csv`)**
   - Aggregated to one row per `(date_utc, asset_id)`
   - Uses **max p_hat** across that day’s hours as the asset-day “risk”
   - Captures **when** risk peaks via `ts_peak` (the hour with max p_hat)
   - Adds an “asset-day truth” label `y_asset_day` = 1 if the asset had **any** positive hour that day

2. **Alerts table with strict budget (`panel_alerts_top5_assets_per_day_test.csv`)**
   - Enforces **K = 5 assets/day**
   - For each day, selects the **top 5 assets** by asset-day max probability (`p_hat`)
   - Computes **Precision@K (asset-day)** = fraction of alerted asset-days that had ≥1 positive hour

3. **Dashboard KPI rollups**
   - `panel_daily_rollup_test.csv`: day-level metrics (volume, pos rate, mean/p95/max p_hat, unique assets) plus the alert-budget performance per day
   - `panel_site_day_rollup_test.csv`: same style rollup grouped by `(date_utc, site_id)`
   - `panel_line_day_rollup_test.csv`: same style rollup grouped by `(date_utc, line_id)`

4. **Budget policy metrics (`panel_alert_budget_top5_metrics_test.json`)**
   - Records our alert policy (Top-K assets/day) and performance summary

5. **Manifest update**
   - Appended the new export filenames into:  
     `DASHBOARD_EXPORTS.json`  
   so downstream steps can discover and publish the latest export set automatically.

**Key results from this run**
- Alerts budget: **K = 5 assets/day**
- Average assets selected per day: **5.00**
- **Precision@K (asset-day) = 0.400**, meaning **~40% of the alerted asset-days** had at least one truly positive hour in the test labels.

**Preview sanity check**
- The “Top alerts” preview shows the expected daily Top-5 structure with:
  `date_utc, asset_id, ts_peak, p_hat, y_asset_day` plus context (`site_id`, `line_id`, `asset_type`, `is_legacy`).
- The daily rollup preview confirms day-level aggregation columns are populated and aligns with the same test window.


In [5]:
#============================================================
# Cell 4 — Alert “WHY” layer: top drivers per alerted asset-day (TEST)
#   Uses the Top-5 assets/day alerts from Cell 3 and explains each alert
#   by extracting top coefficient contributions from the sparse design row
#   Robust to DASHBOARD_EXPORTS.json schema (exports may be a list OR dict)
#============================================================

import json
import numpy as np
import pandas as pd
import scipy.sparse as sp
import joblib

# -----------------------------
# 0) Locate required inputs (from prior cells)
# -----------------------------
# Expect these to exist from Cell 1:
#   PANEL_RUN_DIR, EXPORT_DIR
alerts_path = EXPORT_DIR / "panel_alerts_top5_assets_per_day_test.csv"
X_test_path = PANEL_RUN_DIR / "panel_X_test.npz"
feat_path   = PANEL_RUN_DIR / "panel_feature_names.csv"
model_path  = PANEL_RUN_DIR / "panel_baseline_logreg_saga.joblib"

for p in [alerts_path, X_test_path, feat_path, model_path]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required file: {p}")

alerts = pd.read_csv(alerts_path)
X_test_tx = sp.load_npz(X_test_path)
clf = joblib.load(model_path)

# Force CSR for fast row slicing
if not sp.issparse(X_test_tx):
    X_test_tx = sp.csr_matrix(X_test_tx)
else:
    X_test_tx = X_test_tx.tocsr()

# -----------------------------
# 1) Load feature names robustly (column schema may vary)
# -----------------------------
feat_df = pd.read_csv(feat_path)

if "feature_name" in feat_df.columns:
    feature_names = feat_df["feature_name"].astype(str).tolist()
elif "feature" in feat_df.columns:
    feature_names = feat_df["feature"].astype(str).tolist()
elif len(feat_df.columns) == 1:
    feature_names = feat_df.iloc[:, 0].astype(str).tolist()
else:
    feature_names = feat_df.iloc[:, 0].astype(str).tolist()

# Sanity check: features align with matrix columns
if X_test_tx.shape[1] != len(feature_names):
    raise ValueError(
        "Feature dimension mismatch:\n"
        f"  X_test_tx.shape[1] = {X_test_tx.shape[1]}\n"
        f"  len(feature_names) = {len(feature_names)}\n"
        f"  feature_names file = {feat_path}"
    )

# Model coefficient vector
coefs = clf.coef_.ravel()
if len(coefs) != X_test_tx.shape[1]:
    raise ValueError(
        "Model coefficient dimension mismatch:\n"
        f"  len(coefs)          = {len(coefs)}\n"
        f"  X_test_tx.shape[1]  = {X_test_tx.shape[1]}\n"
        f"  model file          = {model_path}"
    )

# -----------------------------
# 2) Ensure required columns exist in alerts
# -----------------------------
required_cols = ["date_utc", "asset_id", "ts_peak", "p_hat", "y_asset_day", "row_ix"]
missing = [c for c in required_cols if c not in alerts.columns]
if missing:
    raise KeyError(
        "Alerts file is missing required columns.\n"
        f"  Missing: {missing}\n"
        f"  Found: {alerts.columns.tolist()}\n"
        f"  File: {alerts_path}"
    )

alerts["ts_peak"] = pd.to_datetime(alerts["ts_peak"], utc=True, errors="coerce")
if alerts["ts_peak"].isna().any():
    raise ValueError("alerts['ts_peak'] contains NaT after parsing; check the alerts export.")

# -----------------------------
# 3) Per-alert driver extraction (top contributions)
#    contribution_i = x_i * coef_i  (in transformed feature space)
# -----------------------------
def top_contribs_for_row(row_ix: int, top_n: int = 10) -> pd.DataFrame:
    row = X_test_tx.getrow(int(row_ix))  # 1 x n_features CSR
    if row.nnz == 0:
        return pd.DataFrame(columns=["feature", "x", "coef", "contrib", "direction"])

    idx = row.indices
    x = row.data
    c = coefs[idx]
    contrib = x * c

    dfc = pd.DataFrame({
        "feature": [feature_names[i] for i in idx],
        "x": x,
        "coef": c,
        "contrib": contrib,
    })
    dfc["abs_contrib"] = np.abs(dfc["contrib"])
    dfc["direction"] = np.where(dfc["contrib"] >= 0, "push_to_1", "push_to_0")

    return dfc.sort_values("abs_contrib", ascending=False).drop(columns=["abs_contrib"]).head(top_n)

TOP_N = 10
drivers_long = []

for _, r in alerts.iterrows():
    row_ix = int(r["row_ix"])
    d = top_contribs_for_row(row_ix=row_ix, top_n=TOP_N).copy()

    d.insert(0, "date_utc", r["date_utc"])
    d.insert(1, "asset_id", r["asset_id"])
    d.insert(2, "ts_peak", r["ts_peak"])
    d.insert(3, "p_hat_peak", float(r["p_hat"]))
    d.insert(4, "y_asset_day", int(r["y_asset_day"]))
    d.insert(5, "row_ix", row_ix)

    drivers_long.append(d)

drivers_long_df = pd.concat(drivers_long, ignore_index=True) if drivers_long else pd.DataFrame()

# -----------------------------
# 4) Compact “why” strings per alert row (dashboard friendly)
# -----------------------------
def _summarize_why(dsub: pd.DataFrame, n_each: int = 3) -> dict:
    if dsub.empty:
        return {"why_push_to_1": "", "why_push_to_0": ""}

    pos = dsub[dsub["contrib"] > 0].copy().sort_values("contrib", ascending=False).head(n_each)
    neg = dsub[dsub["contrib"] < 0].copy().sort_values("contrib", ascending=True).head(n_each)

    def fmt(df):
        return "; ".join([f"{f}({v:+.3f})" for f, v in zip(df["feature"], df["contrib"])])

    return {"why_push_to_1": fmt(pos), "why_push_to_0": fmt(neg)}

why_rows = []
if not drivers_long_df.empty:
    for (date_utc, asset_id, ts_peak, row_ix), dsub in drivers_long_df.groupby(
        ["date_utc", "asset_id", "ts_peak", "row_ix"], as_index=False
    ):
        s = _summarize_why(dsub, n_each=3)
        why_rows.append({
            "date_utc": date_utc,
            "asset_id": asset_id,
            "ts_peak": ts_peak,
            "row_ix": row_ix,
            **s
        })

why_df = pd.DataFrame(why_rows)
alerts_why = alerts.merge(why_df, on=["date_utc", "asset_id", "ts_peak", "row_ix"], how="left")

# -----------------------------
# 5) Persist outputs + update manifest (robust to exports being list or dict)
# -----------------------------
drivers_long_out = EXPORT_DIR / "panel_alerts_top5_assets_per_day_test_drivers_long.csv"
alerts_why_out   = EXPORT_DIR / "panel_alerts_top5_assets_per_day_test_with_why.csv"

drivers_long_df.to_csv(drivers_long_out, index=False)
alerts_why.to_csv(alerts_why_out, index=False)

print("Saved:")
print(" ", drivers_long_out)
print(" ", alerts_why_out)

def _upsert_export(manifest_obj: dict, key: str, path: str) -> dict:
    """
    Supports either:
      - manifest["exports"] as dict  -> set exports[key] = path
      - manifest["exports"] as list  -> append/replace {"name": key, "path": path}
    """
    exports = manifest_obj.get("exports", None)

    # If missing, default to dict (simple, stable)
    if exports is None:
        manifest_obj["exports"] = {key: path}
        return manifest_obj

    # If dict, write directly
    if isinstance(exports, dict):
        exports[key] = path
        manifest_obj["exports"] = exports
        return manifest_obj

    # If list, store as list-of-records
    if isinstance(exports, list):
        replaced = False
        new_list = []
        for item in exports:
            if isinstance(item, dict) and item.get("name") == key:
                new_list.append({"name": key, "path": path})
                replaced = True
            else:
                new_list.append(item)
        if not replaced:
            new_list.append({"name": key, "path": path})
        manifest_obj["exports"] = new_list
        return manifest_obj

    # Otherwise (unexpected type), coerce to dict
    manifest_obj["exports"] = {key: path}
    return manifest_obj

manifest_path = EXPORT_DIR / "DASHBOARD_EXPORTS.json"
manifest = json.loads(manifest_path.read_text()) if manifest_path.exists() else {}

manifest = _upsert_export(manifest, "panel_alerts_top5_assets_per_day_test_drivers_long_csv", str(drivers_long_out))
manifest = _upsert_export(manifest, "panel_alerts_top5_assets_per_day_test_with_why_csv", str(alerts_why_out))

manifest_path.write_text(json.dumps(manifest, indent=2))
print("Updated manifest:", manifest_path)

# -----------------------------
# 6) Quick preview
# -----------------------------
print("\nAlerts + WHY (preview):")
display(alerts_why.head(15))

print("\nDrivers (long form, preview):")
display(drivers_long_df.head(30))


Saved:
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_alerts_top5_assets_per_day_test_drivers_long.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/panel_alerts_top5_assets_per_day_test_with_why.csv
Updated manifest: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json

Alerts + WHY (preview):


Unnamed: 0,date_utc,asset_id,ts_peak,p_hat,y_asset_day,row_ix,site_id,line_id,asset_type,is_legacy,why_push_to_1,why_push_to_0
0,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,S2,S2-L4,case_packer,True,is_legacy(+0.631); asset_type_case_packer(+0.5...,dow_utc_cos(-0.508); site_id_S2(-0.212); dow_u...
1,2025-11-27,A0046,2025-11-27 18:00:00+00:00,0.731878,0,3383,S4,S4-L4,blister_packer,True,is_legacy(+0.631); line_id_S4-L4(+0.588); is_w...,dow_utc_cos(-0.508); vibration_mm_s_tele_mean(...
2,2025-11-27,A0056,2025-11-27 19:00:00+00:00,0.693739,0,4056,S3,S3-L1,cartoner,True,is_legacy(+0.631); humidity_rh_tele_mean(+0.47...,dow_utc_cos(-0.508); line_speed_u_min_tele_mea...
3,2025-11-27,A0063,2025-11-27 16:00:00+00:00,0.692411,0,4389,S3,S3-L1,sterilizer,True,is_legacy(+0.631); humidity_rh_tele_max(+0.360...,dow_utc_cos(-0.508); humidity_rh_tele_mean(-0....
4,2025-11-27,A0037,2025-11-27 22:00:00+00:00,0.682174,0,2378,S1,S1-L3,capper,True,is_legacy(+0.631); asset_type_capper(+0.399); ...,dow_utc_cos(-0.508); site_id_S1(-0.332); humid...
5,2025-11-28,A0046,2025-11-28 11:00:00+00:00,0.999991,1,3400,S4,S4-L4,blister_packer,True,inc_count(+10.214); is_legacy(+0.631); line_id...,humidity_rh_tele_mean(-0.810); dow_utc_cos(-0....
6,2025-11-28,A0056,2025-11-28 13:00:00+00:00,0.999987,1,4074,S3,S3-L1,cartoner,True,inc_count(+10.214); is_legacy(+0.631); is_week...,dow_utc_cos(-0.508); site_id_S3(-0.301); temp_...
7,2025-11-28,A0110,2025-11-28 04:00:00+00:00,0.999983,1,7763,S4,S4-L3,capper,True,inc_count(+10.214); is_legacy(+0.631); humidit...,dow_utc_cos(-0.508); vibration_mm_s_tele_mean(...
8,2025-11-28,A0019,2025-11-28 05:00:00+00:00,0.99998,1,1376,S3,S3-L5,labeler,True,inc_count(+10.214); is_legacy(+0.631); line_id...,dow_utc_cos(-0.508); asset_type_labeler(-0.409...
9,2025-11-28,A0045,2025-11-28 14:00:00+00:00,0.999951,1,3067,S2,S2-L2,labeler,True,inc_count(+10.214); is_legacy(+0.631); is_week...,dow_utc_cos(-0.508); asset_type_labeler(-0.409...



Drivers (long form, preview):


Unnamed: 0,date_utc,asset_id,ts_peak,p_hat_peak,y_asset_day,row_ix,feature,x,coef,contrib,direction
0,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,is_legacy,0.960621,0.656359,0.630512,push_to_1
1,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,asset_type_case_packer,1.0,0.531498,0.531498,push_to_1
2,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,dow_utc_cos,-1.271777,0.399179,-0.507667,push_to_0
3,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,line_id_S2-L4,1.0,0.363836,0.363836,push_to_1
4,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,is_weekend_utc,-0.631791,-0.496756,0.313846,push_to_1
5,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,site_id_S2,1.0,-0.212079,-0.212079,push_to_0
6,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,dow_utc_sin,0.612969,-0.267724,-0.164107,push_to_0
7,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,telemetry_rows_hour,-1.202494,-0.110827,0.133269,push_to_1
8,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,humidity_rh_tele_max,-0.655394,-0.171087,0.11213,push_to_1
9,2025-11-27,A0105,2025-11-27 20:00:00+00:00,0.753193,0,7082,humidity_rh_tele_min,0.584919,-0.138792,-0.081182,push_to_0


### What Cell 4 Just Did — “Why” Layer for Top-5 Alerts/Day (Explainability)

This cell adds an **actionability layer** on top of the **Top-5 assets/day alert budget** produced in Cell 3 by generating a human-readable *why* for each alert.

**Inputs used**
- The daily alert list from the scoring run: `panel_alerts_top5_assets_per_day_test.csv`
- The corresponding TEST design matrix and model artifacts from the selected panel run:
  - `panel_X_test.npz` (sparse transformed features)
  - `panel_feature_names.csv` (feature name mapping)
  - `panel_baseline_logreg_saga.joblib` (trained logistic regression)

**What it computed**
- For every alerted **asset-day**, it took the **peak-risk hour** (`ts_peak`) and used the stored `row_ix` (which points directly into the sparse `X_test` row) to compute **feature contribution values** in the transformed feature space:

  \[
  \text{contrib}_i = x_i \times \beta_i
  \]

  Where:
  - \(x_i\) is the transformed feature value for that hour
  - \(\beta_i\) is the model coefficient for that feature

- It then produced two explainability views:
  1. **Long-form drivers**: top per-row contributions with sign/direction (`push_to_1` vs `push_to_0`)
  2. **Compact “why” strings** per alert row:
     - `why_push_to_1`: the strongest contributors pushing risk upward
     - `why_push_to_0`: the strongest contributors pushing risk downward

**What the preview shows**
- The alert table now includes “why” text. For example:
  - Many high-risk alerts include `inc_count(+10.214)` as a dominant push-to-1 driver, which aligns with the model learning that recent incident frequency strongly increases predicted risk.
  - Some alerts are also influenced by static/context features like `is_legacy(+0.631)` or line/site indicators (e.g., `line_id_S4-L4(+0.588)`), while time features like `dow_utc_cos(-0.508)` can push predictions down depending on the hour/day position.

**Files created**
- `panel_alerts_top5_assets_per_day_test_drivers_long.csv`  
  Long-form driver table: one row per `(alert × feature)` with `x`, `coef`, `contrib`, and direction.
- `panel_alerts_top5_assets_per_day_test_with_why.csv`  
  Business-friendly alert table with compact `why_push_to_1` / `why_push_to_0` summaries.

**Provenance**
- `DASHBOARD_EXPORTS.json` was updated successfully to register these new export artifacts, and the update is robust to the manifest’s `exports` structure (list or dict).


In [6]:
#============================================================
# Cell 5 — Dashboard bundle exports (small + app-friendly):
#   Build lightweight KPI JSON + “latest day” alert queue + top sites/lines
#   Update DASHBOARD_EXPORTS.json with the new bundle artifacts
#============================================================

import json
from pathlib import Path

import numpy as np
import pandas as pd

# -----------------------------
# 0) Locate the active EXPORT_DIR + manifest (created in Cell 1)
# -----------------------------
if "EXPORT_DIR" not in globals():
    raise NameError("EXPORT_DIR is not defined. Please run Cell 1 first.")

EXPORT_DIR = Path(EXPORT_DIR)
manifest_path = EXPORT_DIR / "DASHBOARD_EXPORTS.json"
if not manifest_path.exists():
    raise FileNotFoundError(f"Missing manifest: {manifest_path}")

manifest = json.loads(manifest_path.read_text())

# Ensure manifest["exports"] is a dict (older cells may have initialized it differently)
exports_obj = manifest.get("exports", {})
if isinstance(exports_obj, list):
    # Convert list -> dict using basename keys (best-effort)
    exports_obj = {Path(p).name: p for p in exports_obj if isinstance(p, str)}
elif not isinstance(exports_obj, dict):
    exports_obj = {}
manifest["exports"] = exports_obj

# -----------------------------
# 1) Inputs produced by earlier cells in this notebook
# -----------------------------
alerts_with_why_path = EXPORT_DIR / "panel_alerts_top5_assets_per_day_test_with_why.csv"
daily_rollup_path    = EXPORT_DIR / "panel_daily_rollup_test.csv"
site_day_rollup_path = EXPORT_DIR / "panel_site_day_rollup_test.csv"
line_day_rollup_path = EXPORT_DIR / "panel_line_day_rollup_test.csv"
budget_metrics_path  = EXPORT_DIR / "panel_alert_budget_top5_metrics_test.json"

for p in [alerts_with_why_path, daily_rollup_path, site_day_rollup_path, line_day_rollup_path, budget_metrics_path]:
    if not p.exists():
        raise FileNotFoundError(f"Missing expected input from earlier cells: {p}")

alerts = pd.read_csv(alerts_with_why_path)
daily  = pd.read_csv(daily_rollup_path)
site_d = pd.read_csv(site_day_rollup_path)
line_d = pd.read_csv(line_day_rollup_path)
budget_metrics = json.loads(budget_metrics_path.read_text())

# Normalize dates/timestamps (robust)
if "date_utc" in alerts.columns:
    alerts["date_utc"] = pd.to_datetime(alerts["date_utc"], errors="coerce").dt.date
if "ts_peak" in alerts.columns:
    alerts["ts_peak"] = pd.to_datetime(alerts["ts_peak"], utc=True, errors="coerce")

if "date_utc" in daily.columns:
    daily["date_utc"] = pd.to_datetime(daily["date_utc"], errors="coerce").dt.date
if "date_utc" in site_d.columns:
    site_d["date_utc"] = pd.to_datetime(site_d["date_utc"], errors="coerce").dt.date
if "date_utc" in line_d.columns:
    line_d["date_utc"] = pd.to_datetime(line_d["date_utc"], errors="coerce").dt.date

# -----------------------------
# 2) Determine “latest day” in TEST exports and create app-friendly slices
# -----------------------------
if alerts.empty:
    raise ValueError("alerts_with_why is empty; cannot build dashboard bundle.")

latest_day = max([d for d in alerts["date_utc"].dropna().tolist()])
alerts_latest = alerts.loc[alerts["date_utc"] == latest_day].copy()

# Enforce the budget view (Top-5 assets/day) and keep columns that are UI-friendly
alert_cols = [
    "date_utc", "asset_id", "ts_peak", "p_hat", "y_asset_day",
    "site_id", "line_id", "asset_type", "is_legacy",
    "why_push_to_1", "why_push_to_0",
]
alert_cols = [c for c in alert_cols if c in alerts_latest.columns]
alerts_latest = alerts_latest[alert_cols].sort_values("p_hat", ascending=False).reset_index(drop=True)

# Top sites/lines for latest day (choose a stable “risk” sort key if present)
def _topn(df: pd.DataFrame, group_cols: list, n: int = 10) -> pd.DataFrame:
    if df.empty:
        return df
    # prefer max risk, else p95, else mean
    sort_key = None
    for k in ["p_hat_max", "p_hat_p95", "p_hat_mean"]:
        if k in df.columns:
            sort_key = k
            break
    if sort_key is None:
        sort_key = df.columns[-1]  # last resort
    out = df.sort_values(sort_key, ascending=False).head(n).reset_index(drop=True)
    return out

site_latest = site_d.loc[site_d["date_utc"] == latest_day].copy()
line_latest = line_d.loc[line_d["date_utc"] == latest_day].copy()

top_sites_latest = _topn(site_latest, group_cols=["site_id"], n=10)
top_lines_latest = _topn(line_latest, group_cols=["line_id"], n=10)

# Trend window (last 14 days available in TEST range)
daily_sorted = daily.dropna(subset=["date_utc"]).sort_values("date_utc").copy()
trend_tail = daily_sorted.tail(14).copy()

# -----------------------------
# 3) KPI summary (small JSON) for the dashboard header
# -----------------------------
date_min = daily_sorted["date_utc"].min()
date_max = daily_sorted["date_utc"].max()

kpis = {
    "export_run_id": manifest.get("export_run_id"),
    "panel_run_dir": manifest.get("panel_run_dir"),
    "scope": "TEST",
    "date_min_utc": str(date_min) if pd.notna(date_min) else None,
    "date_max_utc": str(date_max) if pd.notna(date_max) else None,
    "n_days": int(daily_sorted["date_utc"].nunique()) if not daily_sorted.empty else 0,
    "n_assets_test": int(alerts["asset_id"].nunique()) if "asset_id" in alerts.columns else None,
    "alerts_budget": {
        "policy": budget_metrics.get("policy", "top_k_assets_per_day"),
        "k_assets_per_day": budget_metrics.get("k"),
        "precision_at_k_asset_day": budget_metrics.get("precision_at_k_asset_day"),
        "note": budget_metrics.get("note"),
    },
}

# -----------------------------
# 4) Save “dashboard bundle” artifacts
# -----------------------------
alerts_latest_out = EXPORT_DIR / "dashboard_alert_queue_latest_day_test.csv"
top_sites_out     = EXPORT_DIR / "dashboard_top_sites_latest_day_test.csv"
top_lines_out     = EXPORT_DIR / "dashboard_top_lines_latest_day_test.csv"
trend_out         = EXPORT_DIR / "dashboard_daily_trend_last14_test.csv"
kpis_out          = EXPORT_DIR / "dashboard_kpis_test.json"

alerts_latest.to_csv(alerts_latest_out, index=False)
top_sites_latest.to_csv(top_sites_out, index=False)
top_lines_latest.to_csv(top_lines_out, index=False)
trend_tail.to_csv(trend_out, index=False)
kpis_out.write_text(json.dumps(kpis, indent=2))

# Update manifest with these paths
manifest["exports"]["dashboard_alert_queue_latest_day_test_csv"] = str(alerts_latest_out)
manifest["exports"]["dashboard_top_sites_latest_day_test_csv"]   = str(top_sites_out)
manifest["exports"]["dashboard_top_lines_latest_day_test_csv"]   = str(top_lines_out)
manifest["exports"]["dashboard_daily_trend_last14_test_csv"]     = str(trend_out)
manifest["exports"]["dashboard_kpis_test_json"]                  = str(kpis_out)

manifest_path.write_text(json.dumps(manifest, indent=2))

print("Latest day (UTC):", latest_day)
print("\nSaved dashboard bundle:")
print(" ", alerts_latest_out)
print(" ", top_sites_out)
print(" ", top_lines_out)
print(" ", trend_out)
print(" ", kpis_out)
print("\nUpdated manifest:", manifest_path)

print("\nAlert queue (latest day) preview:")
display(alerts_latest)

print("\nTop sites (latest day) preview:")
display(top_sites_latest.head(10))

print("\nTop lines (latest day) preview:")
display(top_lines_latest.head(10))

print("\nDaily trend (last 14 days) preview:")
display(trend_tail)

Latest day (UTC): 2025-12-11

Saved dashboard bundle:
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_alert_queue_latest_day_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_top_sites_latest_day_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_top_lines_latest_day_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_daily_trend_last14_test.csv
  /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_kpis_test.json

Updated manifest: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json

Alert queue (latest day) preview:


Unnamed: 0,date_utc,asset_id,ts_peak,p_hat,y_asset_day,site_id,line_id,asset_type,is_legacy,why_push_to_1,why_push_to_0
0,2025-12-11,A0011,2025-12-11 00:00:00+00:00,0.29737,0,S1,S1-L3,vision_inspection,False,is_weekend_utc(+0.314); asset_type_vision_insp...,is_legacy(-0.683); dow_utc_cos(-0.508); site_i...
1,2025-12-11,A0032,2025-12-11 00:00:00+00:00,0.275497,0,S4,S4-L3,vision_inspection,False,line_speed_u_min_tele_mean(+0.452); is_weekend...,is_legacy(-0.683); dow_utc_cos(-0.508); line_i...
2,2025-12-11,A0065,2025-12-11 00:00:00+00:00,0.251767,0,S3,S3-L5,labeler,False,line_id_S3-L5(+0.553); is_weekend_utc(+0.314);...,is_legacy(-0.683); dow_utc_cos(-0.508); asset_...
3,2025-12-11,A0108,2025-12-11 00:00:00+00:00,0.250631,0,S1,S1-L4,cartoner,False,is_weekend_utc(+0.314); asset_type_cartoner(+0...,is_legacy(-0.683); dow_utc_cos(-0.508); site_i...
4,2025-12-11,A0012,2025-12-11 00:00:00+00:00,0.16361,0,S4,S4-L3,weigh_check,False,is_weekend_utc(+0.314); telemetry_rows_hour(+0...,is_legacy(-0.683); dow_utc_cos(-0.508); asset_...



Top sites (latest day) preview:


Unnamed: 0,date_utc,site_id,n_hours,n_pos_hours,pos_rate_hours,p_hat_mean,p_hat_max,n_assets
0,2025-12-11,S1,4,0,0.0,0.190104,0.29737,4
1,2025-12-11,S4,2,0,0.0,0.219553,0.275497,2
2,2025-12-11,S3,1,0,0.0,0.251767,0.251767,1



Top lines (latest day) preview:


Unnamed: 0,date_utc,line_id,n_hours,n_pos_hours,pos_rate_hours,p_hat_mean,p_hat_max,n_assets
0,2025-12-11,S1-L3,1,0,0.0,0.29737,0.29737,1
1,2025-12-11,S4-L3,2,0,0.0,0.219553,0.275497,2
2,2025-12-11,S3-L5,1,0,0.0,0.251767,0.251767,1
3,2025-12-11,S1-L4,1,0,0.0,0.250631,0.250631,1
4,2025-12-11,S1-L2,2,0,0.0,0.106208,0.132054,2



Daily trend (last 14 days) preview:


Unnamed: 0,date_utc,n_hours,n_pos_hours,pos_rate_hours,p_hat_mean,p_hat_p95,p_hat_max,n_assets,n_alert_assets,n_alert_asset_days_positive,precision_at_k,max_alert_p_hat
1,2025-11-28,576,78,0.135417,0.499999,0.797957,0.999991,24,5,5,1.0,0.999991
2,2025-11-29,576,81,0.140625,0.427053,0.734381,0.999956,24,5,3,0.6,0.999956
3,2025-11-30,576,50,0.086806,0.544726,0.82649,0.999972,24,5,2,0.4,0.999972
4,2025-12-01,576,7,0.012153,0.551168,0.833077,0.864172,24,5,0,0.0,0.864172
5,2025-12-02,576,9,0.015625,0.479401,0.78132,0.999972,24,5,1,0.2,0.999972
6,2025-12-03,576,41,0.071181,0.398783,0.708626,0.99998,24,5,3,0.6,0.99998
7,2025-12-04,576,46,0.079861,0.393049,0.712068,0.757414,24,5,1,0.2,0.757414
8,2025-12-05,576,13,0.022569,0.49991,0.799057,0.999978,24,5,1,0.2,0.999978
9,2025-12-06,576,11,0.019097,0.422288,0.737457,0.777275,24,5,0,0.0,0.777275
10,2025-12-07,576,87,0.151042,0.547525,0.83096,0.999993,24,5,5,1.0,0.999993


### What Cell 5 Just Did — Dashboard bundle exports (small + app-friendly)

This cell takes the **actionability outputs from Cells 2–4** and packages them into a **lightweight “dashboard bundle”** that’s easy for an API/UI to consume without loading large parquet tables.

**Inputs used (from this export run directory):**
- `panel_alerts_top5_assets_per_day_test_with_why.csv` (Top-5 assets/day with “why” strings)
- `panel_daily_rollup_test.csv`, `panel_site_day_rollup_test.csv`, `panel_line_day_rollup_test.csv` (rollups)
- `panel_alert_budget_top5_metrics_test.json` (alerts budget metrics)

**What it produced:**
1. **Latest-day alert queue (Top-5 assets/day)**  
   - Picked the **latest available TEST day**: **2025-12-11 (UTC)**  
   - Built an alert queue containing **exactly 5 assets** for that day, ordered by `p_hat`, and kept the “why” fields:
     - `why_push_to_1` (top positive contributions driving risk up)
     - `why_push_to_0` (top negative contributions pushing risk down)
   - In this run, the **Top-5 queue is relatively low-risk** (max `p_hat` ≈ **0.297**) and **none of the alerted asset-days had a positive hour** (`y_asset_day = 0` for all 5).

2. **Latest-day top Sites and Lines (KPI slices)**  
   - Generated “latest day” slices for:
     - Sites: **S1, S4, S3**
     - Lines: **S1-L3, S4-L3, S3-L5, S1-L4, S1-L2**
   - These tables are sized for UI cards (small and fast).

3. **Last-14-day daily trend window**  
   - Exported the **last 14 available TEST days** from the daily rollup for plotting trend lines (volume, positivity, risk percentiles, max alert risk, etc.).
   - The trend table shows that earlier days have **high max-alert probabilities** near 1.0, while **2025-12-11** is a partial day (only **7 hours**) with much lower max risk (~0.297).

4. **KPI JSON for a dashboard header**  
   - Wrote a compact JSON summary with:
     - export run id + source panel run dir
     - TEST date range
     - alerts budget policy (`top_k_assets_per_day`, **k=5**)
     - `precision_at_k_asset_day` (from the saved budget metrics)

**Saved artifacts:**
- `dashboard_alert_queue_latest_day_test.csv`
- `dashboard_top_sites_latest_day_test.csv`
- `dashboard_top_lines_latest_day_test.csv`
- `dashboard_daily_trend_last14_test.csv`
- `dashboard_kpis_test.json`

**Provenance tracking:**
- Updated `DASHBOARD_EXPORTS.json` so downstream steps (API/dashboard code) can reliably discover these exported files by name.


In [9]:
#============================================================
# Cell 6 — Dashboard visuals + packaged bundle (TEST exports)
#   - Create a small set of PNG charts for quick inspection / UI prototyping
#   - Zip the dashboard bundle (CSVs + JSON + figures) for easy sharing
#   - Update DASHBOARD_EXPORTS.json with figure + bundle paths
#============================================================

import json
from pathlib import Path
import zipfile

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# -----------------------------
# 0) Locate the current EXPORT_DIR + manifest
# -----------------------------
manifest_path = EXPORT_DIR / "DASHBOARD_EXPORTS.json"
assert manifest_path.exists(), f"Missing manifest: {manifest_path}"

manifest = json.loads(manifest_path.read_text())

# Defensive: ensure manifest["exports"] is a dict (some earlier versions accidentally wrote a list)
if not isinstance(manifest.get("exports", {}), dict):
    manifest["exports"] = {}

# Figures directory inside this export run
FIG_DIR = EXPORT_DIR / "figures"
FIG_DIR.mkdir(parents=True, exist_ok=True)

print("Using EXPORT_DIR:", EXPORT_DIR)
print("Figures dir:", FIG_DIR)
print("Manifest:", manifest_path)


# -----------------------------
# 1) Load the “dashboard bundle” tables created in Cell 5
# -----------------------------
alert_queue_path = EXPORT_DIR / "dashboard_alert_queue_latest_day_test.csv"
top_sites_path   = EXPORT_DIR / "dashboard_top_sites_latest_day_test.csv"
top_lines_path   = EXPORT_DIR / "dashboard_top_lines_latest_day_test.csv"
trend_path       = EXPORT_DIR / "dashboard_daily_trend_last14_test.csv"
kpis_path        = EXPORT_DIR / "dashboard_kpis_test.json"

for p in [alert_queue_path, top_sites_path, top_lines_path, trend_path, kpis_path]:
    assert p.exists(), f"Missing required dashboard export: {p}"

alert_q  = pd.read_csv(alert_queue_path)
top_sites = pd.read_csv(top_sites_path)
top_lines = pd.read_csv(top_lines_path)
trend = pd.read_csv(trend_path)
kpis = json.loads(Path(kpis_path).read_text())

# Parse dates for plotting
if "date_utc" in trend.columns:
    trend["date_utc"] = pd.to_datetime(trend["date_utc"], errors="coerce")
if "date_utc" in alert_q.columns:
    alert_q["date_utc"] = pd.to_datetime(alert_q["date_utc"], errors="coerce")
if "date_utc" in top_sites.columns:
    top_sites["date_utc"] = pd.to_datetime(top_sites["date_utc"], errors="coerce")
if "date_utc" in top_lines.columns:
    top_lines["date_utc"] = pd.to_datetime(top_lines["date_utc"], errors="coerce")

# Lightweight sanity prints (helps confirm we’re charting the intended window)
print("\nLoaded dashboard tables:")
print("  alert_queue rows:", len(alert_q))
print("  top_sites rows  :", len(top_sites))
print("  top_lines rows  :", len(top_lines))
print("  trend rows      :", len(trend))
print("  trend date range:", trend["date_utc"].min(), "→", trend["date_utc"].max())


# -----------------------------
# 2) Plot A — Daily max alert probability (and mean risk) over time
# -----------------------------
fig_a = FIG_DIR / "daily_risk_trend_test.png"

plt.figure()
# Always sort by date for time series plots
trend_sorted = trend.sort_values("date_utc").copy()

# Use columns if present (robust to minor schema changes)
x = trend_sorted["date_utc"]
if "max_alert_p_hat" in trend_sorted.columns:
    plt.plot(x, trend_sorted["max_alert_p_hat"], label="Max alert p_hat")
if "p_hat_mean" in trend_sorted.columns:
    plt.plot(x, trend_sorted["p_hat_mean"], label="Mean p_hat")

plt.title("TEST — Daily risk trend (max alert + mean)")
plt.xlabel("UTC date")
plt.ylabel("Probability")
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.savefig(fig_a, dpi=150)
plt.close()
print("Saved:", fig_a)


# -----------------------------
# 3) Plot B — Daily incident activity (positive hours vs total hours)
# -----------------------------
fig_b = FIG_DIR / "daily_positive_hours_test.png"

plt.figure()
x = trend_sorted["date_utc"]
if "n_pos_hours" in trend_sorted.columns and "n_hours" in trend_sorted.columns:
    plt.plot(x, trend_sorted["n_pos_hours"], label="Positive hours")
    plt.plot(x, trend_sorted["n_hours"], label="Total hours")
elif "pos_rate_hours" in trend_sorted.columns:
    plt.plot(x, trend_sorted["pos_rate_hours"], label="Positive rate (hours)")
else:
    # Fallback: nothing to plot (should be rare), but don’t hard-fail
    plt.text(0.5, 0.5, "No activity columns found", ha="center", va="center")
plt.title("TEST — Daily activity (positive hours vs total)")
plt.xlabel("UTC date")
plt.ylabel("Count (hours)")
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.savefig(fig_b, dpi=150)
plt.close()
print("Saved:", fig_b)


# -----------------------------
# 4) Plot C — Latest-day alert queue (Top-5 assets) as a bar chart
# -----------------------------
fig_c = FIG_DIR / "alert_queue_latest_day_test.png"

plt.figure()
# Ensure highest risk at top (left-to-right) for readability
aq = alert_q.sort_values("p_hat", ascending=False).copy()
labels = aq["asset_id"].astype(str).tolist()
vals = aq["p_hat"].astype(float).tolist()

plt.bar(labels, vals)
plt.title("TEST — Alert queue (latest day, Top-5 assets)")
plt.xlabel("Asset ID")
plt.ylabel("p_hat (peak hour)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig(fig_c, dpi=150)
plt.close()
print("Saved:", fig_c)


# -----------------------------
# 5) Plot D — Latest-day top sites (max probability by site)
# -----------------------------
fig_d = FIG_DIR / "top_sites_latest_day_test.png"

plt.figure()
ts = top_sites.sort_values("p_hat_max", ascending=False).copy() if "p_hat_max" in top_sites.columns else top_sites.copy()
site_labels = ts["site_id"].astype(str).tolist()
site_vals = ts["p_hat_max"].astype(float).tolist() if "p_hat_max" in ts.columns else ts["p_hat_mean"].astype(float).tolist()

plt.bar(site_labels, site_vals)
plt.title("TEST — Top sites (latest day)")
plt.xlabel("Site")
plt.ylabel("Risk (max p_hat)" if "p_hat_max" in ts.columns else "Risk (mean p_hat)")
plt.xticks(rotation=0)
plt.tight_layout()
plt.savefig(fig_d, dpi=150)
plt.close()
print("Saved:", fig_d)


# -----------------------------
# 6) Update manifest with figure paths + create a zip bundle
# -----------------------------
# Register figures in the manifest
manifest["exports"]["fig_daily_risk_trend_test_png"] = str(fig_a)
manifest["exports"]["fig_daily_positive_hours_test_png"] = str(fig_b)
manifest["exports"]["fig_alert_queue_latest_day_test_png"] = str(fig_c)
manifest["exports"]["fig_top_sites_latest_day_test_png"] = str(fig_d)

# Create a small zip bundle with:
# - the “dashboard bundle” CSVs + KPI JSON
# - the figures
# - the manifest itself
zip_out = EXPORT_DIR / "dashboard_bundle_test.zip"

bundle_files = [
    alert_queue_path,
    top_sites_path,
    top_lines_path,
    trend_path,
    kpis_path,
    fig_a,
    fig_b,
    fig_c,
    fig_d,
    manifest_path,
]

with zipfile.ZipFile(zip_out, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fp in bundle_files:
        fp = Path(fp)
        if fp.exists():
            # Store inside the zip relative to EXPORT_DIR (clean paths)
            zf.write(fp, arcname=str(fp.relative_to(EXPORT_DIR)))

manifest["exports"]["dashboard_bundle_test_zip"] = str(zip_out)

# Persist updated manifest
manifest_path.write_text(json.dumps(manifest, indent=2))
print("\nSaved bundle:", zip_out)
print("Updated manifest:", manifest_path)

# Quick peek: show what we registered
print("\nManifest exports (new/updated):")
for k in [
    "fig_daily_risk_trend_test_png",
    "fig_daily_positive_hours_test_png",
    "fig_alert_queue_latest_day_test_png",
    "fig_top_sites_latest_day_test_png",
    "dashboard_bundle_test_zip",
]:
    print(f"  - {k}: {manifest['exports'].get(k)}")


Using EXPORT_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z
Figures dir: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/figures
Manifest: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json

Loaded dashboard tables:
  alert_queue rows: 5
  top_sites rows  : 3
  top_lines rows  : 5
  trend rows      : 14
  trend date range: 2025-11-28 00:00:00 → 2025-12-11 00:00:00
Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/figures/daily_risk_trend_test.png
Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/figures/daily_positive_hours_test.png
Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/figures/alert_queue_latest_day_test.png
Saved: /home/parallels/p

### What Cell 6 Just Did — Dashboard visuals + packaged export bundle (TEST)

This cell turned the “dashboard-ready” tables from **Cell 5** into a lightweight, shareable dashboard export package.

**Inputs loaded from the current export run (`EXPORT_DIR`):**
- `dashboard_alert_queue_latest_day_test.csv` (5 rows — the Top-5 assets for the latest day)
- `dashboard_top_sites_latest_day_test.csv` (3 rows)
- `dashboard_top_lines_latest_day_test.csv` (5 rows)
- `dashboard_daily_trend_last14_test.csv` (14 rows — **2025-11-28 → 2025-12-11 UTC**)
- `dashboard_kpis_test.json`
- Existing manifest: `DASHBOARD_EXPORTS.json`

**Visuals generated (saved to `figures/`):**
- `daily_risk_trend_test.png`  
  Plots the daily risk trend using:
  - `max_alert_p_hat` (max peak-hour alert probability that day), and
  - `p_hat_mean` (mean predicted risk across all scored hours that day).
- `daily_positive_hours_test.png`  
  Plots daily label activity using:
  - `n_pos_hours` (hours labeled positive), and
  - `n_hours` (total scored hours).
- `alert_queue_latest_day_test.png`  
  Bar chart of the **latest day’s Top-5 assets** by `p_hat` (peak-hour risk).
- `top_sites_latest_day_test.png`  
  Bar chart of **latest day’s top sites** using `p_hat_max` (or `p_hat_mean` if max is unavailable).

**Packaging + provenance:**
- Created a single zip bundle for easy sharing and downstream ingestion:
  - `dashboard_bundle_test.zip`  
  This includes the dashboard CSVs, KPI JSON, the PNG figures, and the manifest itself.

**Manifest update:**
- Updated `DASHBOARD_EXPORTS.json` with the new figure paths and the zip bundle path, so downstream code (API/UI) can reliably discover the latest export artifacts.

**Key outputs produced:**
- Figures:
  - `figures/daily_risk_trend_test.png`
  - `figures/daily_positive_hours_test.png`
  - `figures/alert_queue_latest_day_test.png`
  - `figures/top_sites_latest_day_test.png`
- Bundle:
  - `dashboard_bundle_test.zip`
- Updated manifest:
  - `DASHBOARD_EXPORTS.json`


In [10]:
#============================================================
# Cell 7 — Publish “latest” TEST dashboard artifacts + update repo EXPORTS.json
#   Goal: create stable, API-friendly paths (risk_scoring/latest_test/...) that always point to the most recent export run
#============================================================

import json
import shutil
from pathlib import Path

# -----------------------------
# 0) Preconditions (from Cell 1)
# -----------------------------
assert "ROOT" in globals(), "Expected ROOT from Cell 1"
assert "RS_DIR" in globals(), "Expected RS_DIR from Cell 1"
assert "EXPORT_DIR" in globals(), "Expected EXPORT_DIR from Cell 1"
assert "EXPORT_RUN_ID" in globals(), "Expected EXPORT_RUN_ID from Cell 1"

ROOT = Path(ROOT)
RS_DIR = Path(RS_DIR)
EXPORT_DIR = Path(EXPORT_DIR)
EXPORT_RUN_ID = str(EXPORT_RUN_ID)

assert EXPORT_DIR.exists(), f"Missing EXPORT_DIR: {EXPORT_DIR}"

manifest_path = EXPORT_DIR / "DASHBOARD_EXPORTS.json"
assert manifest_path.exists(), f"Missing manifest: {manifest_path}"

manifest = json.loads(manifest_path.read_text())
print("Loaded manifest:", manifest_path)

# -----------------------------
# 1) Create a stable “latest” directory for TEST artifacts
# -----------------------------
LATEST_DIR = RS_DIR / "latest_test"
LATEST_DIR.mkdir(parents=True, exist_ok=True)

# Optional: clean it so it truly represents "latest"
for p in LATEST_DIR.glob("*"):
    if p.is_file():
        p.unlink()
    elif p.is_dir():
        shutil.rmtree(p)

print("\nPublishing to stable directory:")
print("  LATEST_DIR:", LATEST_DIR)

# -----------------------------
# 2) Copy the key deliverables we want the API/UI to consume
#    Keep this *small* (CSV/JSON/PNG/ZIP), no parquet, no big matrices.
# -----------------------------
to_publish = [
    # “Operational” tables
    "dashboard_alert_queue_latest_day_test.csv",
    "dashboard_top_sites_latest_day_test.csv",
    "dashboard_top_lines_latest_day_test.csv",
    "dashboard_daily_trend_last14_test.csv",
    "dashboard_kpis_test.json",

    # Figures
    "figures/daily_risk_trend_test.png",
    "figures/daily_positive_hours_test.png",
    "figures/alert_queue_latest_day_test.png",
    "figures/top_sites_latest_day_test.png",

    # Bundle + manifest
    "dashboard_bundle_test.zip",
    "DASHBOARD_EXPORTS.json",
]

published = []
missing = []

for rel in to_publish:
    src = EXPORT_DIR / rel
    if not src.exists():
        missing.append(rel)
        continue

    dst = LATEST_DIR / Path(rel).name  # flatten into latest_test/
    shutil.copy2(src, dst)
    published.append((rel, dst.name))

print("\nPublished files:")
for rel, name in published:
    print(f"  ✅ {rel}  →  {name}")

if missing:
    print("\nMissing (not published):")
    for rel in missing:
        print("  ⚠️", rel)

# -----------------------------
# 3) Write a small pointer file so humans & scripts can see what “latest” means
# -----------------------------
latest_pointer = {
    "latest_run_id": EXPORT_RUN_ID,
    "latest_export_dir": str(EXPORT_DIR),
    "latest_published_dir": str(LATEST_DIR),
}
(LATEST_DIR / "LATEST_TEST.json").write_text(json.dumps(latest_pointer, indent=2))
print("\nWrote pointer:", LATEST_DIR / "LATEST_TEST.json")

# -----------------------------
# 4) Update a repo-level EXPORTS.json (small, git-safe pointers only)
#    This is a “directory of outputs” for reviewers and your API.
# -----------------------------
exports_path = ROOT / "EXPORTS.json"
if exports_path.exists():
    try:
        exports = json.loads(exports_path.read_text())
        if not isinstance(exports, dict):
            exports = {}
    except Exception:
        exports = {}
else:
    exports = {}

exports.setdefault("risk_scoring", {})
exports["risk_scoring"]["latest_test"] = {
    "run_id": EXPORT_RUN_ID,
    "export_dir": str(EXPORT_DIR),
    "published_dir": str(LATEST_DIR),
    "files": {name: str(LATEST_DIR / name) for _, name in published},
    "pointer": str(LATEST_DIR / "LATEST_TEST.json"),
}

exports_path.write_text(json.dumps(exports, indent=2))
print("Updated repo exports index:", exports_path)

# -----------------------------
# 5) Also append a convenience block into the run manifest itself
# -----------------------------
manifest.setdefault("published_latest", {})
manifest["published_latest"]["latest_test_dir"] = str(LATEST_DIR)
manifest["published_latest"]["latest_test_pointer"] = str(LATEST_DIR / "LATEST_TEST.json")
manifest["published_latest"]["published_files"] = [name for _, name in published]

manifest_path.write_text(json.dumps(manifest, indent=2))
print("Updated run manifest with published_latest:", manifest_path)

# -----------------------------
# 6) Quick preview of what the API would serve
# -----------------------------
print("\nLATEST_DIR listing:")
for p in sorted(LATEST_DIR.glob("*")):
    print(" ", p.name)


Loaded manifest: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json

Publishing to stable directory:
  LATEST_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test

Published files:
  ✅ dashboard_alert_queue_latest_day_test.csv  →  dashboard_alert_queue_latest_day_test.csv
  ✅ dashboard_top_sites_latest_day_test.csv  →  dashboard_top_sites_latest_day_test.csv
  ✅ dashboard_top_lines_latest_day_test.csv  →  dashboard_top_lines_latest_day_test.csv
  ✅ dashboard_daily_trend_last14_test.csv  →  dashboard_daily_trend_last14_test.csv
  ✅ dashboard_kpis_test.json  →  dashboard_kpis_test.json
  ✅ figures/daily_risk_trend_test.png  →  daily_risk_trend_test.png
  ✅ figures/daily_positive_hours_test.png  →  daily_positive_hours_test.png
  ✅ figures/alert_queue_latest_day_test.png  →  alert_queue_latest_day_test.png
  ✅ figures/top_sites_latest_day_test.png  →  top_sites_latest_day_t

### What Cell 7 Just Did — Publish “latest” TEST dashboard exports + update `EXPORTS.json`

This cell takes the **current export run** you created earlier (`EXPORT_DIR = .../risk_scoring/20251220T223649Z`) and “publishes” a **stable, always-current** TEST dashboard bundle under:

- `data/processed/risk_scoring/latest_test/`

It then **copies a curated set of small, API/dashboard-friendly deliverables** into that folder (CSV tables, PNG figures, ZIP bundle, and the run manifest), so downstream consumers don’t need to know the timestamped run folder name.

Based on your output, the following were successfully published into `latest_test/`:

- Core tables:
  - `dashboard_alert_queue_latest_day_test.csv`
  - `dashboard_top_sites_latest_day_test.csv`
  - `dashboard_top_lines_latest_day_test.csv`
  - `dashboard_daily_trend_last14_test.csv`
  - `dashboard_kpis_test.json`
- Key figures:
  - `daily_risk_trend_test.png`
  - `daily_positive_hours_test.png`
  - `alert_queue_latest_day_test.png`
  - `top_sites_latest_day_test.png`
- Bundle + provenance:
  - `dashboard_bundle_test.zip`
  - `DASHBOARD_EXPORTS.json`

To make “latest” traceable, it also wrote a pointer file:

- `data/processed/risk_scoring/latest_test/LATEST_TEST.json`

Finally, it updated two metadata indexes so the repo and the run both “know” where the latest published outputs live:

- Repo-level pointer index:
  - `/home/parallels/projects/gmp-packaging-risk-analytics/EXPORTS.json`
- Run-level manifest (adds a `published_latest` section):
  - `.../risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json`

Result: your dashboard/API can now read from **one stable location** (`risk_scoring/latest_test/`) instead of hunting for the newest timestamped run directory.


In [11]:
#============================================================
# Cell 8 — Build an API-ready dashboard payload (TEST) from the published “latest” exports
#   Goal:
#     Create ONE compact JSON file that a FastAPI endpoint can return directly:
#       - KPIs (json)
#       - Alert queue (top 5 assets for the latest day)
#       - Top sites/lines for the latest day
#       - Daily trend (last 14 days)
#     Keep it small + business-friendly (no huge tables, no parquet).
#
#   Inputs (from Cell 7 publish step):
#     data/processed/risk_scoring/latest_test/
#       - dashboard_kpis_test.json
#       - dashboard_alert_queue_latest_day_test.csv
#       - dashboard_top_sites_latest_day_test.csv
#       - dashboard_top_lines_latest_day_test.csv
#       - dashboard_daily_trend_last14_test.csv
#       - DASHBOARD_EXPORTS.json
#
#   Outputs:
#     - latest_test/dashboard_payload_test.json
#     - (optional) also copy into EXPORT_DIR for provenance
#============================================================

import json
from datetime import datetime, timezone
from pathlib import Path

import numpy as np
import pandas as pd

# -----------------------------
# 0) Locate the published "latest" folder (TEST)
# -----------------------------
LATEST_DIR = RS_DIR / "latest_test"
assert LATEST_DIR.exists(), f"Missing published latest directory: {LATEST_DIR}"

manifest_latest_path = LATEST_DIR / "DASHBOARD_EXPORTS.json"
kpis_path  = LATEST_DIR / "dashboard_kpis_test.json"
aq_path    = LATEST_DIR / "dashboard_alert_queue_latest_day_test.csv"
sites_path = LATEST_DIR / "dashboard_top_sites_latest_day_test.csv"
lines_path = LATEST_DIR / "dashboard_top_lines_latest_day_test.csv"
trend_path = LATEST_DIR / "dashboard_daily_trend_last14_test.csv"

for p in [kpis_path, aq_path, sites_path, lines_path, trend_path]:
    assert p.exists(), f"Missing required published file: {p}"

# -----------------------------
# 1) Load the published tables + KPIs
# -----------------------------
kpis = json.loads(kpis_path.read_text())

alert_queue = pd.read_csv(aq_path)
top_sites   = pd.read_csv(sites_path)
top_lines   = pd.read_csv(lines_path)
trend14     = pd.read_csv(trend_path)

# -----------------------------
# 2) Helper: make JSON-safe records (handles NaN, timestamps, numpy types)
# -----------------------------
def _clean_value(v):
    """Convert values into JSON-safe python primitives."""
    if pd.isna(v):
        return None
    # numpy scalar -> python scalar
    if isinstance(v, (np.generic,)):
        return v.item()
    return v

def df_to_records(df: pd.DataFrame, date_cols=None) -> list[dict]:
    """Convert df rows to list-of-dicts; optionally coerce specific columns to ISO strings."""
    out = df.copy()
    date_cols = date_cols or []
    for c in date_cols:
        if c in out.columns:
            out[c] = pd.to_datetime(out[c], errors="coerce")
            out[c] = out[c].dt.strftime("%Y-%m-%dT%H:%M:%SZ").where(out[c].notna(), None)
    # Replace NaN -> None and ensure python primitives
    records = []
    for row in out.to_dict(orient="records"):
        records.append({k: _clean_value(v) for k, v in row.items()})
    return records

# Coerce known date columns if present
alert_queue_records = df_to_records(alert_queue, date_cols=["ts_peak"])
top_sites_records   = df_to_records(top_sites)
top_lines_records   = df_to_records(top_lines)
trend14_records     = df_to_records(trend14, date_cols=["date_utc"])

# -----------------------------
# 3) Load published manifest (if present) for provenance (non-fatal)
# -----------------------------
manifest_latest = {}
if manifest_latest_path.exists():
    try:
        manifest_latest = json.loads(manifest_latest_path.read_text())
    except Exception:
        manifest_latest = {}

# -----------------------------
# 4) Assemble payload
# -----------------------------
created_utc = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

payload = {
    "kind": "dashboard_payload_test",
    "created_utc": created_utc,
    "published_latest_dir": str(LATEST_DIR),
    "source_export_run_id": manifest_latest.get("export_run_id"),
    "source_panel_run_dir": manifest_latest.get("panel_run_dir"),
    "kpis": kpis,
    "latest_day": alert_queue["date_utc"].iloc[0] if ("date_utc" in alert_queue.columns and len(alert_queue)) else None,
    "alert_queue_latest_day": alert_queue_records,     # top-5 assets (budget policy)
    "top_sites_latest_day": top_sites_records,         # site rollup for latest day
    "top_lines_latest_day": top_lines_records,         # line rollup for latest day
    "daily_trend_last14": trend14_records,             # 14-day trend window
    "notes": {
        "scope": "TEST split exports only (panel model).",
        "alert_budget": "Top-5 assets per day, selected by max p_hat within the day (asset-day risk).",
        "why_fields": "If present, why_push_to_1 / why_push_to_0 summarize top coefficient contributions at ts_peak.",
    },
}

# -----------------------------
# 5) Save payload (stable location + provenance copy)
# -----------------------------
payload_latest_path = LATEST_DIR / "dashboard_payload_test.json"
payload_latest_path.write_text(json.dumps(payload, indent=2))
print("Saved:", payload_latest_path)

# Also copy into this run's EXPORT_DIR (if defined in notebook state) for provenance
try:
    payload_run_path = EXPORT_DIR / "dashboard_payload_test.json"  # noqa: F821
    payload_run_path.write_text(json.dumps(payload, indent=2))
    print("Saved:", payload_run_path)
except Exception:
    print("Note: EXPORT_DIR not available in this cell context; saved only to latest_test/.")

# -----------------------------
# 6) Quick sanity preview
# -----------------------------
print("\nPayload sanity checks:")
print("  latest_day:", payload.get("latest_day"))
print("  alert_queue rows:", len(payload["alert_queue_latest_day"]))
print("  trend days:", len(payload["daily_trend_last14"]))
print("  kpis keys:", list(payload["kpis"].keys())[:10], "...")

Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test/dashboard_payload_test.json
Saved: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/dashboard_payload_test.json

Payload sanity checks:
  latest_day: 2025-12-11
  alert_queue rows: 5
  trend days: 14
  kpis keys: ['export_run_id', 'panel_run_dir', 'scope', 'date_min_utc', 'date_max_utc', 'n_days', 'n_assets_test', 'alerts_budget'] ...


### What Cell 8 Just Did — API-ready dashboard payload (TEST)

This cell packaged the “latest_test” dashboard exports into a single, API-friendly JSON payload that your FastAPI app can return directly.

**Inputs (published stable exports from Cell 7):**
- `data/processed/risk_scoring/latest_test/dashboard_kpis_test.json`
- `data/processed/risk_scoring/latest_test/dashboard_alert_queue_latest_day_test.csv`
- `data/processed/risk_scoring/latest_test/dashboard_top_sites_latest_day_test.csv`
- `data/processed/risk_scoring/latest_test/dashboard_top_lines_latest_day_test.csv`
- `data/processed/risk_scoring/latest_test/dashboard_daily_trend_last14_test.csv`
- (optionally) `data/processed/risk_scoring/latest_test/DASHBOARD_EXPORTS.json` for provenance fields

**What it built:**
- A compact JSON object containing:
  - **KPIs** (as-is from the KPI file)
  - **latest_day** inferred from the alert queue table
  - **alert_queue_latest_day** (Top-5 assets/day, with `ts_peak`, `p_hat`, and “why” fields if present)
  - **top_sites_latest_day** and **top_lines_latest_day**
  - **daily_trend_last14** (14-day trend table as JSON records)
  - **provenance** fields like `source_export_run_id` and `source_panel_run_dir` when available

**Key output artifacts created:**
- Stable location for the API to read:
  - `data/processed/risk_scoring/latest_test/dashboard_payload_test.json`
- Provenance copy tied to the export run:
  - `data/processed/risk_scoring/20251220T223649Z/dashboard_payload_test.json`

**Run output recap (from this execution):**
- Saved payload to both paths above.
- `latest_day` resolved to **2025-12-11**
- Alert queue contains **5 rows** (matching the Top-5 assets/day budget)
- Daily trend contains **14 days**
- KPI keys were confirmed loaded (e.g., `export_run_id`, `panel_run_dir`, `scope`, `date_min_utc`, `date_max_utc`, etc.)

**Why this matters:**
With this payload, your dashboard/API layer no longer needs to join multiple CSVs at request time—your endpoint can simply load and return one JSON file for a fast, deterministic response.


In [14]:
#============================================================
# Cell 9 — Publish a GitHub-friendly “report snapshot” (TEST)
#   Goal:
#     - Copy a *small*, human-readable bundle from data/processed → ./reports/
#       so GitHub can show results without committing the full data/ directory.
#     - Create a Markdown report that links the key CSV/JSON + figures.
#
#   Inputs (from Cell 7/8 “latest_test”):
#     data/processed/risk_scoring/latest_test/*
#
#   Outputs (git-friendly, small artifacts):
#     reports/risk_scoring/latest_test/
#       - DASHBOARD_REPORT_TEST.md
#       - dashboard_payload_test.json
#       - dashboard_kpis_test.json
#       - dashboard_alert_queue_latest_day_test.csv
#       - dashboard_top_sites_latest_day_test.csv
#       - dashboard_top_lines_latest_day_test.csv
#       - dashboard_daily_trend_last14_test.csv
#       - figures/*.png
#============================================================

import json
import shutil
from pathlib import Path
import pandas as pd

# -----------------------------
# 0) Define source (stable) + destination (git-friendly) dirs
# -----------------------------
LATEST_DIR = ROOT / "data" / "processed" / "risk_scoring" / "latest_test"
assert LATEST_DIR.exists(), f"Missing LATEST_DIR: {LATEST_DIR}"

REPORTS_DIR = ROOT / "reports" / "risk_scoring" / "latest_test"
FIG_DST_DIR = REPORTS_DIR / "figures"
REPORTS_DIR.mkdir(parents=True, exist_ok=True)
FIG_DST_DIR.mkdir(parents=True, exist_ok=True)

print("Source (stable exports):", LATEST_DIR)
print("Destination (git-friendly):", REPORTS_DIR)

# -----------------------------
# 1) Define the small files we want to publish to GitHub
# -----------------------------
files_to_copy = [
    "dashboard_payload_test.json",
    "dashboard_kpis_test.json",
    "dashboard_alert_queue_latest_day_test.csv",
    "dashboard_top_sites_latest_day_test.csv",
    "dashboard_top_lines_latest_day_test.csv",
    "dashboard_daily_trend_last14_test.csv",
    "DASHBOARD_EXPORTS.json",
    "LATEST_TEST.json",
]

# Copy core tables/json
copied = []
missing = []
for fname in files_to_copy:
    src = LATEST_DIR / fname
    dst = REPORTS_DIR / fname
    if src.exists():
        shutil.copy2(src, dst)
        copied.append(dst)
    else:
        missing.append(src)

# Copy figures (PNG)
fig_src_candidates = [
    "daily_risk_trend_test.png",
    "daily_positive_hours_test.png",
    "alert_queue_latest_day_test.png",
    "top_sites_latest_day_test.png",
]
for f in fig_src_candidates:
    src = LATEST_DIR / f
    dst = FIG_DST_DIR / f
    if src.exists():
        shutil.copy2(src, dst)
        copied.append(dst)
    else:
        missing.append(src)

print("\nCopied files:")
for p in copied:
    print("  ✅", p)

if missing:
    print("\nMissing (not fatal):")
    for p in missing:
        print("  ⚠️", p)

# -----------------------------
# 2) Load a few items to populate a clean Markdown report
# -----------------------------
payload_path = REPORTS_DIR / "dashboard_payload_test.json"
kpis_path = REPORTS_DIR / "dashboard_kpis_test.json"
trend_path = REPORTS_DIR / "dashboard_daily_trend_last14_test.csv"
aq_path = REPORTS_DIR / "dashboard_alert_queue_latest_day_test.csv"

payload = json.loads(payload_path.read_text()) if payload_path.exists() else {}
kpis = json.loads(kpis_path.read_text()) if kpis_path.exists() else {}

trend_df = pd.read_csv(trend_path) if trend_path.exists() else pd.DataFrame()
aq_df = pd.read_csv(aq_path) if aq_path.exists() else pd.DataFrame()

latest_day = payload.get("latest_day") or (str(aq_df["date_utc"].iloc[0]) if (not aq_df.empty and "date_utc" in aq_df.columns) else "unknown")

# Basic quick stats for the report
n_alerts = int(len(aq_df)) if not aq_df.empty else 0
max_p = float(aq_df["p_hat"].max()) if (not aq_df.empty and "p_hat" in aq_df.columns) else None
mean_p = float(aq_df["p_hat"].mean()) if (not aq_df.empty and "p_hat" in aq_df.columns) else None

trend_days = int(trend_df["date_utc"].nunique()) if (not trend_df.empty and "date_utc" in trend_df.columns) else 0
trend_min = str(trend_df["date_utc"].min()) if (not trend_df.empty and "date_utc" in trend_df.columns) else None
trend_max = str(trend_df["date_utc"].max()) if (not trend_df.empty and "date_utc" in trend_df.columns) else None

# -----------------------------
# 3) Write a GitHub-readable Markdown report
# -----------------------------
report_md = []
report_md.append("# Risk Scoring Dashboard Exports (TEST)")
report_md.append("")
report_md.append(f"- **Latest day:** `{latest_day}`")
report_md.append(f"- **Alerts budget:** Top-5 assets/day (queue rows = `{n_alerts}`)")
if mean_p is not None:
    report_md.append(f"- **Mean alert score (p̂):** `{mean_p:.3f}`")
if max_p is not None:
    report_md.append(f"- **Max alert score (p̂):** `{max_p:.3f}`")
if trend_days:
    report_md.append(f"- **Trend window:** `{trend_days}` days (`{trend_min}` → `{trend_max}`)")
report_md.append("")
report_md.append("## Files (API-ready)")
report_md.append("- `dashboard_payload_test.json` — single JSON payload for API responses")
report_md.append("- `dashboard_kpis_test.json` — KPI summary + provenance")
report_md.append("")
report_md.append("## Tables (CSV)")
report_md.append("- `dashboard_alert_queue_latest_day_test.csv` — Top-5 asset alerts for the latest day (+ “why”)")
report_md.append("- `dashboard_top_sites_latest_day_test.csv` — Site rollup for the latest day")
report_md.append("- `dashboard_top_lines_latest_day_test.csv` — Line rollup for the latest day")
report_md.append("- `dashboard_daily_trend_last14_test.csv` — 14-day trend table")
report_md.append("")
report_md.append("## Figures")
report_md.append("- `figures/daily_risk_trend_test.png`")
report_md.append("- `figures/daily_positive_hours_test.png`")
report_md.append("- `figures/alert_queue_latest_day_test.png`")
report_md.append("- `figures/top_sites_latest_day_test.png`")
report_md.append("")
report_md.append("## Notes")
report_md.append("- These artifacts are copied from `data/processed/risk_scoring/latest_test/` into `reports/` so they can be committed to GitHub safely.")
report_md.append("- The underlying full datasets and large intermediate artifacts remain under `data/` and stay out of git by design.")
report_md.append("")
report_md.append("## Quick Preview")
if not aq_df.empty:
    preview_cols = [c for c in ["date_utc","asset_id","ts_peak","p_hat","site_id","line_id","asset_type","is_legacy","why_push_to_1","why_push_to_0"] if c in aq_df.columns]
    report_md.append("")
    report_md.append("### Alert Queue (latest day, first 5 rows)")
    report_md.append("")
    report_md.append(aq_df[preview_cols].head(5).to_markdown(index=False))
else:
    report_md.append("")
    report_md.append("_Alert queue preview not available (CSV missing or empty)._")

report_path = REPORTS_DIR / "DASHBOARD_REPORT_TEST.md"
report_path.write_text("\n".join(report_md))
print("\nWrote report:", report_path)

print("\nDestination listing (reports/risk_scoring/latest_test):")
for p in sorted(REPORTS_DIR.glob("*")):
    if p.is_file():
        print("  -", p.name)
print("\nFigures listing:")
for p in sorted(FIG_DST_DIR.glob("*.png")):
    print("  - figures/" + p.name)


Source (stable exports): /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test
Destination (git-friendly): /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test

Copied files:
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_payload_test.json
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_kpis_test.json
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_alert_queue_latest_day_test.csv
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_top_sites_latest_day_test.csv
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_top_lines_latest_day_test.csv
  ✅ /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/dashboard_daily_trend_last14_t

### What Cell 9 Just Did — Publish a GitHub-friendly “report snapshot” (TEST)

This cell created a **git-safe reporting snapshot** of the latest TEST dashboard exports by copying a small, human-readable bundle from the stable processed output directory into the repo’s `reports/` folder (so results can be viewed on GitHub without committing the large `data/` tree).

**Inputs used (stable “latest” exports):**
- `data/processed/risk_scoring/latest_test/` (the published outputs from earlier cells)

**What it generated:**
- A new GitHub-friendly destination folder:
  - `reports/risk_scoring/latest_test/`
- A Markdown report summarizing the latest run:
  - `reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md`
- Copies of the core tables/JSON artifacts needed for review and light sharing:
  - `dashboard_payload_test.json` (API-ready payload)
  - `dashboard_kpis_test.json` (KPI/provenance summary)
  - `dashboard_alert_queue_latest_day_test.csv`
  - `dashboard_top_sites_latest_day_test.csv`
  - `dashboard_top_lines_latest_day_test.csv`
  - `dashboard_daily_trend_last14_test.csv`
  - `DASHBOARD_EXPORTS.json`, `LATEST_TEST.json` (provenance pointers)
- Copies of the key PNG figures into:
  - `reports/risk_scoring/latest_test/figures/`

**Confirmed outputs from your run:**
- ✅ Wrote the report:
  - `reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md`
- ✅ Snapshot folder contents now include:
  - `DASHBOARD_EXPORTS.json`
  - `DASHBOARD_REPORT_TEST.md`
  - `LATEST_TEST.json`
  - `dashboard_alert_queue_latest_day_test.csv`
  - `dashboard_daily_trend_last14_test.csv`
  - `dashboard_kpis_test.json`
  - `dashboard_payload_test.json`
  - `dashboard_top_lines_latest_day_test.csv`
  - `dashboard_top_sites_latest_day_test.csv`
- ✅ Figures copied:
  - `figures/alert_queue_latest_day_test.png`
  - `figures/daily_positive_hours_test.png`
  - `figures/daily_risk_trend_test.png`
  - `figures/top_sites_latest_day_test.png`

**Why this matters:**
- This creates a **clean “report artifact” set** suitable for GitHub review, hiring managers, or stakeholders, while keeping the large raw/processed datasets out of version control (consistent with your `.gitignore`).


In [15]:
#============================================================
# Cell 10 — GitHub-friendly reports index + EXPORTS pointers (TEST)
#   - Create a small README index under reports/ so GitHub viewers can navigate outputs
#   - Update EXPORTS.json (repo root) with links to the latest TEST dashboard report snapshot
#   - Update the run manifest (DASHBOARD_EXPORTS.json) with the report locations
#============================================================

import json
from pathlib import Path
from datetime import datetime, timezone

# -----------------------------
# 0) Resolve paths robustly (works after kernel restart)
# -----------------------------
ROOT = Path(globals().get("ROOT", Path.cwd())).resolve()

# Preferred: variables created in earlier cells
EXPORT_DIR = Path(globals().get("EXPORT_DIR", "")) if globals().get("EXPORT_DIR") else None
LATEST_DIR = Path(globals().get("LATEST_DIR", "")) if globals().get("LATEST_DIR") else None

# Fallbacks if the kernel was restarted and variables are missing
RS_DIR = Path(globals().get("RS_DIR", ROOT / "data" / "processed" / "risk_scoring")).resolve()

if LATEST_DIR is None or not LATEST_DIR.exists():
    LATEST_DIR = (RS_DIR / "latest_test").resolve()

if EXPORT_DIR is None or not EXPORT_DIR.exists():
    # Use pointer JSON in latest_test to locate the last export_run_id
    pointer = LATEST_DIR / "LATEST_TEST.json"
    if pointer.exists():
        try:
            ptr = json.loads(pointer.read_text())
            # common patterns: ptr["export_run_id"] or ptr["export_dir"]
            if isinstance(ptr, dict) and "export_dir" in ptr:
                EXPORT_DIR = Path(ptr["export_dir"]).resolve()
            elif isinstance(ptr, dict) and "export_run_id" in ptr:
                EXPORT_DIR = (RS_DIR / str(ptr["export_run_id"])).resolve()
        except Exception:
            EXPORT_DIR = None

# Final sanity checks
REPORTS_DIR = (ROOT / "reports" / "risk_scoring" / "latest_test").resolve()
REPORT_MD = REPORTS_DIR / "DASHBOARD_REPORT_TEST.md"
RUN_MANIFEST = (EXPORT_DIR / "DASHBOARD_EXPORTS.json").resolve() if EXPORT_DIR else None
ROOT_EXPORTS = (ROOT / "EXPORTS.json").resolve()

assert REPORTS_DIR.exists(), f"Missing reports snapshot dir: {REPORTS_DIR}"
assert REPORT_MD.exists(), f"Missing report markdown: {REPORT_MD}"
assert LATEST_DIR.exists(), f"Missing latest_test dir: {LATEST_DIR}"
if RUN_MANIFEST:
    assert RUN_MANIFEST.exists(), f"Missing run manifest: {RUN_MANIFEST}"

print("Resolved paths:")
print("  ROOT       :", ROOT)
print("  RS_DIR      :", RS_DIR)
print("  EXPORT_DIR  :", EXPORT_DIR if EXPORT_DIR else "(unknown)")
print("  LATEST_DIR  :", LATEST_DIR)
print("  REPORTS_DIR :", REPORTS_DIR)
print("  REPORT_MD   :", REPORT_MD)

# -----------------------------
# 1) Write a lightweight reports index README (GitHub navigation helper)
# -----------------------------
reports_index = REPORTS_DIR / "README.md"

# Keep it short and “GitHub-first”: explain what’s inside + link files relative to this folder.
index_lines = [
    "# Risk Scoring Dashboard — Latest TEST Snapshot",
    "",
    "This folder contains a **git-friendly snapshot** of the latest TEST exports from the panel risk-scoring pipeline.",
    "It is intentionally small so it can be reviewed on GitHub without committing the full processed dataset.",
    "",
    "## Quick links",
    f"- **Dashboard Report (Markdown):** `{REPORT_MD.name}`",
    f"- **Alert queue (latest day):** `dashboard_alert_queue_latest_day_test.csv`",
    f"- **Top sites (latest day):** `dashboard_top_sites_latest_day_test.csv`",
    f"- **Top lines (latest day):** `dashboard_top_lines_latest_day_test.csv`",
    f"- **Daily trend (last 14 days):** `dashboard_daily_trend_last14_test.csv`",
    f"- **KPIs / provenance:** `dashboard_kpis_test.json`",
    f"- **API-ready payload:** `dashboard_payload_test.json`",
    "",
    "## Figures",
    "- `figures/daily_risk_trend_test.png`",
    "- `figures/daily_positive_hours_test.png`",
    "- `figures/alert_queue_latest_day_test.png`",
    "- `figures/top_sites_latest_day_test.png`",
    "",
    "## Notes",
    "- TEST outputs are produced from a grouped-by-asset split and a Top-5 assets/day alert budget.",
    "- For full raw/processed artifacts, see the run directory under `data/processed/risk_scoring/<export_run_id>/` (not committed).",
]
reports_index.write_text("\n".join(index_lines))
print("\nWrote reports index:", reports_index)

# -----------------------------
# 2) Utility: safe JSON loader that tolerates old/empty files
# -----------------------------
def safe_load_json(path: Path):
    if not path.exists():
        return {}
    try:
        obj = json.loads(path.read_text())
        # We expect dicts for manifests/exports; if something else, wrap it safely.
        if isinstance(obj, dict):
            return obj
        return {"_raw": obj}
    except Exception:
        return {}

def safe_write_json(path: Path, obj: dict):
    path.write_text(json.dumps(obj, indent=2, sort_keys=False))

# -----------------------------
# 3) Update repo-level EXPORTS.json with report pointers (GitHub-safe)
# -----------------------------
exports = safe_load_json(ROOT_EXPORTS)
exports.setdefault("risk_scoring", {})
exports["risk_scoring"].setdefault("latest_test", {})

# Store paths relative to repo root when possible (reads better on GitHub)
def rel_to_root(p: Path) -> str:
    try:
        return str(p.resolve().relative_to(ROOT))
    except Exception:
        return str(p)

exports["risk_scoring"]["latest_test"].update({
    "reports_dir": rel_to_root(REPORTS_DIR),
    "report_md": rel_to_root(REPORT_MD),
    "reports_index_md": rel_to_root(reports_index),
    "latest_published_dir": rel_to_root(LATEST_DIR),
    "updated_utc": datetime.now(timezone.utc).isoformat(),
})

safe_write_json(ROOT_EXPORTS, exports)
print("Updated repo exports index:", ROOT_EXPORTS)

# -----------------------------
# 4) Update run manifest with report snapshot pointers (nice provenance)
# -----------------------------
if RUN_MANIFEST:
    manifest = safe_load_json(RUN_MANIFEST)

    # Ensure the manifest has the expected dict structure
    if not isinstance(manifest, dict):
        manifest = {"_raw": manifest}

    manifest.setdefault("reports", {})
    manifest["reports"].update({
        "report_snapshot_dir": str(REPORTS_DIR),
        "dashboard_report_test_md": str(REPORT_MD),
        "report_index_md": str(reports_index),
        "published_latest_dir": str(LATEST_DIR),
        "updated_utc": datetime.now(timezone.utc).isoformat(),
    })

    safe_write_json(RUN_MANIFEST, manifest)
    print("Updated run manifest with report pointers:", RUN_MANIFEST)

# -----------------------------
# 5) Quick tail preview (helps confirm in-notebook)
# -----------------------------
print("\nPreview: reports index (tail)\n")
print("\n".join(reports_index.read_text().splitlines()[-15:]))


Resolved paths:
  ROOT       : /home/parallels/projects/gmp-packaging-risk-analytics
  RS_DIR      : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring
  EXPORT_DIR  : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z
  LATEST_DIR  : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test
  REPORTS_DIR : /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test
  REPORT_MD   : /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md

Wrote reports index: /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test/README.md
Updated repo exports index: /home/parallels/projects/gmp-packaging-risk-analytics/EXPORTS.json
Updated run manifest with report pointers: /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z/DASHB

### What Cell 10 Just Did — GitHub-friendly report index + export pointers (TEST)

This cell **finalized the “human-readable / GitHub-readable” layer** of the risk-scoring outputs by creating a stable navigation entrypoint and wiring it into the repo-level export index.

**Key actions and outputs:**
- **Resolved all key paths** (root repo, risk_scoring folder, run export directory, latest published directory, and report snapshot directory), confirming the run we’re indexing is:
  - `EXPORT_DIR`: `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/20251220T223649Z`
  - `LATEST_DIR`: `/home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test`
  - `REPORTS_DIR`: `/home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test`
  - `REPORT_MD`: `.../reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md`

- **Wrote a GitHub-friendly README index** inside the report snapshot folder:
  - `reports/risk_scoring/latest_test/README.md`
  This README provides quick links to the report, CSV outputs, and figure files so someone browsing the repo can understand and navigate the deliverables without needing the full processed dataset.

- **Updated the repo-wide exports registry** to include pointers for the “latest TEST” dashboard snapshot:
  - `EXPORTS.json`
  This makes the dashboard report discoverable from a single canonical index at the repo root.

- **Updated the run manifest** to include report snapshot locations and publication pointers (provenance / traceability):
  - `data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json`

**Why this matters:**
- We now have a clean separation between:
  - **large, non-committed artifacts** in `data/processed/...`, and
  - **small, GitHub-friendly “report snapshots”** in `reports/...` that are easy to review and share.
- Anyone reviewing the repo can start at `reports/risk_scoring/latest_test/README.md` and follow links to the report, figures, and tables.

**Saved/Updated artifacts (confirmed by the cell output):**
- Created: `reports/risk_scoring/latest_test/README.md`
- Updated: `EXPORTS.json`
- Updated: `data/processed/risk_scoring/20251220T223649Z/DASHBOARD_EXPORTS.json`


In [16]:
#============================================================
# Cell 11 — Wrap-up checklist: verify deliverables + print “what to commit” (GitHub-safe pointers)
#============================================================

from pathlib import Path
import json

ROOT = Path.cwd()
RS_DIR = ROOT / "data" / "processed" / "risk_scoring"
REPORTS_DIR = ROOT / "reports" / "risk_scoring" / "latest_test"
LATEST_DIR = RS_DIR / "latest_test"
EXPORTS_JSON = ROOT / "EXPORTS.json"

REPORT_MD = REPORTS_DIR / "DASHBOARD_REPORT_TEST.md"
REPORT_INDEX = REPORTS_DIR / "README.md"
PAYLOAD = LATEST_DIR / "dashboard_payload_test.json"
KPIS = LATEST_DIR / "dashboard_kpis_test.json"
LATEST_POINTER = LATEST_DIR / "LATEST_TEST.json"
RUN_MANIFEST = None  # best-effort: discover from LATEST_TEST.json if present

print("Resolved paths:")
print("  ROOT       :", ROOT)
print("  RS_DIR     :", RS_DIR)
print("  LATEST_DIR :", LATEST_DIR)
print("  REPORTS_DIR:", REPORTS_DIR)

# -----------------------------
# 1) Existence checks (fail fast if something important is missing)
# -----------------------------
must_exist = [
    REPORTS_DIR,
    REPORT_MD,
    REPORT_INDEX,
    LATEST_DIR,
    PAYLOAD,
    KPIS,
    LATEST_POINTER,
    EXPORTS_JSON,
]

missing = [p for p in must_exist if not p.exists()]
if missing:
    raise FileNotFoundError("Missing expected deliverables:\n" + "\n".join([f"- {p}" for p in missing]))

# Try to locate the specific run manifest from the LATEST pointer (best-effort)
try:
    latest_meta = json.loads(LATEST_POINTER.read_text())
    run_manifest_path = latest_meta.get("run_manifest_path") or latest_meta.get("run_manifest")  # tolerate key changes
    if run_manifest_path:
        RUN_MANIFEST = Path(run_manifest_path)
except Exception:
    RUN_MANIFEST = None

# -----------------------------
# 2) Size + quick preview (helps ensure we’re not committing huge binaries)
# -----------------------------
def file_size_kb(p: Path) -> float:
    return round(p.stat().st_size / 1024.0, 2)

print("\nDeliverables (sizes):")
for p in [REPORT_MD, REPORT_INDEX, PAYLOAD, KPIS, LATEST_POINTER, EXPORTS_JSON]:
    print(f"  - {p.relative_to(ROOT)}  ({file_size_kb(p)} KB)")

# Show a tiny preview of the payload (API wiring sanity)
payload_obj = json.loads(PAYLOAD.read_text())
print("\nPayload sanity preview:")
print("  keys:", sorted(list(payload_obj.keys()))[:20], "...")
print("  latest_day:", payload_obj.get("latest_day_utc"))
print("  alert_queue rows:", len(payload_obj.get("alert_queue", [])) if isinstance(payload_obj.get("alert_queue"), list) else "n/a")

# -----------------------------
# 3) “What to commit” guidance (print only; you run git commands in terminal)
# -----------------------------
print("\nWhat to commit (recommended):")
print("  - 03_risk_scoring_and_dashboard_exports.ipynb")
print("  - reports/risk_scoring/latest_test/ (README + DASHBOARD_REPORT_TEST.md + figures + CSVs copied for reporting)")
print("  - EXPORTS.json")
print("\nWhat NOT to commit:")
print("  - data/processed/** (large run artifacts; ignored by .gitignore)")

print("\nSuggested git commands:")
print("  git status --porcelain")
print("  git add 03_risk_scoring_and_dashboard_exports.ipynb EXPORTS.json reports/risk_scoring/latest_test/")
print("  git commit -m \"Add risk scoring dashboard exports (TEST) + report index\"")
print("  git push")

# -----------------------------
# 4) Optional: write a small commit-helper pointer file in reports/ (GitHub readability)
# -----------------------------
commit_note_path = REPORTS_DIR / "COMMIT_NOTES.md"
commit_note = f"""# Commit Notes — Risk Scoring Dashboard (TEST)

This folder contains the **GitHub-friendly snapshot** of the latest TEST dashboard exports.

## Key files
- `README.md` — index of outputs
- `DASHBOARD_REPORT_TEST.md` — narrative report
- `dashboard_payload_test.json` — API-ready payload (also mirrored under `data/processed/risk_scoring/latest_test/`)
- Figures are under `figures/`

## Provenance
- Latest pointer: `{LATEST_POINTER}`
- Panel run dir referenced in payload/KPIs: `{payload_obj.get("panel_run_dir")}`
"""
commit_note_path.write_text(commit_note)
print("\nWrote:", commit_note_path.relative_to(ROOT))


Resolved paths:
  ROOT       : /home/parallels/projects/gmp-packaging-risk-analytics
  RS_DIR     : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring
  LATEST_DIR : /home/parallels/projects/gmp-packaging-risk-analytics/data/processed/risk_scoring/latest_test
  REPORTS_DIR: /home/parallels/projects/gmp-packaging-risk-analytics/reports/risk_scoring/latest_test

Deliverables (sizes):
  - reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md  (3.26 KB)
  - reports/risk_scoring/latest_test/README.md  (1.13 KB)
  - data/processed/risk_scoring/latest_test/dashboard_payload_test.json  (11.41 KB)
  - data/processed/risk_scoring/latest_test/dashboard_kpis_test.json  (0.54 KB)
  - data/processed/risk_scoring/latest_test/LATEST_TEST.json  (0.28 KB)
  - EXPORTS.json  (2.65 KB)

Payload sanity preview:
  keys: ['alert_queue_latest_day', 'created_utc', 'daily_trend_last14', 'kind', 'kpis', 'latest_day', 'notes', 'published_latest_dir', 'source_export_run_id', 'sour

### What Cell 11 Just Did

**Purpose**
- Performed a final “wrap-up” verification that the **risk scoring TEST dashboard deliverables** exist in the expected locations.
- Printed **GitHub-safe commit guidance** (what to commit vs. what to keep out of git).
- Wrote a small helper note file: `reports/risk_scoring/latest_test/COMMIT_NOTES.md`.

**Key paths resolved**
- **Repo root:** `/home/parallels/projects/gmp-packaging-risk-analytics`
- **Latest TEST exports (data, not committed):** `data/processed/risk_scoring/latest_test/`
- **Latest TEST reports (GitHub-friendly):** `reports/risk_scoring/latest_test/`

**Verified deliverables + sizes**
- `reports/risk_scoring/latest_test/DASHBOARD_REPORT_TEST.md` (~3.26 KB)
- `reports/risk_scoring/latest_test/README.md` (~1.13 KB)
- `data/processed/risk_scoring/latest_test/dashboard_payload_test.json` (~11.41 KB)
- `data/processed/risk_scoring/latest_test/dashboard_kpis_test.json` (~0.54 KB)
- `data/processed/risk_scoring/latest_test/LATEST_TEST.json` (~0.28 KB)
- `EXPORTS.json` (~2.65 KB)

**Important note (payload preview mismatch)**
- The payload keys indicate the schema is:
  - `latest_day` (not `latest_day_utc`)
  - `alert_queue_latest_day` (not `alert_queue`)
- The preview printed:
  - `latest_day: None`
  - `alert_queue rows: n/a`
- That means **either**:
  1) The preview code is checking the wrong keys (likely), **and/or**
  2) The payload generator stored `latest_day` as `None` (needs a small fix if you plan to serve this via API).

A quick sanity check you can run right now:
- `payload["latest_day"]`
- `len(payload["alert_queue_latest_day"])`
- `len(payload["daily_trend_last14"])`

**GitHub commit guidance (printed)**
- **Commit:**
  - `03_risk_scoring_and_dashboard_exports.ipynb`
  - `EXPORTS.json`
  - `reports/risk_scoring/latest_test/` (index + report + figures + small CSV copies)
- **Do NOT commit:**
  - `data/processed/**` (large run artifacts; intentionally ignored)

**Suggested terminal commands**
- `git status --porcelain`
- `git add 03_risk_scoring_and_dashboard_exports.ipynb EXPORTS.json reports/risk_scoring/latest_test/`
- `git commit -m "Add risk scoring dashboard exports (TEST) + report index"`
- `git push`


## Notebook Wrap-Up — `03_risk_scoring_and_dashboard_exports.ipynb`

This notebook takes the **Panel (asset-hour) TEST scoring outputs** and converts them into **business-ready dashboard exports** under a strict **alerts budget of Top-5 assets/day**. It also generates lightweight charts, bundles the outputs, publishes a “latest_test” snapshot, and produces a human-readable report for GitHub.

### What we produced
- **Scored TEST table (asset-hour):** saved as parquet + CSV
- **Alerts under budget:** Top-5 risky assets/day with:
  - **when**: peak-risk hour (`ts_peak`)
  - **why**: top coefficient contributions at that peak hour (push-to-1 vs push-to-0)
- **Dashboard tables:** alert queue, top sites, top lines, daily trend, KPIs
- **Figures:** daily risk trend, positive-hours trend, latest-day alert queue, top sites
- **Bundle:** ZIP containing the dashboard tables + figures
- **Stable publish target:** `data/processed/risk_scoring/latest_test/`
- **GitHub-friendly report target:** `reports/risk_scoring/latest_test/`

### Where the “latest” outputs live
- **API-ready + operational files (not committed):**
  - `data/processed/risk_scoring/latest_test/`
- **GitHub-readable report and index (committed):**
  - `reports/risk_scoring/latest_test/`

### Operational note: payload field check
Before wiring the API endpoint, confirm the payload includes a valid latest day and queue rows:
- `latest_day` should be a date (e.g., `2025-12-11`)
- `alert_queue_latest_day` should be a list with 5 rows

If `latest_day` is `None`, we’ll patch the payload builder to set it from:
- `trend["date_utc"].max()` (preferred), or
- `alert_queue["date_utc"].max()`

