<a href="https://colab.research.google.com/github/sankeawthong/Project-1-Lita-Chatbot/blob/main/%5B20251220%5D%20Regen_figures_from_artifacts_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#!/usr/bin/env python3
"""
regen_figures_from_artifacts_v2.py

Purpose
-------
Regenerate publication figures (ROC overlay, PR overlay, Reliability diagram)
for CIC-IoMT **Option A** using the SAME evaluation protocol as your Option A
training summary:

1) Combine CIC provided train.csv + test.csv into ONE pool ("original dataset")
2) Stratified split: Train 60%, Val 20%, Test 20%
3) Split Val into Val-selection 50% and Val-calibration 50%
4) Fit isotonic ONLY on Val-calibration
5) Apply temperature scaling using T from your meta.json
6) Plot ROC/PR/Reliability on the HELD-OUT Test (20%)

This v2 script fixes the main pitfall that causes AUROCâ‰ˆ0.5:
- Do NOT treat the CIC provided test file as the evaluation test set. The
  official split is known to be degenerate in your study; Option A avoids it.

Inputs
------
Required:
  --cic-train-csv
  --cic-test-csv
  --cic-label-col
  --cic-pipe-joblib
  --cic-mlp-joblib
  --cic-temp-meta

Optional:
  --cic-drop-cols ... (only if you dropped these during training)
  --summary-csv ...   (if provided, script checks that split counts match)

Outputs (in --outdir)
---------------------
  roc_overlay_CIC_OptionA.png
  pr_overlay_CIC_OptionA.png
  reliability_overlay_CIC_OptionA.png
  optionA_split_audit.json
  optionA_plot_metrics.json

Colab usage (example)
---------------------
python regen_figures_from_artifacts_v2.py \
  --cic-train-csv "/content/CIC_train.csv" \
  --cic-test-csv  "/content/CIC_test.csv" \
  --cic-label-col "Label" \
  --cic-pipe-joblib "/content/artifacts/CIC_OptionA_pipe.joblib" \
  --cic-mlp-joblib  "/content/artifacts/CIC_OptionA_mlp.joblib" \
  --cic-temp-meta   "/content/results/CIC_IoMT__OptionA__Calibrated(temperature)__meta.json" \
  --random-state 42 \
  --outdir "/content/paper_exports/optionA_figures"

Notes
-----
- This script includes a SafeNaNDropper stub for joblib loading.
- Uses matplotlib only and does not set explicit colors.
"""

In [None]:
!ls -lah /content | egrep "regen_figures_from_artifacts_v6.py"

In [None]:
!rm -rf /content/paper_exports/optionA_figures
!mkdir -p /content/paper_exports/optionA_figures

!python -u /content/regen_figures_from_artifacts_v6.py \
  --cic-train-csv "/content/CIC_IoMT_2024_WiFi_MQTT_train.csv" \
  --cic-test-csv  "/content/CIC_IoMT_2024_WiFi_MQTT_test.csv" \
  --cic-label-col "label" \
  --cic-pipe-joblib "/content/CIC_OptionA_pipe.joblib" \
  --cic-mlp-joblib  "/content/CIC_OptionA_mlp.joblib" \
  --cic-temp-meta   "/content/CIC_IoMT__OptionA__Calibrated(temperature)__meta.json" \
  --random-state 42 \
  --outdir "/content/paper_exports/optionA_figures" \
  2>&1 | tee /content/paper_exports/optionA_figures/run.log

In [None]:
!ls -lah /content/paper_exports/optionA_figures

In [None]:
import json, pprint

with open("/content/paper_exports/optionA_figures/optionA_split_audit.json","r") as f:
    audit = json.load(f)

with open("/content/paper_exports/optionA_figures/optionA_plot_metrics.json","r") as f:
    metrics = json.load(f)

print("=== SPLIT AUDIT ===")
pprint.pp(audit)

print("\n=== PLOT METRICS ===")
pprint.pp(metrics)

In [None]:
from IPython.display import Image, display

display(Image(filename="/content/paper_exports/optionA_figures/roc_overlay_CIC_OptionA.png"))
display(Image(filename="/content/paper_exports/optionA_figures/pr_overlay_CIC_OptionA.png"))
display(Image(filename="/content/paper_exports/optionA_figures/reliability_overlay_CIC_OptionA.png"))