<a href="https://colab.research.google.com/github/sankeawthong/Project-1-Lita-Chatbot/blob/main/%5B20251220%5D%20CIC-IoMT%20Option%20A%20Train%20%26%20evaluate%20tree%20baselines%20(RF%2C%20XGBoost)%20with%20a%20leakage-safe%20protocol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
CIC-IoMT Option A: Train & evaluate tree baselines (RF, XGBoost) with a leakage-safe protocol.

This script is designed to complement your Option A LR→MLP runs and produce
publication-ready baseline rows for Table (clean ablation) on CIC-IoMT.

Protocol choices:
  1) holdout_split (recommended if you used the author-provided train/test files):
        - Train file is used as the training pool.
        - Holdout (author test) file is partitioned into:
            Val-selection, Val-calibration, and Final held-out Test.
        - Final model is trained on: Train + Val-selection
        - Calibration set is NOT used for training (reserved for probability calibration if you choose).
        - Report metrics on the Final held-out Test only.
  2) combined_threeway:
        - Combine train+holdout into one pool and stratify-split 60/20/20 (Train/Val/Test),
          then split Val into Val-selection and Val-calibration.

Outputs (in --outdir):
  - CIC_IoMT__OptionA__RF__binary_metrics.json
  - CIC_IoMT__OptionA__RF__binary_report.txt
  - CIC_IoMT__OptionA__XGB__binary_metrics.json
  - CIC_IoMT__OptionA__XGB__binary_report.txt
  - CIC_IoMT__OptionA__trees__summary.csv

Usage example (holdout_split):
  python cic_optionA_train_rf_xgb.py \
    --protocol holdout_split \
    --cic-train-csv /content/CIC_IoMT_2024_WiFi_MQTT_train.csv \
    --cic-holdout-csv /content/CIC_IoMT_2024_WiFi_MQTT_test.csv \
    --label-col label \
    --random-state 42 \
    --outdir /content/paper_exports/cic_optionA_trees

If you have optionA_split_audit.json from the LR→MLP Option A run, pass it to exactly
match the split counts used previously:
  --split-audit /content/paper_exports/optionA_figures/optionA_split_audit.json
"""

In [None]:
!pip -q install xgboost

In [None]:
!python -u /content/cic_optionA_train_rf_xgb.py \
  --protocol holdout_split \
  --cic-train-csv "/content/CIC_IoMT_2024_WiFi_MQTT_train.csv" \
  --cic-holdout-csv "/content/CIC_IoMT_2024_WiFi_MQTT_test.csv" \
  --label-col "label" \
  --random-state 42 \
  --outdir "/content/paper_exports/cic_optionA_trees"

In [None]:
!ls -lah /content/paper_exports/cic_optionA_trees
!cat /content/paper_exports/cic_optionA_trees/CIC_IoMT__OptionA__trees__summary.csv

In [None]:
import os
zip_file_path = "/content/cic_optionA_trees.zip"
directory_to_zip = "/content/paper_exports/cic_optionA_trees"

# Check if the directory exists before zipping
if os.path.exists(directory_to_zip):
    !zip -r {zip_file_path} {directory_to_zip}
    print(f"Successfully created {zip_file_path}")
    print("You can now download this file from the Colab file browser.")
else:
    print(f"Error: Directory '{directory_to_zip}' not found.")