<a href="https://colab.research.google.com/github/jintubhuyan-2000/ML-XAI_ForestFire/blob/main/Forest_TemporalSplit__FinalRevised2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install gdown --quiet

# Use gdown to download the folder
!gdown "https://drive.google.com/drive/u/0/folders/1ZPvCvLcy68RGEFKIA1elHrDdzMSXKZ8a" --folder

Retrieving folder contents
Processing file 1q0JYs5vwwDkEN3gVQBc4-7oAKGx6J189 AccuracyStats_2025_grassland.csv
Processing file 1wgWqbfJ5_qMgH2TgqbIRfgBseKvpUFJ2 ConfusionMatrix_2025_grassland.csv
Processing file 1CYhNkpQr_9q8G947aTYd41X26B_yzGh1 FireProbStats_PerDistrict_grassland.csv
Processing file 17GP1DDg1USLDjZAv-BBH-QSJlQjzAPSu RF_RegionTransfer_Test_grassland.csv
Processing file 1Y08p-xgwuWZNR20Mx2gbB27MZNoUZH_w RF_SpatialCV_Test_grassland.csv
Processing file 19xIgV46X3CI7f1txivigEZO7yioXOFxF RF_TemporalSplit_Test2025_grassland.csv
Processing file 1I4JLP1qu5AuP80uHISDV_asbihnH9pYr TestPoints_Predictors_grassland.csv
Processing file 1drsd-aZRdiKnNblLniHCuN7sGdc69dFG TrainPoints_Predictors_grassland.csv
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1q0JYs5vwwDkEN3gVQBc4-7oAKGx6J189
To: /content/Validation Datasets/AccuracyStats_2025_grassland.csv
100% 127/127 [00:00<00:00,

In [None]:
# Wildfire RF Evaluation & SHAP Analysis (multi-run + uncertainty)
# Paste into a Jupyter cell. Requires: pandas, numpy, matplotlib, scikit-learn
# Optional: shap (pip install shap) for SHAP outputs.
# Outputs: metrics CSV (per-run + aggregated), topk CSV, curves, summary text.
# Author: adapted for multi-run uncertainty & reproducibility

import os
from pathlib import Path
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.metrics import (
    roc_auc_score, roc_curve,
    average_precision_score, precision_recall_curve,
    brier_score_loss, f1_score, precision_score, recall_score
)
from sklearn.calibration import calibration_curve

# Optional SHAP
try:
    import shap
    SHAP_AVAILABLE = True
except Exception:
    SHAP_AVAILABLE = False

# ----------------- CONFIG -----------------
search_dirs = [
    r"/content/Validation Datasets",
]
OUT_DIR = Path(r"/content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/forest/Results/Temporal_Split_V5")
OUT_DIR.mkdir(parents=True, exist_ok=True)

# resampling / reproducibility settings
K_RUNS = 10                       # set between 5-10 (user request)
BASE_SEED = 20250908              # document this seed in Methods (changeable)
SAMPLING_METHOD = "stratified_bootstrap"  # options: 'stratified_bootstrap' or 'subsample_no_replacement'
TEST_SAMPLE_FRACTION = 1.0        # for subsampling (if used). For bootstrap keep =1.0

# Settings for curves / interpolation
ROC_FPR_GRID = np.linspace(0,1,200)
PR_RECALL_GRID = np.linspace(0,1,200)
CAL_PROB_BINS = np.linspace(0.0, 1.0, 11)  # 10 bins for reliability diagram
TOPK_PERCENTS = [1,5,10,20,30,40,50]

# Filenames
METRICS_PER_RUN_CSV = OUT_DIR / "metrics_per_run.csv"
METRICS_AGG_CSV = OUT_DIR / "metrics_aggregated_by_stratum.csv"
TOPK_PER_RUN_CSV = OUT_DIR / "topk_per_run.csv"
TOPK_AGG_CSV = OUT_DIR / "topk_aggregated.csv"
SUMMARY_TXT = OUT_DIR / "summary_report_multi_run.txt"

# ----------------- Helpers -----------------
def find_candidate_csv(dirs):
    candidates = []
    for d in dirs:
        for p in Path(d).rglob("*.csv"):
            candidates.append(p)
    if not candidates:
        return None
    prioritized = [p for p in candidates if any(k in p.name.lower() for k in ("test2025","test_2025","rf_regiontransfer","rf_spatialcv_test","rf_region_transfer_test","rf_spatialcv_test","rf_"))]
    if prioritized:
        return prioritized[0]
    prioritized = [p for p in candidates if any(k in p.name.lower() for k in ("test","2025","regiontransfer","region_transfer"))]
    if prioritized:
        return prioritized[0]
    return candidates[0]

def topk_capture(y_true, y_prob, ks=TOPK_PERCENTS):
    out = []
    order = np.argsort(-y_prob)
    y_true_sorted = np.array(y_true)[order]
    total_pos = float(y_true_sorted.sum())
    n = len(y_true_sorted)
    for k in ks:
        frac = k / 100.0
        top_n = max(1, int(math.ceil(n * frac)))
        captured = int(y_true_sorted[:top_n].sum())
        capture_rate = (captured / total_pos) if total_pos > 0 else np.nan
        out.append({
            "top_%": k,
            "top_n": top_n,
            "pos_captured_count": captured,
            "pos_captured_frac": capture_rate
        })
    return pd.DataFrame(out)

def interpolate_curve(x, y, x_grid):
    # simple interpolation (assumes x sorted ascending)
    # clamp to valid range
    xp = np.clip(x, 0.0, 1.0)
    yp = np.clip(y, 0.0, 1.0)
    # remove duplicated x for interpolation
    xp_unique, idx = np.unique(xp, return_index=True)
    yp_unique = yp[idx]
    # If too few unique points, fallback
    if len(xp_unique) < 2:
        return np.full_like(x_grid, yp_unique[0] if len(yp_unique)>0 else np.nan)
    return np.interp(x_grid, xp_unique, yp_unique)

# ----------------- Locate CSV -----------------
csv_path = find_candidate_csv(search_dirs)
if csv_path is None:
    raise FileNotFoundError(f"No CSV found in {search_dirs}. Place your exported CSV(s) in one of those folders.")
print("Using CSV:", csv_path)

# ----------------- Load & detect columns -----------------
df_orig = pd.read_csv(csv_path)
cols = list(df_orig.columns)
y_true_col = None
y_prob_col = None
# heuristics for truth and prob columns
for c in cols:
    lc = c.lower().strip()
    if lc in ('class','label','y','ground_truth','is_fire','fire') and y_true_col is None:
        y_true_col = c
    if lc in ('classification','probability','prob','pred','pred_prob','probability_1') and y_prob_col is None:
        y_prob_col = c
# fallback heuristics
if y_true_col is None:
    for c in cols:
        if 'class' in c.lower() or c.lower().startswith('label'):
            y_true_col = c
            break
if y_prob_col is None:
    for c in cols:
        if any(k in c.lower() for k in ('prob','classification','pred')):
            y_prob_col = c
            break

if y_true_col is None or y_prob_col is None:
    raise ValueError(f"Could not auto-detect required columns. Found: {cols}\nExpected a 'class' (ground truth) and 'classification'/'probability' column.")
print("Detected columns -> label:", y_true_col, ", prob:", y_prob_col)

# optional stratum column detection (forest / grassland)
stratum_col = None
for candidate in ['landcover','land_cover','lc','class_name','stratum','habitat','lcc']:
    if candidate in [c.lower() for c in cols]:
        # pick original-case column name
        for c in cols:
            if c.lower()==candidate:
                stratum_col = c
                break
        break
if stratum_col:
    print("Detected stratum column:", stratum_col)
else:
    print("No stratum column detected - will evaluate overall only. If you have 'forest'/'grassland' values, add as 'landcover' or 'class_name'.")

# ----------------- Data cleaning -----------------
df_orig = df_orig.dropna(subset=[y_true_col, y_prob_col]).copy()
df_orig['y_true'] = df_orig[y_true_col].astype(int)
df_orig['y_prob'] = pd.to_numeric(df_orig[y_prob_col], errors='coerce').clip(0,1)
df_orig = df_orig.dropna(subset=['y_prob']).reset_index(drop=True)
n_total = len(df_orig)
pos_rate_total = df_orig['y_true'].mean()

# ----------------- Multi-run evaluation -----------------
rng = np.random.default_rng(BASE_SEED)
seeds = [int(r) for r in rng.integers(0, 2**31-1, size=K_RUNS)]
print(f"Running {K_RUNS} runs with seeds: {seeds}")

metrics_rows = []
topk_rows = []

# We'll store per-run interpolated curves for averaging
roc_tpr_matrix = np.zeros((K_RUNS, len(ROC_FPR_GRID)))
pr_prec_matrix = np.zeros((K_RUNS, len(PR_RECALL_GRID)))
cal_bin_obs_matrix = np.zeros((K_RUNS, len(CAL_PROB_BINS)-1))  # one less than edges
topk_matrix = np.zeros((K_RUNS, len(TOPK_PERCENTS)))

for run_idx, seed in enumerate(seeds):
    np.random.seed(seed)
    if SAMPLING_METHOD == 'stratified_bootstrap':
        # resample indices stratified by y_true
        idx_list = []
        for cls in df_orig['y_true'].unique():
            cls_idx = df_orig.index[df_orig['y_true']==cls].tolist()
            if len(cls_idx)==0:
                continue
            # sample with replacement same size as original class count
            s = np.random.choice(cls_idx, size=len(cls_idx), replace=True)
            idx_list.extend(s.tolist())
        sampled_idx = np.array(idx_list)
    elif SAMPLING_METHOD == 'subsample_no_replacement':
        # random subsample without replacement
        n_take = int(math.ceil(n_total * TEST_SAMPLE_FRACTION))
        sampled_idx = np.random.choice(df_orig.index, size=n_take, replace=False)
    else:
        raise ValueError("Unknown SAMPLING_METHOD. Use 'stratified_bootstrap' or 'subsample_no_replacement'.")

    df = df_orig.loc[sampled_idx].reset_index(drop=True)

    # compute scalar metrics
    try:
        roc_auc = roc_auc_score(df['y_true'], df['y_prob'])
    except Exception:
        roc_auc = np.nan
    try:
        pr_auc = average_precision_score(df['y_true'], df['y_prob'])
    except Exception:
        pr_auc = np.nan
    brier = brier_score_loss(df['y_true'], df['y_prob'])
    # compute threshold-based metrics at chosen threshold (e.g., 0.5 or choose optimal)
    thresh = 0.5
    y_pred = (df['y_prob'] >= thresh).astype(int)
    f1 = f1_score(df['y_true'], y_pred, zero_division=0)
    prec = precision_score(df['y_true'], y_pred, zero_division=0)
    rec = recall_score(df['y_true'], y_pred, zero_division=0)

    # Save metrics
    metrics_rows.append({
        "run": run_idx,
        "seed": seed,
        "n_samples": len(df),
        "positive_rate": float(df['y_true'].mean()),
        "roc_auc": float(roc_auc),
        "pr_auc": float(pr_auc),
        "brier": float(brier),
        "threshold": thresh,
        "f1_at_0.5": float(f1),
        "precision_at_0.5": float(prec),
        "recall_at_0.5": float(rec)
    })

    # ROC curve -> interpolate TPR on common FPR grid
    try:
        fpr, tpr, _ = roc_curve(df['y_true'], df['y_prob'])
        tpr_interp = interpolate_curve(fpr, tpr, ROC_FPR_GRID)
    except Exception:
        tpr_interp = np.full_like(ROC_FPR_GRID, np.nan)
    roc_tpr_matrix[run_idx,:] = tpr_interp

    # PR curve -> interpolate Precision on common Recall grid
    try:
        precision, recall, _ = precision_recall_curve(df['y_true'], df['y_prob'])
        # precision_recall_curve returns precision values for thresholds; recall is decreasing
        # ensure recall sorted ascending for interpolation
        recall_sort_idx = np.argsort(recall)
        precision_sorted = precision[recall_sort_idx]
        recall_sorted = recall[recall_sort_idx]
        prec_interp = interpolate_curve(recall_sorted, precision_sorted, PR_RECALL_GRID)
    except Exception:
        prec_interp = np.full_like(PR_RECALL_GRID, np.nan)
    pr_prec_matrix[run_idx,:] = prec_interp

    # Calibration reliability -> bin predicted probabilities into CAL_PROB_BINS
    try:
        prob_true, prob_pred = calibration_curve(df['y_true'], df['y_prob'], n_bins=len(CAL_PROB_BINS)-1, strategy='uniform')
        # calibration_curve returns prob_true (observed) and prob_pred (predicted mean)
        # We will compute observed frequency per uniform bin using pandas cut to ensure alignment with edges
        df['prob_bin'] = pd.cut(df['y_prob'], bins=CAL_PROB_BINS, include_lowest=True, labels=False)
        obs_per_bin = []
        for b in range(len(CAL_PROB_BINS)-1):
            sel = df['prob_bin']==b
            if sel.sum() == 0:
                obs_per_bin.append(np.nan)
            else:
                obs_per_bin.append(df.loc[sel, 'y_true'].mean())
        cal_bin_obs_matrix[run_idx,:] = np.array(obs_per_bin, dtype=float)
    except Exception:
        cal_bin_obs_matrix[run_idx,:] = np.full(len(CAL_PROB_BINS)-1, np.nan)

    # Top-K capture
    try:
        topk_df = topk_capture(df['y_true'].values, df['y_prob'].values, ks=TOPK_PERCENTS)
        for j,k in enumerate(TOPK_PERCENTS):
            topk_matrix[run_idx,j] = topk_df.loc[topk_df['top_%']==k, 'pos_captured_frac'].values[0]
        # record per-run topk in long form
        temp = topk_df.copy()
        temp['run'] = run_idx
        temp['seed'] = seed
        topk_rows.append(temp)
    except Exception:
        topk_matrix[run_idx,:] = np.nan
        topk_rows.append(pd.DataFrame())

# ----------------- Aggregation -----------------
metrics_df = pd.DataFrame(metrics_rows)
metrics_df.to_csv(METRICS_PER_RUN_CSV, index=False)

# Aggregate metrics overall (mean ± sd, and 95% CI) and by stratum if available
def agg_stats(series):
    mean = np.nanmean(series)
    sd = np.nanstd(series, ddof=1) if np.sum(~np.isnan(series))>1 else np.nan
    # 95% CI via t-approx (large K so normal approximation fine)
    n = np.sum(~np.isnan(series))
    se = sd / math.sqrt(n) if n>0 and not np.isnan(sd) else np.nan
    ci95 = 1.96 * se if se is not None else np.nan
    return mean, sd, ci95

agg_rows = []
metrics_to_agg = ["roc_auc","pr_auc","brier","f1_at_0.5","precision_at_0.5","recall_at_0.5","positive_rate"]
for metric in metrics_to_agg:
    mean, sd, ci95 = agg_stats(metrics_df[metric].values)
    agg_rows.append({
        "stratum": "ALL",
        "metric": metric,
        "mean": mean,
        "sd": sd,
        "ci95": ci95
    })

# If stratum col exists, perform per-stratum aggregation by running the same evaluation across runs restricted to stratum.
if stratum_col:
    strata = df_orig[stratum_col].dropna().unique().tolist()
    for stratum_val in strata:
        # For each run, recompute metric restricted to stratum samples from that run's sampled indices.
        # Simplest: loop again over seeds and compute metrics per stratum (cheaper copies).
        per_stratum_metrics = []
        for run_idx, seed in enumerate(seeds):
            np.random.seed(seed)
            if SAMPLING_METHOD == 'stratified_bootstrap':
                idx_list = []
                for cls in df_orig['y_true'].unique():
                    cls_idx = df_orig.index[df_orig['y_true']==cls].tolist()
                    if len(cls_idx)==0:
                        continue
                    s = np.random.choice(cls_idx, size=len(cls_idx), replace=True)
                    idx_list.extend(s.tolist())
                sampled_idx = np.array(idx_list)
            else:
                n_take = int(math.ceil(n_total * TEST_SAMPLE_FRACTION))
                sampled_idx = np.random.choice(df_orig.index, size=n_take, replace=False)
            df_run = df_orig.loc[sampled_idx]
            df_run_stratum = df_run[df_run[stratum_col]==stratum_val]
            if len(df_run_stratum)==0:
                per_stratum_metrics.append({m: np.nan for m in metrics_to_agg})
                continue
            try:
                roc_auc = roc_auc_score(df_run_stratum['y_true'], df_run_stratum['y_prob'])
            except Exception:
                roc_auc = np.nan
            try:
                pr_auc = average_precision_score(df_run_stratum['y_true'], df_run_stratum['y_prob'])
            except Exception:
                pr_auc = np.nan
            brier = brier_score_loss(df_run_stratum['y_true'], df_run_stratum['y_prob'])
            thresh = 0.5
            y_pred = (df_run_stratum['y_prob']>=thresh).astype(int)
            f1 = f1_score(df_run_stratum['y_true'], y_pred, zero_division=0)
            prec = precision_score(df_run_stratum['y_true'], y_pred, zero_division=0)
            rec = recall_score(df_run_stratum['y_true'], y_pred, zero_division=0)
            per_stratum_metrics.append({
                "roc_auc": roc_auc, "pr_auc": pr_auc, "brier": brier,
                "f1_at_0.5": f1, "precision_at_0.5": prec, "recall_at_0.5": rec,
                "positive_rate": float(df_run_stratum['y_true'].mean())
            })
        per_stratum_df = pd.DataFrame(per_stratum_metrics)
        for metric in metrics_to_agg:
            mean, sd, ci95 = agg_stats(per_stratum_df[metric].values)
            agg_rows.append({
                "stratum": str(stratum_val),
                "metric": metric,
                "mean": mean,
                "sd": sd,
                "ci95": ci95
            })

agg_df = pd.DataFrame(agg_rows)
agg_df.to_csv(METRICS_AGG_CSV, index=False)

# ----------------- Top-K aggregation -----------------
topk_all = pd.concat([t.assign(run=int(t['run'].iloc[0]) ) if not t.empty else pd.DataFrame() for t in topk_rows], ignore_index=True, sort=False)
if not topk_all.empty:
    topk_all.to_csv(TOPK_PER_RUN_CSV, index=False)
# aggregate across runs
topk_agg = []
for j,k in enumerate(TOPK_PERCENTS):
    vals = topk_matrix[:,j]
    mean, sd, ci95 = agg_stats(vals)
    topk_agg.append({"top_%": k, "mean_frac": mean, "sd": sd, "ci95": ci95})
topk_agg_df = pd.DataFrame(topk_agg)
topk_agg_df.to_csv(TOPK_AGG_CSV, index=False)

import matplotlib.pyplot as plt
import numpy as np

# ----------------- Plotting with ribbons -----------------
# ROC: mean TPR across runs at fixed FPR grid, with ribbon ±1 SD
mean_tpr = np.nanmean(roc_tpr_matrix, axis=0)
sd_tpr = np.nanstd(roc_tpr_matrix, axis=0, ddof=1)

plt.figure(figsize=(6, 5))
plt.plot(ROC_FPR_GRID, mean_tpr, label=f"Mean ROC (n={K_RUNS})", linewidth=2)
plt.fill_between(ROC_FPR_GRID, mean_tpr - sd_tpr, mean_tpr + sd_tpr, alpha=0.2)
plt.plot([0, 1], [0, 1], linestyle='--', color='gray', label='Random')

# ---- Font size adjustments (16 pt) ----
plt.xlabel('False Positive Rate', fontsize=18)
plt.ylabel('True Positive Rate', fontsize=18)
plt.title('ROC Curve (mean ± SD)', fontsize=18)
plt.legend(loc='lower right', fontsize=18)
plt.grid(alpha=0.3)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)

plt.tight_layout()
roc_path = OUT_DIR / 'roc_curve_mean_sd.png'
plt.savefig(roc_path, dpi=200)
plt.close()
print("Saved ROC with ribbons:", roc_path)


# ----------------- Precision-Recall Curve -----------------
mean_prec = np.nanmean(pr_prec_matrix, axis=0)
sd_prec = np.nanstd(pr_prec_matrix, axis=0, ddof=1)

plt.figure(figsize=(6, 5))
plt.plot(PR_RECALL_GRID, mean_prec, label=f"Mean PR (n={K_RUNS})", linewidth=2)
plt.fill_between(PR_RECALL_GRID, mean_prec - sd_prec, mean_prec + sd_prec, alpha=0.2)

plt.xlabel('Recall', fontsize=18)
plt.ylabel('Precision', fontsize=18)
plt.title('Precision-Recall Curve (mean ± SD)', fontsize=18)
plt.legend(loc='upper right', fontsize=18)
plt.grid(alpha=0.3)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)

plt.tight_layout()
pr_path = OUT_DIR / 'pr_curve_mean_sd.png'
plt.savefig(pr_path, dpi=200)
plt.close()
print("Saved PR with ribbons:", pr_path)


# ----------------- Top-K: Mean Capture Fraction -----------------
mean_topk = np.nanmean(topk_matrix, axis=0)
sd_topk = np.nanstd(topk_matrix, axis=0, ddof=1)

plt.figure(figsize=(6, 4))
plt.plot(TOPK_PERCENTS, mean_topk, marker='o', color='tab:blue', linewidth=1.8,
         label='Mean Top-K capture')
plt.fill_between(TOPK_PERCENTS,
                 mean_topk - sd_topk,
                 mean_topk + sd_topk,
                 color='tab:blue',
                 alpha=0.2,
                 label='± SD')

plt.xlabel('Top-k percent of highest risk area', fontsize=18)
plt.ylabel('Fraction of fires captured', fontsize=18)
plt.title('Top-K Capture (mean ± SD)', fontsize=18)
plt.ylim(0, 1)
plt.xlim(0, 50)
plt.grid(alpha=0.3)
plt.legend(loc='lower right', fontsize=14)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)

plt.tight_layout()
topk_plot_path = OUT_DIR / 'topk_curve_mean_sd.png'
plt.savefig(topk_plot_path, dpi=200)
plt.close()
print("Saved Top-K curve with ribbons:", topk_plot_path)


# ----------------- Reliability Plot -----------------
cal_bin_centers = (CAL_PROB_BINS[:-1] + CAL_PROB_BINS[1:]) / 2.0
mean_bin_obs = np.nanmean(cal_bin_obs_matrix, axis=0)
sd_bin_obs = np.nanstd(cal_bin_obs_matrix, axis=0, ddof=1)

plt.figure(figsize=(6, 5))
plt.plot(cal_bin_centers, mean_bin_obs, marker='o', linewidth=2, label='Mean calibration')
plt.fill_between(cal_bin_centers, mean_bin_obs - sd_bin_obs, mean_bin_obs + sd_bin_obs, alpha=0.2)
plt.plot([0, 1], [0, 1], linestyle='--', color='gray', label='Perfect')

plt.xlabel('Predicted probability (bin center)', fontsize=18)
plt.ylabel('Observed frequency', fontsize=18)
plt.title('Reliability Diagram (mean ± SD)', fontsize=18)
plt.legend(fontsize=18)
plt.grid(alpha=0.3)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)

plt.tight_layout()
rel_path = OUT_DIR / 'reliability_mean_sd.png'
plt.savefig(rel_path, dpi=200)
plt.close()
print("Saved reliability plot with ribbons:", rel_path)


# ----------------- Optional SHAP (single-run sample) - unchanged logic but left optional -----------------
shap_outputs = {}
meta_like = set([y_true_col, y_prob_col, 'y_true','y_prob','system:index','.geo','longitude','lat','latitude','year','quadrant'])
feature_cols = [c for c in df_orig.columns if c not in meta_like and np.issubdtype(df_orig[c].dtype, np.number)]
if SHAP_AVAILABLE and len(feature_cols) >= 2:
    try:
        X = df_orig[feature_cols].copy().fillna(0)
        y = df_orig['y_true'].copy()
        # small train/test split for SHAP model building
        from sklearn.model_selection import train_test_split
        from sklearn.ensemble import RandomForestClassifier
        X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
        rf = RandomForestClassifier(n_estimators=200, n_jobs=-1, random_state=42, class_weight='balanced_subsample')
        rf.fit(X_train, y_train)
        explainer = shap.TreeExplainer(rf)
        Xs = X_val.sample(min(5000, len(X_val)), random_state=42)
        shap_values = explainer.shap_values(Xs)
        # Summary bar
        try:
            plt.figure(figsize=(6,6))
            shap.summary_plot(shap_values[1] if isinstance(shap_values, list) else shap_values, Xs, show=False, plot_type='bar')
            shap_summary_path = OUT_DIR / 'shap_summary_bar.png'
            plt.savefig(shap_summary_path, dpi=200, bbox_inches='tight')
            plt.close()
            shap_outputs['shap_summary'] = str(shap_summary_path)
        except Exception as e:
            print('SHAP summary_plot failed:', e)
        # Dependence for top feature
        try:
            if isinstance(shap_values, list):
                abs_mean = np.mean(np.abs(shap_values[1]), axis=0)
            else:
                abs_mean = np.mean(np.abs(shap_values), axis=0)
            top_idx = int(np.argmax(abs_mean))
            top_feat = Xs.columns[top_idx]
            plt.figure(figsize=(6,5))
            shap.dependence_plot(top_feat, shap_values[1] if isinstance(shap_values, list) else shap_values, Xs, show=False)
            shap_dep_path = OUT_DIR / 'shap_dependence_top_feature.png'
            plt.savefig(shap_dep_path, dpi=200, bbox_inches='tight')
            plt.close()
            shap_outputs['top_feature'] = top_feat
        except Exception as e:
            print('SHAP dependence_plot failed:', e)
    except Exception as e:
        print('SHAP stage failed:', e)
else:
    if not SHAP_AVAILABLE:
        print('SHAP not installed - skip SHAP outputs. To enable, `pip install shap`.')
    else:
        print('Not enough numeric features for SHAP or no features found - skipping SHAP.')

# ----------------- Save textual summary -----------------
report_lines = [
    "Region-transfer / Temporal-style Multi-run Evaluation",
    f"File: {csv_path}",
    f"Total original samples: {n_total}, Overall positive rate: {pos_rate_total:.4f}",
    f"Sampling method: {SAMPLING_METHOD}",
    f"Number of runs (K): {K_RUNS}",
    f"Base seed (document for reproducibility): {BASE_SEED}",
    f"Seeds used: {seeds}",
    "",
    f"Metrics per-run saved to: {METRICS_PER_RUN_CSV}",
    f"Aggregated metrics saved to: {METRICS_AGG_CSV} (mean ± sd ± 95%CI)",
    f"Top-K per-run saved to: {TOPK_PER_RUN_CSV}",
    f"Top-K aggregated saved to: {TOPK_AGG_CSV}",
    f"ROC plot (mean ± SD): {roc_path}",
    f"PR plot (mean ± SD): {pr_path}",
    f"Reliability plot (mean ± SD): {rel_path}",
    f"Top-K plot (mean ± SD): {topk_plot_path}",
]
if shap_outputs.get('top_feature'):
    report_lines.append(f"SHAP top feature (single-run model): {shap_outputs.get('top_feature')}")
with open(SUMMARY_TXT, 'w') as f:
    f.write("\n".join(report_lines))

print("\n".join(report_lines))
print("Outputs written to:", OUT_DIR)


Using CSV: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Validation Datasets/RF_SpatialCV_Test_grassland.csv
Detected columns -> label: class , prob: classification
No stratum column detected - will evaluate overall only. If you have 'forest'/'grassland' values, add as 'landcover' or 'class_name'.
Running 10 runs with seeds: [1281540326, 1005233768, 2011547310, 1367058049, 1542280147, 90017157, 581114276, 1258272007, 2134214070, 1923482161]
Saved ROC with ribbons: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/roc_curve_mean_sd.png
Saved PR with ribbons: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/pr_curve_mean_sd.png
Saved Top-K curve with ribbons: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/topk_curve_mean_sd.png


  mean_bin_obs = np.nanmean(cal_bin_obs_matrix, axis=0)
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


Saved reliability plot with ribbons: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/reliability_mean_sd.png


  shap.summary_plot(shap_values[1] if isinstance(shap_values, list) else shap_values, Xs, show=False, plot_type='bar')
  summary_legacy(
  summary_legacy(


SHAP dependence_plot failed: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 50
Region-transfer / Temporal-style Multi-run Evaluation
File: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Validation Datasets/RF_SpatialCV_Test_grassland.csv
Total original samples: 2462, Overall positive rate: 0.8115
Sampling method: stratified_bootstrap
Number of runs (K): 10
Base seed (document for reproducibility): 20250908
Seeds used: [1281540326, 1005233768, 2011547310, 1367058049, 1542280147, 90017157, 581114276, 1258272007, 2134214070, 1923482161]

Metrics per-run saved to: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/metrics_per_run.csv
Aggregated metrics saved to: /content/drive/MyDrive/California_Fire_MS/Fire_Risk_Validation/grassland/Results/Temporal_Split_V5/metrics_aggregated_by_stratum.csv

<Figure size 600x600 with 0 Axes>

<Figure size 600x500 with 0 Axes>

To fix this, we can remove the existing content in the mountpoint before mounting the drive.