## SHAP Explainability (Task 3)

This notebook explains the **best Task 2 model** using SHAP:

- Global explanations: SHAP summary plot (top features)
- Local explanations (3 examples):
  - True Positive (TP): correctly detected fraud
  - False Positive (FP): legitimate transaction flagged as fraud
  - False Negative (FN): missed fraud

### Prereqs

1) Run Task 2 first (so models exist):

```bash
python -m scripts.task2_train --dataset all
```

2) Install Task 3 dependency:

```bash
pip install -r requirements-task3.txt
```



In [None]:
from pathlib import Path
import sys

# Ensure repo root is on PYTHONPATH so `import src...` works in Jupyter
sys.path.insert(0, str(Path("..").resolve()))

import shap

from src.modeling.task3_shap import Task3Paths, explain_task3

RAW_DIR = Path("../data/raw")
REPORTS_DIR = Path("../reports")
MODELS_DIR = Path("../models")

paths = Task3Paths(raw_dir=RAW_DIR, reports_dir=REPORTS_DIR, models_dir=MODELS_DIR)

shap.initjs()



In [None]:
# Explain Fraud_Data best model

res_fraud = explain_task3(dataset="fraud", paths=paths)
res_fraud["model_name"], res_fraud["examples"], res_fraud["n_test_sample_explained"]



In [None]:
# Global explanation: SHAP summary (Fraud_Data)

shap.plots.beeswarm(res_fraud["shap_values"], max_display=15)



In [None]:
# Local explanations (Fraud_Data): TP / FP / FN

def show_case(res, idx, title):
    if idx is None:
        print(f"{title}: not found in the explained sample (try increasing explain_size)")
        return
    print(title)
    # Waterfall plot (works in most environments)
    shap.plots.waterfall(res["shap_values"][idx], max_display=15)

show_case(res_fraud, res_fraud["examples"]["tp_index"], "True Positive (fraud correctly flagged)")
show_case(res_fraud, res_fraud["examples"]["fp_index"], "False Positive (legitimate flagged)")
show_case(res_fraud, res_fraud["examples"]["fn_index"], "False Negative (missed fraud)")



In [None]:
# Explain creditcard best model

res_cc = explain_task3(dataset="creditcard", paths=paths)
res_cc["model_name"], res_cc["examples"], res_cc["n_test_sample_explained"]



In [None]:
# Global explanation: SHAP summary (creditcard)

shap.plots.beeswarm(res_cc["shap_values"], max_display=15)



In [None]:
# Local explanations (creditcard): TP / FP / FN

show_case(res_cc, res_cc["examples"]["tp_index"], "True Positive (fraud correctly flagged)")
show_case(res_cc, res_cc["examples"]["fp_index"], "False Positive (legitimate flagged)")
show_case(res_cc, res_cc["examples"]["fn_index"], "False Negative (missed fraud)")

