# LIME Report (Local + Global)

This notebook runs the existing `lime_explain.py` **without modifying it** and shows:
- local LIME result (HTML + text)
- global aggregated LIME (CSV + plot)


## Method note (for report)
We use LIME (Ribeiro et al., 2016) to explain the CatBoost classifier. LIME fits a sparse, interpretable surrogate model around each instance by perturbing the patient’s features within plausible ranges and weighting nearby samples more heavily. The local surrogate’s coefficients indicate which features increase or decrease the predicted risk in that neighborhood, without inspecting the internal tree structure.

To obtain a population‑level view, we compute LIME explanations for many (or all) test‑set patients and aggregate the local contributions. The global summary reports the mean absolute contribution per feature, which yields a robust ranking of how strongly each attribute tends to influence predictions across the cohort.


## 1) Setup paths (auto-discovery)


In [None]:
import os
from pathlib import Path

# Find repo root by looking for models/outputs
_cwd = Path(os.getcwd()).resolve()
BASE_DIR = None
for _p in [_cwd] + list(_cwd.parents):
    if (_p / 'models' / 'outputs').exists():
        BASE_DIR = _p
        break
if BASE_DIR is None:
    raise FileNotFoundError('Repo root not found. Open notebook from the project folder.')

SCRIPT_PATH = BASE_DIR / 'explainability' / 'lime' / 'lime_explain.py'
LOCAL_HTML = BASE_DIR / 'explainability' / 'lime' / 'lime_explanation.html'
LOCAL_TXT = BASE_DIR / 'explainability' / 'lime' / 'lime_explanation.txt'
GLOBAL_CSV = BASE_DIR / 'explainability' / 'lime' / 'lime_global_summary.csv'
GLOBAL_PNG = BASE_DIR / 'explainability' / 'lime' / 'lime_global_summary.png'

print('BASE_DIR:', BASE_DIR)
print('SCRIPT_PATH exists:', SCRIPT_PATH.exists())


## 2) Local LIME (single patient)


### Interpretation (Local)
The local LIME output explains **one patient**. Positive weights increase the predicted risk in that patient’s neighborhood; negative weights decrease it. This provides case‑level reasoning suitable for clinical interpretation.


In [None]:
# Run local LIME (change index if needed)
!python3 {SCRIPT_PATH} --index 0


In [None]:
from IPython.display import HTML, display

if LOCAL_HTML.exists():
    display(HTML(LOCAL_HTML.read_text(encoding='utf-8')))
else:
    print('No local HTML output found:', LOCAL_HTML)

if LOCAL_TXT.exists():
    print(LOCAL_TXT.read_text(encoding='utf-8')[:2000])
else:
    print('No local TXT output found:', LOCAL_TXT)


## 3) Global LIME (aggregation)


### Interpretation (Global)
The global table and plot aggregate thousands of local explanations. The **mean |weight|** reflects average strength of influence, while the sign (mean weight) shows whether a feature tends to increase or decrease risk when present. This hybrid view preserves local explainability while yielding a stable population‑level pattern.


In [None]:
# Run global LIME aggregation (adjust max-instances if needed)
!python3 {SCRIPT_PATH} --aggregate --max-instances 1000 --progress-every 100


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

if GLOBAL_CSV.exists():
    df = pd.read_csv(GLOBAL_CSV)
    display(df.head(20))
else:
    print('No global CSV found:', GLOBAL_CSV)

if GLOBAL_PNG.exists():
    from IPython.display import Image
    display(Image(filename=str(GLOBAL_PNG)))
else:
    print('No global PNG found:', GLOBAL_PNG)
