# Model Explainability

**Concept:** Model explainability means understanding how your AI makes decisions. In today's world, it's not enough to have accurate models. People need to trust, validate, and act on AI insights with confidence.

In [7]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import lime
import lime.lime_tabular

X_train = pd.read_csv("../data/processed/X_train.csv")
X_val = pd.read_csv("../data/processed/X_val.csv")
X_test = pd.read_csv("../data/processed/X_test.csv")
y_train = pd.read_csv("../data/processed/y_train.csv")
y_val = pd.read_csv("../data/processed/y_val.csv")
y_test = pd.read_csv("../data/processed/y_test.csv")

target_variable = "tree_type"

models = {
    "Logistic Regression": LogisticRegression(C=0.01, penalty='l2', solver='saga'),
    "Random Forest": RandomForestClassifier(max_depth=None, max_features='sqrt', min_samples_split=2, n_estimators=100),
    "SVM": SVC(C=100, gamma='auto', kernel='rbf', probability=True),
    "KNN": KNeighborsClassifier(metric='euclidean', n_neighbors=7, weights='distance'),
}

# Fit all the models
for name, model in models.items():
    model.fit(X_train, y_train[target_variable])
    score = model.score(X_val, y_val)
    print(f"{name} Validation Score: {score:.3f}")

Logistic Regression Validation Score: 0.849
Random Forest Validation Score: 0.911
SVM Validation Score: 0.898
KNN Validation Score: 0.915


## LIME: Local Interpretable Explanations

**Concept:** LIME explains individual predictions by learning simple models that approximate your complex model's behavior locally. Think of it as creating a "local map" around each prediction to understand what the model is doing in that specific area.

**Why I use LIME and not SHAP?**

- **LIME vs SHAP**
  - LIME Strengths:
    - Works with any model type.
    - Fast for individual explanations.
    - Easy to understand local approximations.
  - LIME Limitations:
    - Only explains local behavior.
    - Can be unstable (different explanations for similar cases).
    - Quality depends on how well local sampling works.
  - SHAP Strengths:
    - Theoretically guaranteed fair attribution.
    - More stable and consistent results.
    - Can provide both local and global insights.

In [9]:
# Basic LIME usage (install with: pip install lime)
def explain_with_lime(model, X_train, X_test, feature_names, sample_index=0):
    # Create LIME explainer
    explainer = lime.lime_tabular.LimeTabularExplainer(
        X_train, 
        feature_names=feature_names,
        mode='classification'
    )

    # Explain one prediction
    explanation = explainer.explain_instance(
        X_test[sample_index], 
        model.predict_proba, 
        num_features=5
    )
    
    # Show explanation
    print(f"LIME explanation for sample {sample_index}:")
    for feature, contribution in explanation.as_list():
        direction = "increases" if contribution > 0 else "decreases"
        print(f"- {feature} {direction} prediction by {abs(contribution):.3f}")

for name, model in models.items():
    print(f"\n{name} Model Explanation with LIME:")
    explain_with_lime(model, X_train.values, X_val.values, X_train.columns, sample_index=0)


Logistic Regression Model Explanation with LIME:
LIME explanation for sample 0:
- basal_area > 0.50 decreases prediction by 0.427
- average_height > 0.44 increases prediction by 0.162
- 0.00 < age <= 0.45 decreases prediction by 0.065
- 0.00 < dbh <= 0.44 increases prediction by 0.036
- -0.31 < trees_per_ha <= 0.00 increases prediction by 0.015

Random Forest Model Explanation with LIME:
LIME explanation for sample 0:
- basal_area > 0.50 decreases prediction by 0.311
- average_height > 0.44 increases prediction by 0.096
- -0.31 < trees_per_ha <= 0.00 increases prediction by 0.076
- 0.00 < age <= 0.45 decreases prediction by 0.043
- 0.00 < dbh <= 0.44 increases prediction by 0.002

SVM Model Explanation with LIME:




LIME explanation for sample 0:
- basal_area > 0.50 decreases prediction by 0.228
- average_height > 0.44 increases prediction by 0.201
- 0.00 < age <= 0.45 decreases prediction by 0.074
- -0.31 < trees_per_ha <= 0.00 increases prediction by 0.070
- 0.00 < dbh <= 0.44 decreases prediction by 0.039

KNN Model Explanation with LIME:
LIME explanation for sample 0:
- basal_area > 0.50 decreases prediction by 0.372
- -0.31 < trees_per_ha <= 0.00 increases prediction by 0.153
- average_height > 0.44 increases prediction by 0.114
- 0.00 < age <= 0.45 decreases prediction by 0.099
- 0.00 < dbh <= 0.44 decreases prediction by 0.003


