<a href="https://colab.research.google.com/github/eyaguirat10/Explainable-AI-Workshop/blob/main/Notebook_XAI_Eya_Guirat_4DS10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Explainable AI Workshop***

---



# **Objectives:**

* Build a machine learning model to predict customer churn.

* Perform a global explanation of the model's behavior using an appropriate global explanation method.

* Conduct a local analysis to explain individual predictions using a suitable local explanation technique.


**Customer Churn dataset:**

The aim is to estimate whether a bank's customers leave the bank or not.

The event that defines the customer abandonment is the closing of the customer's bank account.

**Data Set Story:**
It consists of 10000 observations and 12 variables.
Independent variables contain information about customers.
Dependent variable refers to customer abandonment.
**Features:**
* Surname: Surname
* CreditScore: Credit score
* Geography: Country (Germany / France / Spain)
* Gender: Gender (Female / Male)
* Age: Age
* Tenure: How many years of customer
* Balance: Balance
* NumOfProducts: Bank product used
* HasCrCard: Credit card status (0 = No, 1 = Yes)
* IsActiveMember: Active membership status (0 = No, 1 = Yes)
* EstimatedSalary: Estimated salary
* Exited: Abandoned or not? (0 = No, 1 = Yes)


In [1]:
!pip install lime

Collecting lime
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/275.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m266.2/275.7 kB[0m [31m16.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: lime
  Building wheel for lime (setup.py) ... [?25l[?25hdone
  Created wheel for lime: filename=lime-0.2.0.1-py3-none-any.whl size=283834 sha256=c2d124cdd5d170330906d4f4949a5cb1bf08455ca51997e3997360b02464f72f
  Stored in directory: /root/.cache/pip/wheels/e7/5d/0e/4b4fff9a47468fed5633211fb3b76d1db43fe806a17fb7486a
Successfully built lime
Installing collected packages: lime
Successfully installed lime-0.2.0.1


In [2]:
# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import lime
import lime.lime_tabular

# Print XAI definition
print("Explainable AI (XAI): Methods and techniques that make AI systems' decisions more transparent and interpretable to humans.")

Explainable AI (XAI): Methods and techniques that make AI systems' decisions more transparent and interpretable to humans.


In [3]:
# read data
data = pd.read_csv('churn_problem.csv')
data.head()


FileNotFoundError: [Errno 2] No such file or directory: 'churn_problem.csv'

In [None]:
data.shape

In [None]:
data = data.drop(['RowNumber', 'CustomerId','Surname'], axis=1)


In [None]:
data.head()

In [None]:
from sklearn import preprocessing

label_encoder = preprocessing.LabelEncoder()
data['Geography']= label_encoder.fit_transform(data['Geography'])

data['Geography'].unique()

In [None]:
label_encoder = preprocessing.LabelEncoder()
data['Gender']= label_encoder.fit_transform(data['Gender'])

data['Gender'].unique()

In [None]:
data.head()

In [None]:
# data = data.select_dtypes(exclude=['object'])


In [None]:
# Split the data
X = data.drop('Exited', axis=1)
y = data['Exited']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
print("Preprocessed data shape:", X.shape)
print("Target variable distribution:")
print(y.value_counts(normalize=True))

* 0 = the customer has not left (no "Exited")

* 1 = the customer has left (yes "Exited")

In [None]:
data.head()

In [None]:
X.info()

In [None]:
X_test.shape

#Interpretable ML

##Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Train the model
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)

# Predictions
y_pred_lr = log_reg.predict(X_test)

# Evaluation
print("Logistic Regression Performance:")
print(classification_report(y_test, y_pred_lr))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_lr))

##Global Explanation

In [None]:
import pandas as pd
import numpy as np

# Extract feature names and coefficients
coefficients = log_reg.coef_[0]
feature_importance_lr = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": coefficients
}).sort_values(by="Coefficient", ascending=False)

feature_importance_lr


In [None]:
feature_importance_lr.sort_values("Coefficient", ascending=True).plot(
    x="Feature", y="Coefficient", kind="barh", figsize=(8,6),
    title="Logistic Regression Coefficients"
)


this observation reveals that Geography, IsActiveMember and Gender are the main contributors to churn prediction. Positive coefficients increase the probability of churn, while negative coefficients strongly reduce churn risk. Some features such as CreditScore and EstimatedSalary have almost no linear effect, which explains why Logistic Regression performs differently from the Random Forest model.

##Local Explanation

In [None]:
i = 0
single_example = X_test.iloc[i]

print("Selected customer data:")
print(single_example)

# Compute contribution = feature_value * coefficient
contribution = single_example * coefficients

local_explanation = pd.DataFrame({
    "Feature": X.columns,
    "Value": single_example.values,
    "Coefficient": coefficients,
    "Contribution": contribution
}).sort_values(by="Contribution", ascending=False)

local_explanation


Logistic Regression provides a transparent explanation for each prediction.
For this customer, the strongest positive factor was Age, contributing +2.03 to the log-odds of churn, while CreditScore was the strongest negative factor, contributing –1.57. Additional smaller contributions were made by Gender (–0.51), Geography (+0.25), and Balance (+0.39).
Features such as EstimatedSalary and HasCrCard had very small or zero influence.
This demonstrates how interpretable models allow us to understand precisely why a prediction was made for an individual case.

#Explainable ML

##Random Forest

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(X_train)
x_test_scaled = scaler.fit_transform(X_test)

In [None]:
# Train a Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Evaluate the model
test_accuracy = rf_classifier.score(X_test, y_test)

# print(f"Train accuracy: {train_accuracy:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score

# Probability predictions
y_proba_original = rf_classifier.predict_proba(x_test_scaled)[:, 1]

# ROC curve
fpr_original, tpr_original, _ = roc_curve(y_test, y_proba_original)
auc_original = roc_auc_score(y_test, y_proba_original)

print("AUC (Original Random Forest):", auc_original)


###**Global Explanation**

In [None]:
import matplotlib.pyplot as plt
# Get feature importances
importances = rf_classifier.feature_importances_
features = X.columns

# Plot top 10 important features
top_indices = importances.argsort()[::-1][:10]
plt.figure(figsize=(10, 6))
plt.barh(features[top_indices][::-1], importances[top_indices][::-1], color='skyblue')
plt.xlabel("Feature Importance")
plt.title("Global Feature Importance (Random Forest)")
plt.tight_layout()
plt.show()

**Train Random Forest WITHOUT Age**

In [None]:
# Remove Age from features
X_no_age = X.drop(columns=["Age"])

# Train-test split again
X_train_no_age, X_test_no_age, y_train_no_age, y_test_no_age = train_test_split(
    X_no_age, y, test_size=0.2, random_state=42
)

# Train RF again
rf_no_age = RandomForestClassifier(n_estimators=100, random_state=42)
rf_no_age.fit(X_train_no_age, y_train_no_age)

# Evaluate
y_pred_no_age = rf_no_age.predict(X_test_no_age)
print(classification_report(y_test_no_age, y_pred_no_age))


In [None]:
# Probability predictions
y_proba_no_age = rf_no_age.predict_proba(X_test_no_age)[:, 1]

# ROC curve
fpr_no_age, tpr_no_age, _ = roc_curve(y_test_no_age, y_proba_no_age)
auc_no_age = roc_auc_score(y_test_no_age, y_proba_no_age)

print("AUC - RF Without Age:", auc_no_age)


**Train Random Forest WITHOUT EstimatedSalary AND CreditScore**

In [None]:
X_no_salary_score = X.drop(columns=["EstimatedSalary", "CreditScore"])

X_train_ns, X_test_ns, y_train_ns, y_test_ns = train_test_split(
    X_no_salary_score, y, test_size=0.2, random_state=42
)

rf_no_salary_score = RandomForestClassifier(n_estimators=100, random_state=42)
rf_no_salary_score.fit(X_train_ns, y_train_ns)

y_pred_ns = rf_no_salary_score.predict(X_test_ns)
print(classification_report(y_test_ns, y_pred_ns))


**ROC Curve**

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(8,6))

# Original RF
plt.plot(fpr_original, tpr_original, label=f'Original RF (AUC = {auc_original:.3f})')

# Without Age
plt.plot(fpr_no_age, tpr_no_age, label=f'RF Without Age (AUC = {auc_no_age:.3f})')

# Without Salary + CreditScore
plt.plot(fpr_ns, tpr_ns, label=f'RF Without Salary+Score (AUC = {auc_ns:.3f})')

# Random baseline
plt.plot([0,1], [0,1], 'k--', label='Random Guess')

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve Comparison')
plt.legend()
plt.grid(True)
plt.show()


The ROC curve below compares three Random Forest models: the original model using all features, a model without the Age feature and a model without EstimatedSalary and CreditScore.

The original model performs poorly with an AUC close to 0.5, meaning it is almost equivalent to random guessing and fails to distinguish churners from non-churners. Interestingly, when the Age feature is removed, the model’s performance improves significantly (AUC ≈ 0.79), suggesting that Age may have added noise or instability despite appearing important in the global explanation. The best performance is obtained when EstimatedSalary and CreditScore are removed (AUC ≈ 0.85), showing that these two features were not useful for prediction and that removing them helps the model focus on more relevant information.

Overall, this comparison highlights that eliminating certain features can greatly improve model performance, and explainability techniques help us understand which features truly contribute to reliable predictions.

In [None]:
# Probability predictions
y_proba_ns = rf_no_salary_score.predict_proba(X_test_ns)[:, 1]

# ROC curve
fpr_ns, tpr_ns, _ = roc_curve(y_test_ns, y_proba_ns)
auc_ns = roc_auc_score(y_test_ns, y_proba_ns)

print("AUC - RF Without Salary & CreditScore:", auc_ns)


###**Local Explanation**

In [None]:
# # Create a LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns,
    class_names=['No', 'Exited'],
    mode='classification'
)


# # Plot the LIME explanation
# lime_exp.as_pyplot_figure()


In [None]:
# Demonstrating explanation stability
instance1 = X_test.iloc[0]
instance2 = X_test.iloc[1731]

In [None]:
y_test.iloc[0]

In [None]:
y_test.iloc[1731]

In [None]:
# LIME explanations
exp1 = lime_explainer.explain_instance(instance1.values, rf_classifier.predict_proba, num_features=5)
exp2 = lime_explainer.explain_instance(instance2.values, rf_classifier.predict_proba, num_features=5)

print("LIME explanation for instance 1:")
print(exp1.as_list())
print("\nLIME explanation for instance 2:")
print(exp2.as_list())

In [None]:
# # Generate a LIME explanation for a single instance
instance = X_test.iloc[0]
# lime_exp = lime_explainer.explain_instance(instance.values, rf_classifier.predict_proba, num_features=10)


In [None]:
# Generate LIME explanation for both classes
lime_exp = lime_explainer.explain_instance(
    instance1.values,
    rf_classifier.predict_proba,
    num_features=10,
    labels=[0, 1]  # Force explanation for both classes
)

# Now display both
for label in [0, 1]:
    print(f"\nExplanation for class {lime_exp.class_names[label]}:")
    for feature, weight in lime_exp.as_list(label=label):
        print(f"{feature}: {weight}")


Les valeurs positives dans une classe favorisent cette classe.

Les valeurs négatives sont en faveur de l’autre classe.

Les grandes valeurs absolues indiquent une forte influence sur la décision.

Condition | Impact sur**"Exited"**

Jeune âge → -0.1689 | Moins probable de quitter

Inactif → +0.1257 | Plus probable de quitter

Num produits bancaires modérés → -0.1535 | Moins probable de quitter

etc. |

In [None]:
import matplotlib.pyplot as plt

def plot_lime_explanation(lime_exp, label, ax, class_name):
    exp = lime_exp.as_list(label=label)
    features = [x[0] for x in exp]
    weights = [x[1] for x in exp]

    colors = ['green' if w > 0 else 'red' for w in weights]

    ax.barh(features, weights, color=colors)
    ax.set_title(f"Class: {class_name}")
    ax.axvline(0, color='black', linewidth=0.5)
    ax.invert_yaxis()  # Highest weight on top

# Create figure
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
fig.suptitle("LIME Explanation for Both Classes", fontsize=16)

# Plot manually for both classes
plot_lime_explanation(lime_exp, label=0, ax=axes[0], class_name="No")
plot_lime_explanation(lime_exp, label=1, ax=axes[1], class_name="Exited")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()


##ANN

In [None]:
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
scaler_ann = StandardScaler()
X_train_ann = scaler_ann.fit_transform(X_train)
X_test_ann = scaler_ann.transform(X_test)


In [None]:
ann_model = Sequential()
ann_model.add(Dense(16, activation='relu', input_shape=(X_train_ann.shape[1],)))
ann_model.add(Dense(8, activation='relu'))
ann_model.add(Dense(1, activation='sigmoid'))

In [None]:
ann_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
history = ann_model.fit(X_train_ann, y_train, epochs=20, batch_size=32, validation_split=0.2, verbose=0)


In [None]:
loss, acc = ann_model.evaluate(X_test_ann, y_test, verbose=0)
print("ANN Test Accuracy:", acc)

###Global Explanation

In [None]:
import shap
import numpy as np


X_background = shap.sample(X_train_ann, 50)

pred_fn = lambda x: ann_model.predict(x).flatten()

explainer_ann = shap.KernelExplainer(pred_fn, X_background)

X_sample = X_test_ann[:100]
shap_values_ann = explainer_ann.shap_values(X_sample)

shap_values_fixed = np.squeeze(shap_values_ann)

In [None]:
shap.summary_plot(shap_values_fixed, X_sample, feature_names=X.columns)

The SHAP summary plot for the ANN model shows which features have the biggest influence on the model’s predictions for customer churn. The feature that stands out the most is NumOfProducts, which means the number of products a customer has plays a major role in how the ANN decides whether they might leave or not. When the value of NumOfProducts is high (red dots), it tends to push the prediction toward churn. Age is also an important factor: older customers usually increase the probability of churn, while younger customers decrease it. Another strong feature is IsActiveMember, where being inactive has a clear positive impact on churn. Features like Gender, Balance, and Geography show moderate influence, meaning they sometimes affect the prediction but not as consistently. On the other hand, CreditScore, EstimatedSalary, Tenure, and HasCrCard have very small SHAP values, which suggests that the ANN model does not rely heavily on them when making decisions. Overall, this plot gives a good global view of how the ANN behaves and which features matter most for predicting churn.

In [None]:
loss, acc = ann_model.evaluate(X_test_ann, y_test)
print(acc)


###Local Explanation

In [None]:
i = 0
sample = X_test_ann[i:i+1]

shap_values_single = explainer_ann.shap_values(sample)

# Squeeze the shap_values to ensure it's 1D for a single sample
# shap_values_single[0] will be (10, 1) before squeezing, we want (10,)
# sample will be (1, 10) before squeezing, we want (10,)
shap.force_plot(explainer_ann.expected_value[0],
                shap_values_single[0].squeeze(), # Squeeze the SHAP values
                sample.squeeze(),                # Squeeze the sample features
                feature_names=X.columns,
                matplotlib=True)

In [None]:
import json

with open('Notebook_XAI_Eya_Guirat_4DS10.ipynb') as f:
    data = json.load(f)

# Remove widget metadata if it exists
if 'widgets' in data['metadata']:
    del data['metadata']['widgets']

with open('Notebook_XAI_Eya_Guirat_4DS10_fixed.ipynb', 'w') as f:
    json.dump(data, f)
