# 📘 [LDATS2350] - DATA MINING

## 📊 Python23 - MLP Regressor

**Prof. Robin Van Oirbeek**  
<br/>
**🧑‍🏫 Guillaume Deside** *(guillaume.deside@uclouvain.be)*  

---

## 🔹 What is an MLP Regressor?

An **MLP Regressor** (Multi-Layer Perceptron for regression) is a type of neural network used to predict **continuous numerical values**. It consists of:

- An **input layer** (one neuron per feature),
- One or more **hidden layers** with activation functions (e.g., ReLU),
- An **output layer** producing the predicted value.

---

### 🔧 Key Characteristics:
- Learns non-linear relationships between input and target.
- Trained via **backpropagation** and **gradient descent**.
- Supports multiple hidden layers and neurons per layer.
- Requires numerical input — categorical variables must be encoded.

---

### ⚙️ Important Hyperparameters:
- `hidden_layer_sizes`: Tuple defining neurons in each hidden layer (e.g., `(100,)` or `(50, 20)`).
- `activation`: Activation function (`relu`, `tanh`, etc.).
- `alpha`: L2 regularization term.
- `solver`: Optimization algorithm (`adam`, `lbfgs`, `sgd`).
- `max_iter`: Maximum training iterations.
- `learning_rate_init`: Step size for gradient descent.

---

### 📌 Typical Workflow
1. **Preprocess** your data (scaling & encoding).
2. **Split** into training/test sets.
3. Define and **train** `MLPRegressor`.
4. Evaluate using **MAE, MSE, RMSE, R²**.
5. **Tune hyperparameters** with `GridSearchCV`.

---

## 🧠 Why Use MLP for Regression?
✅ Captures complex patterns.  
✅ Suitable for high-dimensional datasets.  
✅ Flexible architecture and activation.  
❌ Requires tuning.  
❌ Less interpretable than linear models.

---

## 📈 Regression Metrics

- **MAE (Mean Absolute Error)**:  
  $ MAE = \frac{1}{n} \sum |y_i - \hat{y}_i| $
  
- **MSE (Mean Squared Error)**:  
  $ MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2 $

- **RMSE (Root Mean Squared Error)**:  
  $ RMSE = \sqrt{MSE} $

- **R² (Coefficient of Determination)**:  
  $ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $

---


In [9]:
import warnings
warnings.filterwarnings("ignore")
import os
import pandas as pd

os.makedirs("figures/MLP", exist_ok=True)
# Create folder for saving plots

X_train = pd.read_csv("data/X_train_raw.csv").values
X_test = pd.read_csv("data/X_test_raw.csv").values
y_train = pd.read_csv("data/y_train_raw.csv")["target"].values
y_test = pd.read_csv("data/y_test_raw.csv")["target"].values

# Print the shape of the training and test datasets
print("Training data shape:", X_train.shape)
print("Training targets shape:", y_train.shape)
print("Test data shape:", X_test.shape)
print("Test targets shape:", y_test.shape)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler(copy=False).fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Training data shape: (391, 13)
Training targets shape: (391,)
Test data shape: (99, 13)
Test targets shape: (99,)


### 📝 Exercise Instructions:

#### **Objective**:
Use an `MLPRegressor` to fit a neural network on a regression dataset, perform hyperparameter tuning with GridSearchCV, evaluate the model, and analyze the residuals.

#### **Steps**:

1. **Modeling**:
    - Define an `MLPRegressor` model.
    - Set up a grid of parameters: `hidden_layer_sizes`, `solver`, `alpha`, and `max_iter`.

2. **Model Selection**:
    - Use `GridSearchCV` with 3-fold cross-validation to find the best configuration.
    - Print the best parameters and scores.

3. **Evaluation**:
    - Compute $R^2$ on the test set.
    - Calculate and print MAE, MSE, RMSE, and R² for both train and test sets.

4. **Visualization**:
    - Create a residual plot showing prediction errors on both training and testing data.
    - Save the residual plot as `figures/MLP/residual_plot.png`.

![residual_plot.png](attachment:c5a54408-fcea-4241-ae4c-54f01a150f3e.png)

## 🔍 LIME & SHAP: Explaining Black-Box Models

Modern machine learning models like MLPs are powerful but **not inherently interpretable**. To gain trust and understand model behavior, we use **explainability tools**.

---

## 🔹 What is LIME?

**LIME** stands for **Local Interpretable Model-Agnostic Explanations**.

It builds a simple interpretable model (e.g., linear regression) around a single prediction to approximate the behavior of the complex model **locally**.

### ✅ Key Ideas:
- Perturb the input features around an instance.
- Observe how predictions change.
- Fit an interpretable model to these variations.
- Output a **ranking of feature importance** for that specific instance.

### 🧪 Example (for regression):
```python
from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    mode='regression'
)

# Explain a specific instance
i = 0  # Example index
exp = explainer.explain_instance(X_test[i], model.predict)
exp.show_in_notebook()
```

---

## 🔹 What is SHAP?

**SHAP** stands for **SHapley Additive exPlanations**.  
It uses principles from **cooperative game theory** to fairly attribute a prediction to each input feature.

### ✅ Key Concepts:
- Each feature is considered a "player" in the prediction.
- SHAP values measure the **marginal contribution** of each feature.
- Offers both **global** (across dataset) and **local** (per instance) interpretability.

---

### 📊 SHAP for Regression (with MLP):
```python
import shap

# Use KernelExplainer or Explainer depending on the model
explainer = shap.Explainer(model.predict, X_train)

# Get SHAP values for the test set
shap_values = explainer(X_test)

# Visualize impact of each feature
shap.plots.beeswarm(shap_values)

# Local explanation for one instance
shap.plots.waterfall(shap_values[0])
```

---

## 🆚 LIME vs SHAP

| Feature          | LIME                      | SHAP                          |
|------------------|---------------------------|--------------------------------|
| Local/Global     | Local                     | Local & Global                |
| Model-agnostic   | ✅ Yes                    | ✅ Yes (with KernelExplainer) |
| Speed            | Fast                      | Slower (computationally heavy)|
| Output           | Feature importances (local)| Additive feature contributions|
| Use case         | Quick insight             | Theoretically grounded        |

---

## 🎯 When to Use

- Use **LIME** for quick, intuitive, local explanations.
- Use **SHAP** for deep insights, fairness, or **audit-ready** interpretability.

---

Let me know if you'd like **slides** or a **Jupyter Notebook** exercise showing both tools in action!

In [36]:
!pip install lime shap

Collecting shap
  Downloading shap-0.47.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (24 kB)
Collecting slicer==0.0.8 (from shap)
  Downloading slicer-0.0.8-py3-none-any.whl.metadata (4.0 kB)
Downloading shap-0.47.0-cp312-cp312-macosx_11_0_arm64.whl (532 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m532.6/532.6 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m-:--:--[0m
[?25hDownloading slicer-0.0.8-py3-none-any.whl (15 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.47.0 slicer-0.0.8


In [53]:
import lime
import lime.lime_tabular
import shap
import matplotlib.pyplot as plt

# Ensure X_train and X_test are in numpy format
X_train_np = X_train if isinstance(X_train, np.ndarray) else X_train.to_numpy()
X_test_np = X_test if isinstance(X_test, np.ndarray) else X_test.to_numpy()

# Train the best model found by GridSearch
best_model = gs.best_estimator_

# Get feature names if available
feature_names = X_train.columns if hasattr(X_train, 'columns') else [f"Feature {i}" for i in range(X_train_np.shape[1])]

# -------------------- LIME --------------------
explainer_lime = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train_np,
    feature_names=feature_names,
    mode='regression'
)

# Explain a single instance from the test set
i = 0
exp = explainer_lime.explain_instance(X_test_np[i], best_model.predict, num_features=10)
lime_fig = exp.as_pyplot_figure()
lime_fig.tight_layout()
lime_fig.savefig("figures/MLP/lime_explanation.png", dpi=300)

# -------------------- SHAP --------------------
explainer_shap = shap.Explainer(best_model.predict, X_train_np)
shap_values = explainer_shap(X_test_np[:100])  # limit to 100 instances for performance

# Beeswarm plot
plt.figure()
shap.plots.beeswarm(shap_values, show=False)
plt.tight_layout()
plt.savefig("figures/MLP/shap_beeswarm.png", dpi=300)

# Waterfall for the first instance
plt.figure()
shap.plots.waterfall(shap_values[0], show=False)
plt.tight_layout()
plt.savefig("figures/MLP/shap_waterfall.png", dpi=300)

"✅ LIME and SHAP explanations generated and saved in figures/MLP folder."


PermutationExplainer explainer: 100it [00:15,  2.61it/s]                        


'✅ LIME and SHAP explanations generated and saved in figures/MLP folder.'