## Permutation Feature Importance

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score, confusion_matrix

In [None]:
# Load the dataset
credit_score=pd.read_csv("data/credit_score.csv")

# Select features
features=['INCOME', 'DEBT', 'R_EXPENDITURE', 'R_ENTERTAINMENT', 'CAT_GAMBLING']
x=credit_score[features].copy()

x['GAMBLING_LOW']=x['CAT_GAMBLING'].apply(lambda x: 1 if x=='Low' else 0)
x['GAMBLING_HIGH']=x['CAT_GAMBLING'].apply(lambda x: 1 if x=='High' else 0)
x.drop(columns=['CAT_GAMBLING'], inplace=True)

# Target variable
y=credit_score['CREDIT_SCORE']

#### 🔹 Step 1: Importing XGBoost Regressor

```python
import xgboost as xgb
```

* `xgb.XGBRegressor` → This is the **regression model** from the XGBoost library.
* XGBoost (**Extreme Gradient Boosting**) is a very popular machine learning algorithm that builds an **ensemble of decision trees** in a boosting framework.

---

#### 🔹 Step 2: Creating the Model

```python
model = xgb.XGBRegressor(
    objective='reg:squarederror',
    max_depth=3,
    n_estimators=100
)
```

1. **`objective='reg:squarederror'`**

   * Defines the type of learning task.
   * `"reg:squarederror"` means the model will optimize for **Mean Squared Error (MSE)**, which is the standard for regression problems.
   * In other words, it tries to minimize the squared difference between predicted values and actual values.

2. **`max_depth=3`**

   * Controls the **maximum depth of each decision tree** in the boosting process.
   * A lower depth (like 3) → simpler trees, less risk of overfitting, but possibly underfitting.
   * A higher depth → more complex trees, better training fit, but risk of overfitting.

3. **`n_estimators=100`**

   * Number of **boosted trees (weak learners)** to build.
   * Each new tree corrects the errors made by the previous ones.
   * More trees generally improve performance but increase training time and risk of overfitting.

---

#### 🔹 Step 3: Training the Model

```python
model.fit(x, y)
```

* `.fit()` is the method to **train the model**.

* Here:

  * `x` → Feature matrix (independent variables, input data).
    Example: income, debt ratio, etc.
  * `y` → Target variable (dependent variable).
    Example: credit score, house price, etc.

* During training:

  1. The model builds an initial prediction (like an average of all `y` values).
  2. It calculates the **residuals** (errors between prediction and actual `y`).
  3. It fits a decision tree to predict those residuals.
  4. Repeats this for `n_estimators` (100 trees here).
  5. Each tree is shallow (depth 3), making it a **weak learner**.
  6. Boosting combines them into a strong predictor.

After this step, `model` has **learned patterns from the data** and is ready to make predictions.

---

In [None]:
# Train the model
model=xgb.XGBRegressor(objective='reg:squarederror', max_depth=3, n_estimators=100)
model.fit(x, y)

In [None]:
# Get the predictions
y_pred=model.predict(x)

# Model evaluation
fig, ax=plt.subplots(nrows=1, ncols=1, figsize=(8, 8))

plt.scatter(y, y_pred)
# Line of perfect predictions
ax.plot([y.min(), y.max()], [y.min(), y.max()], color='tab:red')

plt.ylabel('Predicted values', size=20)
plt.xlabel('Actual values', size=20)

In [None]:
# Calculate R^2 value to evaluate performance
baseline_score=model.score(x, y)
# The score when no feature has been permuted
print(baseline_score)

### Permuting a feature

- `Permuting a feature` means shuffling the values of that feature. We must not sample from another feature or distribution.

In [None]:
x_perm=x.copy()
x_perm['INCOME']=np.random.permutation(x_perm['INCOME'])

# Get predictions
y_pred=model.predict(x_perm)

In [None]:
# Model evaluation
fig, ax=plt.subplots(nrows=1, ncols=1, figsize=(8, 8))

plt.scatter(y, y_pred)
# Line of perfect predictions
ax.plot([y.min(), y.max()], [y.min(), y.max()], color='tab:red')

plt.ylabel('Predicted values', size=20)
plt.xlabel('Actual values', size=20)

- Since the scatter plot has been deviated a lot when we permuted `INCOME`, so we can say that `INCOME` feature contributes the most for the model prediction.

- The model is using the relationship between income and credit_score to make predictions. When we permute income, we break this relationship and so the model makes worse predictions.

- It is difficult to make an objective judgements on how much worse by using a visualization.

In [None]:
permuted_score=model.score(x_perm, y)
print(permuted_score)

In [None]:
importance_score=baseline_score-permuted_score
print(importance_score)

### Permutation feature importance from scratch

In [None]:
def get_perm_importance(model, x, y, features, n=10):
    # Calculate baseline score (without permuting any feature)
    baseline_score=model.score(x, y)

    importance_scores={}

    for feature in features:
        x_perm=x.copy()
        sum_score=0

        # Repeat n times to get average importance score
        for i in range(n):
            # Calculate score when given feature is permuted
            x_perm[feature]=np.random.permutation(x_perm[feature])
            permuted_score=model.score(x_perm, y)

            sum_score+=permuted_score

        # Calculate decrease in score
        importance_score=baseline_score-(sum_score/n)
        importance_scores[feature]=importance_score

    return importance_scores

```python
sorted_importance_scores = sorted(importance_scores.items(), key=lambda x: x[1])
```

1. **`importance_scores`**

   * Likely a **dictionary** like:

     ```python
     importance_scores = {"Income": 0.45, "Debt": 0.25, "Age": 0.30}
     ```
   * Keys → feature names
   * Values → importance score (how much that feature contributes to the model).

2. **`.items()`**

   * Converts the dictionary into a list of **(key, value) tuples**:

     ```python
     dict.items() → [("Income", 0.45), ("Debt", 0.25), ("Age", 0.30)]
     ```

3. **`sorted(..., key=lambda x: x[1])`**

   * `sorted()` sorts the list of tuples.
   * `key=lambda x: x[1]` → tells Python to sort by the **second element of each tuple** (the importance score, not the feature name).
   * So the result is sorted **ascending by score**:

     ```python
     sorted_importance_scores =
       [("Debt", 0.25), ("Age", 0.30), ("Income", 0.45)]
     ```

---

```python
features, scores = zip(*sorted_importance_scores)
```

1. **`zip(*sorted_importance_scores)`**

   * `*` operator unpacks the list of tuples.
   * It’s like doing:

     ```python
     zip(("Debt", 0.25), ("Age", 0.30), ("Income", 0.45))
     ```
   * `zip` groups elements **by position**:

     * First elements together → `("Debt", "Age", "Income")`
     * Second elements together → `(0.25, 0.30, 0.45)`

2. **`features, scores = ...`**

   * Unpacks the zipped result into two variables:

     ```python
     features = ("Debt", "Age", "Income")
     scores   = (0.25, 0.30, 0.45)
     ```

---

In [None]:
# Calculate permutation feature importance
importance_scores=get_perm_importance(
    model=model,
    x=x,
    y=y,
    features=x.columns,
    n=10
)

# Display the importance scores using the bar plot
sorted_importance_scores=sorted(importance_scores.items(), key=lambda x: x[1])
features, scores=zip(*sorted_importance_scores)

plt.subplots(figsize=(8, 4))
plt.barh(features, scores)
plt.xlabel('Permutation Importance')

- We can conclude that `INCOME` and `DEBT` are most important feature whivh contribute for model predictions.