# Permutation Feature Importance from Scratch

**Author**: Srimugunthan  
**Date**: February 2026

## Overview

**Permutation Feature Importance** is a model-agnostic method to measure feature importance by evaluating how much model performance decreases when a feature's values are randomly shuffled.

### Key Concept

If a feature is important:
- Shuffling it breaks the relationship with the target
- Model performance drops significantly
- High importance score

If a feature is not important:
- Shuffling it has little effect
- Model performance stays roughly the same
- Low/zero importance score

### Algorithm

```
1. Train model on training data
2. Calculate baseline score on validation data
3. For each feature:
   a. Shuffle the feature's values (breaks feature-target relationship)
   b. Calculate score with shuffled feature
   c. Importance = baseline_score - shuffled_score
   d. Repeat multiple times and average
```

### Advantages

‚úÖ Model-agnostic (works with any model)  
‚úÖ Captures feature interactions  
‚úÖ No retraining needed  
‚úÖ Easy to understand  

### Disadvantages

‚ö†Ô∏è Computationally expensive  
‚ö†Ô∏è Can be affected by correlated features  
‚ö†Ô∏è Requires validation set  

---

## 1. Setup and Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_regression, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, r2_score, mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Set random seed
np.random.seed(42)

print("Libraries loaded successfully!")

## 2. Understanding the Concept

Let's visualize what happens when we shuffle a feature.

In [None]:
def visualize_shuffling_effect():
    """
    Visualize the effect of shuffling on feature-target relationship
    """
    np.random.seed(42)
    
    # Create synthetic data with clear relationship
    n_samples = 100
    
    # Important feature: strong correlation with target
    X_important = np.random.randn(n_samples)
    y = 2 * X_important + np.random.randn(n_samples) * 0.5
    
    # Unimportant feature: no correlation with target
    X_unimportant = np.random.randn(n_samples)
    
    # Shuffle the important feature
    X_important_shuffled = np.random.permutation(X_important)
    
    # Plot
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    # Important feature (before shuffling)
    axes[0].scatter(X_important, y, alpha=0.6, edgecolor='black')
    axes[0].set_xlabel('Important Feature')
    axes[0].set_ylabel('Target')
    axes[0].set_title(f'Important Feature (Original)\nCorrelation: {np.corrcoef(X_important, y)[0,1]:.3f}')
    axes[0].grid(True, alpha=0.3)
    
    # Important feature (after shuffling)
    axes[1].scatter(X_important_shuffled, y, alpha=0.6, edgecolor='black', color='orange')
    axes[1].set_xlabel('Important Feature (Shuffled)')
    axes[1].set_ylabel('Target')
    axes[1].set_title(f'Important Feature (Shuffled)\nCorrelation: {np.corrcoef(X_important_shuffled, y)[0,1]:.3f}')
    axes[1].grid(True, alpha=0.3)
    
    # Unimportant feature
    axes[2].scatter(X_unimportant, y, alpha=0.6, edgecolor='black', color='red')
    axes[2].set_xlabel('Unimportant Feature')
    axes[2].set_ylabel('Target')
    axes[2].set_title(f'Unimportant Feature\nCorrelation: {np.corrcoef(X_unimportant, y)[0,1]:.3f}')
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\nüí° Key Insight:")
    print("   ‚Ä¢ Shuffling an IMPORTANT feature destroys the relationship with target")
    print("   ‚Ä¢ Shuffling an UNIMPORTANT feature has little effect (already no relationship)")
    print("   ‚Ä¢ This difference in effect is what permutation importance measures!")

visualize_shuffling_effect()

## 3. Implementing Permutation Importance from Scratch

### 3.1 Core Algorithm

In [None]:
class PermutationImportance:
    """
    Permutation Feature Importance from Scratch
    
    This implementation works with any sklearn-compatible model.
    """
    
    def __init__(self, model, X, y, metric='auto', n_repeats=10, random_state=42):
        """
        Initialize permutation importance calculator
        
        Args:
            model: Trained model with predict() method
            X: Feature matrix (validation/test set)
            y: Target vector (validation/test set)
            metric: Scoring metric ('accuracy', 'r2', 'mse', 'mae', 'auto')
            n_repeats: Number of times to shuffle each feature
            random_state: Random seed for reproducibility
        """
        self.model = model
        self.X = X if isinstance(X, np.ndarray) else X.values
        self.y = y if isinstance(y, np.ndarray) else y.values
        self.n_repeats = n_repeats
        self.random_state = random_state
        
        # Auto-detect metric based on model type
        if metric == 'auto':
            if hasattr(model, 'predict_proba'):
                self.metric = 'accuracy'
                self.higher_is_better = True
            else:
                self.metric = 'r2'
                self.higher_is_better = True
        else:
            self.metric = metric
            self.higher_is_better = metric in ['accuracy', 'r2', 'auc']
        
        # Store results
        self.importances_mean = None
        self.importances_std = None
        self.importances_raw = None
        self.baseline_score = None
    
    def _get_score(self, X, y):
        """
        Calculate score based on specified metric
        """
        y_pred = self.model.predict(X)
        
        if self.metric == 'accuracy':
            return accuracy_score(y, y_pred)
        elif self.metric == 'r2':
            return r2_score(y, y_pred)
        elif self.metric == 'mse':
            return mean_squared_error(y, y_pred)
        elif self.metric == 'mae':
            return mean_absolute_error(y, y_pred)
        else:
            raise ValueError(f"Unknown metric: {self.metric}")
    
    def compute(self, verbose=True):
        """
        Compute permutation importance for all features
        
        Returns:
            Dictionary with importance statistics
        """
        np.random.seed(self.random_state)
        
        n_features = self.X.shape[1]
        
        # Step 1: Calculate baseline score (no permutation)
        self.baseline_score = self._get_score(self.X, self.y)
        
        if verbose:
            print(f"Baseline {self.metric}: {self.baseline_score:.4f}")
            print(f"Computing importance for {n_features} features...")
        
        # Initialize storage for results
        importances_raw = np.zeros((n_features, self.n_repeats))
        
        # Step 2: For each feature
        for feature_idx in range(n_features):
            if verbose and (feature_idx + 1) % 5 == 0:
                print(f"  Processed {feature_idx + 1}/{n_features} features...")
            
            # Step 3: Repeat shuffling n_repeats times
            for repeat_idx in range(self.n_repeats):
                # Create a copy of the data
                X_permuted = self.X.copy()
                
                # Step 4: Shuffle the feature column
                # This breaks the relationship between feature and target
                X_permuted[:, feature_idx] = np.random.permutation(X_permuted[:, feature_idx])
                
                # Step 5: Calculate score with shuffled feature
                permuted_score = self._get_score(X_permuted, self.y)
                
                # Step 6: Calculate importance
                # For metrics where higher is better (accuracy, R¬≤)
                # For metrics where lower is better (MSE, MAE), flip the sign
                if self.higher_is_better:
                    importance = self.baseline_score - permuted_score
                else:
                    importance = permuted_score - self.baseline_score
                
                importances_raw[feature_idx, repeat_idx] = importance
        
        # Step 7: Aggregate results across repeats
        self.importances_mean = np.mean(importances_raw, axis=1)
        self.importances_std = np.std(importances_raw, axis=1)
        self.importances_raw = importances_raw
        
        if verbose:
            print(f"‚úì Computation complete!")
        
        return {
            'importances_mean': self.importances_mean,
            'importances_std': self.importances_std,
            'importances_raw': self.importances_raw,
            'baseline_score': self.baseline_score
        }
    
    def get_feature_importance_df(self, feature_names=None):
        """
        Return feature importance as a pandas DataFrame
        """
        if self.importances_mean is None:
            raise ValueError("Must call compute() first!")
        
        if feature_names is None:
            feature_names = [f'Feature_{i}' for i in range(len(self.importances_mean))]
        
        df = pd.DataFrame({
            'Feature': feature_names,
            'Importance_Mean': self.importances_mean,
            'Importance_Std': self.importances_std
        })
        
        # Sort by importance
        df = df.sort_values('Importance_Mean', ascending=False).reset_index(drop=True)
        
        return df
    
    def plot(self, feature_names=None, top_k=None, figsize=(10, 6)):
        """
        Plot permutation importance with error bars
        """
        if self.importances_mean is None:
            raise ValueError("Must call compute() first!")
        
        if feature_names is None:
            feature_names = [f'Feature_{i}' for i in range(len(self.importances_mean))]
        
        # Sort by importance
        sorted_idx = np.argsort(self.importances_mean)[::-1]
        
        if top_k is not None:
            sorted_idx = sorted_idx[:top_k]
        
        # Plot
        plt.figure(figsize=figsize)
        plt.barh(range(len(sorted_idx)),
                self.importances_mean[sorted_idx],
                xerr=self.importances_std[sorted_idx],
                alpha=0.7,
                edgecolor='black',
                capsize=5)
        
        plt.yticks(range(len(sorted_idx)),
                  [feature_names[i] for i in sorted_idx])
        plt.xlabel(f'Permutation Importance ({self.metric})', fontsize=12)
        plt.title(f'Feature Importance (Baseline {self.metric}: {self.baseline_score:.4f})', 
                 fontsize=14, fontweight='bold')
        plt.gca().invert_yaxis()
        plt.grid(axis='x', alpha=0.3)
        plt.axvline(x=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
        plt.tight_layout()
        plt.show()

print("‚úì PermutationImportance class defined!")

## 4. Example 1: Classification Task (Synthetic Data)

Let's test on a synthetic dataset where we know which features are important.

In [None]:
print("="*70)
print("EXAMPLE 1: BINARY CLASSIFICATION (Synthetic Data)")
print("="*70)

# Generate synthetic data
# 10 features: 5 informative, 3 redundant, 2 noise
X_class, y_class = make_classification(
    n_samples=1000,
    n_features=10,
    n_informative=5,
    n_redundant=3,
    n_repeated=0,
    n_classes=2,
    random_state=42
)

# Split data
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
    X_class, y_class, test_size=0.3, random_state=42
)

print(f"\nDataset:")
print(f"  Training samples: {X_train_c.shape[0]}")
print(f"  Test samples: {X_test_c.shape[0]}")
print(f"  Features: {X_train_c.shape[1]}")
print(f"  Classes: {len(np.unique(y_class))}")

# Train Random Forest classifier
print("\nTraining Random Forest Classifier...")
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_c, y_train_c)

train_acc = accuracy_score(y_train_c, rf_classifier.predict(X_train_c))
test_acc = accuracy_score(y_test_c, rf_classifier.predict(X_test_c))

print(f"  Training Accuracy: {train_acc:.4f}")
print(f"  Test Accuracy: {test_acc:.4f}")

# Calculate permutation importance
print("\n" + "-"*70)
print("Computing Permutation Importance...")
print("-"*70)

perm_imp_class = PermutationImportance(
    model=rf_classifier,
    X=X_test_c,
    y=y_test_c,
    metric='accuracy',
    n_repeats=10,
    random_state=42
)

results_class = perm_imp_class.compute(verbose=True)

# Show results
feature_names_class = [f'Feature_{i}' for i in range(10)]
df_class = perm_imp_class.get_feature_importance_df(feature_names_class)

print("\n" + "="*70)
print("RESULTS")
print("="*70)
print(df_class.to_string(index=False))

In [None]:
# Plot results
perm_imp_class.plot(feature_names_class)

## 5. Example 2: Real Dataset (Breast Cancer)

Let's apply it to the Breast Cancer dataset.

In [None]:
print("="*70)
print("EXAMPLE 2: BREAST CANCER CLASSIFICATION (Real Data)")
print("="*70)

# Load Breast Cancer dataset
data = load_breast_cancer()
X_cancer = data.data
y_cancer = data.target
feature_names_cancer = data.feature_names

print(f"\nDataset:")
print(f"  Total samples: {X_cancer.shape[0]}")
print(f"  Features: {X_cancer.shape[1]}")
print(f"  Classes: {len(np.unique(y_cancer))}")

# Split data
X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(
    X_cancer, y_cancer, test_size=0.3, random_state=42
)

# Train Logistic Regression
print("\nTraining Logistic Regression...")
lr_cancer = LogisticRegression(max_iter=10000, random_state=42)
lr_cancer.fit(X_train_cancer, y_train_cancer)

train_acc_cancer = accuracy_score(y_train_cancer, lr_cancer.predict(X_train_cancer))
test_acc_cancer = accuracy_score(y_test_cancer, lr_cancer.predict(X_test_cancer))

print(f"  Training Accuracy: {train_acc_cancer:.4f}")
print(f"  Test Accuracy: {test_acc_cancer:.4f}")

# Calculate permutation importance
print("\n" + "-"*70)
print("Computing Permutation Importance...")
print("-"*70)

perm_imp_cancer = PermutationImportance(
    model=lr_cancer,
    X=X_test_cancer,
    y=y_test_cancer,
    metric='accuracy',
    n_repeats=10,
    random_state=42
)

results_cancer = perm_imp_cancer.compute(verbose=True)

# Show top 15 features
df_cancer = perm_imp_cancer.get_feature_importance_df(feature_names_cancer)

print("\n" + "="*70)
print("TOP 15 MOST IMPORTANT FEATURES")
print("="*70)
print(df_cancer.head(15).to_string(index=False))

In [None]:
# Plot top 15 features
perm_imp_cancer.plot(feature_names_cancer, top_k=15, figsize=(10, 8))

## 6. Example 3: Regression Task

Permutation importance also works for regression problems.

In [None]:
print("="*70)
print("EXAMPLE 3: REGRESSION (Synthetic Data)")
print("="*70)

# Generate synthetic regression data
X_reg, y_reg = make_regression(
    n_samples=1000,
    n_features=15,
    n_informative=10,
    n_targets=1,
    noise=10.0,
    random_state=42
)

# Split data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.3, random_state=42
)

print(f"\nDataset:")
print(f"  Training samples: {X_train_reg.shape[0]}")
print(f"  Test samples: {X_test_reg.shape[0]}")
print(f"  Features: {X_train_reg.shape[1]}")

# Train Random Forest Regressor
print("\nTraining Random Forest Regressor...")
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train_reg, y_train_reg)

train_r2 = r2_score(y_train_reg, rf_regressor.predict(X_train_reg))
test_r2 = r2_score(y_test_reg, rf_regressor.predict(X_test_reg))

print(f"  Training R¬≤: {train_r2:.4f}")
print(f"  Test R¬≤: {test_r2:.4f}")

# Calculate permutation importance using R¬≤
print("\n" + "-"*70)
print("Computing Permutation Importance (using R¬≤)...")
print("-"*70)

perm_imp_reg = PermutationImportance(
    model=rf_regressor,
    X=X_test_reg,
    y=y_test_reg,
    metric='r2',
    n_repeats=10,
    random_state=42
)

results_reg = perm_imp_reg.compute(verbose=True)

# Show results
feature_names_reg = [f'Feature_{i}' for i in range(15)]
df_reg = perm_imp_reg.get_feature_importance_df(feature_names_reg)

print("\n" + "="*70)
print("RESULTS")
print("="*70)
print(df_reg.to_string(index=False))

In [None]:
# Plot results
perm_imp_reg.plot(feature_names_reg)

## 7. Comparing with sklearn's Implementation

Let's verify our implementation matches sklearn's.

In [None]:
from sklearn.inspection import permutation_importance as sklearn_perm_imp

print("="*70)
print("COMPARISON: Our Implementation vs sklearn")
print("="*70)

# Our implementation (already computed above for classification)
our_importances = perm_imp_class.importances_mean

# sklearn implementation
print("\nComputing with sklearn...")
sklearn_result = sklearn_perm_imp(
    rf_classifier,
    X_test_c,
    y_test_c,
    n_repeats=10,
    random_state=42,
    scoring='accuracy'
)

sklearn_importances = sklearn_result.importances_mean

# Compare
comparison_df = pd.DataFrame({
    'Feature': feature_names_class,
    'Our Implementation': our_importances,
    'sklearn': sklearn_importances,
    'Difference': np.abs(our_importances - sklearn_importances)
})

comparison_df = comparison_df.sort_values('Our Implementation', ascending=False)

print("\n" + "="*70)
print("COMPARISON RESULTS")
print("="*70)
print(comparison_df.to_string(index=False))

# Calculate correlation
correlation = np.corrcoef(our_importances, sklearn_importances)[0, 1]
print(f"\n‚úì Correlation between implementations: {correlation:.6f}")
print(f"‚úì Mean absolute difference: {comparison_df['Difference'].mean():.6f}")

if correlation > 0.99:
    print("\n‚úÖ Perfect match! Our implementation is correct!")
else:
    print("\n‚ö†Ô∏è  Small differences due to random shuffling order")

## 8. Understanding Negative Importance

Sometimes features have negative importance. What does this mean?

In [None]:
def explain_negative_importance():
    """
    Explain what negative importance means
    """
    print("="*70)
    print("UNDERSTANDING NEGATIVE IMPORTANCE")
    print("="*70)
    
    print("\nüìä What does negative importance mean?\n")
    
    print("Positive Importance (e.g., +0.05):")
    print("  ‚Ä¢ Shuffling the feature DECREASES model performance")
    print("  ‚Ä¢ Feature is useful for prediction")
    print("  ‚Ä¢ Model relies on this feature")
    print("  ‚Ä¢ Interpretation: 'Removing this feature hurts accuracy by 5%'")
    
    print("\nZero Importance (e.g., ~0.00):")
    print("  ‚Ä¢ Shuffling the feature has NO effect on performance")
    print("  ‚Ä¢ Feature is not used by the model")
    print("  ‚Ä¢ Can be safely removed")
    
    print("\nNegative Importance (e.g., -0.02):")
    print("  ‚Ä¢ Shuffling the feature INCREASES model performance (!)")
    print("  ‚Ä¢ Feature is actually hurting the model")
    print("  ‚Ä¢ Possible reasons:")
    print("    - Feature adds noise")
    print("    - Feature is highly correlated with other features")
    print("    - Random variation (check std deviation)")
    print("    - Overfitting on training set")
    
    print("\nüí° Key Insight:")
    print("   If importance is negative but std is high, it might just be noise.")
    print("   If importance is consistently negative, consider removing the feature.")

explain_negative_importance()

# Show features with negative importance from our examples
print("\n" + "="*70)
print("FEATURES WITH NEGATIVE IMPORTANCE (from our examples)")
print("="*70)

negative_features = df_class[df_class['Importance_Mean'] < 0]
if len(negative_features) > 0:
    print("\nClassification Example:")
    print(negative_features.to_string(index=False))
else:
    print("\nNo features with negative importance in classification example.")

## 9. Effect of n_repeats

How does the number of repetitions affect the results?

In [None]:
print("="*70)
print("ANALYZING EFFECT OF n_repeats")
print("="*70)

# Test with different n_repeats
n_repeats_values = [1, 5, 10, 20, 50]
results_by_repeats = {}

print("\nComputing importance with different n_repeats values...")

for n_repeats in n_repeats_values:
    print(f"  n_repeats={n_repeats}...", end=" ")
    
    perm_imp = PermutationImportance(
        model=rf_classifier,
        X=X_test_c,
        y=y_test_c,
        metric='accuracy',
        n_repeats=n_repeats,
        random_state=42
    )
    
    results = perm_imp.compute(verbose=False)
    results_by_repeats[n_repeats] = results
    print("Done")

# Plot comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Mean importance across n_repeats
for n_repeats in n_repeats_values:
    importances = results_by_repeats[n_repeats]['importances_mean']
    axes[0].plot(range(len(importances)), sorted(importances, reverse=True),
                marker='o', label=f'n_repeats={n_repeats}', alpha=0.7)

axes[0].set_xlabel('Feature Rank')
axes[0].set_ylabel('Importance')
axes[0].set_title('Mean Importance vs n_repeats')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Standard deviation across n_repeats
for n_repeats in n_repeats_values:
    std = results_by_repeats[n_repeats]['importances_std']
    axes[1].plot(range(len(std)), sorted(std, reverse=True),
                marker='o', label=f'n_repeats={n_repeats}', alpha=0.7)

axes[1].set_xlabel('Feature Rank')
axes[1].set_ylabel('Standard Deviation')
axes[1].set_title('Std Deviation vs n_repeats')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° Key Insights:")
print("   ‚Ä¢ Higher n_repeats ‚Üí Lower standard deviation ‚Üí More stable estimates")
print("   ‚Ä¢ Mean importance converges quickly (even n_repeats=5 is often good)")
print("   ‚Ä¢ Trade-off: Computation time vs stability")
print("   ‚Ä¢ Recommended: n_repeats=10 for most applications")

## 10. Visualizing Importance Distribution

In [None]:
def plot_importance_distributions(perm_imp, feature_names, top_k=5):
    """
    Plot distribution of importance across repeats for top features
    """
    # Get top k features
    sorted_idx = np.argsort(perm_imp.importances_mean)[::-1][:top_k]
    
    fig, axes = plt.subplots(1, top_k, figsize=(4*top_k, 4))
    
    if top_k == 1:
        axes = [axes]
    
    for i, feature_idx in enumerate(sorted_idx):
        importances = perm_imp.importances_raw[feature_idx]
        
        axes[i].hist(importances, bins=15, alpha=0.7, edgecolor='black')
        axes[i].axvline(perm_imp.importances_mean[feature_idx],
                       color='red', linestyle='--', linewidth=2,
                       label=f'Mean: {perm_imp.importances_mean[feature_idx]:.4f}')
        axes[i].set_xlabel('Importance')
        axes[i].set_ylabel('Frequency')
        axes[i].set_title(f'{feature_names[feature_idx]}\n(Rank #{i+1})')
        axes[i].legend()
        axes[i].grid(True, alpha=0.3)
    
    plt.suptitle('Importance Distribution Across Repeats (Top 5 Features)',
                fontsize=14, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.show()

print("Plotting importance distributions for top 5 features...")
plot_importance_distributions(perm_imp_class, feature_names_class, top_k=5)

## 11. Summary and Best Practices

### Algorithm Summary

```python
# Pseudocode for Permutation Importance
baseline_score = evaluate(model, X_val, y_val)

for each feature in X_val:
    importances = []
    
    for repeat in range(n_repeats):
        X_permuted = X_val.copy()
        X_permuted[:, feature] = shuffle(X_permuted[:, feature])
        permuted_score = evaluate(model, X_permuted, y_val)
        importance = baseline_score - permuted_score
        importances.append(importance)
    
    feature_importance[feature] = mean(importances)
```

### Best Practices

1. **Use validation/test set** (not training set)
   - Training set can give misleading results due to overfitting

2. **Choose appropriate n_repeats**
   - n_repeats=10 is usually sufficient
   - Increase for more stable estimates
   - Decrease for faster computation

3. **Consider computational cost**
   - Cost = n_features √ó n_repeats √ó model_inference_time
   - Can be expensive for large datasets or slow models

4. **Interpret with caution for correlated features**
   - If features A and B are highly correlated
   - Shuffling A might not hurt performance (B compensates)
   - Both might show low importance even if jointly important

5. **Check standard deviation**
   - High std ‚Üí Unstable estimate, increase n_repeats
   - Low std ‚Üí Reliable estimate

6. **Compare with other methods**
   - Use alongside tree-based importances, SHAP, etc.
   - Different methods can reveal different insights

### When to Use Permutation Importance

‚úÖ **Good for:**
- Model-agnostic feature importance
- Comparing features across different models
- Understanding feature impact on predictions
- Feature selection

‚ùå **Not ideal for:**
- Extremely large datasets (slow)
- Highly correlated features (misleading)
- When you need individual prediction explanations (use SHAP)

---

## 12. Key Takeaways

### Core Concept
Permutation importance measures how much model performance drops when a feature is randomly shuffled, breaking its relationship with the target.

### Algorithm Steps
1. Calculate baseline score on validation set
2. For each feature:
   - Shuffle feature values
   - Calculate new score
   - Importance = baseline - new_score
   - Repeat and average

### Advantages
- Model-agnostic (works with any model)
- Intuitive interpretation
- Captures feature interactions
- No model retraining needed

### Limitations
- Computationally expensive
- Can be misleading with correlated features
- Requires separate validation set

### Interpretation
- **High positive**: Important feature
- **Zero**: Unimportant feature
- **Negative**: May indicate noise or correlation issues

---

## Further Reading

- [Breiman (2001) - Random Forests](https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf)
- [sklearn Permutation Importance](https://scikit-learn.org/stable/modules/permutation_importance.html)
- [Interpretable Machine Learning Book - Permutation Feature Importance](https://christophm.github.io/interpretable-ml-book/feature-importance.html)