<a href="https://colab.research.google.com/github/shradhadabhade/AIES/blob/main/Pr4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# problem4_permutation_importance.py
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.inspection import permutation_importance
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=600, n_features=8, n_informative=4, random_state=0)
feature_names = [f"f{i}" for i in range(X.shape[1])]
X = pd.DataFrame(X, columns=feature_names)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
clf = RandomForestClassifier(n_estimators=100, random_state=0).fit(X_train, y_train)

print("Test acc:", accuracy_score(y_test, clf.predict(X_test)))

r = permutation_importance(clf, X_test, y_test, n_repeats=20, random_state=0)
importances = pd.Series(r.importances_mean, index=feature_names).sort_values(ascending=False)
print("Permutation importances (top 5):")
print(importances.head(5))


Test acc: 0.9833333333333333
Permutation importances (top 5):
f7    0.207222
f0    0.163889
f4    0.067778
f5    0.059167
f3    0.037500
dtype: float64


# Explanation:

**Model Accuracy**

The model achieved a test accuracy of ~98.3%, which means the classifier is performing very well on new/unseen data.

**Purpose of PFI (Permutation Feature Importance)**

PFI helps us understand which features contribute the most to the model’s decisions.

It works by randomly shuffling each feature one at a time and measuring how much the model performance drops.

If shuffling a feature causes a big drop in accuracy, that feature is important.

If the accuracy doesn't change much, that feature has little influence.

**Interpreting the values**

The numeric values represent the decrease in performance (on average) when that feature is permuted.

**Result Interpretation:**

Feature	Importance	Meaning
f7 (0.207)	highest impact	Shuffling f7 caused 20% performance drop → f7 is the most influential feature.
f0 (0.164)	second most important	Shuffling f0 decreases model accuracy significantly → strong predictor.
f4 & f5 (0.06–0.07)	moderate importance	These features contribute but less than f7/f0.
f3 (~0.038)	small influence	Lower impact on prediction.

Conclusion

“From the PFI results, features f7 and f0 are the most important since permuting them decreases the model’s prediction accuracy the most. This means the model mainly relies on these features for classification. Features like f4, f5, and f3 contribute less, and therefore have lower importance scores.”

# **Question	Expected Answer**
1. What does the test accuracy value indicate?

It tells how well the trained model performs on unseen data. A score of 0.98 means 98% predictions on the test set were correct.

2. What is Permutation Feature Importance (PFI)?

PFI measures how the model’s performance changes when the values of one feature are randomly shuffled. If performance drops a lot, the feature is important.

3. Why do we shuffle one feature at a time?

	To isolate the effect of that particular feature while keeping the rest of the model inputs unchanged.
4. What does a higher importance score (like for f7) mean?

	It means shuffling f7 caused the largest drop in model accuracy; hence f7 contributes most to prediction.
5. Why is PFI considered model-agnostic?

	It does not depend on how the model was built; it only analyzes the model’s output predictions.
6. Can PFI handle both regression and classification models?

Yes, the concept remains the same — measure drop in model performance after permutation.
7. Why is feature f3 less important than f7 in the output?

Because permuting f3 caused a smaller decrease in accuracy compared to f7, meaning f3 influences predictions less.
8. What is the advantage of PFI over built-in feature importance of tree models?

	PFI reflects importance relative to the actual predictive performance, not internal model structure