# Permutation Feature Importance 

Once you've built your model and tuned your hyperparameters, there are still a few more tricks to get the most out of your model. Feature Selection is an imporant technique you should to apply to most machine learning models (deep learning is an exception) to identify features that make meaningful contributions to your model. Dropping features that are not important helps to reduce overfitting. Overfitting is discussed in another tutorial, but the gist is that if you train your model on too many features, it may become very good at fitting to the training data, but poor at generalizing to new data. 

Permutation Feature Importance (lets call it PFI) is slightly different from more popular feature selection techniques, like Principal Component Analysis. For PFI, you identify important features by testing an already trained model on input data that has random permutations in it, then oberseving the response. If you find that random permutations to the input data do not effect the model's performance, then it may be a less important feature. 

Just for review, creating a permutation of dataset is just reordering it without deleting any values. Here are all the permutations of [A, B, C]

[A, B, C]

[B, C, A]

[C, A, B]

[A, C, B]

[C, B, A]

[B, A, C]

Here is a toy example: lets say I have a random forest regression model, using the R^2 coefficient as a metric (0-1 range where 1 is a perfect fit), and have 10 features in my data. I will iterate through each feature. For a single loop, I create create a random permuation of one of the feature vectors on a copy of the validation data. Then I will run this data through the trained Random Forest, and evaluate its R^2 performance. I then repeat this for all the features. If the original training accuracy is 0.75, and the accuracy remains 0.75 on columns 1, 3, 5, then those features are not useful to the predive power of the model, and we can try dropping them. 

PFI is mostly used in random forest models, but can be extended to other classification and regression models. 

Lets now see an example with the Melbourne Housing Dataset from the previous tutorials. In the next few cells, I will load, clean, and build a Random Forest Regressor baseline model.

In [None]:
import pandas as pd 
import numpy as np
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestRegressor

In [None]:
#read in the data 
data = pd.read_csv('../input/melb_data.csv')

#split the X and Y data 
melbourne_predictors = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 
                        'YearBuilt', 'Lattitude', 'Longtitude']
X = data[melbourne_predictors]
y = data['Price']

#fill in missing NaN values (see Handling Missing Values tutorial)
my_imputer = Imputer()
X_impute = my_imputer.fit_transform(X)

#create a training and validation split 
train_X, val_X = X_impute[:15000], X_impute[15001:]
train_y, val_y = y[:15000], y[15001:]

In [None]:
forest_model = RandomForestRegressor() #build a regression model 
forest_model.fit(train_X, train_y) #train the base model 
base_score = forest_model.score(val_X, val_y) #get a accuracy score 
print ('Base model R^2:, ', base_score)

The baseline model performed with an R^2 value of about 0.64. Now, we can use Permutation Feature Importance to identify which feaatures contributed most to this score. 

In [None]:
for i in range(val_X.shape[1]): 
    
    #don't overwrite the validation data! 
    val_X_PFI = val_X.copy() 
    
    #create a random permutation with numpy 
    val_X_PFI[:, i] = np.random.permutation(val_X_PFI[:, i]) 
    
    #recompute the R^2 score
    score = forest_model.score(val_X_PFI, val_y)  
    print ('Permute {} R^2: {}'.format(X.columns[i], score))

The results above are insightful. The results may be slightly different when you run this, due to the random nature of the calculations. We can see that the R^2 coefficient remains within 0.03 of 0.64 for the Bathroom and BuldingArea features, and within 0.10 for Landsize and YearBuilt categories. This means, that these features did not impact the predictive power of the model to a significant degree. 

We can try dropping these four features and redoing the analysis. 

In [None]:
melbourne_predictors = ['Rooms', 'Lattitude', 'Longtitude']
X = data[melbourne_predictors]
y = data['Price']

#fill in missing NaN values (see Handling Missing Values tutorial)
my_imputer = Imputer()
X_impute = my_imputer.fit_transform(X)

#create a training and validation split 
train_X, val_X = X_impute[:15000], X_impute[15001:]
train_y, val_y = y[:15000], y[15001:]

forest_model = RandomForestRegressor() #build a regression model 
forest_model.fit(train_X, train_y) #train the base model 
base_score = forest_model.score(val_X, val_y) #get a accuracy score 
print ('New R^2:, ', base_score)

As you can see, the R^2 only drops slighly with this large reduction in dimensionality. This may reduce the predictive power of this particular model, but in datasets with more features, or more complicated models, you may see better gains.

# Conclusion 
Feature Selection with Permutation Feature Importance is a great way to reduce dimensionality of your dataset and help reduce overfitting. Here is a checklist of things to keep in mind when implementing PFI: 

- apply PFI on a trained model, using validation data
- make a copy of you validation every iteration to not overwrite 
- use the same perfomance metric for all comparisons 

# Your Turn 

Sklearn does not come with an implementation for PFI. See if you can extend this example for other performance metrics and model types (classification vs. regression). 

### References 

[Interpretable Machine Learning - Chrstoph Molnar](https://christophm.github.io/interpretable-ml-book/permutation-feature-importance.html)

[pjh2011's Github /rf_perm_feat_import
](https://github.com/pjh2011/rf_perm_feat_import)