# Permutation Feature Importance

In this assignment, you'll evaluate feature importance using permutation.

First, you will estimate feature importance using a random forests. 

Next, you'll estimate feature importance using any of the libraries that we covered in this section.

Finally, you will use permutation feature importance to select features recursively.

In [1]:
import pandas as pd

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

## Load data

In [2]:
variables = ['MSSubClass', 'LotArea', 'OverallQual', 'OverallCond',
             'YearBuilt', 'YearRemodAdd', 'BsmtFinSF1',
             'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF',  '1stFlrSF',
             '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath',
             'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr',
             'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageCars',
             'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch',
             '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal',
             'MoSold', 'YrSold', 'SalePrice']

In [3]:
# load dataset

data = pd.read_csv('../../houseprice.csv', usecols=variables)

data.shape

(1460, 34)

In [4]:
# separate train and test sets

X_train, X_test, y_train, y_test = train_test_split(
    data.drop(labels=['SalePrice'], axis=1),
    data['SalePrice'],
    test_size=0.3,
    random_state=0)

X_train.shape, X_test.shape

((1022, 33), (438, 33))

## Random Forests

In [5]:
# Train Random Forests

rf = RandomForestRegressor(
    n_estimators=100,
    max_depth=3,
    random_state=2909,)

rf.fit(X_train, y_train)

In [6]:
# R2 in train set

rf.score(X_train, y_train)

0.8078308037529935

In [7]:
# R2 in test set

rf.score(X_test, y_test)

0.780510833461595

## Tree derived feature importance

Extract and plot the importance derived from the random forests. Which features are more important?

## Permutation Feature Importance

Calculate and plot feature importance by permutation. Use any Python library you like.

## Permutation with RFE

We will now use permutation importance to select features, but we will do it recursively.

Tasks:

- implement recursive feature elimination using permutation (you'll need to combine Eli5 and sklearn)
- plot the importance of the selected features
- draw some conclusion