# Recursive Feature Elimination
<br/>
<br/>


## Importing necessary libraries

In [128]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

## Load the dataset
This time I am using 'Boston Housing Price' dataset collected from kaggle. Hope you all are well acquainted with it.

In [25]:
df = pd.read_csv('C:/Users/Mehedi Hassan Galib/Desktop/Python/datas/housing_price.csv')
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


## Shape of the dataset
There are 14 columns as well as variables in the dataset. Our main goal is to drop the unnecessary variables without reducing the model performance 

In [5]:
df.shape

(506, 14)

## Dealing with missing values

In [45]:
df = df.fillna(df.mean())

## Splitting into explanatory and response variables

In [49]:
X = pd.DataFrame(df.iloc[:, 0:13])
y = pd.DataFrame(df.iloc[:, 13:14])

## Feature scaling

#### Note:
Feature scaling is very important in this regard. If we see how Recursive Feature Elimination (RFE) works, we will be able to understand it's importance.  
<br/>

In short, Firstly it calculates the coefficients of each variables and then removes the variable which coefficient is close to zero. Because when this coefficient will be multiplied by the variable, it won't bear that much importance to the model. But if we don't scale the variables, we can't compare the coefficients with each other.

In [57]:
scaling = StandardScaler()
X_std = scaling.fit_transform(df.iloc[:, 0:13])

## Splitting into train and test set

In [131]:
X_train, X_test, y_train, y_test = train_test_split(X_std, y, test_size = 0.2, random_state = 1)

<br/>

## Linear Regression Model with all the variables

In [63]:
lr1 = LinearRegression()
model1 = lr1.fit(X_train, y_train)

<br/>

## Observing coefficients of each variables

In [141]:
coefs1 = dict(zip(X.columns, abs(lr1.coef_[0])))
coefs1

{'CRIM': 0.9658321132444522,
 'ZN': 1.3541926293607394,
 'INDUS': 0.13323164365740447,
 'CHAS': 0.5409943251067181,
 'NOX': 2.262655371784611,
 'RM': 2.1510663396972483,
 'AGE': 0.13481402718067037,
 'DIS': 3.134676350082103,
 'RAD': 2.676102908535655,
 'TAX': 1.8886215977459326,
 'PTRATIO': 2.144054650450591,
 'B': 0.6676469647668919,
 'LSTAT': 3.897626210850134}

## Prediction

In [67]:
y_pred = lr1.predict(X_test)

<br/>

## Model Evaluation

In [153]:
mse1 = mean_squared_error(y_test, y_pred)
mse1.round(4)

23.448

In [154]:
r2_score(y_test, y_pred).round(4)

0.7627

In [155]:
lr1.score(X_test, y_test).round(4)

0.7627

<br/>
<br/>

# Here comes the RFE as a lifesaver!

In [115]:
rfe = RFE(estimator = LinearRegression(), n_features_to_select = 11, verbose = 1)
rfe.fit(X_train, y_train)

Fitting estimator with 13 features.
Fitting estimator with 12 features.


RFE(estimator=LinearRegression(), n_features_to_select=11, verbose=1)

<br/>

## List of variables RFE choose for our model

In [116]:
X.columns[rfe.support_]

Index(['CRIM', 'ZN', 'CHAS', 'NOX', 'RM', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B',
       'LSTAT'],
      dtype='object')

## Have a close look

The higher values mean that they were dropped at the early stage and the smaller values mean that they survive till the end and prove themselves fittest for the model. Survival of the fittest!

In [117]:
print(dict(zip(X.columns, rfe.ranking_)))

{'CRIM': 1, 'ZN': 1, 'INDUS': 3, 'CHAS': 1, 'NOX': 1, 'RM': 1, 'AGE': 2, 'DIS': 1, 'RAD': 1, 'TAX': 1, 'PTRATIO': 1, 'B': 1, 'LSTAT': 1}


## Prediction

In [118]:
y1_pred = rfe.predict(X_test)

<br/>

## Model Evaluation

In [150]:
mse2 = mean_squared_error(y_test, y1_pred)
mse2.round(4)

23.4293

In [151]:
r2_score(y_test, y1_pred).round(4)

0.7629

In [152]:
rfe.score(X_test, y_test).round(4)

0.7629

<br/>
<br/>

## Conclusion:
In the first model, we worked with 13 variables and got mse value 23.448. But in the second model with applying RFE, we got 11 variables to work with. In this time, we got mse value 23.4293 which is almost same to the first one. More precisely speaking, the second one is a little bit better.
<br/>
<br/>

Sometimes we may have to adjust the 'n_features_to_select' parameters value to find the model performance close to the first one or a better one.

<br/>
<br/>

# One more time saving tips

Don't try to measure the model performance with 'accuracy_score' like me. 'accuracy_score' is only for classification model. I spent almost half an hour with this problem and every time I got 'ValueError: continuous is not supported'. Thanks to Kaggle and Stake Overflow for helping me to solve this issue!
<br/>
<br/>

## Feel free to share your thoughts and if you find it helpful, please upvote. Thanks!