**Feature Selection**

Reduce the number of predictors as far as possible without compromising predictive performance. Either mitigate a specific problem in the interplay between predictors and a model, or reduce the model complexity.


Problems to solve:

    1- Some models are sensitive to uninformative predictors. -> SVMs and NNs
    2- The number of predictors is much greater than the number of samples -> should choose an optimal subset of predictors.
    3- Linear models are sensitive to inter-predictor correlation. Reducing   multi-collinearity will improve the predictive performance.
    4- Decreasing the number of predictors to predict also decreases the cost of acquiring new data
    
Some implicit feature selection methods:


    1- Tree-Rule Based models
    2- Multivariate adaptive regression spline (MARS) models
    3- Regularization models

    Pros: 
        Relatively fast, selection process is embedded within the model fitting process(implicit)
        Direct connection between selecting features and the objective function.
    Cons:
        Model-dependent

Sub-set search procedures: (explicit)
    
    1- Filter methods (e.g. statistical significance)
    2- Wrapper methods (iterative search procedures)


In [60]:
import pandas as pd 
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LinearRegression,Ridge
from sklearn.model_selection import train_test_split
import numpy as np
pd.set_option('display.max_column',None)

In [61]:
df = pd.read_csv('clean_df.csv')
df = df.drop(columns=['Unnamed: 0', 'symboling'])
y = df.pop('price')
df.head()

Unnamed: 0,normalized-losses,make,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,width,height,curb-weight,engine-type,num-of-cylinders,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,city-L/100km,horsepower-binned,fuel-type-diesel,fuel-type-gas
0,122,alfa-romero,std,two,convertible,rwd,front,88.6,0.811148,0.890278,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000.0,21,27,11.190476,Low,0,1
1,122,alfa-romero,std,two,convertible,rwd,front,88.6,0.811148,0.890278,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000.0,21,27,11.190476,Low,0,1
2,122,alfa-romero,std,two,hatchback,rwd,front,94.5,0.822681,0.909722,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000.0,19,26,12.368421,Medium,0,1
3,164,audi,std,four,sedan,fwd,front,99.8,0.84863,0.919444,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10.0,102,5500.0,24,30,9.791667,Low,0,1
4,164,audi,std,four,sedan,4wd,front,99.4,0.84863,0.922222,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8.0,115,5500.0,18,22,13.055556,Low,0,1


In [62]:
df[df['stroke'].isna()] = df['stroke'].mean()

In [63]:
foo = pd.get_dummies(df,drop_first=True)
foo.head()

Unnamed: 0,normalized-losses,wheel-base,length,width,height,curb-weight,engine-size,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,city-L/100km,fuel-type-diesel,fuel-type-gas,make_alfa-romero,make_audi,make_bmw,make_chevrolet,make_dodge,make_honda,make_isuzu,make_jaguar,make_mazda,make_mercedes-benz,make_mercury,make_mitsubishi,make_nissan,make_peugot,make_plymouth,make_porsche,make_renault,make_saab,make_subaru,make_toyota,make_volkswagen,make_volvo,aspiration_std,aspiration_turbo,num-of-doors_four,num-of-doors_two,body-style_convertible,body-style_hardtop,body-style_hatchback,body-style_sedan,body-style_wagon,drive-wheels_4wd,drive-wheels_fwd,drive-wheels_rwd,engine-location_front,engine-location_rear,engine-type_dohc,engine-type_l,engine-type_ohc,engine-type_ohcf,engine-type_ohcv,num-of-cylinders_eight,num-of-cylinders_five,num-of-cylinders_four,num-of-cylinders_six,num-of-cylinders_three,num-of-cylinders_twelve,fuel-system_1bbl,fuel-system_2bbl,fuel-system_idi,fuel-system_mfi,fuel-system_mpfi,fuel-system_spdi,fuel-system_spfi,horsepower-binned_High,horsepower-binned_Low,horsepower-binned_Medium
0,122.0,88.6,0.811148,0.890278,48.8,2548.0,130.0,3.47,2.68,9.0,111.0,5000.0,21.0,27.0,11.190476,0.0,1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0
1,122.0,88.6,0.811148,0.890278,48.8,2548.0,130.0,3.47,2.68,9.0,111.0,5000.0,21.0,27.0,11.190476,0.0,1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0
2,122.0,94.5,0.822681,0.909722,52.4,2823.0,152.0,2.68,3.47,9.0,154.0,5000.0,19.0,26.0,12.368421,0.0,1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1
3,164.0,99.8,0.84863,0.919444,54.3,2337.0,109.0,3.19,3.4,10.0,102.0,5500.0,24.0,30.0,9.791667,0.0,1.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0
4,164.0,99.4,0.84863,0.922222,54.3,2824.0,136.0,3.19,3.4,8.0,115.0,5500.0,18.0,22.0,13.055556,0.0,1.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0


In [64]:
foo = (foo-foo.mean())/foo.std()
y = (y-min(y))/(max(y)-min(y))

In [65]:
x_train, x_test, y_train, y_test = train_test_split(foo,y,test_size=0.33,random_state=66)

In [66]:
y_train.shape

(134,)

In [67]:
l =[]
for i in range(10):
    alphas = [0.0001,0.001,0.01,0.1,0.3,0.5,0.8,0.9,0.95,1]
    ridge = Ridge(alpha=alphas[i])
    ridge.fit(x_train,y_train)
    p = ridge.predict(x_test)
    l.append(np.sum((y_test-p)**2))

In [71]:
l.index(min(l))

9

In [72]:
#Ridge regression
ridge = Ridge(alpha=1)
ridge.fit(x_train,y_train)
p = ridge.predict(x_test)

In [76]:
np.sum((y_test-p)**2)

0.1633469698824234

In [89]:
#Forward selection and Backward selection
lr = LinearRegression()
sfs = SequentialFeatureSelector(lr, n_features_to_select='auto',direction='forward')
sfs.fit(x_train,y_train)

In [91]:
foo.shape

(201, 74)

In [90]:
transformed = sfs.transform(x_train)
test_transformed = sfs.transform(x_test)
transformed.shape

(134, 37)

In [92]:
lr.fit(transformed,y_train)
p = lr.predict(test_transformed)
np.sum((y_test-p)**2)

0.1458743609538687

In [93]:
sfs = SequentialFeatureSelector(lr, n_features_to_select='auto',direction='backward')
sfs.fit(x_train,y_train)

In [94]:
transformed = sfs.transform(x_train)
test_transformed = sfs.transform(x_test)
transformed.shape

(134, 37)

In [95]:
lr.fit(transformed,y_train)
p = lr.predict(test_transformed)
np.sum((y_test-p)**2)

0.3683101554692601

Best result is with linear regression + forward selection