### V. Algorithm Selection and fine tuning

In this part, we create a model, we try different algorithms and see which one delivers the best results. Then we chose the best algorithm and fine tune it. 

This notebook presents the following parts:

    1) Model creation
    2) Algorithm testing : Linear Regression (simple, lasso, ridge), Boosted decision tree regressor , Random forest regressor, Bayesian linear regressor
    3) Chosing best algorithm
    4) Save model


     

In [1]:
# import libraries
import pandas as pd
from sklearn import preprocessing
import sklearn.model_selection as ms
from sklearn import linear_model
import sklearn.metrics as sklm
import numpy as np
import numpy.random as nr
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as ss
import math
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics


%matplotlib inline
%matplotlib inline

In [2]:
#import data set
df=pd.read_csv('dfprepared3.csv')
df.shape

(1399, 18)

In [3]:
df.head(2)

Unnamed: 0,capacity,failure_rate,margin,price,prod_cost,product_type:_auto-portee,product_type:_electrique,product_type:_essence,Quality:_Basic,Quality:_High,Quality:_Medium,Warranty_years:_1,Warranty_years:_2,Warranty_years:_3,Perc_Margin:_High,Perc_Margin:_Low,Perc_Margin:_Medium,attractiveness
0,-1.873473,-1.683579,2.342817,2.187839,0.911872,1,0,0,1,0,0,0,0,1,0,0,1,0.650648
1,-1.380486,-1.746504,2.854882,2.395929,0.583018,1,0,0,1,0,0,0,0,1,0,0,1,0.699792


**1) Model Creation**

In [4]:
# separate train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    df.drop(labels=['attractiveness'], axis=1),
    df['attractiveness'],
    test_size=0.3,
    random_state=0)

X_train.shape, X_test.shape

((979, 17), (420, 17))

**2) Algoirthm testing**

2.1Algorithm: **Linear Regression**

2.1.1 Linear Regression Simple

In [5]:
#Train the Model and predict
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_log_error
lm = LinearRegression()
lm.fit(X_train,y_train)
lm_predictions = lm.predict(X_test)

In [6]:
#print RMSLE
print ('Simple Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, lm_predictions)))

Simple Regression RMSLE is 0.07242206659110598


2.1.2 Lasso Linear Regression (l1)

In [7]:
#Train the Model and predict
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_log_error
Lasso = Lasso()
Lasso.fit(X_train,y_train)
Lasso_predictions = Lasso.predict(X_test)

In [8]:
#print RMSLE
print ('Lasso Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Lasso_predictions)))

Lasso Regression RMSLE is 0.07920657308951531


2.1.3 Ridge Regression (l2)

In [9]:
#Train the Model and predict
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_log_error
Ridge = Ridge()
Ridge.fit(X_train,y_train)
Ridge_predictions = Ridge.predict(X_test)

In [10]:
#print RMSLE
print ('Ridge Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Ridge_predictions)))

Ridge Regression RMSLE is 0.07165632155144348


2.2 Algorithm: **Boosted Decision Tree Regressor**

In [11]:
#Train the Model and predict
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_log_error
Tree = DecisionTreeRegressor()
Tree.fit(X_train,y_train)
Tree_predictions = Tree.predict(X_test)

In [12]:
#print RMSLE
print ('Boosted Decision Tree Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Tree_predictions)))

Boosted Decision Tree Regression RMSLE is 0.06004657556100968


2.3 Algorithm: **Random Forest Regressor**

In [13]:
#Train the Model and predict
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_log_error
Forest = RandomForestRegressor()
Forest.fit(X_train,y_train)
Forest_predictions = Forest.predict(X_test)



In [14]:
#print RMSLE
print ('Random Forest Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Forest_predictions)))

Random Forest Regression RMSLE is 0.04796913613049845


2.4 Algorithm: **Bayesian Linear Regressor**

In [15]:
#Train the Model and predict
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_log_error
Bayesian = BayesianRidge()
Bayesian.fit(X_train,y_train)
Bayesian_predictions = Bayesian.predict(X_test)

In [16]:
#print RMSLE
print ('Bayesian Ridge Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Bayesian_predictions)))

Bayesian Ridge Regression RMSLE is 0.07204981804528336


**3. Compare and chose best model**

In [17]:
print ('Simple Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, lm_predictions)))
print ('Lasso Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Lasso_predictions)))
print ('Ridge Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Ridge_predictions)))
print ('Boosted Decision Tree Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Tree_predictions)))
print ('Random Forest Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Forest_predictions)))
print ('Bayesian Ridge Regression RMSLE is', np.sqrt(mean_squared_log_error(y_test, Bayesian_predictions)))

Simple Regression RMSLE is 0.07242206659110598
Lasso Regression RMSLE is 0.07920657308951531
Ridge Regression RMSLE is 0.07165632155144348
Boosted Decision Tree Regression RMSLE is 0.06004657556100968
Random Forest Regression RMSLE is 0.04796913613049845
Bayesian Ridge Regression RMSLE is 0.07204981804528336


Clearly, Random Forest Regressor is the best model. 

**4. Save the model**

In [18]:
df.to_csv("dftobeimproved.csv", index=False)