### II. Algorithm Selection 

In this part, we create a model, we try different algorithms and see which one delivers the best results. 

The goal is to determine wether it's a linear or a non linear model, as this will determine the nature of feature selection undertaken on the next notebook.

The algorithms tested are:

* Simple Linear Regression 
* Ridge Regression
* Lasso
* Boosted decision tree regressor
* Random forest regressor
* Bayesian linear regressor
* SVM regressor
* XGB regressor


In [1]:
# import libraries
import pandas as pd
from sklearn import preprocessing
import sklearn.model_selection as ms
from sklearn import linear_model
import sklearn.metrics as sklm
import numpy as np
import numpy.random as nr
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as ss
import math
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline
%matplotlib inline

### 1. Import data



In [2]:
# Merge the two datasets
X_train=pd.read_csv('X_train.csv',sep= ',')
X_train.shape

(174, 27)

In [3]:
X_test=pd.read_csv('X_test.csv',sep= ',')
X_test.shape

(31, 27)

In [4]:
# Merge the two datasets
y_train=pd.read_csv('y_train.csv',sep= ',')
y_train.shape

(174, 1)

In [5]:
# Merge the two datasets
y_test=pd.read_csv('y_test.csv',sep= ',')
y_test.shape

(31, 1)

### 2. Algorithm Testing

#### 2.1 Linear Regression 

##### 2.1.1 Linear Regression simple

In [6]:
#Train the Model and predict
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
lm = LinearRegression()
lm.fit(X_train,y_train)
lm_predictions = lm.predict(X_test)

In [7]:
#print RMSLE
print ('Simple Regression RMSE is', np.sqrt(mean_squared_error(y_test, lm_predictions)))


Simple Regression RMSE is 3934.656560357617


#### 2.1.2 Lasso Linear Regression (l1)

In [8]:
#Train the Model and predict
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
Lasso = Lasso()
Lasso.fit(X_train,y_train)
Lasso_predictions = Lasso.predict(X_test)

In [9]:
#print RMSLE
print ('Lasso Regression RMSE is', np.sqrt(mean_squared_error(y_test, Lasso_predictions)))

Lasso Regression RMSE is 3907.50195764579


#### 2.1.3  Ridge Regression (l2)

In [10]:
#Train the Model and predict
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
Ridge = Ridge()
Ridge.fit(X_train,y_train)
Ridge_predictions = Ridge.predict(X_test)

In [11]:
#print RMSLE
print ('Ridge Regression RMSE is', np.sqrt(mean_squared_error(y_test, Ridge_predictions)))

Ridge Regression RMSE is 3774.5442779356827


#### 2.2 Algorithm: Boosted Decision Tree Regressor

In [12]:
#Train the Model and predict
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
Tree = DecisionTreeRegressor()
Tree.fit(X_train,y_train)
Tree_predictions = Tree.predict(X_test)

In [13]:
#print RMSLE
print ('Boosted Decision Tree Regression RMSE is', np.sqrt(mean_squared_error(y_test, Tree_predictions)))

Boosted Decision Tree Regression RMSE is 5200.84673394631


#### 2.3 Algorithm: Random Forest Regressor

In [14]:
#Train the Model and predict
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
Forest = RandomForestRegressor()
Forest.fit(X_train,y_train)
Forest_predictions = Forest.predict(X_test)

In [15]:
#print RMSLE
print ('Random Forest Regression RMSE is', np.sqrt(mean_squared_error(y_test, Forest_predictions)))

Random Forest Regression RMSE is 5305.622302668888


#### 2.4 Algorithm: Bayesian Linear Regressor

In [16]:
#Train the Model and predict
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_error
Bayesian = BayesianRidge()
Bayesian.fit(X_train,y_train)
Bayesian_predictions = Bayesian.predict(X_test)

In [17]:
#print RMSLE
print ('Bayesian Ridge Regression RMSE is', np.sqrt(mean_squared_error(y_test, Bayesian_predictions)))

Bayesian Ridge Regression RMSE is 5015.695707203811


#### 2.4 XGB regressor

In [18]:
import xgboost as xgb

xgb=xgb.XGBRegressor()
xgb.fit(X_train,y_train)
xgb_predictions = xgb.predict(X_test)

print ('XGB Regression RMSE is', np.sqrt(mean_squared_error(y_test, xgb_predictions)))

XGB Regression RMSE is 4859.740070030585


#### 2.5 Sector Vector Regressor

In [19]:
#importing module
from sklearn.svm import SVR
#making the instance
svr = SVR()
#learning
svr.fit(X_train,y_train)
#Prediction
svrprediction=svr.predict(X_test)

print ('SVR Regression RMSE is', np.sqrt(mean_squared_error(y_test, svrprediction)))

SVR Regression RMSE is 4479.984470096623


### 3. Compare and chose best model

In [20]:
print ('Simple Regression RMSE is', np.sqrt(mean_squared_error(y_test, lm_predictions)))
print ('Lasso Regression RMSE is', np.sqrt(mean_squared_error(y_test, Lasso_predictions)))
print ('Ridge Regression RMSE is', np.sqrt(mean_squared_error(y_test, Ridge_predictions)))
print ('Boosted Decision Tree Regression RMSE is', np.sqrt(mean_squared_error(y_test, Tree_predictions)))
print ('Random Forest Regression RMSE is', np.sqrt(mean_squared_error(y_test, Forest_predictions)))
print ('Bayesian Ridge Regression RMSE is', np.sqrt(mean_squared_error(y_test, Bayesian_predictions)))
print ('XGB Regression RMSE is', np.sqrt(mean_squared_error(y_test, xgb_predictions)))

print ('SVR Regression RMSE is', np.sqrt(mean_squared_error(y_test, svrprediction)))

Simple Regression RMSE is 3934.656560357617
Lasso Regression RMSE is 3907.50195764579
Ridge Regression RMSE is 3774.5442779356827
Boosted Decision Tree Regression RMSE is 5200.84673394631
Random Forest Regression RMSE is 5305.622302668888
Bayesian Ridge Regression RMSE is 5015.695707203811
XGB Regression RMSE is 4859.740070030585
SVR Regression RMSE is 4479.984470096623


Clearly, this best model is a **linear regression**. 

On the next notebook, I shall try to improve it by using a proper feature selection, and then fine tuning the model hyperparameters.