## Model Selection & Grid Search CV

![Namespace Labs](../../../../labs.png)

## Predict the number of shares an article will get

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

In [2]:
df = pd.read_csv('OnlineNewsPopularity.csv')

In [3]:
df.head()

Unnamed: 0,url,timedelta,n_tokens_title,n_tokens_content,n_unique_tokens,n_non_stop_words,n_non_stop_unique_tokens,num_hrefs,num_self_hrefs,num_imgs,...,min_positive_polarity,max_positive_polarity,avg_negative_polarity,min_negative_polarity,max_negative_polarity,title_subjectivity,title_sentiment_polarity,abs_title_subjectivity,abs_title_sentiment_polarity,shares
0,http://mashable.com/2013/01/07/amazon-instant-...,731,12,219,0.663594,1.0,0.815385,4,2,1,...,0.1,0.7,-0.35,-0.6,-0.2,0.5,-0.1875,0.0,0.1875,593
1,http://mashable.com/2013/01/07/ap-samsung-spon...,731,9,255,0.604743,1.0,0.791946,3,1,1,...,0.033333,0.7,-0.11875,-0.125,-0.1,0.0,0.0,0.5,0.0,711
2,http://mashable.com/2013/01/07/apple-40-billio...,731,9,211,0.57513,1.0,0.663866,3,1,1,...,0.1,1.0,-0.466667,-0.8,-0.133333,0.0,0.0,0.5,0.0,1500
3,http://mashable.com/2013/01/07/astronaut-notre...,731,9,531,0.503788,1.0,0.665635,9,0,1,...,0.136364,0.8,-0.369697,-0.6,-0.166667,0.0,0.0,0.5,0.0,1200
4,http://mashable.com/2013/01/07/att-u-verse-apps/,731,13,1072,0.415646,1.0,0.54089,19,19,20,...,0.033333,1.0,-0.220192,-0.5,-0.05,0.454545,0.136364,0.045455,0.136364,505


In [4]:
X =  df.drop([' shares','url'],axis=1)

In [5]:
y = df[' shares']

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=101)

In [8]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [9]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor # this is experimental
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge


In [21]:
regressors = [GradientBoostingRegressor(),HistGradientBoostingRegressor(),Lasso(),Ridge()]

for regressor in regressors:
    regressor.fit(X_train,y_train)
    print('The {}  Root Mean Squared Error is {} '.format(regressor,
            np.sqrt(mean_squared_error(y_test, regressor.predict(X_test)))) )


The GradientBoostingRegressor()  Root Mean Squared Error is 9828.043004846559 
The HistGradientBoostingRegressor()  Root Mean Squared Error is 9271.95542730209 
The Lasso()  Root Mean Squared Error is 14304.968106200065 
The Ridge()  Root Mean Squared Error is 13232.735501407204 


In [11]:
parameters = {
    'loss': ['least_squares', 'least_absolute_deviation'],
    'learning_rate': [0.1, 0.2,0.3],
    'max_depth': [3,5,6] #The maximum depth of each tree
}

In [12]:
from sklearn.model_selection import GridSearchCV

In [17]:
regressor = GridSearchCV(HistGradientBoostingRegressor(), parameters, verbose=1,cv=5) 
# cv is 5-fold cross validation
# refit = true Refit the regressor using the best found parameters on the whole dataset.
regressor.fit(X_train,y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  90 out of  90 | elapsed:  1.1min finished


GridSearchCV(cv=5, estimator=HistGradientBoostingRegressor(),
             param_grid={'learning_rate': [0.1, 0.2, 0.3],
                         'loss': ['least_squares', 'least_absolute_deviation'],
                         'max_depth': [3, 5, 6]},
             verbose=1)

![KFOLD](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png)

In [18]:
regressor.best_params_

{'learning_rate': 0.1, 'loss': 'least_squares', 'max_depth': 3}

In [19]:
predictions = regressor.predict(X_test)

In [20]:
print('Root Mean Squared Error is {} '.format(np.sqrt(mean_squared_error(y_test,
                                                    regressor.predict(X_test)))) )


Root Mean Squared Error is 9148.701877324085 


Happy coding!