## Hyperparameter Tuning with CVGridSearch

In [1]:
!pip --quiet install mglearn

We will select our Ridge, Lasso, and SVR models and tune our hyperparameters to refine our model.

In [2]:
import pandas as pd
import numpy as np

import mglearn
from sklearn.linear_model import Ridge, Lasso
from sklearn.svm import SVR

import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
cd ..

/home/jovyan/Ames-Iowa-Data


In [4]:
df = pd.read_csv('data/final_ames_df.csv')

In [5]:
final_ames_df = pd.DataFrame(df)

In [6]:
final_ames_df = final_ames_df.drop(['Unnamed: 0'], axis = 1)
target = final_ames_df['SalePrice']
features = final_ames_df.drop(['SalePrice'], axis = 1)

In [7]:
from sklearn.model_selection import GridSearchCV, ShuffleSplit, StratifiedShuffleSplit, train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=42)

### Ridge Regression

Hyperparameters to tune:

**alpha** : {float, array-like}, shape (n_targets)

Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to C^-1 in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.

**max_iter** : int, optional

Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000.

**solver** : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}

Solver to use in the computational routines:

*‘auto’* chooses the solver automatically based on the type of data.
*‘svd’* uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
*‘cholesky’* uses the standard scipy.linalg.solve function to obtain a closed-form solution.
*‘sparse_cg’* uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
*‘lsqr’* uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest but may not be available in old scipy versions. It also uses an iterative procedure.
*‘sag’* uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

In [17]:
ridge_params ={
    'alpha' : np.logspace(-3, 3, 7),
    'solver' : ['sag', 'saga']
}

In [18]:
ridge_gs = GridSearchCV(Ridge(), param_grid = ridge_params, return_train_score=True)

In [19]:
ridge_gs.fit(X_train, y_train)

GridSearchCV(cv=None, error_score='raise',
       estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'alpha': array([  1.00000e-03,   1.00000e-02,   1.00000e-01,   1.00000e+00,
         1.00000e+01,   1.00000e+02,   1.00000e+03]), 'solver': ['sag', 'saga']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

In [20]:
cv_results = pd.DataFrame(ridge_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_alpha,param_solver,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
11,0.505927,0.000746,0.880314,0.959749,100.0,saga,"{'alpha': 100.0, 'solver': 'saga'}",1,0.831846,0.966047,0.892674,0.957733,0.91652,0.955467,0.075475,1.8e-05,0.03565,0.004548
10,0.184194,0.000758,0.879673,0.960143,100.0,sag,"{'alpha': 100.0, 'solver': 'sag'}",2,0.831064,0.966386,0.892458,0.958091,0.915598,0.955953,0.021403,2e-06,0.035669,0.0045
12,0.05565,0.000703,0.873041,0.923959,1000.0,sag,"{'alpha': 1000.0, 'solver': 'sag'}",3,0.839243,0.934968,0.882173,0.919566,0.897776,0.917344,0.002271,4.3e-05,0.024749,0.007837
13,0.139342,0.000726,0.872852,0.923677,1000.0,saga,"{'alpha': 1000.0, 'solver': 'saga'}",4,0.839068,0.934634,0.882031,0.919372,0.897526,0.917026,0.008636,3e-06,0.024728,0.007807
9,1.220285,0.000771,0.854359,0.965325,10.0,saga,"{'alpha': 10.0, 'solver': 'saga'}",5,0.809357,0.969961,0.855135,0.965329,0.898708,0.960686,0.314149,1.1e-05,0.036473,0.003786
8,0.519624,0.00076,0.849656,0.965854,10.0,sag,"{'alpha': 10.0, 'solver': 'sag'}",6,0.805561,0.970578,0.8484,0.965834,0.895134,0.961149,0.090623,1.4e-05,0.03657,0.003849
7,1.517133,0.00074,0.843088,0.965998,1.0,saga,"{'alpha': 1.0, 'solver': 'saga'}",7,0.802527,0.97051,0.834106,0.966335,0.892768,0.961148,0.406506,7e-06,0.037377,0.00383
5,1.553943,0.00075,0.841562,0.966062,0.1,saga,"{'alpha': 0.1, 'solver': 'saga'}",8,0.801777,0.970571,0.830914,0.966428,0.892134,0.961188,0.421094,8e-06,0.037641,0.003839
3,1.544111,0.000737,0.841548,0.966069,0.01,saga,"{'alpha': 0.01, 'solver': 'saga'}",9,0.801977,0.970575,0.830723,0.96644,0.892083,0.961192,0.413934,1.1e-05,0.037566,0.00384
1,1.551249,0.000742,0.841456,0.966065,0.001,saga,"{'alpha': 0.001, 'solver': 'saga'}",10,0.801849,0.970571,0.830783,0.966432,0.891875,0.961194,0.409553,1.7e-05,0.037513,0.003837


In [21]:
ridge_gs.best_params_

{'alpha': 100.0, 'solver': 'saga'}

In [22]:
ridge_gs.best_score_

0.88031379774797136

### Lasso Regression

**alpha** : float, optional

Constant that multiplies the L1 term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

In [26]:
lasso_params = {
    'alpha' : np.logspace(-4, -1, 4)
}

In [27]:
lasso_gs = GridSearchCV(Lasso(), param_grid = lasso_params, return_train_score=True)

In [28]:
lasso_gs.fit(X_train, y_train)
cv_results = pd.DataFrame(lasso_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()



Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_alpha,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
2,0.016454,0.000602,0.887108,0.949295,0.01,{'alpha': 0.01},1,0.841858,0.955326,0.901027,0.94857,0.918525,0.943987,0.001142,3.2e-05,0.032805,0.004657
1,0.099546,0.000738,0.855098,0.965735,0.001,{'alpha': 0.001},2,0.817414,0.970932,0.845166,0.965449,0.902847,0.960826,0.029389,2e-05,0.035571,0.004131
0,0.187474,0.000711,0.823073,0.968591,0.0001,{'alpha': 0.0001},3,0.800072,0.973871,0.782236,0.969071,0.887086,0.962831,0.010779,7e-06,0.045785,0.00452
3,0.006979,0.000537,0.806663,0.826697,0.1,{'alpha': 0.1},4,0.767506,0.83743,0.833861,0.824428,0.818655,0.818235,0.000137,1.5e-05,0.028394,0.007999


In [29]:
lasso_gs.best_params_

{'alpha': 0.01}

In [30]:
lasso_gs.best_score_

0.88710806168687895

### SVR Regression

In [35]:
svr_params = {
    'C' : np.logspace(-3,3,7),
    'kernel' : ['linear', 'poly', 'rbf', 'sigmoid']
}

In [36]:
svr_gs = GridSearchCV(SVR(), param_grid= svr_params, return_train_score=True)

In [37]:
svr_gs.fit(X_train, y_train)
cv_results = pd.DataFrame(svr_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()

KeyboardInterrupt: 

In [None]:
svr_gs.best_params_

In [None]:
svr_gs.best_score_