## Hyperparameter Tuning with CVGridSearch

In [1]:
!pip --quiet install mglearn

We will select our Ridge, Lasso, and SVR models and tune our hyperparameters to refine our model.

In [2]:
import pandas as pd
import numpy as np

import mglearn
from sklearn.linear_model import Ridge, Lasso
from sklearn.svm import SVR

import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
cd ..

/home/jovyan/Ames-Iowa-Data


In [4]:
df = pd.read_csv('data/final_ames_df.csv')

In [5]:
final_ames_df = pd.DataFrame(df)

In [6]:
final_ames_df = final_ames_df.drop(['Unnamed: 0'], axis = 1)
target = final_ames_df['SalePrice']
features = final_ames_df.drop(['SalePrice'], axis = 1)

In [7]:
from sklearn.model_selection import GridSearchCV, ShuffleSplit, StratifiedShuffleSplit, train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=42)

### Ridge Regression

Hyperparameters to tune:

**alpha** : {float, array-like}, shape (n_targets)

Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to C^-1 in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.

**max_iter** : int, optional

Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000.

**solver** : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}

Solver to use in the computational routines:

*‘auto’* chooses the solver automatically based on the type of data.
*‘svd’* uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
*‘cholesky’* uses the standard scipy.linalg.solve function to obtain a closed-form solution.
*‘sparse_cg’* uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
*‘lsqr’* uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest but may not be available in old scipy versions. It also uses an iterative procedure.
*‘sag’* uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

In [None]:
ridge_params ={
    'alpha' : np.logspace(-4, 4, 9),
    'solver' : ['sag', 'saga']
}

In [None]:
ridge_gs = GridSearchCV(Ridge(), param_grid = ridge_params, return_train_score=True)

In [None]:
ridge_gs.fit(X_train, y_train)

In [None]:
cv_results = pd.DataFrame(ridge_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()

In [None]:
(cv_results[cv_results['param_solver'] == 'saga']
 [['mean_train_score','mean_test_score','param_alpha']]
 .plot(x='param_alpha'))

plt.axvline(cv_results[cv_results['param_solver'] == 'saga']['mean_test_score'].max(), c='r', ls='--', label = 'optimal C')
plt.title('Complexity Curve for Ridge')
plt.legend()
plt.xscale('log')

In [None]:
ridge_gs.best_params_

In [None]:
ridge_gs.best_score_

### Lasso Regression

**alpha** : float, optional

Constant that multiplies the L1 term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

In [None]:
lasso_params = {
    'alpha' : np.logspace(-6, 0, 7)
}

In [None]:
lasso_gs = GridSearchCV(Lasso(), param_grid = lasso_params, return_train_score=True)

In [None]:
lasso_gs.fit(X_train, y_train)
cv_results = pd.DataFrame(lasso_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()

In [None]:
(cv_results[['mean_train_score','mean_test_score','param_alpha']]
 .plot(x='param_alpha'))

plt.axvline(0.01, c='r', ls='--', label = 'optimal C')
plt.title('Complexity Curve for Lasso')
plt.legend()
plt.xscale('log')

In [None]:
lasso_gs.best_params_

In [None]:
lasso_gs.best_score_

### SVR Regression

In [11]:
svr_params = {
    'C' : np.logspace(-3,3,7)
#     'kernel' : ['linear', 'poly', 'rbf', 'sigmoid']
}

In [15]:
svr_gs = GridSearchCV(SVR(kernel = 'linear'), param_grid= svr_params, return_train_score=True)

In [16]:
svr_gs.fit(X_train, y_train)

KeyboardInterrupt: 

In [None]:
cv_results = pd.DataFrame(svr_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()

In [None]:
(cv_results[cv_results['param_solver'] == 'saga']
 [['mean_train_score','mean_test_score','param_alpha']]
 .plot(x='param_alpha'))

plt.axvline(cv_results[cv_results['param_solver'] == 'saga']['mean_test_score'].max(), c='r', ls='--', label = 'optimal C')
plt.title('Complexity Curve for SVM Regression')
plt.legend()
plt.xscale('log')

In [None]:
svr_params = {
    'C' : np.logspace(1,3,3),
    'kernel' : ['linear', 'sigmoid']
}

In [None]:
svr_gs = GridSearchCV(SVR(), param_grid= svr_params, return_train_score=True)

In [None]:
svr_gs.fit(X_train, y_train)

In [None]:
cv_results = pd.DataFrame(svr_gs.cv_results_)
cv_results.sort_values('mean_test_score', ascending=False).head()

In [None]:
svr_gs.best_params_

In [None]:
svr_gs.best_score_

In [None]:
def complexity_curve(results, model, tuning_param, stationary_param, stationary_val):
    (results[results['{}'.format(stationary_param)]=='{}'.format(stationary_val)]
     [['mean_train_score','mean_test_score','{}'.format(tuning_param)]]
     .plot(x='{}'.format(tuning_param)))
                     
    plt.axvline(results[results['{}'.format(stationary_param)]=='{}'.format(stationary_val)]
                ['mean_test_score'].max(), c='r', ls='--', label = 'optimal C')
    plt.title('Complexity Curve for {}'.format(model))

In [None]:
complexity_curve(cv_results, Ridge, param_alpha, param_solver, saga)