### Colab Activity 9.3: Using GridSearchCV

**Expected Time: 45 Minutes**


This activity focuses on using `GridSearchCV` to search over different hyperparameter values within the `Ridge` estimator.  You will first use the grid search to search parameters for an estimator.  Then, you will incorporate a pipeline into the grid search and identify the step in the pipeline you are searching along with the hyperparameters. 

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler


### The Data

We again use the California housing dataset from scikit-learn.  You are building regression models with the `MedHouseVal` as the target feature.  The data is loaded and described below.  

In [None]:
cali = fetch_california_housing(as_frame=True)

In [None]:
cali.frame.head()

In [None]:
X = cali.frame.drop('MedHouseVal', axis=1)
y = cali.frame['MedHouseVal']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Problem 1

#### Dictionary for grid search



As discussed in the videos, to search over hyperparameters, you have to create a dictionary with a key whose name exactly matches the hyperparameter.  With the `Ridge` estimator, this will be `alpha`.  Create a dictionary with `alpha` as the key and values `[0.1, 1.0, 10.0]` and assign it to the variable `params_dict` below.  

In [None]:


params_dict = {
    'alpha': [.1, 1.0, 10.0]
}

# Answer check
print(params_dict.values())
print(params_dict.keys())

### Problem 2

#### Creating the grid search object


Instantiate a `Ridge()` regressor and assign to `ridge`.

Next, use `GridSearchCV(` to instantiate a grid search object using `ridge` as the estimator. Set the argument `param_grid` equal to `params_dict`. Assign your grid to `grid` below. 

In [None]:


ridge = Ridge()
grid = GridSearchCV(estimator=ridge, param_grid=params_dict)

# Answer check
print(grid.get_params()['param_grid'])
print(grid)

### Problem 3

#### Performing the grid search




- Use the `fit` function on `grid` to train your model using `X_train`  and `y_train`.
- Use the `predict` function on `grid` to compute the predictions on `X_train`. Assign your result to `train_preds`.
- Use the `predict` function on `gird` to compute the predictions on `X_test`. Assign your result to `test_preds`.
- Use the `mean_squared_error` function to compute the MSE between `y_train` and `train_preds`. Assign your result to `train_mse`.
- Use the `mean_squared_error` function to compute the MSE between `y_test` and `test_preds`. Assign your result to `test_mse`.



In [None]:
grid.fit(X_train, y_train)

train_preds = grid.predict(X_train)
test_preds = grid.predict(X_test)
train_mse = mean_squared_error(train_preds, y_train)
test_mse = mean_squared_error(test_preds, y_test)

# Answer check
print(f'Train MSE: {train_mse}')
print(f'Test MSE: {test_mse}')

### Problem 4

#### Identify optimal alpha value


Use y fit grid to determine the optimal alpha value.  Assign this as a float to `best_alpha` below.  (**Hint**: Use the `best_params_` attribute of the fit grid.)

In [None]:


best_alpha = grid.best_params_

# Answer check
print(f'Best alpha: {list(best_alpha.values())[0]}')

### Problem 5

#### Pipeline with Grid Search


To use a `Pipeline` in a `GridSearchCV`, you want to preface the value in your parameter dictionary with an all-lowercase version of the object.  For example, to search over a ridge estimator's alpha value, we will create a pipeline with names `scaler` and `ridge` to use the `StandardScaler` followed by the `Ridge` regressor.  To search over the ridge object alpha parameter, we write `ridge__alpha`. (Note there are two underscores here.)

Below, you are provided a pipeline and dictionary ready to be used in a new grid search.  You are to instantiate, fit, and score a grid search on the train and test data using mean squared error. Create your grid object as `grid_2` below and assign the training error and test error to `model_2_train_mse` and `model_2_test_mse`.  Determine the optimal value for `alpha` and assign it as a dictionary to `model_2_best_alpha` below.

In [None]:
pipe = Pipeline([('scale', StandardScaler()), ('ridge', Ridge())])

In [None]:
param_dict = {'ridge__alpha': [0.001, 0.1, 1.0, 10.0, 100.0, 1000.0]}

In [None]:


grid_2 = GridSearchCV(estimator=pipe, param_grid=param_dict)
model_2_train_mse = mean_squared_error(grid.best_estimator_.predict(X_train), y_train)
model_2_test_mse = mean_squared_error(grid.best_estimator_.predict(X_test), y_test)
model_2_best_alpha = grid.best_params_

# Answer check
print(f'Test MSE: {model_2_test_mse}')
print(f'Best Alpha: {list(model_2_best_alpha.values())[0]}')