## CHAPTER 12
---
# MODEL SELECTION

---
- Model selection is, in this book, selecting the best learning algorithm and its best hyperparameters
- In this chapter, we will cover techniques to efficiently select the best model from a set of candidates
- Hyperparameters are like the settings for the learning algorithm that we must choose before starting training

## 12.1 Selecting Best Models Using Exhaustive Search

**Problem:** You want to select the best model by searching over a range of hyperparameters

**Solution:** Use scikit-learn’s GridSearchCV
- GridSearchCV is a brute-force approach to model selection using cross-validation.
- Specifically, a user defines sets of possible values for one or multiple hyperparameters, and then GridSearchCV trains a model using every value and/or combination of values. 
- The model with the best performance score is selected as the best model

In [1]:
# Load libraries
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Create logistic regression
logistic = linear_model.LogisticRegression(max_iter=10000)

# Create range of candidate regularization values
C = np.logspace(0, 4, 10)

# Create range of candidate solver values
l1_solver = ['liblinear', 'saga']
l2_solver = ['newton-cg', 'lbfgs', 'sag', 'saga']

# Create dictionary hyperparameter candidates
hyperparameters = [dict(C=C, penalty=['l1'], solver=l1_solver), 
                   dict(C=C, penalty=['l2'], solver=l2_solver)]

# Create grid search
gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, verbose=0)

# Fit grid search
best_model = gridsearch.fit(features, target)

# View best hyperparameters
print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])
print('Best Solver:', best_model.best_estimator_.get_params()['solver'])

Best Penalty: l1
Best C: 2.7825594022071245
Best Solver: saga


**My additions to book's code:**
- The code, as it is written in the book, was throwing a bunch of errors ("FitFailedWarning") due to the fact that some hyperparameters cannot be combined. I wish skit-learn ignored them but it has to show errors
- I added the solver values and modified the hyperparameters variable to separate l1 parameters from l2 parameters. 
- To my surprise it worked perfectly and I actually learned something new.
- I also added max_iter=10000 in the logistic variable because I was getting ConvergenceWarning

#### Discussion:
- Let's calculate the number of models from which the best was selected:
    - 10C * 1l1 * 2l1_solver = 20 models
    - 10C * 1l2 * 4l2_solver = 40 models
    - *Total:* 60 models
- The best model's parameters are: solver=saga, penalty=l1, and C=2.78
- By default, after identifying the best hyperparameters, GridSearchCV will retrain a model using the best hyperparameters on the entire dataset (rather than leaving a fold out for cross-validation). We can use this model to predict values like any other scikit-learn model:

In [2]:
# Predict target vector
best_model.predict(features)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [3]:
best_model.best_estimator_

LogisticRegression(C=2.7825594022071245, max_iter=10000, penalty='l1',
                   solver='saga')

One GridSearchCV parameter is worth noting: verbose. While mostly unnecessary, it can be reassuring during long searching processes to receive an indication that the search is progressing. The verbose parameter determines the amount of messages
outputted during the search, with 0 showing no output, and 1 to 3 outputting messages with increasing detail.

## 12.2 Selecting Best Models Using Randomized Search

**Problem:** You want a computationally cheaper method than exhaustive search to select the best model.

**Solution:** Use scikit-learn’s RandomizedSearchCV