# Introduction
> In machine learning, training algorithms are used to learn model parameters by minimizing a loss function. Some algorithms, like support vector classifiers and random forests, have user-defined hyperparameters that affect parameter learning.
> > Parameters are learned during training, while hyperparameters are set manually. For instance, random forests consist of decision trees, but the number of trees must be predetermined.
> > > This process is known as `hyperparameter tuning` or `model selection`. The goal is to choose the best learning algorithm and hyperparameters, leading to the best model. Various techniques are available to efficiently select the optimal model from a set of candidates in this chapter.

#  Selecting the Best Models Using Exhaustive Search
> `GridSearchCV` is a method for model selection through `cross-validation`.
> > User defines possible hyperparameter values, and GridSearchCV trains models using all combinations.
> > The best model is chosen based on performance score.
> > > For example, we used logistic regression with hyperparameters C and regularization penalty, along with other parameters. Specific values must be set for training.

In [1]:
# If you want to select the best model by searching over a range of hyperparameters.
# Use scikit-learn’s GridSearchCV:
# Load libraries
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create logistic regression
logistic = linear_model.LogisticRegression(max_iter=500, solver='liblinear')
# Create range of candidate penalty hyperparameter values
penalty = ['l1','l2']
# Create range of candidate regularization hyperparameter values
C = np.logspace(0, 4, 10)
# Create dictionary of hyperparameter candidates
hyperparameters = dict(C=C, penalty=penalty)
# Create grid search
gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, verbose=0)
# Fit grid search
best_model = gridsearch.fit(features, target)
# Show the best model
print(best_model.best_estimator_)

LogisticRegression(C=7.742636826811269, max_iter=500, penalty='l1',
                   solver='liblinear')


In [2]:
np.logspace(0, 4, 10)

array([1.00000000e+00, 2.78255940e+00, 7.74263683e+00, 2.15443469e+01,
       5.99484250e+01, 1.66810054e+02, 4.64158883e+02, 1.29154967e+03,
       3.59381366e+03, 1.00000000e+04])

- we define two possible values for the regularization penalty: ['l1', 'l2'].
For each combination of C and regularization penalty values, we train the model
and evaluate it using k-fold cross-validation. In our solution, we have 10 possible
values of C, 2 possible values of regularization penalty, and 5 folds. They create
10 × 2 × 5 = `100 candidate models`, from which the best is selected.

In [3]:
# GridSearchCV is complete, we can see the hyperparameters of the best model:
# View best hyperparameters
print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])

Best Penalty: l1
Best C: 7.742636826811269


- By default, after identifying the best hyperparameters, GridSearchCV will retrain a model using the best hyperparameters on the entire dataset (rather than leaving a fold out for cross-validation).
- We can use this model to predict values like any other
scikit-learn model:


In [4]:
# Predict target vector
best_model.predict(features)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

# Selecting the Best Models Using Randomized Search
> Using RandomizedSearchCV with user-defined hyperparameter values is a more efficient method than GridSearchCV for finding the best model.
> > Through `randomized sampling`, scikit-learn can search over specific distributions for optimal hyperparameter values.


In [5]:
# If you want a computationally cheaper method than exhaustive search to select the best model.
# Use scikit-learn’s RandomizedSearchCV:
# Load libraries
from scipy.stats import uniform
from sklearn import linear_model, datasets
from sklearn.model_selection import RandomizedSearchCV
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create logistic regression
logistic = linear_model.LogisticRegression(max_iter=500, solver='liblinear')
# Create range of candidate regularization penalty hyperparameter values
penalty = ['l1', 'l2']
# Create distribution of candidate regularization hyperparameter values
C = uniform(loc=0, scale=4)
# Create hyperparameter options
hyperparameters = dict(C=C, penalty=penalty)
# Create randomized search
randomizedsearch = RandomizedSearchCV(
 logistic, hyperparameters, random_state=1, n_iter=100, cv=5, verbose=0,
 n_jobs=-1)

In [6]:
# Fit randomized search
best_model = randomizedsearch.fit(features, target)
# Print best model
print(best_model.best_estimator_)

LogisticRegression(C=1.668088018810296, max_iter=500, penalty='l1',
                   solver='liblinear')


In [7]:
# View best hyperparameters
print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])

Best Penalty: l1
Best C: 1.668088018810296


In [8]:
# Predict target vector
best_model.predict(features)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

- The number of sampled combinations of hyperparameters (i.e., the number of candidate models trained) is specified with the n_iter (number of iterations) setting.
- It’s
worth noting that RandomizedSearchCV isn’t inherently faster than GridSearchCV, but
it often achieves comparable performance to GridSearchCV in less time just by testing
fewer combinations

# Selecting the Best Models from Multiple Learning Algorithms

In [10]:
# If you want to select the best model by searching over a range of learning algorithms and their respective hyperparameters.
# Create a dictionary of candidate learning algorithms and their hyperparameters to use as the search space for GridSearchCV:
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
# Set random seed
np.random.seed(0)
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create a pipeline
pipe = Pipeline([("classifier", RandomForestClassifier())])

In [11]:
# Create dictionary with candidate learning algorithms and their hyperparameters
search_space = [{"classifier": [LogisticRegression(max_iter=500,solver='liblinear')],
                 "classifier__penalty": ['l1', 'l2'],
                 "classifier__C": np.logspace(0, 4, 10)},
                 {"classifier": [RandomForestClassifier()],
                 "classifier__n_estimators": [10, 100, 1000],
                 "classifier__max_features": [1, 2, 3]}]
# Create grid search
gridsearch = GridSearchCV(pipe, search_space, cv=5, verbose=0)
# Fit grid search
best_model = gridsearch.fit(features, target)
# Print best model
print(best_model.best_estimator_)


Pipeline(steps=[('classifier',
                 LogisticRegression(C=7.742636826811269, max_iter=500,
                                    penalty='l1', solver='liblinear'))])


  _data = np.array(data, dtype=dtype, copy=copy,


#### Each learning algorithm has its own hyperparameters, and we define their candidate values using the format `classifier__[hyperparameter name]`. 
> For example, for our logistic regression, to define the set of possible values for regularization hyperparameter space, C, and potential types of regularization penalties, penalty, we create a
dictionary:

In [12]:
# After the search is complete, we can use best_estimator_ to view the best model’s learning algorithm and hyperparameters:
# View best model
print(best_model.best_estimator_.get_params()["classifier"])

LogisticRegression(C=7.742636826811269, max_iter=500, penalty='l1',
                   solver='liblinear')


In [13]:
# Predict target vector
best_model.predict(features)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

# End of chapter 12