# Hyperparameter Tuning
- End of the day we are looking to have a model which is best representing the data.
- Get the best model
- In Linear reg -> We are just choosing parameters that best represent the model
- In Lasso/Ridge we try to vary alpha
- In K-nn We want to choose n_neighbors neighbors

- These values like alpha and k are called as hyper parameters

- Now comes the art of choosing the right values for the huperparametrs

- as you see these values can't be learned

### Choosing the right values

- We can try different values

- Fit them seperately

- see how well each performs

- see which is best

- It is essential to use CV here

This process is called as hyperparameter Tuning


When we have two hyperparameters 
C and K
then we use something called as a grid search CV

![Grid Search CV](grid search cv.png)

In [None]:
# how to use this
from sklearn.model_seletion import GridSearchCV
param_grid = {'n_neighbors':np.arange(1. 50)} # dictionary of hyperparameter to value range pair
knn_cv = GridSearchCV(knn, param_grid, cv = 5)
knn_cv.fit(X, y)
knn_cv.best_params_
knn_cv.best_score_

## Hyperparameter tuning with GridSearchCV

Lets tune the n_neighbors parameter of the KNeighborsClassifier() <br>
using GridSearchCV on logistic regression on the diabetes dataset instead! <br><br>

Like the alpha parameter of lasso and ridge regularization that you saw earlier, <br>
logistic regression also has a regularization parameter: C. <br>
    C controls the inverse of the regularization strength, and this is what you will tune in this exercise. <br>
    A large C can lead to an overfit model, while a small C can lead to an underfit model. <br><br>

Lets use GridSearchCV and logistic regression to find the optimal C in this hyperparameter space. <br>

Here, we are focusing on the process of setting up the hyperparameter grid and performing grid-search cross-validation. <br>
In practice, you will indeed want to hold out a portion of your data for evaluation purposes, <br>

In [5]:
import pandas as pd
import numpy as np
df = pd.read_csv('/media/Datascience/Projects/Giridhar/Datasets/pima-indians-diabetes-database/diabetes.csv')
X = df.iloc[:,range(0,8)].values
y = df.iloc[:,[-1]].values

In [14]:
# Import necessary modules
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# Setup the hyperparameter grid
c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}

# Instantiate a logistic regression classifier: logreg
logreg = LogisticRegression(solver = 'liblinear')

# Instantiate the GridSearchCV object: logreg_cv
logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

# Fit it to the data
print(y.shape)
y_mod = np.ravel(y,order='C')
logreg_cv.fit(X, y_mod)

# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_)) 
print("Best score is {}".format(logreg_cv.best_score_))


(768, 1)
Tuned Logistic Regression Parameters: {'C': 268.2695795279727}
Best score is 0.7708333333333334


## Hyperparameter tuning with RandomizedSearchCV
GridSearchCV can be computationally expensive, <br>
especially if you are searching over a large hyperparameter space <br>
and dealing with multiple hyperparameters. <br>

A solution to this is to use RandomizedSearchCV, in which not all hyperparameter values are tried out. <br>
Instead, a fixed number of hyperparameter settings is sampled from specified probability distributions. <br>

Here, we will be using a new model: the Decision Tree. 
    Don't worry about the specifics of how this model works. 
    Just like k-NN, linear regression, and logistic regression, 
    decision trees in scikit-learn have .fit() and .predict() methods 
    that you can use in exactly the same way as before. 
    
    Decision trees have many parameters that can be tuned, 
    such as max_features, max_depth, and min_samples_leaf.
    This makes it an ideal use case for RandomizedSearchCV.

In [15]:
# Import necessary modules
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

# Setup the parameters and distributions to sample from: param_dist
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}

# Instantiate a Decision Tree classifier: tree
tree = DecisionTreeClassifier()

# Instantiate the RandomizedSearchCV object: tree_cv
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)

# Fit it to the data
tree_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))


Tuned Decision Tree Parameters: {'criterion': 'gini', 'max_depth': None, 'max_features': 7, 'min_samples_leaf': 7}
Best score is 0.7356770833333334


Note that RandomizedSearchCV will never outperform GridSearchCV. Instead, it is valuable because it saves on computation time.

### Hold Out set - Need

- A Hold out set is a set of data held for testing
- It will not be used in the CV
- Thus this would give us a clear idea on how the model performs on unseen data


In addition to C, logistic regression has a 'penalty' hyperparameter which specifies whether to use 'l1' or 'l2' regularization.

Lets create a hold-out set, tune the 'C' and 'penalty' hyperparameters of a logistic regression classifier using GridSearchCV on the training set.


In [19]:
# Import necessary modules
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# Create the hyperparameter grid
c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space, 'penalty': ['l1', 'l2']}

# Instantiate the logistic regression classifier: logreg
logreg = LogisticRegression()

# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)

# Instantiate the GridSearchCV object: logreg_cv
logreg_cv = GridSearchCV(logreg, param_grid, cv = 5)

# Fit it to the training data
logreg_cv.fit(X_train, y_train)

# Print the optimal parameters and best score
print("Tuned Logistic Regression Parameter: {}".format(logreg_cv.best_params_))
print("Tuned Logistic Regression Accuracy: {}".format(logreg_cv.best_score_))

Tuned Logistic Regression Parameter: {'C': 31.622776601683793, 'penalty': 'l2'}
Tuned Logistic Regression Accuracy: 0.7673913043478261


## Hold-out set in practice II: Regression

Remember lasso and ridge regression from the previous chapter? 
Lasso used the L1 penalty to regularize, while ridge used the L2 penalty. 
There is another type of regularized regression known as the elastic net.
In elastic net regularization, the penalty term is a linear combination of the L1 and L2 penalties:
<br>
<hr> a∗L1+b∗L2 <hr>
In scikit-learn, this term is represented by the 'l1_ratio' parameter: An 'l1_ratio' of 1 corresponds to an L1 penalty, and anything lower is a combination of L1 and L2.

In this exercise, We will GridSearchCV to tune the 'l1_ratio' of an elastic net model trained on the Gapminder data. 

As in the previous exercise, We will use a hold-out set to evaluate your model's performance.

In [17]:
import warnings
warnings.simplefilter('ignore')

In [18]:
# Import necessary modules
# Use Gapminder
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)

# Create the hyperparameter grid
l1_space = np.linspace(0, 1, 30)
param_grid = {'l1_ratio': l1_space}

# Instantiate the ElasticNet regressor: elastic_net
elastic_net = ElasticNet()

# Setup the GridSearchCV object: gm_cv
gm_cv = GridSearchCV(elastic_net, param_grid, cv = 5)

# Fit it to the training data
gm_cv.fit(X_train, y_train)

# Predict on the test set and compute metrics
y_pred = gm_cv.predict(X_test)
r2 = gm_cv.score(X_test, y_test)
mse = mean_squared_error(y_test, y_pred)
print("Tuned ElasticNet l1 ratio: {}".format(gm_cv.best_params_))
print("Tuned ElasticNet R squared: {}".format(r2))
print("Tuned ElasticNet MSE: {}".format(mse))

Tuned ElasticNet l1 ratio: {'l1_ratio': 0.0}
Tuned ElasticNet R squared: 0.24765337510702687
Tuned ElasticNet MSE: 0.16664179543611013
