## Hyperparameter Tunning

- Hyperparameters are parameters we specify to fitting the model
- Used to optimize our model
- Steps to tune our hyperparameters:
	- Try lots of different hyperparameters values
	- Fit all of them separately
	- See how well they perform
	- Choose the best performing class
- We use cross validation to avoid overfitting to the test set
- We'll use the test set for final evaluation

## Grid Search
- We create a table with the hyperparatemers we have to optimize

### Example:
| Number of neighbors | 11 |           |           |
|---------------------|----|-----------|-----------|
| Number of neighbors | 8  |           |           |
| Number of neighbors | 5  |           |           |
| Number of neighbors | 2  |           |           |
|                     |    | euclidean | manhattan |
|                     |    |  Metrics  |  Metrics  |

Then we calculate the score the each combination of hyperparameters and fill the table with the results:

| Number of neighbors | 11 | 0.8716    | 0.8692    |
|---------------------|----|-----------|-----------|
| Number of neighbors | 8  |   0.8704  |   0.8688  |
| Number of neighbors | 5  |   **0.8748**  |   0.8714  |
| Number of neighbors | 2  |   0.8634  |   0.8646  |
|                     |    | euclidean | manhattan |
|                     |    |  Metrics  |  Metrics  |

In this case we will use the parameters that perform best:
- Number of neighbors = 5
- Metrics = euclidean



## Exercise 
Now you have seen how to perform grid search hyperparameter tuning, you are going to **build a lasso regression model with optimal hyperparameters** to predict blood glucose levels using the features in the diabetes_df dataset.

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv('./datasets/diabetes_clean.csv')

In [6]:
features = ['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi',
       'dpf', 'age', 'diabetes']
target = ['glucose']

In [8]:
df_features = df[features]
df_target = df[target]

In [9]:
df_features.head(2)

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0


In [10]:
df_target.head(2)

Unnamed: 0,glucose
0,148
1,85


In [11]:
from sklearn.linear_model import Lasso

In [12]:
model = Lasso()

In [15]:
from sklearn.model_selection import KFold, \
                                    GridSearchCV

In [16]:
kfold = KFold(n_splits=3, shuffle=True, random_state=1)

In [18]:
params = {
    "alpha": [0.00001, 0.0001, 0.001, 0.01]
}

In [20]:
model_cross_validation = GridSearchCV(model, 
                                      param_grid=params,
                                      cv=kfold)

In [21]:
from sklearn.model_selection import train_test_split

In [24]:
feature_train, feature_test, target_train, target_test = train_test_split(df_features, df_target, 
                                                                         test_size=0.3,
                                                                         random_state = 1)

In [29]:
print("Train rows: {}".format(feature_train.shape[0]))

print("Test rows: {}".format(feature_test.shape[0]))


Train rows: 537
Test rows: 231
