# Hyperparameter Tuning and Cross-Validation Using `GridSearchCV`

- **Hyperparameters** parameters or knobs in the ML algo that are set by the user before training begins. It dictates the complexity and variation of the ML algorithm.
- **Hyperparameter Tuning** is the process of finding the best **hyperparameters** settings for a Machine Learning model. Basically, it's done by trying different values and permutations with model evaluation to get the best performance.
- **Cross-Validation**
    - Very important concept in ML. It involves partitioning the dataset into multiple subsets (called **folds**) then training and evaluation the model accuracy multiple times - each time using a different fold for train dataset. 
    - It helps provide more reliable estimate of the model's performance  by reducing the chance of focusing on sub group of the data in a single train-test-split operation.

![cv](https://www.mathworks.com/discovery/cross-validation/_jcr_content/mainParsys/image.adapt.full.medium.jpg/1718274806179.jpg)

In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#sklearn libraries
from sklearn.datasets import make_classification #to generate synthetic data
from sklearn.model_selection import train_test_split, GridSearchCV #to split the data into train and test and use GridSearchCV
from sklearn.tree import DecisionTreeClassifier #the ML algo 
from sklearn.metrics import accuracy_score #evaluation

In [17]:
# generate the data

X, y = make_classification(n_samples=9000,
                           n_features=18,
                           n_informative=4,
                           n_redundant=12,
                           random_state=2)

In [18]:
# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

In [19]:
#dtc_model = DecisionTreeClassifier(max_depth=i)
#dtc_model.fit(X_train, y_train)

- There's a dedicated function for performing cross-validation only
- However, it's recommended to perform cross-validation with hyperparameter tuning all in one functionality called `GridSearchCV`
- Possible Hyperparameters for our model:
```python
class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None)
```

1. Define the Hyperparameter Grid as a dicationary

In [28]:
param_grid = {
                'criterion': ['gini', 'entropy'],
                'max_depth': [2,4,6,10,20],
                'min_samples_split' : [ 2,5,10,15,20],
                'random_state':[10]
            }

2. Define the number of Cross-Validation Folds and the evaluation metric

In [29]:
cv_folds = 5
evaluation_metric = 'accuracy'

3. Deploy `GridSearchCV` with `DecisionTreeClassifier`

In [30]:
gs_dtc = GridSearchCV(DecisionTreeClassifier(),#call the model function without hyperparameters
                      param_grid,
                      scoring=evaluation_metric, 
                      cv=cv_folds
                      )

In [31]:
gs_dtc.fit(X, y)

How many trials/iterations did `GridSearchCV do? 2 x 5 x 5 (hyperparameters)  x 5 (cv)

We can ask `GridSearchCV` to provide the best hyperparameters

In [32]:
gs_dtc.best_params_

{'criterion': 'gini',
 'max_depth': 10,
 'min_samples_split': 20,
 'random_state': 10}

In [33]:
gs_dtc.best_score_

0.8873333333333333

The hyperparameters above give 88.7% accuracy score

We can view the results of all the iterations

In [36]:
pd.DataFrame(gs_dtc.cv_results_).head(10)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_max_depth,param_min_samples_split,param_random_state,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.023123,0.003084,0.000584,0.000236,gini,2,2,10,"{'criterion': 'gini', 'max_depth': 2, 'min_sam...",0.704444,0.697222,0.707222,0.683333,0.713333,0.701111,0.010286,46
1,0.020866,0.000169,0.000551,0.000207,gini,2,5,10,"{'criterion': 'gini', 'max_depth': 2, 'min_sam...",0.704444,0.697222,0.707222,0.683333,0.713333,0.701111,0.010286,46
2,0.020834,0.000619,0.000483,0.000223,gini,2,10,10,"{'criterion': 'gini', 'max_depth': 2, 'min_sam...",0.704444,0.697222,0.707222,0.683333,0.713333,0.701111,0.010286,46
3,0.020625,0.00053,0.000694,0.00029,gini,2,15,10,"{'criterion': 'gini', 'max_depth': 2, 'min_sam...",0.704444,0.697222,0.707222,0.683333,0.713333,0.701111,0.010286,46
4,0.021235,0.000378,0.000543,0.000262,gini,2,20,10,"{'criterion': 'gini', 'max_depth': 2, 'min_sam...",0.704444,0.697222,0.707222,0.683333,0.713333,0.701111,0.010286,46
5,0.039322,0.000421,0.000538,5.2e-05,gini,4,2,10,"{'criterion': 'gini', 'max_depth': 4, 'min_sam...",0.83,0.827778,0.831111,0.817778,0.802222,0.821778,0.010855,31
6,0.044461,0.011339,0.000536,0.000193,gini,4,5,10,"{'criterion': 'gini', 'max_depth': 4, 'min_sam...",0.83,0.827778,0.831111,0.817778,0.802222,0.821778,0.010855,31
7,0.038548,0.000293,0.000594,0.000361,gini,4,10,10,"{'criterion': 'gini', 'max_depth': 4, 'min_sam...",0.83,0.827778,0.831111,0.817778,0.802222,0.821778,0.010855,31
8,0.038471,0.000392,0.000722,0.000286,gini,4,15,10,"{'criterion': 'gini', 'max_depth': 4, 'min_sam...",0.83,0.827778,0.831111,0.817778,0.802222,0.821778,0.010855,31
9,0.037399,0.000557,0.000691,0.000345,gini,4,20,10,"{'criterion': 'gini', 'max_depth': 4, 'min_sam...",0.83,0.827778,0.831111,0.817778,0.802222,0.821778,0.010855,31


In [37]:
from sklearn.tree import export_text

tree_view = export_text(gs_dtc.best_estimator_)
print(tree_view)

|--- feature_9 <= 0.69
|   |--- feature_16 <= -0.72
|   |   |--- feature_15 <= 0.65
|   |   |   |--- feature_9 <= -0.66
|   |   |   |   |--- feature_9 <= -0.91
|   |   |   |   |   |--- feature_16 <= -1.48
|   |   |   |   |   |   |--- feature_12 <= -0.03
|   |   |   |   |   |   |   |--- feature_4 <= -1.51
|   |   |   |   |   |   |   |   |--- feature_5 <= 0.11
|   |   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |   |   |--- feature_5 >  0.11
|   |   |   |   |   |   |   |   |   |--- feature_6 <= -2.51
|   |   |   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |   |   |   |--- feature_6 >  -2.51
|   |   |   |   |   |   |   |   |   |   |--- class: 0
|   |   |   |   |   |   |   |--- feature_4 >  -1.51
|   |   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |--- feature_12 >  -0.03
|   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |--- feature_16 >  -1.48
|   |   |   |   |   |   |--- feature_6 <= 1.66
|   |   |   |   |   |   |