# Hyperparameter Tuning

_Summarized by QH_  
_First version: 2023-07-08_  
_Last updated on : 2023-07-08_  

## Model hyperparameters vs model parameters?
* Model hyperparameters are parameters that control the modeling or learning process. 
* Model parameters are the parameters that are estimated through training process.
* Model hyperparameters will set the direction in the training process and will impact the estimation of model parameters.

Think about the knobs on the radios you can use to tune the frequency and volumn - they are the hyperparameters!


## Hyperparameter Importance

Some hyperparameters are more important in terms of determining the model performance than others. 

For example, random forest classifier:
* `n_jobs`: The number of jobs to run in parallel.
* `random_state`: Controls both the randomness of the bootstrapping of the samples used when building trees and and the sampling of the features to consider when looking for the best split at each node.
* `verbose`: Controls the verbosity when fitting and predicting.

do not impact the model performance compared to the following:
* `n_estimators`: The number of trees to build.
* `max_features`: The number of features to consider when looking for best splits.
* `max_depth`: The maximum depth of the tree.
* `min_sample_leaf`: The minimum number of samples required to be at a leaf node.

## How to choose hyperparameters?
The process is called the hyperparameter tuning - we want to tune as best as we can on the hyperparameters that has the lowest predicting error.

### Constraints among hyperparameters
Be careful on the potential conflict of hyperparameters. For example, `LogisticRegression()` has conflicting hyperparameter options of `solver` and `penalty`.
### Avoiding "Silly" choices
Certain values that will definitely not contribute to decent model performance, then avoid them.
* Random Forest
    * low number of trees (`n_estimators`)
* K-Nearest Neighbor
    * <= 2 Neighbors (`n_neighbors`)
* Increasing a hyperparameter by a very small amount compared with its range.

### Grid Search
For each of the hyperparameters you want to tune, list all chosen values and test each combinations and find the combination with the best model performance. For example, Gradient boosting, we want to tune the following hyperparameters:
* `learn_rate`: [0.001, 0.01, 0.1, 0.2]
* `max_depth`: [4, 6, 8, 10, 12, 15, 20, 25, 30]
* `subsample`: [0.4, 0.6, 0.7, 0.8, 0.9]
* `max_features`: ['auto', 'sqrt']

We will test all $4 \times 9 \times 5 \times 2 = 360$ combinations. If we want to use 10 fold cross-validation, then we will in total make $360 \times 10 = 3600$ models.

* Advantanges of Grid Search
    * You are guaranteed to the find the best results in this grid - since you have performed an exhaustive search.
* Disadvantages of Grid Search
    * It is computationally expensive. The cost increases exponentially when adding more hyperparameters and testing more hyperparameter values.
    * It is uninformed. Running models are independent - previous model do not inform the next choice.

You can do mannually as looping through each combination as follows:

In [None]:
# python packages
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_hastie_10_2
from sklearn.metrics import accuracy_score

In [None]:
# import the data
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]

In [None]:
# Adjust the list of values to test
learn_rate_list = [0.001, 0.01, 0.1, 0.2]
max_depth_list = [4,6,8, 10, 12, 15, 20, 25, 30]
subsample_list = [0.4,0.6, 0.7, 0.8, 0.9]
max_features_list = ['log2', 'sqrt']

def gbm_grid_search(learn_rate, max_depth,subsample,max_features):
    model = GradientBoostingClassifier(
    learning_rate=learn_rate,
    max_depth=max_depth,
    subsample=subsample,
    max_features=max_features)
    predictions = model.fit(X_train, y_train).predict(X_test)
    return([learn_rate, max_depth, accuracy_score(y_test, predictions)])

results_list = []
for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        for subsample in subsample_list:
            for max_features in max_features_list:
                results_list.append(gbm_grid_search(learn_rate,max_depth, subsample,max_features))
results_df = pd.DataFrame(results_list, columns=['learning_rate', 'max_depth', 'subsample', 'max_features','accuracy'])
print(results_df)

Or you can use `GridSearchCV` in `scikit-learn` as follows:

```
sklearn.model_selection.GridSearchCV(
                                    estimator,
                                    param_grid, 
                                    scoring=None, 
                                    fit_params=None,
                                    n_jobs=None, 
                                    refit=True, 
                                    cv='warn',
                                    verbose=0, 
                                    pre_dispatch='2*n_jobs',
                                    error_score='raise-deprecating',
                                    return_train_score='warn'
                                    )
```

1. Define the method (e.g. Random Forest) we use for modeling - `estimator`
2. Define the hyperparameter grid - `param_grid`

In [None]:
from sklearn.model_selection import GridSearchCV


## References
1. [Hyperparameter Tuning Course on Datacamp](https://app.datacamp.com/learn/courses/hyperparameter-tuning-in-python)