# Hyperparameter Tuning

Parameters are like the weights of the model like W and b they are learned during training.

However hyperparameters are set prior to training process and govern the training process itself eg., learning rate, number of trees in a random forest.

Common hyperparameters

1. learning rate
2. Batch size
3. Number of epochs
4. Number of layers and neurons in neural netowork
5. Regularization parameters(L1/L2).
6. Number of trees in random forests.

### Grid Search

Grid search involves specifying a set of values for each hyperparameter, and then exhaustively trying all possible combinations of these values to find the best-performing set.

Example:
For a Support Vector Machine (SVM) with hyperparameters `C` (regularization parameter) and `gamma` (kernel coefficient), a grid search might look like this:

```markdown
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)
grid.fit(X_train, y_train)

print(grid.best_params_)
print(grid.best_estimator_)
```

### Random Search

Random search involves sampling a fixed number of hyperparameter combinations from a specified distribution. This can be more efficient than grid search, especially when dealing with a large number of hyperparameters or a large range of possible values.

```markdown
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier

param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'bootstrap': [True, False]
}

random_search = RandomizedSearchCV(estimator=RandomForestClassifier(), param_distributions=param_dist, n_iter=100, cv=3, verbose=2, random_state=42, n_jobs=-1)
random_search.fit(X_train, y_train)

print(random_search.best_params_)
print(random_search.best_estimator_)

```

### Bayesian Optimization

Bayesian optimization builds a probabilistic model of the objective function and uses it to select the most promising hyperparameters to evaluate in the true objective function. It balances exploration and exploitation to efficiently find the optimum hyperparameters.

**Example**:
Using the `bayes_opt` library for Bayesian Optimization:

```markdown
from bayes_opt import BayesianOptimization
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

def rf_cv(n_estimators, min_samples_split, max_features):
    val = cross_val_score(
        RandomForestClassifier(n_estimators=int(n_estimators), min_samples_split=int(min_samples_split), max_features=min(max_features, 0.999)),
        X_train, y_train, scoring='accuracy'
    ).mean()
    return val

pbounds = {
    'n_estimators': (10, 200),
    'min_samples_split': (2, 10),
    'max_features': (0.1, 0.999)
}

optimizer = BayesianOptimization(
    f=rf_cv,
    pbounds=pbounds,
    random_state=42,
)

optimizer.maximize(init_points=10, n_iter=30)

print(optimizer.max)

```

### Gradient-Based Optimization

Some advanced techniques use gradient information to adjust hyperparameters. This is common in deep learning frameworks where techniques like learning rate schedules, adaptive learning rates (Adam, RMSprop), and learning rate warm-ups are applied.

Best practice

1. Start Simple: Begin with a simple model and a limited hyperparameter space. Gradually expand as needed.
2. Cross-Validation: Use cross-validation to ensure that the hyperparameter tuning process is robust and the results are not due to overfitting.
3. Early Stopping: Implement early stopping to avoid overfitting in deep learning models.
4. Learning Curves: Plot learning curves to diagnose the learning process and adjust hyperparameters accordingly.