# **Hyperparameter Tuning**

#### **What is Hyperparameter Tuning?**
> It is the process of finding the best combination of hyperparameters for a given algorithm. It is a common practice in machine learning to set the hyperparameters of an algorithm before training the model. A hyperparameter is a parameter whose value is used to control the learning process.

#### **Types of Hyperparameter Tuning:**
1. **Grid Search**: It is a technique that searches for the best combination of hyperparameters from a grid of hyperparameters. It is an exhaustive search technique that tries every possible combination of hyperparameters.
2. **Random Search**: It is a technique that searches for the best combination of hyperparameters from a random set of hyperparameters. It is a non-exhaustive search technique that tries a random set of hyperparameters.
3. **Bayesian Optimization**: It is a technique that searches for the best combination of hyperparameters using a probabilistic model. It is an iterative search technique that uses a probabilistic model to find the best combination of hyperparameters.
4. **Gradient-based Optimization**: It is a technique that searches for the best combination of hyperparameters using gradient descent. It is an iterative search technique that uses gradient descent to find the best combination of hyperparameters.

#### **Why is Hyperparameter Tuning Important?**
- Hyperparameter tuning is important because it can significantly improve the performance of a machine learning model. By finding the best combination of hyperparameters, we can improve the accuracy, precision, recall, and F1 score of a machine learning model. 
- Hyperparameter tuning can also help us to avoid overfitting and underfitting by finding the best combination of hyperparameters.

#### **Key Concepts of Hyperparameter Tuning:**
- **Hyperparameters**: Hyperparameters are the parameters of an algorithm that are set before training the model. They control the learning process of the algorithm.
- **Hyperparameter Space**: Hyperparameter space is the space of all possible combinations of hyperparameters. It is the space in which we search for the best combination of hyperparameters.
- **Objective Function**: Objective function is the function that we want to optimize. It is the function that we want to maximize or minimize. In the context of hyperparameter tuning, the objective function is the performance metric of the machine learning model.
- **Search Technique**: Search technique is the technique that we use to search for the best combination of hyperparameters. There are many search techniques, such as grid search, random search, Bayesian optimization, and gradient-based optimization.
- **Validation Set**: Validation set is the set of data that we use to evaluate the performance of the machine learning model. It is the set of data that we use to calculate the performance metric of the machine learning model.
- **`Cross-Validation`**: Cross-validation is a technique that we use to evaluate the performance of the machine learning model. It is a technique that we use to calculate the performance metric of the machine learning model.

#### **Import Libraries:**

In [1]:
# Import Libraries:
from sklearn.ensemble import RandomForestClassifier # Hyperparameter Tuning using Randome Forest
from sklearn.model_selection import train_test_split, GridSearchCV # Hyperparamter tuning with GridSearchCV 
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix # Model Evaluation Metrics

#### **Import Iris Data Set from Sklearn:**

In [7]:
# Import the data using sklearn:
from sklearn.datasets import load_iris
iris = load_iris()
print(iris)
print('-------------------')
print(iris.data)
print('-------------------')
print(iris.target)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     

#### **Separate the Features (X) and Target (y):**

In [3]:
X = iris.data
y = iris.target

#### **Define the Model:**

In [8]:
model = RandomForestClassifier()

#### **Create the Dictionary of Hyperparameters:**

In [9]:
# Create the hyperparameter grid:
param_grid = {
    'n_estimators': [100, 200, 300, 400],     # Number of trees in random forest
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4, 5, 6, 7, 8],
    'criterion' :['gini', 'entropy']
} 

**Code Understanding:**
This is a dictionary that defines a grid of hyperparameters for a machine learning model. Each key in the dictionary is a hyperparameter name, and the corresponding value is a list of values that you want to try for that hyperparameter. 

**Here's what each hyperparameter means:**

- `'n_estimators'`: This is the number of trees in the forest. The list `[100, 200, 300, 400]` means you want to try training the model with 100, 200, 300, and 400 trees and see which number gives the best performance.

- `'max_features'`: This is the number of features to consider when looking for the best split. The options are `'auto'` (which is equivalent to `'sqrt'`), `'sqrt'` (which means use `sqrt(n_features)` features), and `'log2'` (which means use `log2(n_features)` features).

- `'max_depth'`: This is the maximum depth of the tree. The list `[4, 5, 6, 7, 8]` means you want to try training the model with a maximum depth of 4, 5, 6, 7, and 8 and see which depth gives the best performance.

- `'criterion'`: This is the function to measure the quality of a split. The options are `'gini'` for the Gini impurity and `'entropy'` for the information gain.

This grid will be used in conjunction with a technique like grid search or random search to find the best combination of hyperparameters for your model. The search technique will train a model with each combination of hyperparameters in the grid and select the combination that gives the best performance on a validation set.

#### **Set up the Grid Search:**
- Using the '`GridSearchCV`'

In [10]:
# Set up the grid search:
grid = GridSearchCV(
    estimator=model,        # Model to be tuned
    param_grid=param_grid,  # Hyperparameter grid
    cv=5,                   # 5-fold cross-validations
    scoring='accuracy',     # Use accuracy as the evaluation metric
    verbose=1,              # Print the result
    n_jobs=-1               # Use all available cores
)             

This code sets up a grid search for hyperparameter tuning using the `GridSearchCV` function from the `sklearn.model_selection` module.

Here's a breakdown of the parameters:

- `estimator=model`: This is the model that you want to tune. The `model` variable should be an instance of a scikit-learn estimator.

- `param_grid=param_grid`: This is the grid of hyperparameters that you want to search. The `param_grid` variable should be a dictionary where each key is a hyperparameter name and each value is a list of values to try for that hyperparameter.

- `cv=5`: This is the number of folds to use for cross-validation. The data will be split into 5 parts, and the model will be trained and evaluated 5 times so that each part is used as the validation set once.

- `scoring='accuracy'`: This is the metric to use for evaluating the models. In this case, it's using accuracy.

- `verbose=1`: This controls the verbosity of the output. A value of 1 means it will print the results.

- `n_jobs=-1`: This is the number of jobs to run in parallel. A value of `-1` means use all available cores on your machine.

> The `GridSearchCV` function will try all combinations of hyperparameters in the grid and perform cross-validation for each combination. It will then select the combination of hyperparameters that gives the best performance according to the scoring metric.

In summary, this code is setting up a comprehensive search over the specified parameter grid for a given model, using cross-validation, with the goal of finding the best hyperparameters to maximize the accuracy of the model.

#### **Fit the Model:**

In [12]:
%%time
# Fit the grid search model:
grid.fit(X, y)   # Fit the model with the data 

# Print the best hyperparameters:
print(f"Best paramters: {grid.best_params_}")

Fitting 5 folds for each of 120 candidates, totalling 600 fits


200 fits failed out of a total of 600.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
44 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 890, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1344, in wrapper
    estimator._validate_params()
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", line 95, in val

Best paramters: {'criterion': 'gini', 'max_depth': 4, 'max_features': 'log2', 'n_estimators': 300}


This is the final step of the grid search process. Here's what it does:

- `grid.fit(X, y)`: This line fits the grid search model to your data. The `X` variable is your features, and the `y` variable is your target. The grid search model will try all combinations of hyperparameters in the grid, perform cross-validation for each combination, and select the combination of hyperparameters that gives the best performance according to the scoring metric.

- `print(f"Best paramters: {grid.best_params_}")`: This line prints the best hyperparameters found by the grid search. The `best_params_` attribute of the grid search model is a dictionary where each key is a hyperparameter name and each value is the best value for that hyperparameter.

**Errors:** The output shows that the grid search encountered some errors during the process. Some combinations of hyperparameters caused the model to fail to fit the data. Specifically, the 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. But 'auto' was provided instead.

Despite these errors, the grid search was able to find a combination of hyperparameters that worked: `'criterion': 'gini', 'max_depth': 4, 'max_features': 'log2', 'n_estimators': 300`. 
- This means that the best model found by the grid search is a random forest classifier with `300 trees`, a `maximum depth of 4`, using the '`log2`' rule for the number of features to consider when looking for the best split, and using the Gini impurity as the function to measure the quality of a split.

## **Using the '`RandomizedSearchCV`'**

In [15]:
# Import Libraries:
from sklearn.model_selection import RandomizedSearchCV # Hyperparameter Tuning using RandomizedSearchCV

# Create the hyperparameter grid:
param_grid = {
    'n_estimators': [100, 200, 300, 400],     # Number of trees in random forest
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4, 5, 6, 7, 8],
    'criterion' :['gini', 'entropy']
} 

# Set up the grid search:
grid = RandomizedSearchCV(
    estimator=model,        # Model to be tuned
    param_distributions=param_grid,  # Hyperparameter grid
    cv=5,                   # 5-fold cross-validations
    scoring='accuracy',     # Use accuracy as the evaluation metric
    verbose=1,              # Print the result
    n_jobs=-1,               # Use all available cores
    n_iter=20,              # Number of iterations
)
# Fit the grid search model:
grid.fit(X, y)   # Fit the model with the data

# Print the best hyperparameters:
print(f"Best paramters: {grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits


30 fits failed out of a total of 100.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
20 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 890, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1344, in wrapper
    estimator._validate_params()
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Hp\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", line 95, in vali

Best paramters: {'n_estimators': 300, 'max_features': 'sqrt', 'max_depth': 7, 'criterion': 'entropy'}


The code sets up a randomized search for hyperparameter tuning using the `RandomizedSearchCV` function from the `sklearn.model_selection` module. 

Here's a breakdown of the parameters:

- `estimator=model`: This is the model that you want to tune. The `model` variable should be an instance of a scikit-learn estimator.

- `param_distributions=param_grid`: This is the grid of hyperparameters that you want to search. The `param_grid` variable should be a dictionary where each key is a hyperparameter name and each value is a list of values to try for that hyperparameter.

- `cv=5`: This is the number of folds to use for cross-validation. The data will be split into 5 parts, and the model will be trained and evaluated 5 times so that each part is used as the validation set once.

- `scoring='accuracy'`: This is the metric to use for evaluating the models. In this case, it's using accuracy.

- `verbose=1`: This controls the verbosity of the output. A value of 1 means it will print the results.

- `n_jobs=-1`: This is the number of jobs to run in parallel. A value of -1 means use all available cores on your machine.

- `n_iter=20`: This is the number of parameter settings that are sampled. RandomizedSearchCV implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values.

The `RandomizedSearchCV` function will randomly sample combinations of hyperparameters from the grid, perform cross-validation for each combination, and select the combination of hyperparameters that gives the best performance according to the scoring metric.

**Observations from the Output:**
> The output shows that the randomized search encountered some errors during the process. Some combinations of hyperparameters caused the model to fail to fit the data. Specifically, the 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. But 'auto' was provided instead.

Despite these errors, the randomized search was able to find a combination of hyperparameters that worked: `'n_estimators': 300, 'max_features': 'sqrt', 'max_depth': 7, 'criterion': 'entropy'`. This means that the best model found by the randomized search is a random forest classifier with 300 trees, a maximum depth of 7, using the 'sqrt' rule for the number of features to consider when looking for the best split, and using the entropy as the function to measure the quality of a split.