<a href="https://colab.research.google.com/github/poovarasansivakumar2003/Marvel_Batch_4_works/blob/main/Task_5_Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Hyperparameter Tuning**
Hyperparameter tuning is the process of optimizing the hyperparameters of a machine learning model to achieve the best performance on a given dataset. Hyperparameters are the parameters that are not learned during the training process and must be set before training the model, such as learning rate, number of trees, depth of trees, etc.

### **Key Concepts in Hyperparameter Tuning**
<ol>
<li>

**Hyperparameters vs. Parameters:**</li>
<ul>
<li>Hyperparameters are set before training (e.g., learning rate, number of estimators).</li>
<li>Parameters are learned during training (e.g., weights in neural networks).</li>
</ul>
<li>

**Why Tune Hyperparameters?**</li>
<ul><li>Proper hyperparameter tuning can significantly improve model performance by finding the optimal settings that generalize well to unseen data.</li></ul>

<li>

**Common Hyperparameter Tuning Techniques:**</li>
<ul>

<li>

**Grid Search**: Exhaustively searches over a specified parameter grid.</li>
<li>

**Random Search**: Randomly searches over a range of hyperparameters. It is more efficient than Grid Search because it does not check every combination.</li>
<li>

**Bayesian Optimization**: Uses probability to find the best hyperparameters, taking into account past evaluations.</li>
<li>

**Automated Hyperparameter Optimization (AutoML)**: Uses advanced techniques like Genetic Algorithms or Sequential Model-Based Optimization (SMBO).</li>
<li>

**Cross-Validation**: A technique to evaluate the model's performance by dividing the data into training and testing sets multiple times, ensuring the model's performance is not dependent on a specific split of the data.</li>
</ul>

### **Steps for Hyperparameter Tuning**
<ol>
<li>

**Select the Model**: Choose an appropriate machine learning model for your problem (e.g., Random Forest, XGBoost, Neural Networks).</li>
<li>

**Choose the Dataset**: Pick a dataset that aligns with the problem (e.g., classification or regression). You can use popular datasets like the Iris dataset, Titanic dataset, or MNIST.</li>
<li>

**Define the Hyperparameter Space**: Determine the hyperparameters to tune and their respective ranges.</li>
<li>

**Choose the Tuning Method**: Select a tuning method (Grid Search, Random Search, Bayesian Optimization).</li>
<li>

**Train and Evaluate:** Use cross-validation to train and evaluate models on the chosen dataset with different hyperparameter combinations.</li>
<li>

**Select the Best Model**: Choose the model with the best performance metrics.</li>
</ol>

### **Example: Hyperparameter Tuning with Random Forest**
We'll use the `Iris` dataset for a classification problem and apply Random Forest to demonstrate hyperparameter tuning using `GridSearchCV` from `scikit-learn`.

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the model
rf = RandomForestClassifier(random_state=42)

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'bootstrap': [True, False]
}

# Set up GridSearchCV
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2, scoring='accuracy')

# Train the model using GridSearchCV
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Evaluate the model with the best hyperparameters
best_rf = grid_search.best_estimator_
y_pred = best_rf.predict(X_test)

# Print accuracy and classification report
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


Fitting 3 folds for each of 216 candidates, totalling 648 fits
Best Hyperparameters: {'bootstrap': True, 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 5, 'n_estimators': 200}
Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



### **Explanation**:
<ol>
<li>

**Data Preparation**: Load the Iris dataset and split it into training and testing sets.</li>
<li>

**Model Selection**: Initialize a RandomForestClassifier.</li>
<li>

**Define Hyperparameter Grid**: Create a dictionary defining the hyperparameters and their possible values to tune.</li>
<li>

**Set up GridSearchCV**: Use GridSearchCV to perform an exhaustive search over the specified hyperparameter grid with cross-validation (cv=3).</li>
<li>

**Train and Find Best Model**: Train the model and find the combination of hyperparameters that yields the highest accuracy.</li>
<li>

**Evaluate the Model**: Predict and evaluate the model on the test set using the best hyperparameters.</li>
</ol>

### **Conclusion**
Hyperparameter tuning is crucial for optimizing machine learning models. Using techniques like Grid Search, Random Search, and Bayesian Optimization, you can significantly improve the performance of your models.