# **Fine-Tuning:**

Fine-tuning involves systematically experimenting with different hyperparameter values to optimize the performance of a machine learning model. For boosting algorithms like AdaBoost, Gradient Boosting, XGBoost, or LightGBM, there are several hyperparameters that can be tuned. Below, we'll discuss a general guide on how to fine-tune hyperparameters for boosting algorithms:


### 1. Identify Hyperparameters:
   - **Learning Rate (or Step Size):** Controls the contribution of each weak learner to the final prediction. Lower values often lead to better generalization but require more boosting rounds.
   - **Number of Estimators (or Boosting Rounds):** The number of weak learners (trees) to train. Increasing the number of estimators can improve performance but may lead to overfitting.
   - **Tree-Specific Hyperparameters:** For algorithms involving decision trees (like XGBoost and LightGBM), consider parameters such as max depth, min child weight, subsample, colsample bytree, etc.


### 2. Split the Dataset:
   - Divide your dataset into training, validation, and testing sets. The training set is used to train the model, the validation set helps in hyperparameter tuning, and the testing set is reserved for the final evaluation.


### 3. Choose a Hyperparameter Search Method:
   - **Grid Search:** Specify a predefined set of hyperparameter values, and the algorithm will try all possible combinations. It's comprehensive but computationally expensive.
   - **Random Search:** Specify a range for each hyperparameter, and the algorithm randomly samples combinations. It's less computationally expensive than grid search and often finds good results.


### 4. Perform Hyperparameter Tuning:
   - Use cross-validation on the training set to assess the model's performance for different hyperparameter values.
   - Evaluate the model on the validation set for each set of hyperparameter values.
   - Repeat this process for various combinations.


### 5. Evaluate on the Testing Set:
   - Once you've identified the best hyperparameters using the validation set, evaluate the model's performance on the testing set to get an unbiased estimate of its generalization ability.



### 6. Analyze and Iterate:
   - Analyze the impact of tuning on model performance, considering metrics like accuracy, precision, recall, or others relevant to your problem.
   - Iterate if needed, adjusting the hyperparameter search space based on the results.


This iterative process helps find the optimal set of hyperparameters for your boosting algorithm, leading to improved model performance on new, unseen data.


### **Hyperparameter Tuning**
**Using GridSearch Cross-validation and Randomized Search Cross-validation**


In [2]:
# Importing Libraries
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification

In [3]:
# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [4]:
# Define the hyperparameter search space
param_grid = {
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 4, 6]
}

In [5]:
# Create a GradientBoostingClassifier
gbc = GradientBoostingClassifier(random_state=42)


In [6]:
# Example with Grid Search
grid_search = GridSearchCV(gbc, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)


In [None]:
# Example with Random Search
random_search = RandomizedSearchCV(gbc, param_distributions=param_grid, n_iter=10, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)


In [None]:
# Print the best hyperparameters
print("Best Hyperparameters from Grid Search:", grid_search.best_params_)
print("Best Hyperparameters from Random Search:", random_search.best_params_)

### **Checking Accuracy with One Boosting Algorithm**

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter search space
param_grid = {
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 4, 6]
}

# Create a GradientBoostingClassifier
gbc = GradientBoostingClassifier(random_state=42)

# Perform Randomized Search for hyperparameter tuning
random_search = RandomizedSearchCV(gbc, param_distributions=param_grid, n_iter=10, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = random_search.best_params_

# Train the model with the best hyperparameters
best_model = GradientBoostingClassifier(**best_params, random_state=42)
best_model.fit(X_train, y_train)

# Make predictions on the test set
predictions = best_model.predict(X_test)

# Evaluate accuracy on the testing set
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy on Testing Set: {accuracy:.2f}")

# Print the best hyperparameters
print("Best Hyperparameters:", best_params)