To optimize the performance of a Gradient Boosting Regression model, experimenting with key hyperparameters like learning rate, number of trees (n_estimators), and tree depth (max_depth) is crucial. Both Grid Search and Random Search are popular methods for hyperparameter optimization, allowing you to systematically explore a range of hyperparameter values to find the best combination for your model.

Grid Search
Grid Search systematically explores a specified subset of hyperparameters, training and evaluating a model for each combination of hyperparameters. It's exhaustive and ensures that you explore all specified combinations, but it can be computationally expensive, especially with a large number of hyperparameters or when the range of values for each hyperparameter is large.

Random Search
Random Search samples a specified number of combinations of hyperparameters from a defined distribution. It's less exhaustive than Grid Search but can be more efficient, especially when some hyperparameters are more important than others. Random Search can sometimes find a good combination of hyperparameters more quickly than Grid Search.

In [1]:
from sklearn.datasets import make_regression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np

# Generate a synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model
model = GradientBoostingRegressor()

# Define the grid of hyperparameters to search
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 4, 5]
}

# Set up the grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2, scoring='neg_mean_squared_error')

# Perform the grid search
grid_search.fit(X_train, y_train)

# Print the best parameters and best score
print("Best parameters:", grid_search.best_params_)
print("Best score (neg_mean_squared_error):", grid_search.best_score_)

# Evaluate on the test set
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print("Test score:", test_score)


Fitting 5 folds for each of 27 candidates, totalling 135 fits
Best parameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300}
Best score (neg_mean_squared_error): -2728.2846132144614
Test score: 0.9450805380973991
