Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression: An ensemble machine learning technique that builds a model from weak learners, typically decision trees, in a stage-wise manner. Each new tree is trained to correct the residual errors of the combined ensemble of previous trees, thereby improving accuracy.
Q2. Simple Gradient Boosting Algorithm from Scratch

Here's a simple implementation of a gradient boosting algorithm using Python and NumPy:

In [1]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor

class SimpleGradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []
        self.initial_pred = None

    def fit(self, X, y):
        self.initial_pred = np.mean(y)
        residuals = y - self.initial_pred
        
        for _ in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            pred = tree.predict(X)
            residuals -= self.learning_rate * pred
            self.models.append(tree)

    def predict(self, X):
        y_pred = np.full(X.shape[0], self.initial_pred)
        for tree in self.models:
            y_pred += self.learning_rate * tree.predict(X)
        return y_pred

# Example dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1.2, 1.9, 3.2, 4.3, 5.1])

# Train the model
model = SimpleGradientBoostingRegressor(n_estimators=10, learning_rate=0.1, max_depth=2)
model.fit(X, y)

# Predict and evaluate
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")


Mean Squared Error: 0.27839675798649904
R-squared: 0.8673290326026977


Q3. Experiment with Different Hyperparameters

Here's how you can use grid search to find the best hyperparameters:

In [2]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'n_estimators': [10, 50, 100],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [1, 2, 3]
}

# Define the model
model = SimpleGradientBoostingRegressor()

# Implementing GridSearchCV is not straightforward with a custom class,
# So we'll use sklearn's GradientBoostingRegressor for demonstration purposes
from sklearn.ensemble import GradientBoostingRegressor

# Define the model
model = GradientBoostingRegressor()

# Implement GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X, y)

print(f"Best Hyperparameters: {grid_search.best_params_}")
print(f"Best Score: {-grid_search.best_score_}")


Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 2, 'n_estimators': 100}
Best Score: 1.5500000266301834
