Q1. What is Gradient Boosting Regression?

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

Q4. What is a weak learner in Gradient Boosting?

Q5. What is the intuition behind the Gradient Boosting algorithm?

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

Q1. Gradient Boosting Regression is a machine learning algorithm that combines multiple weak predictive models (typically decision trees) in a sequential manner to create a strong predictive model. It is a type of boosting algorithm where each weak model is trained to correct the mistakes made by the previous models. Gradient Boosting Regression aims to minimize the loss function by iteratively adding models to the ensemble and adjusting their weights based on the gradients of the loss function.

In [7]:
import numpy as np

# Define the loss function (mean squared error)
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Define the gradient of the loss function
def mse_gradient(y_true, y_pred):
    return 2 * (y_pred - y_true) / len(y_true)

# Define a simple decision tree as a weak learner
class DecisionTree:
    def __init__(self, max_depth=1):
        self.max_depth = max_depth

    def fit(self, X, y):
        self.feature_idx = 0
        self.threshold = X.mean()
        self.left_value = np.mean(y[X < self.threshold])
        self.right_value = np.mean(y[X >= self.threshold])

    def predict(self, X):
        return np.where(X < self.threshold, self.left_value, self.right_value)

# Define the gradient boosting regressor
class GradientBoostingRegressor:
    def __init__(self, n_estimators=10, learning_rate=0.1, max_depth=1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []

    def fit(self, X, y):
        # Initialize the predictions with the mean of y
        self.initial_prediction = np.mean(y)
        predictions = np.full(len(y), self.initial_prediction)

        # Build the ensemble of weak learners
        for _ in range(self.n_estimators):
            # Compute the negative gradient (residuals)
            residuals = mse_gradient(y, predictions)

            # Train a decision tree on the negative gradient
            tree = DecisionTree(max_depth=self.max_depth)
            tree.fit(X, residuals)

            # Update the predictions using the learning rate
            predictions -= self.learning_rate * tree.predict(X)

            # Add the tree to the ensemble
            self.estimators.append(tree)

    def predict(self, X):
        # Initialize the predictions with the mean of y
        predictions = np.full(len(X), self.initial_prediction)

        # Add the predictions of each tree in the ensemble
        for tree in self.estimators:
            predictions -= self.learning_rate * tree.predict(X)

        return predictions

# Create a sample dataset
X = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 6, 8, 10, 12])

# Create and fit the gradient boosting regressor
regressor = GradientBoostingRegressor(n_estimators=10, learning_rate=0.1, max_depth=1)
regressor.fit(X, y)

# Predict on new data
X_new = np.array([7, 8, 9])
y_pred = regressor.predict(X_new)
print(y_pred)


[7.86258582 7.86258582 7.86258582]


In [None]:
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.model_selection import GridSearchCV

# Define the loss function (mean squared error)
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Define the gradient of the loss function
def mse_gradient(y_true, y_pred):
    return 2 * (y_pred - y_true) / len(y_true)

# Define a simple decision tree as a weak learner
class DecisionTree:
    def __init__(self, max_depth=1):
        self.max_depth = max_depth

    def fit(self, X, y):
        self.feature_idx = 0
        self.threshold = X.mean()
        self.left_value = np.mean(y[X < self.threshold])
        self.right_value = np.mean(y[X >= self.threshold])

    def predict(self, X):
        return np.where(X < self.threshold, self.left_value, self.right_value)

# Define the gradient boosting regressor compatible with scikit-learn
class GradientBoostingRegressor(BaseEstimator, RegressorMixin):
    def __init__(self, n_estimators=10, learning_rate=0.1, max_depth=1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []

    def fit(self, X, y):
        # Initialize the predictions with the mean of y
        self.initial_prediction = np.mean(y)
        predictions = np.full(len(y), self.initial_prediction)

        # Build the ensemble of weak learners
        for _ in range(self.n_estimators):
            # Compute the negative gradient (residuals)
            residuals = mse_gradient(y, predictions)

            # Train a decision tree on the negative gradient
            tree = DecisionTree(max_depth=self.max_depth)
            tree.fit(X, residuals)

            # Update the predictions using the learning rate
            predictions -= self.learning_rate * tree.predict(X)

            # Add the tree to the ensemble
            self.estimators.append(tree)

        return self

    def predict(self, X):
        # Initialize the predictions with the mean of y
        predictions = np.full(len(X), self.initial_prediction)

        # Add the predictions of each tree in the ensemble
        for tree in self.estimators:
            predictions -= self.learning_rate * tree.predict(X)

        return predictions

# Create a sample dataset
X = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10, 12])

# Create the gradient boosting regressor
regressor = GradientBoostingRegressor()

# Define the parameter grid
param_grid = {
    'n_estimators': [10, 50, 100],
    'learning_rate': [0.1, 0.01],
    'max_depth': [1, 2, 3]
}

# Perform grid search
grid_search = GridSearchCV(regressor, param_grid, scoring='neg_mean_squared_error')
grid_search.fit(X, y)

# Print the best hyperparameters and the corresponding performance
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Score (MSE):", -grid_search.best_score_)


Q4. In Gradient Boosting, a weak learner refers to a base model that performs slightly better than random guessing but is still relatively simple. In the context of Gradient Boosting Regression, weak learners are typically shallow decision trees with a small maximum depth. These trees individually have low predictive power but are combined in an ensemble to form a stronger predictive model.

Q5. The intuition behind the Gradient Boosting algorithm is to iteratively add weak models to the ensemble, each correcting the mistakes made by the previous models. The algorithm focuses on reducing the residuals or errors of the previous models by fitting the new models to the negative gradients of the loss function. By combining multiple weak models in this way, Gradient Boosting is able to create a powerful ensemble model that can capture complex relationships in the data.

Q6. The Gradient Boosting algorithm builds an ensemble of weak learners in a sequential manner. Here's a high-level overview of the process:

Initialize the predictions with the mean (or any other suitable value) of the target variable.
Compute the negative gradient (residuals) of the loss function with respect to the current predictions.
Train a weak learner (e.g., decision tree) to fit the negative gradient, aiming to minimize the residuals.
Update the predictions by adding the learning rate multiplied by the predictions of the weak learner.
Repeat steps 2-4 for a specified number of iterations (or until convergence).
The final ensemble is formed by combining the predictions of all the weak learners.
By iteratively adding weak models and updating the predictions, the ensemble gradually improves its ability to fit the data and reduce the overall loss.

Q7. The steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm are as follows:

Define a loss function: Choose an appropriate loss function that measures the discrepancy between the predicted values and the true values of the target variable. For regression problems, the mean squared error (MSE) is commonly used as the loss function.

Initialize the ensemble: Set the initial predictions of the ensemble to a constant value, usually the mean of the target variable. This acts as the "zeroth" model in the ensemble.

Compute the negative gradients: Calculate the negative gradients (also known as the residuals) of the loss function with respect to the current predictions. The negative gradient represents the direction and magnitude of the correction needed to reduce the loss.

Train a weak learner: Fit a weak learner (e.g., decision tree) to the negative gradients. The weak learner is trained to approximate the relationship between the input features and the negative gradients, effectively correcting the mistakes made by the previous models. The weak learner is typically a shallow decision tree with limited depth.

Update the predictions: Adjust the predictions of the ensemble by adding the learning rate multiplied by the predictions of the weak learner. The learning rate controls the contribution of each weak learner to the ensemble. A smaller learning rate makes the updates more conservative.

Repeat steps 3-5: Iterate the process by computing new negative gradients based on the updated predictions and training additional weak learners to fit the negative gradients. Each new weak learner further improves the predictions of the ensemble by addressing the remaining errors or residuals.

Final ensemble prediction: The final prediction is obtained by combining the predictions of all the weak learners in the ensemble. This is typically done by summing the predictions.

By iteratively minimizing the loss function through the training of weak learners and updating the ensemble's predictions, the Gradient Boosting algorithm gradually builds a strong predictive model that can effectively capture complex patterns and relationships in the data