Q1. What is Gradient Boosting Regression?


Gradient Boosting is a powerful boosting algorithm that combines several weak learners into strong learners, in which each new model is trained to minimize the loss function such as mean squared error or cross-entropy of the previous model using gradient descent. In each iteration, the algorithm computes the gradient of the loss function with respect to the predictions of the current ensemble and then trains a new weak model to minimize this gradient. The predictions of the new model are then added to the ensemble, and the process is repeated until a stopping criterion is met.

We initially predict the average (since it’s regression here) of the y-column and build a decision tree based on that value.

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.


In [5]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []

    def fit(self, X, y):
        # initialize the residuals with the target values
        residuals = y.copy()

        # loop over the number of estimators
        for i in range(self.n_estimators):
            # fit a decision tree to the residuals
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)

            # predict the residuals for the current tree
            residuals_pred = tree.predict(X)

            # update the residuals with the predictions
            residuals -= self.learning_rate * residuals_pred

            # add the current tree to the list of trees
            self.trees.append(tree)

    def predict(self, X):
        # initialize the predictions with zeros
        y_pred = np.zeros(X.shape[0])

        # loop over the trees and add the predictions
        for tree in self.trees:
            y_pred += self.learning_rate * tree.predict(X)

        return y_pred

In [6]:
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
# generate a random regression problem
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split the data into training and testing sets
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=4)

# train the gradient boosting regressor
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb.fit(X_train, y_train)

# make predictions on the testing set
y_pred = gb.predict(X_test)

# calculate mean squared error and R-squared
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("MSE:", mse)
print("R-squared:", r2)

MSE: 623.4539375433474
R-squared: 0.823539706892307


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters


In [7]:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=5)
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

In [8]:
model.score(X_test,y_test)

0.655427488553789

Q4. What is a weak learner in Gradient Boosting?


The term Weak Learner refers to simple models that do only slightly better than random chance. Boosting algorithms start with a single weak learner (tree methods are overwhelmingly used here), but technically, any model will do.

Q5. What is the intuition behind the Gradient Boosting algorithm?


The intuition behind the Gradient Boosting algorithm is to iteratively improve a model by focusing on the residuals (or errors) of the previous model. It works by adding new weak models (often decision trees) to the ensemble, with each model trained to correct the errors made by the previous models.

In the beginning, the Gradient Boosting algorithm trains the first model on the entire dataset, then the subsequent models are trained on the residuals of the previous model. The algorithm continues to iteratively minimize the residuals by adding new models to the ensemble until a stopping criterion is met, such as reaching a maximum number of models or achieving a certain level of performance.

The final model is a weighted sum of all the weak models, where the weights are determined by the performance of each weak model. The Gradient Boosting algorithm is known to be a powerful and flexible method for both regression and classification tasks.

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?



Gradient Boosting algorithm builds an ensemble of weak learners in a stage-wise manner. It starts by fitting a single weak learner (usually a decision tree) to the training data and calculating the residuals of the predictions. The next weak learner is then fitted to these residuals, rather than the original target values, with the aim of reducing the error made by the previous learner. This process is repeated iteratively, with each subsequent learner fitted to the residuals of the previous ensemble of learners, until a predefined number of learners has been fitted or a convergence criterion has been met.

Each weak learner is fit to the gradient of the loss function with respect to the predictions of the previous ensemble of learners. This means that each subsequent learner is fitted to the negative gradient of the loss function with respect to the predictions made by the current ensemble of learners, which ensures that the next learner tries to correct the mistakes made by the previous learner.

The final ensemble of weak learners is then combined to make predictions on new data. The combination is typically a weighted sum of the predictions made by each weak learner, with the weights determined by the performance of each learner on the training data.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

1. We start by assuming a function to approximate the target variable. This function is usually set to a constant value which is the mean of the target variable.
2. We then calculate the errors by taking the difference between the actual target variable and the predicted values from step 1.
3. We fit a new model, called a weak learner, to the errors. This model is typically a decision tree with a small depth.
4. We add the predictions from this weak learner to our previous approximation, with a shrinkage parameter that controls the contribution of the weak learner. The new function becomes a better approximation of the target variable.
5. We repeat steps 2-4 until a stopping criterion is met, such as a maximum number of iterations or a minimum improvement in performance.
6. Finally, we obtain the final model by combining the weak learners with their corresponding shrinkage parameters, and use it to make predictions on new data.
By repeating this process, the algorithm learns to gradually improve the approximation of the target variable, using the residuals of the previous model as the new target variable for the next model. This approach allows the algorithm to build a highly accurate model, even with complex, non-linear relationships between the input and output variables.