# Question No. 1:
What is Gradient Boosting Regression?

## Answer:
Gradient Boosting Regression is a popular machine learning algorithm used for both regression and classification problems. It is an ensemble learning method where multiple weak models are combined to form a stronger model. The algorithm starts by fitting a simple model to the data and then adding subsequent models, each one attempting to correct for the errors of the previous model.

# Question No. 2:
Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

## Answer:

In [3]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

import warnings
warnings.filterwarnings('ignore')

class GradientBoostingRegressor:
    
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.residuals = []
        
    def fit(self, X, y):
        # Initialize the residuals as the difference between the true values and the mean
        self.residuals = y - np.mean(y)
        # Fit a tree for each estimator
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, self.residuals)
            # Predict the residuals and update them
            residuals_pred = tree.predict(X)
            self.residuals -= self.learning_rate * residuals_pred
            # Add the tree to the list of trees
            self.trees.append(tree)
    
    def predict(self, X):
        # Predict the residuals for each tree and sum them up
        residuals_pred = np.sum(tree.predict(X) for tree in self.trees)
        # Return the sum of the residuals and the mean
        return np.mean(y) + self.learning_rate * residuals_pred

# Generate some random data
X = np.random.rand(100, 3)
y = np.sum(X, axis=1) + np.random.normal(scale=0.1, size=100)

# Split the data into training and testing sets
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

# Train the gradient boosting regressor on the training set
gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gbr.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = gbr.predict(X_test)

# Evaluate the model using mean squared error and R-squared
mse = np.mean((y_pred - y_test)**2)
r2 = 1 - mse / np.var(y_test)

print("Mean squared error: {:.3f}".format(mse))
print("R-squared: {:.3f}".format(r2))

Mean squared error: 0.027
R-squared: 0.837


# Question No. 3:
Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

## Answer:

In [4]:
from sklearn.datasets import load_boston
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor

boston = load_boston()
X, y = boston.data, boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [3, 4, 5]
}

gbr = GradientBoostingRegressor(random_state=42)

grid_search = GridSearchCV(gbr, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid_search.fit(X_train, y_train)

print("Best hyperparameters: ", grid_search.best_params_)

y_pred = grid_search.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean squared error: {:.3f}".format(mse))
print("R-squared: {:.3f}".format(r2))

Best hyperparameters:  {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Mean squared error: 6.209
R-squared: 0.915


# Question No. 4:
What is a weak learner in Gradient Boosting?

## Answer:
In Gradient Boosting, a weak learner is a model that is only slightly better than random guessing. It is a model that has a performance that is only slightly better than a random predictor. Typically, decision trees with shallow depth (1 or 2) are used as weak learners in Gradient Boosting.

# Question No. 5:
What is the intuition behind the Gradient Boosting algorithm?

## Answer:
The intuition behind Gradient Boosting can be explained as follows:

Suppose we have a complex problem that we want to solve using machine learning. We have a large dataset with many features and a target variable that we want to predict. We can train a simple model, such as a linear regression, to predict the target variable, but this model may not be able to capture all the complex patterns in the data.

To improve the performance of the model, we can use an ensemble of models. Gradient Boosting is a type of ensemble learning algorithm that combines the predictions of multiple weak learners, such as decision trees, to create a strong learner.

The algorithm works as follows: we first train a weak learner on the data, and then calculate the error between the predicted values and the actual values. We then train another weak learner on the residuals (i.e., the difference between the predicted and actual values), and add this learner's predictions to the previous learner's predictions. We continue this process of adding new weak learners until we have a set of models that collectively predict the target variable with high accuracy.

# Question No. 6:
How does Gradient Boosting algorithm build an ensemble of weak learners?

## Answer:
The Gradient Boosting algorithm builds an ensemble of weak learners in a sequential manner. At each step of the algorithm, a new weak learner is added to the ensemble and its predictions are combined with the predictions of the previous weak learners to improve the overall prediction accuracy.

The algorithm works as follows:

1. Initialize the ensemble with a constant value, such as the mean or median of the target variable.

2. Train a weak learner, such as a decision tree, on the training data. The weak learner is trained to minimize the loss function between the predicted values and the actual values.

3. Calculate the residuals between the predicted values and the actual values. These residuals represent the errors made by the current ensemble.

4. Train a new weak learner on the residuals. This new learner is trained to predict the residuals, so that it can correct the errors made by the previous learner.

5. Combine the predictions of the new learner with the predictions of the previous learners. This is done by adding the new learner's predictions to the predictions of the previous learners, with a weighting factor that determines the contribution of each learner to the final prediction.

6. Repeat steps 3 to 5 for a predetermined number of iterations or until a certain level of accuracy is achieved.

# Question No. 7:
What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

## Answer:
1. **Define the loss function:** The first step is to define a loss function that measures the difference between the predicted values and the actual values. This loss function is typically a differentiable function, such as the mean squared error or the cross-entropy loss.

2. **Initialize the model:** We start with an initial model that predicts the target variable, such as a constant value or the mean of the target variable.

3. **Compute the negative gradient of the loss function:** We compute the negative gradient of the loss function with respect to the current model's predictions. This gradient represents the direction in which we should update the model to reduce the loss.

4. **Train a new weak learner:** We train a new weak learner, such as a decision tree, on the negative gradient of the loss function. This new learner is trained to predict the negative gradient, so that it can correct the errors made by the previous learner.

5. **Update the model:** We update the model by adding the predictions of the new learner to the predictions of the previous model. We use a weighting factor, called the learning rate, to control the contribution of the new learner to the final prediction.

6. **Repeat steps 3 to 5:** We repeat steps 3 to 5 for a predetermined number of iterations or until a certain level of accuracy is achieved.