In [1]:
# Q1. What is Gradient Boosting Regression?
# Gradient Boosting Regression is a machine learning technique used for regression tasks. It's an ensemble method that combines multiple weak regression models (typically decision trees) to create a strong regression model. The algorithm iteratively builds these weak learners and adjusts their predictions based on the errors of the previous learners, minimizing the overall prediction errors.

In [2]:
#Answer 2

import numpy as np
from sklearn.metrics import mean_squared_error, r2_score

# Create a simple dataset
X = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)
y = np.array([2, 3.8, 6, 8.2, 10, 12.1])

# Define the number of trees (weak learners) and learning rate
n_trees = 100
learning_rate = 0.1

# Initialize the initial prediction as the mean of target values
initial_prediction = np.mean(y)
predictions = np.full(y.shape, initial_prediction)

# Implement gradient boosting algorithm
for i in range(n_trees):
    residuals = y - predictions
    tree_prediction = np.mean(residuals)
    predictions += learning_rate * tree_prediction

# Evaluate the model
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2:.4f}")


Mean Squared Error: 12.1147
R-squared: 0.0000


In [3]:
# Q3. Experiment with different hyperparameters to optimize the performance of the model:


best_mse = float('inf')
best_r2 = -float('inf')
best_n_trees = None
best_learning_rate = None

for n_trees in [50, 100, 150, 200]:
    for learning_rate in [0.01, 0.1, 0.2, 0.3]:
        initial_prediction = np.mean(y)
        predictions = np.full(y.shape, initial_prediction)

        for i in range(n_trees):
            residuals = y - predictions
            tree_prediction = np.mean(residuals)
            predictions += learning_rate * tree_prediction

        mse = mean_squared_error(y, predictions)
        r2 = r2_score(y, predictions)

        if mse < best_mse:
            best_mse = mse
            best_n_trees = n_trees
            best_learning_rate = learning_rate

        if r2 > best_r2:
            best_r2 = r2

print(f"Best Mean Squared Error: {best_mse:.4f}")
print(f"Best R-squared: {best_r2:.4f}")
print(f"Best Number of Trees: {best_n_trees}")
print(f"Best Learning Rate: {best_learning_rate}")


Best Mean Squared Error: 12.1147
Best R-squared: 0.0000
Best Number of Trees: 50
Best Learning Rate: 0.01


In [None]:
# Q4. What is a weak learner in Gradient Boosting?
# A weak learner in Gradient Boosting is a simple model that performs slightly better than random guessing. It could be a decision tree with limited depth, a linear regression model, etc.

# Q5. What is the intuition behind the Gradient Boosting algorithm?
# The intuition behind Gradient Boosting is to iteratively correct the errors made by previous weak learners. Each new weak learner is trained to predict the residuals (the difference between true values and current predictions) of the ensemble so far. This process reduces the error with each iteration, leading to a strong model.

# Q6. How does the Gradient Boosting algorithm build an ensemble of weak learners?
# The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative manner. At each iteration, it fits a new weak learner to the negative gradient of the loss function with respect to the current predictions. These weak learners are then added to the ensemble, and the final prediction is the sum of the individual predictions from all learners.

# Q7. What are the steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm?

# Initialize predictions with the average target value.
# Compute the negative gradient (residuals) of the loss function with respect to current predictions.
# Fit a weak learner (e.g., decision tree) to predict the negative gradient.
# Multiply the predictions of the weak learner by a learning rate and add them to the ensemble.
# Repeat steps 2-4 iteratively for a predefined number of iterations.
# The final prediction is the sum of predictions from all weak learners.
# The loss function is optimized during the process, improving the overall predictive performance.