# Assignment no 71 (Gradient Boost) (17.4.23)

### Q1. What is Gradient Boosting Regression?

**Ans-** 
- Gradient Boosting Regression is an analytical technique that is designed to explore the relationship between two or more variables (X, and Y). 
- Its analytical output identifies important factors (Xi) impacting the dependent variable (y) and the nature of the relationship between each of these factors and the dependent variable. It is a machine learning technique used in regression tasks, among others. 
- It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [1]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score

In [11]:
# Generate some example data

from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

In [12]:
# Split the data into training and test sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test  = train_test_split(X, y, test_size=0.3, random_state=42)

In [14]:
# Train the gradient boosting model

from sklearn.ensemble import GradientBoostingRegressor

gbr = GradientBoostingRegressor() # creating an object of Gradient Boost Regressor

gbr.fit(X_train, y_train)

In [17]:
# Make predictions on the test set

y_pred = gbr.predict(X_test)
y_pred

array([-11.49735754,  -4.03523658,  38.13667388, -53.73729762,
       -42.10841813, -10.70259687,  34.83090542, -43.75183739,
       -62.53762241,  47.20340744, -10.48511194, -31.101997  ,
        -3.09548915, -18.90302248,  10.89834179,  13.8306622 ,
       -22.78335407,   9.18346273, -28.16330545, -19.91097178,
       -29.35283188,  -4.33553175,  39.49646116,  36.87669352,
         2.97184675, -27.37768328,  46.95889601, -56.20913972,
        79.56187483,  36.48045401])

In [18]:
# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean squared error for Gradient Boost Regressor is {mse:.4f}.")
print(f"R-squared error for Gradient Boost Regressor is {r2:.4f}.")

Mean squared error for Gradient Boost Regressor is 85.6359.
R-squared error for Gradient Boost Regressor is 0.9339.


### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [19]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

In [20]:
# Defining the hyperparameter grid

param_grid = {
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [1, 2, 3, 4]
}


In [21]:
# Create a grid search object

grid_search = GridSearchCV(gbr, param_grid, cv=5)

In [22]:
# Fit the grid search object to the data

grid_search.fit(X_train, y_train)

In [23]:
# Print the best hyperparameters

print(f"The best hyperparameters for our gradient boost regression model are {grid_search.best_params_}.")

The best hyperparameters for our gradient boost regression model are {'learning_rate': 0.2, 'max_depth': 1, 'n_estimators': 200}.


In [24]:
# Make predictions on the test set using the best model

y_pred = grid_search.predict(X_test)

In [26]:
# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean squared error for Gradient Boost Regressor is {mse:.4f}")
print()
print(f"R-squared error for Gradient Boost Regressor is {r2:.4f}")

Mean squared error for Gradient Boost Regressor is 43.9197

R-squared error for Gradient Boost Regressor is 0.9661


### Q4. What is a weak learner in Gradient Boosting?

**Ans-** 
1. A weak learner is a simple model that performs only slightly better than random chance. 
2. In the context of gradient boosting, *weak learners are combined iteratively to form a strong learner*. 
3. The idea is to start with a single weak learner, typically a decision tree, and then train additional weak learners that focus on the areas where the previous weak learners performed poorly. 4. This process continues until a predetermined stopping condition is met, such as a set number of weak learners have been created or the model's performance has plateaued.

**The goal of boosting is:**
1. To make small adjustments to the prediction function gradually, evolving its shape in a slow and controlled manner to combat overfitting. 
2. Building the complex predictive function is the job of the boosting algorithm, not the weak learner being boosted. 

This approach allows for the creation of a strong learner from many simple models, rather than relying on a single complex model that may be prone to overfitting.

### Q5. What is the intuition behind the Gradient Boosting algorithm?

**Ans-** 
The intuition behind the Gradient Boosting algorithm is to iteratively improve the predictions of a model by training additional models that focus on the areas where the previous models performed poorly. This is done by combining multiple weak learners, which are simple models that perform only slightly better than random chance, to form a strong learner.

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

**Ans-** The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively training additional models that focus on the areas where the previous models performed poorly. This is done by combining multiple weak learners, which are simple models that perform only slightly better than random chance, to form a strong learner.

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

**Ans-**
The mathematical intuition behind the Gradient Boosting algorithm involves several steps. Here is a brief overview of the process:

1. Define an initial model: An initial model m0 is defined to predict the target variable y. This model will be associated with a residual (y – R0).

2. Fit a new model to the residuals: A new model m1 is fit to the residuals from the previous step.

3. Combine the models: Now, R0 and m1 are combined to give R1, the boosted version of R0. The mean squared error from m1 will be lower than that from m0.

The process is repeated iteratively, with each new model being fit to the residuals of the previous model and then combined with the previous models to form a stronger learner. The goal is to gradually improve the predictions of the model by making small adjustments to the prediction function in a slow and controlled manner, in order to combat overfitting2.