# Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression (GBR) is a machine learning algorithm used for regression problems that involves building an ensemble of decision trees to make predictions. It is a type of boosting algorithm that trains a sequence of weak models, such as decision trees, in a greedy manner to minimize a loss function. Each subsequent model focuses on reducing the errors made by the previous models, which results in a more accurate prediction. The term "gradient" refers to the use of gradient descent optimization to minimize the loss function.

# Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [1]:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100 , n_features=4 ,n_targets= 1  , random_state=42 , shuffle= True)

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [3]:
from sklearn.ensemble import GradientBoostingRegressor

In [4]:
reg = GradientBoostingRegressor()

In [5]:
reg.fit(X_train, y_train)

In [6]:
y_pred = reg.predict(X_test)

In [7]:
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, y_pred))

660.7153591266557


In [8]:
from sklearn.metrics import r2_score
print(r2_score(y_test, y_pred))

0.9152930790674478


# Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [9]:
from sklearn.model_selection import GridSearchCV

In [10]:
## Hyperparameter Tunning
parameter={
 'n_estimators':[50 ,100 , 150 ,200],
  'learning_rate':[0.001 , 0.01 , 0.1 , 0.2, 0.3],
  'max_depth':[1,2,3,4]
}

regressor =GradientBoostingRegressor()

In [11]:
regressorCV=GridSearchCV(regressor,param_grid=parameter,cv=2,scoring='accuracy')

In [12]:
import warnings
warnings.filterwarnings('ignore')

In [13]:
regressorCV.fit(X_train,y_train)

In [14]:
regressorCV.best_params_

{'learning_rate': 0.001, 'max_depth': 1, 'n_estimators': 50}

In [15]:
reg = GradientBoostingRegressor(n_estimators=50 , max_depth=1,learning_rate=0.001)

In [16]:
reg.fit(X_train, y_train)

In [17]:
y_pred = reg.predict(X_test)

In [18]:
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, y_pred))

9755.49450123349


In [21]:
from sklearn.metrics import r2_score
print(r2_score(y_test, y_pred)

SyntaxError: '(' was never closed (2350876818.py, line 2)

# Q4. What is a weak learner in Gradient Boosting?

In Gradient Boosting, a weak learner is a simple model that performs slightly better than random guessing. It is usually a decision tree with a small depth and only a few splits. In each boosting iteration, a weak learner is trained on the residuals (the difference between the predicted and actual values) of the previous iteration. The weak learners are then combined to create a strong learner that can make accurate predictions.

# Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind the Gradient Boosting algorithm is to iteratively add models to the ensemble, where each model is trained to correct the errors of the previous model. At each iteration, the model is fit on the negative gradient of the loss function with respect to the output of the previous model, hence the name Gradient Boosting. In other words, it tries to minimize the residual errors of the previous model by fitting a new model on the residual errors. This process continues until the error stops improving or a specified number of iterations is reached. The final model is then an ensemble of all the weak learners, where each weak learner is weighted according to its performance on the training set. The result is a powerful prediction model that can accurately capture complex patterns in the data.

# Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Gradient Boosting algorithm builds an ensemble of weak learners sequentially. Initially, the algorithm fits a simple model such as a decision tree to the data and calculates the error (residual) between the predicted values and the actual values. Then, it fits another decision tree to the residuals and updates the predictions. This process is repeated iteratively, with each new model focusing on the residuals of the previous model until the residuals can no longer be reduced. Finally, the predictions of all the models are combined to make the final prediction.

At each iteration, the Gradient Boosting algorithm fits a weak learner to the gradient (the negative gradient of the loss function with respect to the current prediction) of the loss function. By doing so, it tries to improve the overall prediction by minimizing the loss function. This is why it is called "Gradient Boosting". The learning rate is a parameter that controls the contribution of each model to the final prediction. A lower learning rate means that each model has a smaller influence on the final prediction, which can prevent overfitting.

# Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

The following are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm:

* Define the loss function: The first step is to define the loss function that measures the difference between the predicted values and the actual values. The most commonly used loss function is the mean squared error.

* Initialize the model: The second step is to initialize the model with a constant value that minimizes the loss function. This can be done by calculating the mean of the target variable.

* Fit the weak learner: In the third step, a weak learner is fitted on the data to make predictions. A weak learner is a model that performs slightly better than random guessing.

* Calculate the residual error: The fourth step is to calculate the residual error by subtracting the predicted values from the actual values.

* Update the model: The fifth step is to update the model by fitting a new weak learner to the residual errors. The goal is to find a new weak learner that can predict the residual errors of the previous model.

* Repeat the process: Steps 4 and 5 are repeated until the desired level of accuracy is achieved or until the number of weak learners reaches a predefined limit.

* Make predictions: The final step is to make predictions by adding up the predictions of all weak learners in the ensemble.