## 1.

Gradient Boosting Regression is a popular machine learning technique used for both regression and classification tasks. It is an ensemble learning method that combines the predictions of multiple weak learners (often decision trees) to create a strong predictive model. In this context, "gradient" refers to the optimization method used to minimize the loss function and improve the model's performance.

Here's how Gradient Boosting Regression works:

1. Base Learners: Gradient Boosting starts with an initial weak learner, often a simple decision tree (called a "stump"), which makes predictions based on a single feature or a small subset of features.

2. Residual Learning: It then sequentially adds more decision trees to the ensemble. Each new tree is trained to correct the errors made by the previously added trees. Instead of fitting the target values directly, the new tree is trained on the residuals (the differences between the actual target values and the predictions made by the current ensemble).

3. Gradient Descent: The algorithm uses gradient descent optimization to find the best parameters (weights) for each new tree, aiming to minimize the loss function (usually mean squared error for regression problems). Gradient descent is an iterative optimization method that gradually adjusts the parameters in the direction of the steepest descent of the loss function.

4. Learning Rate: To control the contribution of each tree to the ensemble, a learning rate is introduced. The learning rate scales the predictions made by each tree before adding them to the overall ensemble. A lower learning rate can improve the generalization of the model but may require more trees to achieve high accuracy.

5. Ensemble Prediction: The final prediction is obtained by combining the predictions of all the individual trees, often with the learning rate applied.

## 2.

In [51]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [52]:
np.random.seed(42)
X = np.random.rand(100, 2)
y = 4 * X[:, 0] + 3 * X[:, 1] + 2 * np.random.randn(100)

In [53]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [54]:
GBC = GradientBoostingRegressor()

In [55]:
GBC.fit(X_train,y_train)

GradientBoostingRegressor()

In [56]:
y_pred = GBC.predict(X_test)

In [57]:
MSE = mean_squared_error(y_test,y_pred)
R2Score = r2_score(y_test,y_pred)

In [58]:
print(MSE)
print(R2Score)

3.5603611719250954
0.3436085306451624


## 3.

In [59]:
from sklearn.model_selection import GridSearchCV

In [60]:
parameter = {
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200, 300],
    'max_depth': [1, 2, 3],
}

In [61]:
GBC1 = GradientBoostingRegressor()

In [62]:
GSC = GridSearchCV(GBC1, parameter, cv = 5, scoring='r2')

In [63]:
GSC.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=GradientBoostingRegressor(),
             param_grid={'learning_rate': [0.01, 0.1, 0.2],
                         'max_depth': [1, 2, 3],
                         'n_estimators': [100, 200, 300]},
             scoring='r2')

In [64]:
GSC.best_params_

{'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 200}

In [65]:
GBR = GradientBoostingRegressor(learning_rate = 0.01, max_depth = 1, n_estimators = 200)

In [66]:
GBR.fit(X_train,y_train)

GradientBoostingRegressor(learning_rate=0.01, max_depth=1, n_estimators=200)

In [67]:
y_pred1 = GBR.predict(X_test)

In [68]:
MSE1 = mean_squared_error(y_test,y_pred1)
R2Score1 = r2_score(y_test,y_pred1)

In [69]:
print(MSE1)
print(R2Score1)

3.6087014123195202
0.3346964793421464


## 4.

In Gradient Boosting, a weak learner, also known as a base learner or a weak classifier/regressor, refers to a simple model that performs slightly better than random guessing on a given learning task. These models are typically simple and have modest predictive power compared to more complex models.

## 5.

The intuition behind the Gradient Boosting algorithm can be best understood by breaking it down into its two key components: "Gradient" and "Boosting."

1. Boosting: Boosting is an ensemble learning technique that combines multiple weak learners (often decision trees) to create a strong learner. A weak learner is a model that performs slightly better than random chance, but it doesn't have high predictive power on its own. Boosting sequentially builds a series of weak learners, each focusing on the mistakes made by its predecessors. The final prediction is a weighted sum of the predictions from all weak learners, with more weight given to the more accurate models.

2. Gradient: The "gradient" in Gradient Boosting refers to the gradient of a loss function with respect to the predictions made by the ensemble. In simpler terms, it's a measure of how much the loss function will change if we make slight adjustments to the model's predictions. By optimizing this loss function, we aim to improve the model's performance in each iteration.

Intuition Step by Step:

1. Initialization: The boosting process starts by creating the first weak learner (often a decision tree). The initial predictions of this model are quite poor because it's a weak learner.

2. Residuals: To improve the predictions, we calculate the residuals, which are the differences between the true values and the predictions made by the current model. These residuals represent the errors that need to be corrected.

3. Fit the Next Weak Learner: The next weak learner is trained to predict the residuals of the previous model. It focuses on the mistakes made by the previous model and tries to correct them.

4. Weighted Combination: The predictions of the new weak learner are combined with the predictions from all previous models, each multiplied by a weight representing its contribution to the ensemble.

5. Update Predictions: The updated predictions are again compared to the true values, and new residuals are calculated. The process of adding weak learners, predicting residuals, and updating the ensemble continues for a specified number of iterations (controlled by the number of trees or the learning rate).

6. Final Prediction: The final prediction is obtained by combining the predictions from all weak learners, giving more weight to those with better performance.

## 6.

Gradient boosting is an ensemble learning algorithm that builds an ensemble of weak learners, typically decision trees, to make predictions. The algorithm works by iteratively adding new weak learners to the ensemble, each of which is trained to correct the errors of the previous learners.

The gradient boosting algorithm works as follows:

1. The algorithm starts with a baseline prediction, such as the mean of the target variable.
2. A weak learner is trained to minimize the residuals between the baseline prediction and the actual target values.
3. The residuals are then updated to reflect the errors made by the weak learner.
4. A new weak learner is trained to minimize the residuals again.
5. Steps 3 and 4 are repeated until the desired number of weak learners has been added to the ensemble.

## 7.

The mathematical intuition of Gradient Boosting algorithm can be constructed in the following steps:

1. Choose a loss function. The loss function is a measure of how well the model is performing. The most common loss functions for gradient boosting are the mean squared error (MSE) and the binary cross-entropy.

2. Initialize the model. The model is initialized with some initial predictions. These predictions can be made using a simple model, such as a constant or a linear regression model.

3. Calculate the gradients. The gradients are the derivatives of the loss function with respect to the model predictions. The gradients indicate the direction in which the model predictions need to be updated in order to reduce the loss.

4. Update the model predictions. The model predictions are updated in the direction of the gradients. The amount by which the predictions are updated is determined by the learning rate.

5. Repeat steps 3-4 until the loss function is minimized. The process of updating the model predictions is repeated until the loss function is minimized.