## Question - 1
ans - 

Gradient Boosting Regression is a machine learning technique used for building regression models. It is a type of ensemble learning method that combines the predictions of multiple weak learners (typically decision trees) sequentially to create a strong predictive model.

## Question - 2
ans - 

In [1]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression


x, y = make_regression(n_samples = 1000 , n_features = 2 , noise=0.5 , random_state=0)


x_train , x_test , y_train , y_test = train_test_split(x, y , test_size= 0.20 , random_state = 0)



gb_regressor = GradientBoostingRegressor(n_estimators=100 , learning_rate=0.1)

gb_regressor.fit(x_train , y_train)

In [2]:
y_pred = gb_regressor.predict(x_test)

In [3]:
from sklearn.metrics import r2_score , mean_squared_error


print(r2_score(y_pred , y_test))
print(mean_squared_error(y_pred , y_test))

0.9930457722283429
19.958906902781127


## Question - 3
ans - 

In [4]:
from sklearn.model_selection import GridSearchCV


parameters = {'loss':['squared_error', 'absolute_error'],
             'learning_rate':[0.1],
             'n_estimators':[50,100,150],
             'criterion':['friedman_mse' , 'squared_error'],
             'max_depth':[3,4,5],
             'alpha':[0.3,0.6]}


grid_regressor = GridSearchCV(GradientBoostingRegressor() , param_grid= parameters)

In [5]:
grid_regressor.fit(x_train , y_train)

In [6]:
grid_regressor.best_params_

{'alpha': 0.6,
 'criterion': 'squared_error',
 'learning_rate': 0.1,
 'loss': 'squared_error',
 'max_depth': 3,
 'n_estimators': 150}

## Question - 4
ans - 

In the context of gradient boosting, a weak learner refers to a simple predictive model that performs slightly better than random guessing on a given learning task. Weak learners are typically simple models with limited complexity, such as shallow decision trees or linear models.

* Characteristics of Weak Learners:

1. Limited Complexity:

Weak learners are intentionally kept simple to prevent overfitting and improve generalization performance.
For example, decision trees with a small number of nodes or depth are commonly used as weak learners in gradient boosting.


2. Performance Slightly Better Than Random:

While weak learners may not perform well on their own, they should still provide some predictive power that is slightly better than random guessing.
Weak learners are often referred to as "weak" because their individual predictive performance is modest compared to more complex models.


3. Ensemble Learning:

In gradient boosting, weak learners are combined into a strong predictive model by sequentially adding them to the ensemble and optimizing their contributions to minimize the overall error of the model.

## Question - 5
ans - 

The intuition behind the Gradient Boosting algorithm can be understood by breaking down its key components and how they work together to build a strong predictive model. Here's a simplified explanation of the intuition behind Gradient Boosting:

1. Weak Learners:
Gradient Boosting combines multiple weak learners (typically decision trees) sequentially to form a strong ensemble model.
Each weak learner is trained on the residuals (errors) of the previous learners, focusing on capturing the remaining patterns in the data that were not captured by the previous models.

2. Sequential Model Building:
The algorithm builds the ensemble model iteratively, adding one weak learner at a time.
At each iteration, a new weak learner is trained to correct the errors made by the current ensemble of models.

3. Gradient Descent:
Gradient Boosting uses a gradient descent optimization technique to minimize the loss function (e.g., mean squared error) of the model.
At each iteration, the algorithm calculates the gradient of the loss function with respect to the predictions made by the current ensemble model.
It then fits a new weak learner to the negative gradient (residuals) of the loss function, effectively moving the model in the direction that minimizes the loss.

4. Additive Model:
The predictions of the ensemble model are the sum of the predictions made by all the individual weak learners.
Each weak learner contributes a small amount to the overall prediction, and the contributions are weighted by a learning rate parameter.

5. Regularization:
Gradient Boosting incorporates regularization techniques to prevent overfitting and improve model generalization.
Regularization is achieved through techniques such as shrinkage (learning rate), limiting the complexity of weak learners, and adding constraints on the model parameters.

## Question - 6
ans - 

## 1. Initialization:

The algorithm starts by initializing the ensemble model with a simple weak learner, typically a decision tree with shallow depth (few nodes) or even a single node.
This initial weak learner makes predictions based on the average of the target variable (for regression) or the majority class (for classification).

## 2. Iterative Training:

At each iteration (or boosting round), the algorithm adds a new weak learner to the ensemble.
The new weak learner is trained to predict the residuals (errors) of the current ensemble model rather than the original target variable.
The residuals are the differences between the actual target values and the predictions made by the current ensemble model.

## 3. Residual Calculation:

After each iteration, the residuals are updated based on the difference between the actual target values and the predictions made by the current ensemble model.
The residuals represent the part of the target variable that has not been captured by the current ensemble model.

## 4. Gradient Descent Optimization:

Gradient Boosting uses a gradient descent optimization technique to minimize the loss function (e.g., mean squared error for regression or cross-entropy loss for classification).
At each iteration, the algorithm calculates the negative gradient of the loss function with respect to the predictions made by the current ensemble model.
It then fits a new weak learner to the negative gradient (residuals), effectively moving the model in the direction that minimizes the loss.

## 5. Weighted Sum of Predictions:

The predictions of the ensemble model are the weighted sum of the predictions made by all the individual weak learners.
Each weak learner contributes a small amount to the overall prediction, and the contributions are weighted by a learning rate parameter.

## 6. Regularization:

Gradient Boosting incorporates regularization techniques to prevent overfitting and improve model generalization.
Regularization is achieved through techniques such as shrinkage (learning rate), limiting the complexity of weak learners, and adding constraints on the model parameters.

## 7. Stopping Criterion:

The algorithm continues to add weak learners to the ensemble until a stopping criterion is met, such as reaching a maximum number of iterations, achieving a minimum improvement in the loss function, or reaching a maximum depth for the weak learners.

## Question - 7
ans - 

Constructing the mathematical intuition behind the Gradient Boosting algorithm involves understanding the underlying principles of gradient descent optimization, the concept of residuals, and the additive nature of the ensemble model. Here are the key steps involved in building the mathematical intuition of Gradient Boosting:

## 1. Loss Function:

* Start with a defined loss function that measures the discrepancy between the actual target values and the predictions made by the model.

* For regression problems, the loss function could be mean squared error (MSE), while for classification problems, it could be cross-entropy loss.

## 2. Gradient Descent Optimization:

* Understand the concept of gradient descent, which is an optimization algorithm used to minimize the loss function.

* Gradient descent iteratively updates the model parameters (weights) in the direction that reduces the loss function the most.

* Calculate the gradient of the loss function with respect to the model predictions to determine the direction of steepest descent.


## 3. Residuals:

* Define the residuals as the differences between the actual target values and the predictions made by the current model.

* Residuals represent the errors or discrepancies in the predictions that the model has not yet captured.


## 4. Weak Learners:

* Introduce the concept of weak learners, which are simple models (e.g., decision trees) that perform slightly better than random guessing.

* Each weak learner is trained to predict the residuals of the current model rather than the original target values.

* Weak learners focus on capturing the remaining patterns in the data that were not captured by the previous models.


## 5. Additive Model:

* Understand that Gradient Boosting constructs an ensemble model by sequentially adding weak learners to the ensemble.

* Each weak learner contributes a small amount to the overall prediction, and the contributions are weighted by a learning rate parameter.

* The predictions of the ensemble model are the sum of the predictions made by all the individual weak learners.

## 6. Gradient Boosting Algorithm:

* Combine the concepts of gradient descent optimization, residuals, weak learners, and additive model to formulate the Gradient Boosting algorithm.

* At each iteration, calculate the negative gradient of the loss function with respect to the predictions made by the current ensemble model.

* Fit a new weak learner to the negative gradient (residuals), effectively moving the model in the direction that minimizes the loss.

## 7. Regularization:

* Consider the incorporation of regularization techniques in Gradient Boosting to prevent overfitting and improve model generalization.

* Regularization techniques may include shrinkage (learning rate), limiting the complexity of weak learners, and adding constraints on the model parameters.