# Boosting-2

#### Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression, often referred to as Gradient Boosting Machines (GBM), is a popular ensemble machine learning technique used for regression tasks. It is an extension of the gradient boosting algorithm and is designed to build a predictive model by combining the predictions of multiple weak learners, typically decision trees, to create a strong regression model.

In Gradient Boosting Regression, weak learners are trained sequentially, with each learner focusing on the residuals (the differences between the actual target values and the current ensemble's predictions) of the previous learners. The new learner tries to fit the residuals, gradually reducing the errors in the predictions. This process continues until a predefined number of weak learners (trees) is reached or until a convergence criterion is met.

#### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [3]:
import numpy as ny
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
X = ny.linspace(0, 10, 100).reshape(-1, 1)
y = ny.sin(X).ravel() + ny.random.normal(0, 0.1, X.shape[0])

# Parameters
n_estimators = 100
learning_rate = 0.1

# Initialize predictions with the mean of y
predictions = ny.full_like(y, ny.mean(y))

for i in range(n_estimators):
    # Calculate residuals
    residuals = y - predictions
    
    # Fit a decision tree to the residuals
    tree = DecisionTreeRegressor(max_depth=3)
    tree.fit(X, residuals)
    
    # Update predictions by adding the tree's prediction (scaled by learning rate)
    predictions += learning_rate * tree.predict(X)

# Evaluate the model
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

Mean Squared Error: 0.0009078224225023124
R-squared: 0.9980425519594686


#### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [None]:
from sklearn.model_selection import GridSearchCV as gcv

param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 4, 5]
}

grid = gcv(GradientBoostingRegressor(), param_grid, cv=5, scoring='neg_mean_squared_error')
grid.fit(X, y)

best_params = grid_search.best_params_
best_estimator = grid_search.best_estimator_

print("Best Hyperparameters:", best_params)
print("Best Model:", best_estimator)

#### Q4. What is a weak learner in Gradient Boosting?

A weak learner in Gradient Boosting is a simple and relatively low-complexity model that performs slightly better than random guessing. Weak learners are often decision trees with limited depth (stumps) or other simple models like linear regression. They are called "weak" because they have a limited ability to capture the underlying patterns in the data. Gradient Boosting combines multiple instances of these weak learners to create a strong predictive model.

#### Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind Gradient Boosting is to sequentially build an ensemble of weak learners, with each learner correcting the errors of the previous ones. The algorithm focuses on instances where the current ensemble makes mistakes (residuals) and aims to reduce these errors at each step. This process continues until a predefined stopping criterion is met or a specified number of weak learners are added. The final ensemble combines the predictions of all weak learners to produce a strong, accurate model.

#### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Gradient Boosting builds an ensemble of weak learners through an iterative process:
* **Initialize:** The first weak learner is trained on the original data or the residuals (differences between actual and predicted values). Initial predictions are often set to the mean of the target values.
* **Sequential Training:** In each iteration, a new weak learner is trained on the residuals produced by the current ensemble. The weak learner focuses on reducing the errors made by the current ensemble.
* **Weighted Combination:** Each weak learner is assigned a weight based on its performance. Better learners receive higher weights. Predictions from all weak learners are combined using these weights to form the ensemble's final prediction.
* **Update Predictions:** The ensemble's predictions are updated by adding the weighted predictions of the newly trained weak learner.
* **Repeat:** Steps 2-4 are repeated for a specified number of iterations or until a predefined stopping criterion (e.g., no further improvement) is met.

#### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

To construct the mathematical intuition behind Gradient Boosting, you can follow these steps:
* **Initialize Predictions:** Start with initial predictions, often set to the mean of the target values for regression problems.
* **Compute Residuals:** Calculate the residuals by subtracting the initial predictions from the actual target values.
* **Train a Weak Learner:** Fit a weak learner (e.g., decision tree) to the residuals. This weak learner aims to capture the patterns or errors in the data.
* **Calculate Weight:** Determine the weight for the newly trained weak learner based on its performance. This weight represents how much the learner's predictions influence the final ensemble.
* **Update Predictions:** Update the ensemble's predictions by adding the weighted predictions of the current weak learner.
* **Repeat:** Continue the process iteratively. In each iteration, train a new weak learner on the current residuals, calculate its weight, and update the ensemble's predictions.
* **Stopping Criterion:** Decide when to stop the iterations. This could be based on the number of iterations, a predefined level of performance, or other criteria.
* **Final Prediction:** The final prediction of the Gradient Boosting ensemble is the sum of all weighted weak learner predictions.