<a href="https://colab.research.google.com/github/golu628/assignment/blob/main/boosting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Q1. What is Gradient Boosting Regression?
Gradient Boosting Regression is an ensemble learning technique that builds a strong predictive model by combining multiple weak learners, typically decision trees. It builds models sequentially, where each model tries to correct the errors of its predecessor using the gradient of a loss function.

Q2. Implement Gradient Boosting Regression from Scratch
Here's a simple implementation using Python and NumPy:

python
Copy
Edit
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Generate a simple dataset
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(scale=0.1, size=100)

# Gradient Boosting from scratch
class GradientBoostingRegressorScratch:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []

    def fit(self, X, y):
        y_pred = np.zeros(len(y))
        for i in range(self.n_estimators):
            residual = y - y_pred
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residual)
            update = tree.predict(X)
            y_pred += self.learning_rate * update
            self.models.append(tree)

    def predict(self, X):
        y_pred = np.zeros(X.shape[0])
        for tree in self.models:
            y_pred += self.learning_rate * tree.predict(X)
        return y_pred

# Train model
model = GradientBoostingRegressorScratch(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(X, y)
y_pred = model.predict(X)

# Evaluate
print("MSE:", mean_squared_error(y, y_pred))
print("R² Score:", r2_score(y, y_pred))
Q3. Experiment with Hyperparameters
You can use Grid Search to try combinations of learning rates, estimators, and depths:

python
Copy
Edit
from sklearn.model_selection import ParameterGrid

param_grid = {
    'n_estimators': [50, 100],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [2, 3, 4]
}

best_score = -np.inf
best_params = None

for params in ParameterGrid(param_grid):
    model = GradientBoostingRegressorScratch(**params)
    model.fit(X, y)
    score = r2_score(y, model.predict(X))
    if score > best_score:
        best_score = score
        best_params = params

print("Best R² Score:", best_score)
print("Best Parameters:", best_params)
Q4. What is a Weak Learner in Gradient Boosting?
A weak learner is a model that performs slightly better than random guessing. In Gradient Boosting, decision trees with shallow depth (often 1–3) are used as weak learners. Each weak learner tries to correct the residuals (errors) of the combined model so far.

Q5. What is the Intuition Behind Gradient Boosting?
Think of it as gradient descent in function space:

Start with a simple prediction (e.g., the mean).

At each step, compute the residuals (what the current model is getting wrong).

Fit a weak model to these residuals.

Update the ensemble by adding the predictions of the new model, scaled by a learning rate.

Each new model reduces the error gradually, similar to how gradient descent reduces the loss.

Q6. How Does Gradient Boosting Build an Ensemble?
Initialize the model with a base prediction (e.g., mean of targets).

For each iteration:

Compute residuals (negative gradients of loss function).

Train a weak learner (e.g., decision tree) on the residuals.

Update the model by adding this weak learner's predictions.

Final prediction is the sum of all weak learners’ outputs scaled by the learning rate.

Q7. Steps to Construct Mathematical Intuition of Gradient Boosting
Define a loss function, e.g., Mean Squared Error.

Initialize the model with a constant prediction that minimizes the loss.

For each iteration:

Compute the negative gradient (residuals) of the loss function.

Fit a weak learner to predict this gradient.

Update the model:
𝐹
𝑚
(
𝑥
)
=
𝐹
𝑚
−
1
(
𝑥
)
+
𝜂
⋅
ℎ
𝑚
(
𝑥
)
F
m
​
 (x)=F
m−1
​
 (x)+η⋅h
m
​
 (x)
where
𝜂
η is the learning rate,
ℎ
𝑚
(
𝑥
)
h
m
​
 (x) is the new tree.

Repeat for a fixed number of iterations or until convergence.

