# Gradient Boosting for Classification (Step-by-Step)

## Dataset

We are using a small dataset with CGPA and binary placement status:

| CGPA | Placed |
|------|--------|
| 5.70 | 0      |
| 6.25 | 1      |
| 7.10 | 0      |
| 8.15 | 1      |
| 9.60 | 1      |

---

## Stage-Wise Additive Modeling

Gradient Boosting performs stage-wise model addition. For classification, the base model starts with **log(odds)** (logit):

### Step 1: Initial Prediction (Log-Odds)

We initialize the model using the log of odds ratio:

- Total placed = 3
- Total students = 5

$$
\text{log(odds)} = \log\left(\frac{3}{2}\right) = \log(1.5) \approx 0.405
$$

This value becomes the **initial prediction** for every instance.

### Step 2: Convert Log-Odds to Probability

We convert log-odds to predicted probabilities using the sigmoid function:

$$
p = \frac{1}{1 + e^{-0.405}} \approx 0.60
$$

So, initial predicted probability for all is **0.60**.

---

## Step 3: Residuals (Gradient of Loss)

For logistic regression, the residual is:

$$
\text{residual}_i = y_i - \hat{p}_i
$$

| CGPA | Placed (y) | Pred (p) | Residual |
|------|------------|----------|----------|
| 5.70 | 0          | 0.60     | -0.60    |
| 6.25 | 1          | 0.60     | +0.40    |
| 7.10 | 0          | 0.60     | -0.60    |
| 8.15 | 1          | 0.60     | +0.40    |
| 9.60 | 1          | 0.60     | +0.40    |

---

## Stage 2: Fit a Decision Tree to Residuals

Sort by CGPA and group residuals:

```plaintext
[5.70, -0.6]
[6.25, +0.4]
[7.10, -0.6]
[8.15, +0.4]
[9.60, +0.4]


## Step 4: Similarity Score (Gain Calculation)

To decide the best split, we use a similarity score based on the **residuals** from the previous stage.

### Formula:

$$
SS = \frac{(\sum residuals)^2}{\sum p(1 - p)}
$$

Where:
- \( p = 0.6 \)
- \( p(1 - p) = 0.6 \times (1 - 0.6) = 0.24 \)

---

## Example Split: CGPA < 7.62

### Split Details:
- **Left Node Residuals**: \[-0.6, +0.4, -0.6\]
- **Right Node Residuals**: \[+0.4, +0.4\]

---

### Left Node Similarity Score:

$$
SS_L = \frac{(-0.6 + 0.4 - 0.6)^2}{3 \times 0.24} = \frac{(-0.8)^2}{0.72} = \frac{0.64}{0.72} \approx 0.88
$$

---

### Right Node Similarity Score:

$$
SS_R = \frac{(0.4 + 0.4)^2}{2 \times 0.24} = \frac{(0.8)^2}{0.48} = \frac{0.64}{0.48} \approx 1.33
$$

---

### Total Gain:

$$
Gain = SS_L + SS_R = 0.88 + 1.33 = \boxed{2.21}
$$




## Stage 2: Model Update Using Best Split (CGPA < 7.62)

From **Stage 1**, the base model predicted:

$$
\text{Initial Log-Odds} = m_1 = \log\left(\frac{p}{1 - p}\right) = \log\left(\frac{0.6}{0.4}\right) \approx 0.405
$$

We now **add a new weak learner (decision tree)** trained on the residuals, scaled by a learning rate \( \eta = 0.3 \).

---

### Stage 2 Decision Tree Output:
Using the best split (CGPA < 7.62):

| CGPA  | Residual | Leaf Output |
|--------|-----------|---------------|
| 5.70   | -0.6     | -1.11 (Left) |
| 6.25   | +0.4     | -1.11 (Left) |
| 7.10   | -0.6     | -1.11 (Left) |
| 8.15   | +0.4     | +1.66 (Right) |
| 9.60   | +0.4     | +1.66 (Right) |

We now update the model:

$$
m_2 = m_1 + \eta \times (\text{Stage 2 Tree Output})
$$

---

### Updated Log-Odds (m₂):

| CGPA  | Stage 1 Log-Odds | Stage 2 Output | Updated Log-Odds (m₂) |
|--------|------------------|------------------|-------------------------|
| 5.70   | 0.405            | -1.11            | 0.405 + 0.3×(-1.11) = **0.072** |
| 6.25   | 0.405            | -1.11            | 0.072 |
| 7.10   | 0.405            | -1.11            | 0.072 |
| 8.15   | 0.405            | +1.66            | 0.405 + 0.3×1.66 = **0.903** |
| 9.60   | 0.405            | +1.66            | 0.903 |

---

### Convert Log-Odds to Probability:

$$
\hat{p} = \frac{e^{\text{log-odds}}}{1 + e^{\text{log-odds}}}
$$

| CGPA  | Log-Odds (m₂) | Predicted Probability (\( \hat{p} \)) |
|--------|-----------------|-----------------------------|
| 5.70   | 0.072           | \( \frac{e^{0.072}}{1 + e^{0.072}} \approx 0.518 \) |
| 6.25   | 0.072           | 0.518 |
| 7.10   | 0.072           | 0.518 |
| 8.15   | 0.903           | \( \approx 0.712 \) |
| 9.60   | 0.903           | 0.712 |

---

### Compute Residuals for Stage 3:

$$
r = y - \hat{p}
$$

| CGPA  | Placed (y) | Predicted (\( \hat{p} \)) | Residual (y - \( \hat{p} \)) |
|--------|-------------|------------------------------|--------------------------|
| 5.70   | 0           | 0.518                        | -0.518 |
| 6.25   | 0           | 0.518                        | -0.518 |
| 7.10   | 0           | 0.518                        | -0.518 |
| 8.15   | 1           | 0.712                        | +0.288 |
| 9.60   | 1           | 0.712                        | +0.288 |

---




## Stage 3: Final Weak Learner (Tree) on Stage 2 Residuals

We use the updated residuals from Stage 2 to fit the third decision tree.

### Residuals from Stage 2:

| CGPA  | Residual (Stage 2) |
|--------|--------------------|
| 5.70   | -0.518             |
| 6.25   | -0.518             |
| 7.10   | -0.518             |
| 8.15   | +0.288             |
| 9.60   | +0.288             |

### Tree Split: CGPA < 7.62

| Left Node (CGPA < 7.62)  | Residuals: -0.518, -0.518, -0.518 |
|--------------------------|-------------------------------------|
| Right Node (≥ 7.62)      | Residuals: +0.288, +0.288           |

Now calculate the similarity scores for the split.

---

### Similarity Score:

$$
SS = \frac{(\sum \text{residuals})^2}{\sum p(1 - p)} \quad \text{with } p = 0.6,\ p(1-p) = 0.24
$$

- Left Node:

$$
SS_L = \frac{(-0.518 - 0.518 - 0.518)^2}{3 \times 0.24} = \frac{(-1.554)^2}{0.72} \approx \frac{2.415}{0.72} \approx 3.35
$$

- Right Node:

$$
SS_R = \frac{(0.288 + 0.288)^2}{2 \times 0.24} = \frac{(0.576)^2}{0.48} \approx \frac{0.332}{0.48} \approx 0.69
$$

- Gain:

$$
\text{Gain} = SS_L + SS_R = 3.35 + 0.69 = 4.04
$$

Best split remains: **CGPA < 7.62**

---

### Stage 3 Tree Output:

| CGPA  | Tree Output |
|--------|-------------|
| 5.70   | -1.554      |
| 6.25   | -1.554      |
| 7.10   | -1.554      |
| 8.15   | +0.576      |
| 9.60   | +0.576      |

---

## Final Prediction (Stage 3 Model)

We now update the prediction as:

$$
\text{Final Log-Odds} = m_3 = m_2 + \eta \times (\text{Stage 3 Output})
\quad \text{with } \eta = 0.3
$$

| CGPA  | Stage 2 Log-Odds | Stage 3 Output | Final Log-Odds |
|--------|------------------|------------------|------------------|
| 5.70   | 0.072            | -1.554           | 0.072 + 0.3×(-1.554) = **-0.394** |
| 6.25   | 0.072            | -1.554           | -0.394 |
| 7.10   | 0.072            | -1.554           | -0.394 |
| 8.15   | 0.903            | +0.576           | 0.903 + 0.3×0.576 = **1.075** |
| 9.60   | 0.903            | +0.576           | 1.075 |

---

### Final Prediction (Probabilities)

$$
\hat{p} = \frac{e^{\text{final log-odds}}}{1 + e^{\text{final log-odds}}}
$$

| CGPA  | Final Log-Odds | Predicted Probability |
|--------|------------------|------------------------|
| 5.70   | -0.394           | \( \frac{e^{-0.394}}{1 + e^{-0.394}} \approx 0.402 \) |
| 6.25   | -0.394           | 0.402 |
| 7.10   | -0.394           | 0.402 |
| 8.15   | 1.075            | \( \approx 0.745 \) |
| 9.60   | 1.075            | 0.745 |

---

## ✅ Final Summary

- We performed **stage-wise additive modeling** using gradient boosting for **binary classification**.
- Each stage corrected the errors (residuals) of the previous model.
- The final output combines predictions from all weak learners to give improved probability estimates.

**Total Log-Odds**:
$$
\text{Final} = m_1 + \eta m_2 + \eta m_3
$$

**Final Probabilities** are used to make binary predictions using a threshold (e.g., 0.5).



## 🔍 Model Predictions vs. Actual Labels

| CGPA  | Actual Placement (Label) | Final Log-Odds | Predicted Probability | Predicted Class |
|--------|--------------------------|------------------|------------------------|------------------|
| 5.70   | 0                        | -0.394           | 0.402                  | 0                |
| 6.25   | 0                        | -0.394           | 0.402                  | 0                |
| 7.10   | 0                        | -0.394           | 0.402                  | 0                |
| 8.15   | 1                        | 1.075            | 0.745                  | 1                |
| 9.60   | 1                        | 1.075            | 0.745                  | 1                |

✅ **Prediction Rule**: If `probability >= 0.5`, predict **1** (Placed); else **0** (Not Placed)

🧠 **Conclusion**: The model correctly classifies all training points — showing effective learning via gradient boosting.
