```{contents}
```

## Gradient Boosting Classifier

**Gradient Boosting Classifier (GBC)** is the classification version of Gradient Boosting. It builds an ensemble of weak learners (usually decision trees) in a **stage-wise fashion**, where each learner corrects the mistakes of the previous one.

---

### 1. Objective

* Given training data $(x_i, y_i)$ with $y_i \in \{0,1\}$ or $\{-1,+1\}$, the goal is to minimize a **classification loss**.
* Common choice: **Logistic loss**

$$
L(y, F(x)) = \log\big(1 + e^{-y F(x)}\big), \quad y \in \{-1,+1\}
$$

where $F(x)$ is the additive model.

---

### 2. Initialization

* Start with a constant model:

$$
F_0(x) = \frac{1}{2} \ln \frac{p}{1-p}
$$

where $p$ is the proportion of positive samples.

* This is the **log-odds of the positive class**.

---

### 3. Iterative boosting process

At each iteration $m$:

#### a) Compute pseudo-residuals

* Derivative of logistic loss wrt predictions:

$$
r_{im} = y_i - p_{i}^{(m-1)}
$$

where $p_i^{(m-1)} = \frac{1}{1+e^{-F_{m-1}(x_i)}}$ is the predicted probability.

* Intuition: residuals = *true label – predicted probability*.

---

#### b) Fit weak learner

* Train a small decision tree $h_m(x)$ on pseudo-residuals.
* The tree tries to capture patterns in the misclassified points.

---

#### c) Compute multiplier

* Find best $\gamma_m$ (step size) via line search:

$$
\gamma_m = \arg\min_\gamma \sum_{i=1}^n L\big(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)\big)
$$

---

#### d) Update model

$$
F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x)
$$

* $\nu$ = learning rate (shrinkage).

---

### 4. Final prediction

* After $M$ rounds, we have:

$$
F_M(x) = F_0(x) + \nu \sum_{m=1}^M \gamma_m h_m(x)
$$

* Convert to probability with sigmoid:

$$
p(x) = \frac{1}{1 + e^{-F_M(x)}}
$$

* Predict class:

$$
\hat{y} = \begin{cases}1 & p(x) \geq 0.5 \\ 0 & p(x) < 0.5\end{cases}
$$

---

### 5. Intuition

* Each tree is trained on the **errors of the previous ensemble**.
* Predictions are updated in small steps (learning rate).
* Over many iterations, the model improves classification boundaries.

---

**Key Features**

* Handles binary and multiclass classification (one-vs-rest or multinomial loss).
* Can use different loss functions (log-loss, exponential loss, deviance).
* Sensitive to learning rate and number of trees (requires tuning).
* More robust than AdaBoost because it uses gradient descent rather than weighting errors exponentially.
---

In [1]:
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data
X, y = make_classification(n_samples=200, n_features=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gbc.fit(X_train, y_train)

# Predictions
y_pred = gbc.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.8666666666666667
