# XGBoost (Extreme Gradient Boosting) Classifier

In this video, we are going to discuss a new machine learning algorithm called **XGBoost**, which stands for **Extreme Gradient Boosting**. This algorithm can solve both **classification** and **regression** problems.

We will focus on a **classification example** and understand how a decision tree is constructed sequentially in XGBoost, including important parameters and the function equation for calculating the final output.

---

## Dataset

- Input Features: `Salary` and `Credit Score`
- Output Feature: `Credit Card Approval` (binary classification)

---

## Step 1: Create Base Model

For **binary classification**, the base model outputs a **probability**:

$$
\hat{y}_0 = 0.5
$$

This ensures that the base model is **unbiased**.

---

## Step 2: Compute Residuals

Residuals for the first decision tree are computed as:

$$
r_i = y_i - \hat{y}_i
$$

Example residuals:

| Record | $y_i$ | $\hat{y}_0$ | $r_i$ |
|--------|-------|-------------|-------|
| 1      | 0     | 0.5         | -0.5  |
| 2      | 1     | 0.5         | 0.5   |
| 3      | 1     | 0.5         | 0.5   |
| 4      | 0     | 0.5         | -0.5  |
| 5      | 1     | 0.5         | 0.5   |
| 6      | 1     | 0.5         | 0.5   |
| 7      | 1     | 0.5         | 0.5   |

---

## Step 3: Construct Decision Tree

- Input features: `Salary` and `Credit Score`
- Output feature: Residuals \( r_i \)

Split example:

- Feature: `Salary`
- Threshold: `<= 50K` and `> 50K`

Residuals for splits:

- **Left Child (<= 50K):** `[-0.5, 0.5, 0.5, 0.5]`
- **Right Child (> 50K):** `[-0.5, 0.5, 0.5]`

---

## Step 4: Calculate Similarity Score

Similarity score formula:

$$
\text{Similarity Score} = \frac{\sum_i r_i^2}{\sum_i p_i (1 - p_i)}
$$

Where \( p_i \) is the probability from the base model.

- **Left Child:** 

$$
\text{Similarity} = \frac{(-0.5)^2 + 0.5^2 + 0.5^2 + 0.5^2}{0.5(1-0.5) \times 4} = 0
$$

- **Right Child:** 

$$
\text{Similarity} = \frac{(-0.5)^2 + 0.5^2 + 0.5^2}{0.5(1-0.5) \times 3} = 0.33
$$

- **Root Node:** 

$$
\text{Similarity} = 0.14
$$

---

## Step 5: Calculate Gain

$$
\text{Gain} = \text{Similarity}_{\text{Left}} + \text{Similarity}_{\text{Right}} - \text{Similarity}_{\text{Root}}
$$

Example:

$$
\text{Gain} = 0 + 0.33 - 0.14 = 0.19
$$

Choose the feature with **highest gain** for splitting.

---

## Step 6: Further Splits

- Next feature: `Credit Score`
- Consider splits like `Bad` vs `Good/Normal`
- Compute similarity score and gain for each split
- Continue until stopping criteria (e.g., **cover value** threshold) is reached

Cover value formula:

$$
\text{Cover} = p (1 - p)
$$

- Stop splitting if similarity weight < cover value

---

## Step 7: Predicting New Data

1. Pass new data through **base model** and compute **log-odds**:

$$
\text{log-odds} = \log\frac{p}{1-p}
$$

- For base probability 0.5: 

$$
\text{log-odds} = \log\frac{0.5}{0.5} = 0
$$

2. Pass through **decision tree(s)** sequentially, apply **learning rate** \( \alpha \):

$$
F(x) = \hat{y}_0 + \alpha_1 H_1(x) + \alpha_2 H_2(x) + \dots
$$

3. Apply **sigmoid activation function** to get probability:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

- Example:

$$
\hat{y}_1 = \sigma(0 + 0.1 \times 1) = 0.52
$$

- For another record with residual 0.33:

$$
\hat{y}_2 = \sigma(0 + 0.1 \times 0.33) = 0.508
$$

---

## Step 8: Repeat

- Compute new residuals \( R_2, R_3, \dots \)
- Construct next decision tree using residuals as output
- Continue until the desired number of trees is constructed
- Learning rate \( \alpha \) helps prevent overfitting

---

## Summary

The **XGBoost classifier** prediction is:

$$
\hat{y} = \sigma\Big(\hat{y}_0 + \alpha_1 H_1(x) + \alpha_2 H_2(x) + \dots + \alpha_n H_n(x)\Big)
$$

- Base model: un-biased probability
- Sequential trees trained on residuals
- Similarity score and gain used to select splits
- Learning rate and cover value prevent overfitting
- Sigmoid function converts log-odds to probability
- For multiclass, sigmoid → softmax

---

This process applies similarly to **regression**, with changes only in **similarity weight** and **gain formulas**.


# XGBoost Regression Machine Learning Algorithm

In this video, we are going to discuss the **XGBoost Regression** machine learning algorithm.

Similar to the **XGBoost Classification** algorithm, we will take a simple dataset and see how **sequential decision trees** are created. Then we will compare the differences between both.

---

## 1. Problem Statement

We have a **regression dataset**:

- **Input Features:** `experience` and `gap`
- **Output Feature (Dependent):** `salary`

Goal: Predict **salary** based on `experience` and `gap`.

This is a **regression problem**, and XGBoost regressor can solve it efficiently.

---

## 2. XGBoost Classifier Recap

For XGBoost classifier:

- We use **similarity weight** to construct decision trees.
- **Similarity weight formula (classification):**

$$
w = \frac{\sum (\text{residual})^2}{\sum p(1-p) + \lambda}
$$

- **Gain** is calculated to decide the best split.

---

## 3. XGBoost Regressor: Steps

### Step 1: Base Model
- Base model predicts the **average** of the output:

$$
\text{Base Model Output} = \frac{40 + 42 + 52 + 60 + 62}{5} \approx 51
$$

- Let this predicted value be $\hat{y} = 51$ K.

### Step 2: Compute Residuals
- Residuals ($r_1$) = Actual Salary - Base Model Output

| Salary | Residual |
|--------|----------|
| 40     | -11      |
| 42     | -9       |
| 52     | 1        |
| 60     | 9        |
| 62     | 11       |

### Step 3: Construct Decision Tree
- Inputs: $x_i = \text{experience, gap}$
- Output: $r_1$

**Example Split:**
- Feature: `experience`
- Threshold: 2

| Condition          | Residuals      |
|-------------------|----------------|
| experience ≤ 2     | -11             |
| experience > 2     | -9, 1, 9, 11   |

---

### Step 4: Calculate Similarity Weight
**For Regression:**

$$
w = \frac{\sum (\text{residual})^2}{\text{number of residuals} + \lambda}
$$

- **Left child (≤2):**

$$
w_\text{left} = \frac{(-11)^2}{1 + 1} = \frac{121}{2} = 60.5
$$

- **Right child (>2):**

$$
w_\text{right} = \frac{(-9+1+9+11)^2}{4 + 1} = \frac{142^2?}{5} \approx 28.5
$$

- **Root Node:**

$$
w_\text{root} = 1.16
$$

---

### Step 5: Calculate Gain
**Gain formula:**

$$
\text{Gain} = w_\text{left} + w_\text{right} - w_\text{root}
$$

- Example Calculation:

$$
\text{Gain} = 65.5 + 28.5 - 0.16 = 98.34
$$

- If a different threshold gives higher gain (e.g., 143.42), choose that split.

---

### Step 6: Further Splitting
- Further splits can be based on `gap`.
- Base learner output = 51 K
- Decision Tree 1 output depends on splits and residual averages.
- Example for `experience > 2.5` and `gap = no`:

$$
\text{Decision Tree Output} = \text{Average}(1, 9) = 5
$$

---

### Step 7: Predicting New Values
- **Predicted output** formula:

$$
\hat{y}_\text{new} = \text{Base Model Output} + \alpha \cdot (\text{Decision Tree Output})
$$

- Example (learning rate $\alpha = 0.1$):

$$
\hat{y}_\text{new} = 51 + 0.1 \cdot 5 = 51.5
$$

- For record with `experience = 2` and `gap = yes`:

$$
\hat{y}_\text{new} = 51 + 0.1 \cdot (-10) = 49.9
$$

- Similarly, compute predicted values for all records.

---

### Step 8: Constructing Next Tree
- Use **residuals from previous tree** as the new output $r_2$.
- Repeat steps with new residuals until multiple decision trees are created sequentially.

---

## 4. Summary

- **XGBoost Regressor** builds **sequential decision trees** using residuals.
- **Similarity Weight (Regression):**

$$
w = \frac{\text{residual}^2}{\text{number of residuals} + \lambda}
$$

- **Similarity Weight (Classification):**

$$
w = \frac{\text{residual}^2}{\sum p(1-p) + \lambda}
$$

- **Gain** is used to select the best split.
- **Final prediction**:

$$
\hat{y} = \text{Base Model} + \sum \alpha_i \cdot (\text{Tree}_i \text{ output})
$$

- **Hyperparameters:** Learning rate $\alpha$, lambda $\lambda$, number of trees, etc.

---

This process is similar to XGBoost classification but tailored for **regression problems**.

---

