Transcript explains **XGBoost (Extreme Gradient Boosting)** in detail through a step-by-step example.

### Key Points:

1. **Introduction**

   * XGBoost handles classification and regression.
   * Uses sequential decision trees to improve predictions.

2. **Example Setup**

   * Dataset: `salary` + `credit score` → `credit card approval (yes/no)`.
   * Binary classification task.

3. **Step 1: Base Model**

   * Start with a base model giving constant probability (0.5).
   * Calculate **residuals** = `actual − prediction`.

4. **Step 2: First Decision Tree**

   * Use residuals as targets to build a tree.
   * Split features (`salary` or `credit`) based on **similarity score** and **gain**:

     * **Similarity score** = Σ(residual²) ÷ Σ(p × (1−p)).
     * **Gain** = (similarity left + similarity right) − similarity root.
   * Select split with highest gain.

5. **Tree Splitting Example**

   * Feature chosen: `salary ≤ 50K` vs. `> 50K`.
   * Residual values distributed across nodes.
   * Compute similarity for left, right, and root.
   * Gain guides which split is better.

6. **Further Splits**

   * Next feature `credit` is tested.
   * Multiple splitting scenarios considered (bad/good/normal).
   * Compute similarity and gain again.
   * Continue splitting until stopping criteria met.

7. **Prediction Process**

   * For a new record:

     * Base model output → transformed to **log odds**: log(p/(1−p)).
     * Pass through decision trees → add weighted contributions.
     * Apply **sigmoid activation** to convert back to probability.
   * Example shown: base model (0.5 → log odds=0), learning rate α=0.1, similarity scores multiplied by α, final probability via sigmoid (\~0.52 or 0.508 depending on record).

8. **Sequential Model Construction**

   * Final prediction = sigmoid( base learner + α₁·tree₁ + α₂·tree₂ + ... ).
   * Each new tree trained on residuals of previous step.

9. **Important Parameters**

   * **Learning rate (α):** Controls contribution of each tree, prevents overfitting.
   * **Lambda (λ):** Regularization hyperparameter included in similarity calculation (chosen via cross-validation).
   * **Cover value:** Threshold based on p(1−p). If node weight < cover, stop splitting.

10. **Extension**

    * For regression: Similar process, formulas for similarity/gain differ slightly.
    * For multiclass classification: sigmoid replaced by softmax.

