```{contents}
```

# Cost Functions

XGBoost is **not a single model** but a **framework** that supports different cost functions (a.k.a. loss functions) depending on whether you’re solving regression or classification.

---

## Cost Functions in **XGBRegressor**

In regression, the task is to predict a continuous value. XGBRegressor uses **differentiable loss functions** that measure prediction error.

### Common loss functions:

1. **Squared Error (default)**

   $$
   L(y, \hat{y}) = \frac{1}{2}(y - \hat{y})^2
   $$

   * Penalizes larger errors more heavily.
   * Smooth and differentiable.
   * Works well when errors are normally distributed.

2. **Absolute Error (MAE)**

   $$
   L(y, \hat{y}) = |y - \hat{y}|
   $$

   * Robust to outliers (less penalty for extreme values).
   * Slower optimization since gradient is less smooth.

3. **Huber Loss** (mix between MSE & MAE)

   $$
   L(y, \hat{y}) =
   \begin{cases}
   \frac{1}{2}(y - \hat{y})^2 & |y - \hat{y}| \leq \delta \\
   \delta |y - \hat{y}| - \frac{1}{2}\delta^2 & |y - \hat{y}| > \delta
   \end{cases}
   $$

   * Balances robustness and sensitivity.

4. **Quantile Loss**

   $$
   L(y, \hat{y}) = \max(\alpha(y - \hat{y}), (1-\alpha)(\hat{y} - y))
   $$

   * Useful for prediction intervals (not just mean).

🔑 In practice, XGBRegressor defaults to **squared error loss** unless specified (`objective="reg:squarederror"`).

---

## Cost Functions in **XGBClassifier**

For classification, the task is to predict probabilities (then assign classes).
The cost functions measure **probability calibration** (how close predicted probabilities are to true labels).

### Common loss functions:

1. **Logistic Loss (Binary Classification)**

   $$
   L(y, \hat{p}) = - \big( y \log(\hat{p}) + (1-y) \log(1 - \hat{p}) \big)
   $$

   where $\hat{p} = \sigma(\hat{y}) = \frac{1}{1+e^{-\hat{y}}}$.

   * Penalizes wrong confident predictions heavily.
   * Optimized with Newton’s method (second-order gradients).
   * Default for `objective="binary:logistic"`.

2. **Softmax Loss (Multiclass Classification)**
   For $K$ classes:

   $$
   L(y, \hat{p}) = - \sum_{k=1}^{K} \mathbf{1}_{y=k} \log(\hat{p}_k)
   $$

   where

   $$
   \hat{p}_k = \frac{e^{\hat{y}_k}}{\sum_{j=1}^{K} e^{\hat{y}_j}}
   $$

   * Standard cross-entropy loss.
   * Used when `objective="multi:softprob"` or `"multi:softmax"`.

3. **Hinge Loss (SVM-style, optional)**

   $$
   L(y, \hat{y}) = \max(0, 1 - y\hat{y})
   $$

   * Focuses on the **margin** between classes.
   * Less probabilistic, more decision-boundary focused.

---

### Intuition: Why these losses?

* Regression losses → measure **distance** between prediction & actual values.
* Classification losses → measure **probability calibration** (confidence in correct class).
* All are **differentiable**, allowing **gradient boosting** with 1st & 2nd order derivatives.

---

**Summary:**

* **XGBRegressor** → squared error, MAE, Huber, quantile.
* **XGBClassifier** → logistic loss (binary), softmax loss (multiclass), hinge loss (SVM-like).
* Loss choice depends on whether you want robustness to outliers, probability calibration, or hard-margin separation.
---