```{contents}
```

# Cost Functions

## 1. **Mean Squared Error (MSE)**

* The most common cost function.
* At each split, the tree chooses the feature & threshold that **minimizes the variance of the target values**.

$$
MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y})^2
$$

* Here:

  * $y_i$ = actual value
  * $\hat{y}$ = predicted value (mean of samples in that leaf)
  * $n$ = number of samples in the node

👉 Minimizing MSE means nodes will group data where target values are close together.

---

## 2. **Mean Absolute Error (MAE)**

$$
MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}|
$$

* Uses the **median** of values in the node for prediction (instead of mean).
* More **robust to outliers** than MSE.

---

## 3. **Friedman’s Mean Squared Error (Friedman MSE)**

* A variation of MSE used in `scikit-learn`.
* Adds a correction term to reduce bias when splitting nodes, especially useful in **gradient boosting** trees.

---

## 4. **Poisson (for count regression)**

* For target values that represent **counts** (non-negative integers).
* Cost function is based on **Poisson deviance**:

$$
D(y, \hat{y}) = 2 \sum_{i=1}^n \left( y_i \log \frac{y_i}{\hat{y}_i} - (y_i - \hat{y}_i) \right)
$$

---

**Summary**

* **MSE** → default, sensitive to outliers, good for general regression.
* **MAE** → robust to outliers, gives median-based predictions.
* **Friedman MSE** → specialized, often used in ensembles.
* **Poisson** → best for count-based data.
