# 🌲 Out-of-Bag (OOB) Evaluation

## 1. Why OOB?

When building each tree in a Random Forest, we use **bootstrap sampling**:

* We sample $N$ data points *with replacement* from the training set of size $N$.
* On average, about **63%** of the training samples get included in the bootstrap sample.
* The remaining **37%** are **not used** for training that tree — these are called **Out-of-Bag (OOB) samples**.

These OOB samples can act like a **test set** for that tree.

---

## 2. How it works mathematically

Let training set = $\{(x_i, y_i)\}_{i=1}^N$.
For each tree $t$:

* $B_t$ = bootstrap sample used to train tree $t$.
* $OOB_t$ = data points not in $B_t$.

Now:

* For each data point $x_i$, collect predictions only from the trees where $x_i \in OOB_t$.
* Aggregate those predictions:

For **classification**:

$$
\hat{y}_i^{OOB} = \arg\max_{c} \sum_{t: x_i \in OOB_t} \mathbb{1}(\hat{y}_t(x_i) = c)
$$

For **regression**:

$$
\hat{y}_i^{OOB} = \frac{1}{|T_i|} \sum_{t: x_i \in OOB_t} \hat{y}_t(x_i)
$$

where $T_i = \{t : x_i \in OOB_t\}$.

Then compute the overall **OOB error**:

$$
OOB\ Error = \frac{1}{N} \sum_{i=1}^N \mathbb{1}(\hat{y}_i^{OOB} \neq y_i)
$$

(for classification, replace with MSE for regression).

---

## 3. Key Insights

* **No separate validation set needed** — OOB acts like cross-validation.
* OOB error is an **unbiased estimator** of test error.
* Saves computation since it reuses the training process.

---

## 4. Example Intuition

Imagine 100 trees:

* Each sample is OOB in about 37 trees (on average).
* Those trees act like mini test predictors for that sample.
* At the end, you’ve tested every point multiple times **without ever using it in training** for those trees.

---

✅ **Summary**:
OOB evaluation is like having a **free built-in cross-validation** in Random Forests. It uses the \~37% of data not seen by each tree to estimate performance, making Random Forest efficient and self-validating.

