# ⚡ Random Forest Workflow

Random Forest is an **ensemble of decision trees** built using **Bagging + Random Feature Selection**.

---

## 🔹 Step 1: Prepare the Dataset

* Start with the original dataset with $N$ samples and $p$ features.

---

## 🔹 Step 2: Bootstrap Sampling

* Create $B$ different training sets by sampling from the original dataset **with replacement** (bootstrap).
* Each tree gets a slightly different dataset.

---

## 🔹 Step 3: Grow Decision Trees

* For each bootstrap sample, grow a decision tree using the following rules:

  1. At each split, **randomly select a subset of features** (say $m < p$).
  2. Choose the best feature from this subset using a split criterion (e.g., **Gini impurity** or **Entropy**).
  3. Repeat until stopping criteria (like max depth, min samples per leaf) are met.

---

## 🔹 Step 4: Build the Forest

* Repeat Step 2 and Step 3 for $B$ trees.
* Each tree is **independent** of the others.

---

## 🔹 Step 5: Make Predictions

* For **classification**:

  $$
  \hat{y} = \text{majority vote}\{h_1(x), h_2(x), \dots, h_B(x)\}
  $$
* For **regression**:

  $$
  \hat{y} = \frac{1}{B}\sum_{b=1}^B h_b(x)
  $$


### 🔹 Components

* $\hat{y}$ → final prediction for input $x$.
* $B$ → total number of base models (e.g., number of decision trees).
* $h_b(x)$ → prediction from the $b$-th model (tree).
* $\frac{1}{B}\sum_{b=1}^B$ → average across all model predictions.
---

## 🔹 Step 6: Evaluate Model (Optional)

* Use **Out-of-Bag (OOB) samples** (data not included in a tree’s bootstrap sample) for validation.

---

# 🚀 Intuition

* **Bagging** reduces variance by averaging over many trees.
* **Random feature selection** ensures trees are diverse (reduces correlation).
* Together, Random Forest provides **high accuracy and robustness**.

---

✅ In short:
Dataset → Bootstrap samples → Randomized trees → Ensemble (voting/averaging) → Final prediction.
