# Random Forest (Classification & Regression)

**Random Forest** is an **Ensemble Learning algorithm** that uses **multiple Decision Trees to make predictions**.

It works using the concept of **Bagging (Bootstrap Aggregating) + Random Feature Selection**.

---

## Definition

### *Random Forest Classification* 

A Random Forest classifier builds **many decision trees**, and each tree votes for a class.
The final class is assigned based on **majority voting**.

Example:

If 100 trees are trained, and 60 vote for Class A → final prediction = Class A.


### *Random Forest Regression*

A Random Forest regressor builds many regression trees, and each tree predicts a number.
The final output is the average of all tree predictions.

Example:

If 5 trees predict: [10, 12, 11, 13, 14]

→ Final prediction = (10 + 12 + 11 + 13 + 14) / 5 = 12

---

## Why Do Random Forests Work Well?

Because each tree:

- Trains on a **different subset of data** (bootstrap sampling)

- Gets a **random subset of features at each split**

That makes each tree slightly different → reducing correlation, and resulting in:

✔️ Less overfitting

✔️ Higher accuracy

✔️ More stable predictions

---

## Core Concepts of Random Forest

### 1️⃣ Bootstrap Sampling

Each tree is trained on a random sample of the dataset with replacement.

### 2️⃣ Random Feature Selection

At each split, the tree randomly selects only a few features to choose the best split.

### 3️⃣ Ensemble Prediction

- Classification → Majority Vote

- Regression → Average Prediction

---

### Advantage

| Advantage                            | Explanation                      |
| ------------------------------------ | -------------------------------- |
| High Accuracy                        | Due to combination of many trees |
| Reduces Overfitting                  | Each tree sees different data    |
| Handles Numerical + Categorical Data | Works on mixed data              |
| Works Well on Large Datasets         | Fast and parallelizable          |
| No Need for Feature Scaling          | Trees don’t need normalization   |

### Disadvantage

| Disadvantage              | Explanation                             |
| ------------------------- | --------------------------------------- |
| Less Interpretable        | Harder to understand than a single tree |
| Slower than a single Tree | Many trees = more computation           |
| More Memory Use           | Stores many models                      |


---

### Random Forest HyperParameter

| Parameter           | Meaning                                      |
| ------------------- | -------------------------------------------- |
| `n_estimators`      | Number of trees in the forest                |
| `max_depth`         | Maximum depth of each tree                   |
| `criterion`         | impurity measure (gini, entropy, mse)        |
| `max_features`      | Number of features to consider at each split |
| `min_samples_split` | Minimum samples to split a node              |
| `min_samples_leaf`  | Minimum samples in a leaf node               |

---


## Row Sampling (Bootstrap Sampling)

Row sampling means selecting rows (samples) from the dataset randomly with replacement.

✔️ Used In:

- Bagging

- Random Forest

- Bootstrap-based models

✔️ Why do we do this?

- To create different datasets for each tree

- Makes trees less correlated

- Reduces overfitting

- Improves model stability

✔️ Example:

Original dataset = 100 rows

Each tree trains on ≈ 63–70 rows (some repeated due to replacement).

--- 

## 2️⃣ Column Sampling (Feature Sampling)

Column sampling means selecting a subset of features randomly when making a split in a decision tree.

✔️ Used In:

- Random Forest

- Extra Trees

- Gradient Boosting (optional)

✔️ Why do we do this?

- Forces each tree to learn different feature combinations

- Reduces correlation between trees

- Improves generalization

- Helps when many features are irrelevant

✔️ Example:

Total features = 20

Random Forest may choose only sqrt(20) ≈ 4 features at each split.

--- 

## 3️⃣ Combined Sampling (Row + Column Sampling)

This means applying both:

        ✔️ Row sampling

        ✔️ Column sampling

at the same time.

This is what Random Forest does.

✔️ Benefits:

- Maximum diversity between trees

- Better performance

- Stronger ensemble

- Less overfitting

✔️ Example:

- Rows → Random bootstrap samples

- Columns → Only a subset chosen at each split

--- 

## 4️⃣ Feature Sampling (Same as Column Sampling)

- Feature sampling = Column sampling.

- It means selecting some features randomly at:

- Node level (most common)

- Tree level (using max_features)

- Split level

Types:

- sqrt(n_features) → default for classification

- log2(n_features) → stronger randomization

- all features → no sampling (pure decision tree)

--- 

## All Concepts in One Table

| Concept               | What is sampled?        | Used in                | Purpose                                |
| --------------------- | ----------------------- | ---------------------- | -------------------------------------- |
| **Row Sampling**      | Samples/rows            | Bagging, Random Forest | Reduce overfitting, increase diversity |
| **Column Sampling**   | Features                | Random Forest          | Reduce correlation, improve stability  |
| **Combined Sampling** | Rows + Features         | Random Forest          | Best performance                       |
| **Feature Sampling**  | Same as column sampling | Decision Trees, RF     | Avoid dominated features               |
