# **Decision_Tree_Regression**

## **✅ What Is It?**

A Decision Tree Regressor splits your dataset into segments by asking yes/no questions on feature values — like a flowchart — to predict a continuous target.

Think of it like a "20 Questions" game that splits the data until it gets close enough to a target value.

Perfect! Let’s begin your next regression model: **Decision Tree Regression**, and as per your preference, I’ll explain everything comprehensively — not just how, but why and when to use it.

---

## 🌳 Decision Tree Regression – Step 1: The Basics

### ✅ What Is It?

A **Decision Tree Regressor** splits your dataset into segments by asking **yes/no questions** on feature values — like a flowchart — to predict a continuous target.

> Think of it like a "20 Questions" game that splits the data until it gets close enough to a target value.

---

### 🧠 Example

```plaintext
Is Present_Price > 7?
├── Yes: Is Car_Age > 3?
│   ├── Yes → predict 4.5
│   └── No  → predict 6.2
└── No: predict 2.8
```

Each leaf node gives a predicted `Selling_Price`.

---

## 🧠 Step 2: Why It's Different (Compared to Linear/Lasso/Ridge/ElasticNet/Polynomial)

| Feature                  | Decision Tree               | Linear Models (Ridge, Lasso, etc.) |
| ------------------------ | --------------------------- | ---------------------------------- |
| Model type               | Non-parametric              | Parametric (learn weights)         |
| Handles non-linearity    | ✅ Yes                       | ❌ (only Polynomial could handle)   |
| Handles interactions     | ✅ Yes (automatically)       | ❌ Only via PolynomialFeatures      |
| Feature scaling needed?  | ❌ No                        | ✅ Yes (e.g. StandardScaler)        |
| Handles categorical data | ✅ Yes (with label encoding) | ❌ No, needs one-hot                |
| Interpretability         | ✅ Yes (tree)                | ✅ (coefficients)                   |
| Overfitting risk         | ⚠️ High if not pruned       | Moderate                           |
| Feature selection        | ✅ Yes (via splits)          | ❌ Only Lasso & ElasticNet          |

---

## ✍️ Step 3: Math Behind It (Simple & Practical)

### Goal:

At each split in the tree, we want to **minimize the variance (MSE)** of the target in each branch.

### Split Logic:

At every feature and value split:

```python
MSE_split = (n_left / n_total) * MSE_left + (n_right / n_total) * MSE_right
```

It selects the feature and value that gives the **lowest total MSE** after the split.

---



## 🌳 **What is Decision Tree Regression?**

It’s a model that makes predictions by **splitting the data step-by-step**, asking simple yes/no questions at each stage.

Think of it as a **"game of 20 questions"** — but for numbers.

---

### 🎯 Goal: Predict a number (like `Selling_Price`)

Instead of trying to draw a straight line (like linear regression), it:

1. Looks at the data.
2. Finds a feature and value that splits it into 2 parts that are **more “pure”** (less varied).
3. Keeps splitting those parts again and again, like a flowchart.
4. At the end (leaf node), it **predicts the average of the samples in that group**.

---

## 🧠 Example (Very Simplified)

Imagine this tiny dataset:

| Car\_Age | Selling\_Price |
| -------- | -------------- |
| 1        | 8.0            |
| 2        | 7.5            |
| 3        | 6.5            |
| 7        | 3.0            |
| 8        | 2.5            |

Let’s say we want to split this into 2 groups.

* The algorithm checks: "What if I split at Car\_Age = 4?"

  * Group A: Age ≤ 4 → prices = \[8.0, 7.5, 6.5]
  * Group B: Age > 4 → prices = \[3.0, 2.5]

Now it asks:

> “Is this split better than splitting at Car\_Age = 6?”

It repeats this search on all features.

---

## 📏 How Does It Measure “Better”?

It uses something called **Mean Squared Error (MSE)**:

* Lower MSE = better prediction
* It chooses the split that gives the **lowest combined MSE** for both branches

---

## 🔍 Then What Happens?

* Once the best first split is found, it **recursively splits again** in both branches.
* This continues until:

  * A max depth is reached
  * The MSE is very low (pure group)
  * There are too few samples left

---

### 🌳 Visual Structure of Tree

```plaintext
Q1: Is Present_Price > 6?
├── Yes → Q2: Is Car_Age > 3?
│   ├── Yes → predict 4.2
│   └── No  → predict 6.8
└── No  → predict 3.1
```

---

## 🧠 Concept Summary

* It’s not trying to fit a line, it’s **cutting data into rectangles**
* It works well when:

  * Data is non-linear
  * Data has rule-like patterns (common in real-world systems like cars)

---

---

## 🎯 Problem: We have many features like:

```
['Present_Price', 'Car_Age', 'Kms_Driven', 'Owner', ...]
```

So how does the tree decide **where to split and which feature to use**?

---

## 🧠 Step-by-Step: How Splitting Works with Multiple Features

### 🔁 For Every Split Level in the Tree:

1. **Look at all features**, one by one.
2. For each feature, try **many possible split points**:

   * e.g. For `Present_Price`: try splitting at 4.5, 6.0, 8.0 etc.
3. For each candidate split:

   * Divide the data into left/right groups
   * Compute **Mean Squared Error (MSE)** for that split
4. Choose the split (feature + value) that gives the **lowest MSE**.

> ✅ This is called **greedy splitting**, because it always chooses the **best split at that level only**, not globally.

---

## 🔍 Mini Example

Suppose we have:

```python
features = ['Present_Price', 'Car_Age']
```

At the root level, it tries:

* `Present_Price <= 5` → MSE = 4.1
* `Car_Age <= 3` → MSE = 2.9 ✅

It picks `Car_Age <= 3`, because it gives the **lowest MSE**.

Then on the left and right subsets, it repeats the same process:

* Try both features again
* Find best local split

---

## 🔢 What If You Have 100+ Features?

It still works the same:

* It checks **each feature independently**
* Picks the **best feature + split point** at every level
* Automatically **ignores unhelpful features** (they’re not chosen for any splits)

> ⚠️ No regularization is built-in → tree can grow too complex without pruning

---

## 🧠 Summary

| Concept           | Decision Tree Behavior                 |
| ----------------- | -------------------------------------- |
| Multiple features | Tried one-by-one at each split         |
| Split chosen by   | Lowest MSE from all candidates         |
| Feature selection | Done **implicitly** via splitting      |
| Multicollinearity | Doesn’t matter — not coefficient-based |

---



# **✂️ What is Pruning in Decision Trees?**

**Pruning** means:

> "Stop the tree from growing too deep or cut it back after it's grown."

It’s used to prevent **overfitting**, where the model becomes too complex and memorizes the training data.

---

## 🌳 Why Pruning Is Needed

Without pruning:

* A tree can keep splitting until each **leaf has only 1 sample**
* This leads to:

  * **High accuracy on training data** ✅
  * **Poor accuracy on new/unseen data** ❌

> The tree becomes like a **memorizing overfit student** — not a generalizing one

---

## 🔧 Types of Pruning

### 1. **Pre-Pruning (a.k.a. Early Stopping)**

Stop tree growth **before** it becomes too deep:

| Parameter           | Meaning                            |
| ------------------- | ---------------------------------- |
| `max_depth`         | Max depth of the tree              |
| `min_samples_split` | Min samples needed to split a node |
| `min_samples_leaf`  | Min samples in a leaf node         |
| `max_leaf_nodes`    | Limit the number of final outputs  |

✅ Most commonly used and built-in in scikit-learn

---

### 2. **Post-Pruning (a.k.a. Cost-Complexity Pruning)**

* Let the tree grow fully
* Then **cut back unhelpful branches**
* Based on a trade-off between **tree complexity vs performance**

In scikit-learn:

```python
DecisionTreeRegressor(ccp_alpha=0.01)
```

Where:

* `ccp_alpha` is the **cost-complexity pruning parameter**
* Higher `ccp_alpha` = more pruning

---

## 🧠 Summary

| Term             | Explanation                             |
| ---------------- | --------------------------------------- |
| **Overfitting**  | Tree memorizes noise in training data   |
| **Pruning**      | Prevents/corrects overfitting           |
| **Pre-pruning**  | Stops splits early (e.g., max\_depth=4) |
| **Post-pruning** | Removes weak branches after training    |

---