```{contents}
```

## Assumptions

### 1. **Recursive Partitioning Can Capture the True Pattern**

* The algorithm assumes the data can be separated into **subgroups** that are relatively **homogeneous** in terms of the target class.
* Example: splitting on "Weather = Sunny" should meaningfully separate classes like "Play Tennis" vs "Don’t Play".

---

### 2. **Features Have Predictive Power**

* At least some features must **carry information about the target**.
* Otherwise, splits won’t reduce impurity, and the tree won’t learn meaningful patterns.

---

### 3. **No Linear/Distributional Assumptions**

* Unlike regression models, decision trees **don’t assume linearity** between features and target.
* They don’t assume **normality of features** or **equal variance** across classes.
  ✅ This makes them non-parametric and flexible.

---

### 4. **Features Are Independent for Splitting**

* At each split, the algorithm treats features independently and chooses the "best" one.
* It does **not assume feature independence globally**, but locally at a split it ignores feature interactions unless they show up in deeper splits.

---

### 5. **Sufficient Data for Each Split**

* Assumes there’s **enough data in each node** to compute reliable impurity measures (Gini/Entropy).
* Small datasets can make trees unstable (high variance).

---

### 6. **Target Variable is Well-Defined**

* Assumes that the target classes are **mutually exclusive and exhaustive**.
* Example: A loan application is either "Approved" or "Rejected", not both.

---

**Summary**

* **What Trees DON’T assume:** linearity, normality, equal variance, feature scaling.
* **What Trees DO assume:**

  * Recursive partitioning can separate data meaningfully.
  * Some features are predictive.
  * Enough samples exist per node to make good splits.

---

👉 This low number of assumptions is why **Decision Trees work well in practice**, especially when extended into ensembles (Random Forests, Gradient Boosting).