## 🌳 **Assumptions in Decision Trees**

### 1. **Features split data meaningfully**

* Assumes that some features (or combinations) can split the data into groups that are closer to being “pure.”
* Example: Outlook = *Overcast* always → Play Tennis = *Yes*.
* If features don’t provide meaningful splits, the tree struggles.

---

### 2. **Greedy splitting is “good enough”**

* At each node, the tree chooses the *best split locally* (using Entropy, Gini, etc.).
* Assumes that this greedy approach leads to a reasonably good global structure.
* It does **not** check all possible tree structures (too expensive).

---

### 3. **Independence between features (conditional on splits)**

* Once a split is made, the tree assumes that subsequent splits don’t need to revisit earlier features.
* Order of splits matters — once a feature is used, it won’t usually be reused above.

---

### 4. **Data is sufficient to avoid overfitting**

* Trees assume there’s enough training data to learn meaningful splits.
* Small datasets → risk of capturing noise (overfitting).
* That’s why pruning or ensemble methods (Random Forests, XGBoost) are used.

---

### 5. **Splitting criteria reflect “purity” correctly**

* Assumes measures like **Entropy**, **Gini Index**, or **Information Gain** truly capture which feature is best.
* In skewed datasets, these measures can sometimes be misleading.

---

### 6. **Stationarity of data**

* Assumes the relationship between features and target remains consistent in train vs. test (no major distribution shift).
* Otherwise, splits learned won’t generalize.

---

✅ **Key takeaway**: Decision trees don’t assume linearity, normality, or homoscedasticity like regression models. Their main assumptions are: *features provide useful splits, greedy local decisions lead to a good global tree, and enough data exists to prevent overfitting*.
