### Decision trees

####  Introduction
A **Decision Tree** is a supervised machine learning algorithm used for **classification** and **regression** tasks. It models decisions based on **if-else conditions**, splitting data at each node based on the most significant feature.

---

### Key Concepts

#### Root Node
- The **starting point** of the tree.
- Represents the entire dataset, which is split into child nodes.

#### Splitting
- The process of dividing nodes into **sub-nodes** based on feature conditions.

#### Decision Node
- A node that **splits** into further sub-nodes.

#### Leaf Node
- The **final** output node (contains no further splits).
- In classification, it represents **class labels**.
- In regression, it holds a **continuous value**.

#### Pruning
- Reducing tree size by **removing branches** to avoid **overfitting**.

---

####  How Decision Trees Work

* **Select the Best Feature**:  
   - Uses criteria like **Gini Impurity**, **Entropy (Information Gain)**, or **Variance Reduction**.

* **Split the Data**:  
   - The feature with the best **split criterion** is chosen to divide the dataset.

* **Repeat Until a Stopping Condition is Met**:  
   - Stopping criteria can be **maximum depth**, **minimum samples per node**, etc.

* **Make Predictions**:  
   - Traverse from the root to a **leaf node** based on feature values.

---

####  Splitting Criteria

####  **For Classification:**
1. **Gini Impurity**  
   \[
   Gini = 1 - \sum p_i^2
   \]  
   - Measures **impurity** in a node (lower is better).

2. **Entropy & Information Gain**  
   \[
   Entropy = -\sum p_i \log_2 p_i
   \]  
   - Measures uncertainty in the dataset.

   \[
   \text{Information Gain} = \text{Entropy(Parent)} - \sum \text{Weighted Entropy(Child)}
   \]
   - Higher **Information Gain** means a better split.

####  **For Regression:**
- Uses **Mean Squared Error (MSE)** or **Mean Absolute Error (MAE)**.

---

####  Advantages of Decision Trees

- **Easy to interpret & visualize**  
- **No need for feature scaling** (e.g., Standardization)  
- **Handles both numerical & categorical data**  
- **Works well with missing values**  
- **Performs feature selection automatically**  

---

##  Disadvantages of Decision Trees

- **Prone to overfitting** (Deep trees memorize data)  
- **Sensitive to noisy data**  
- **Not optimal for large datasets** (Better alternatives: Random Forests)  

---




In [2]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import graphviz


In [3]:
iris = load_iris()
X = iris.data
y = iris.target


In [5]:
X[:10]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [6]:
y[:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
