# **<font color='darkorange'>Decision Tree Learning Tutorial</font>**

---

## **<font color='darkorange'>1. Introduction to Decision Tree Algorithm</font>**

A **Decision Tree** is one of the most popular machine learning algorithms. It uses a **tree-like structure** to represent decisions and their possible outcomes.  

### **Key Features**:
- **Supervised Learning**: It can be used for **Classification** and **Regression** tasks.
- **Tree Representation**: A decision tree consists of:
  - **Root Node**: The starting point.
  - **Internal Nodes**: Represent tests on attributes.
  - **Branches**: Denote outcomes of the tests.
  - **Leaf Nodes**: Represent class labels or regression outputs.

<img  src="https://drive.google.com/uc?id=1WOpBBe9pX9uAPZKzepKfP2iseTkhMyhs"/>

### **Assumptions**:
1. The whole training set is considered as the **root** at the start.
2. Feature values need to be **categorical**. If values are continuous, they are **discretized**.
3. Records are split **recursively** based on attribute values.
4. The order of placing attributes is determined using **statistical measures** like **Gini Index** or **Entropy**.

---

## **<font color='darkorange'>Classification and Regression Trees (CART)</font>**

The modern name for decision trees is **CART**, which stands for **Classification and Regression Trees**.

### **What is CART?**
- **Classification**: Predicts a **categorical outcome**.  
- **Regression**: Predicts a **continuous outcome**.  

### **History**:
The term **CART** was introduced by **Leo Breiman**, and it forms the basis for advanced ensemble algorithms like:
- **Bagged Trees**
- **Random Forests**
- **Boosted Trees**

---
## **<font color='darkorange'>Classification Tree Example</font>**

<img src="https://drive.google.com/uc?id=18fn15kdDpYto-fbnUeiP_REDMwiLrdYC" width=700/>

---
## **<font color='darkorange'>Regression Tree Example</font>**

<img src="https://drive.google.com/uc?id=1lLgkWbcUX4ggRRKtMYVPuJW7tnQFG1nM" width=700/>
---


# **<font color='darkorange'>Decision Tree Learning Tutorial</font>**

---

## **<font color='darkorange'>1. Steps to Create a Decision Tree</font>**

To create a **Decision Tree**, follow these steps:

### **Step 1: Start with the Entire Dataset**
- Treat the entire dataset as the **root node**.

### **Step 2: Choose the Best Attribute for Splitting**
- Use a **splitting criterion** to determine the best attribute to split the data.  
- Examples of criteria: **Gini Index**, **Entropy**, or **Mean Squared Error (MSE)**.

### **Step 3: Split the Dataset**
- Split the dataset into child nodes based on the selected attribute's values.

### **Step 4: Recursively Repeat the Process**
- Treat each child node as a new dataset and repeat the process:
  - Choose the best attribute.  
  - Split further until a **stopping condition** is met.

### **Step 5: Assign Class Labels or Values to Leaf Nodes**
- For **classification**: Assign the most common class in the node.  
- For **regression**: Assign the mean of the target values in the node.

---

## **<font color='darkorange'>2. Splitting Criteria in Decision Trees</font>**

The **splitting criterion** determines how to split the dataset at each node. Different criteria are used for **classification** and **regression** trees.

---

### **<font color='blue'>2.1 Gini Index (for Classification)</font>**

The **Gini Index** measures the **impurity** of a node. A **lower Gini Index** means the node is purer.

#### **Formula**:
$$
Gini = 1 - \sum_{i=1}^c p_i^2
$$
Where:
- $ p_i $ is the probability of class $ i $ in the node.  
- $ c $ is the number of classes.

#### **Example**:
If a node contains 4 samples: 3 in Class A and 1 in Class B:  
$$
Gini = 1 - \left( \frac{3}{4} \right)^2 - \left( \frac{1}{4} \right)^2 = 0.375
$$

---

### **<font color='blue'>2.2 Entropy and Information Gain (for Classification)</font>**

**Entropy** measures the level of **disorder** in the data. The goal is to minimize entropy using **Information Gain**.

#### **Formula for Entropy**:
$$
Entropy = -\sum_{i=1}^c p_i \log_2(p_i)
$$
Where:
- $ p_i $ is the probability of class $ i $ in the node.

#### **Formula for Information Gain**:
$$
Information\ Gain = Entropy_{parent} - \sum_{j} \left( \frac{n_j}{N} \times Entropy_{child\ j} \right)
$$
Where:
- $ n_j $ is the number of samples in child node $ j $.  
- $ N $ is the total number of samples in the parent node.

**Goal**: Choose the attribute that **maximizes** the Information Gain.

---

### **<font color='blue'>2.3 Mean Squared Error (MSE) for Regression</font>**

For regression trees, **Mean Squared Error (MSE)** measures the quality of a split. The goal is to **minimize** MSE.

#### **Formula**:
$$
MSE = \frac{\sum_{i=1}^n (y_i - \hat{y})^2}{n}
$$
Where:
- $ y_i $ is the actual value.  
- $ \hat{y} $ is the predicted value (mean of the target values in the node).  
- $ n $ is the number of samples in the node.

---

## **<font color='darkorange'>3. Stopping Conditions for Building Decision Trees</font>**

The splitting process stops when one of the following conditions is met:

1. **Pure Node**: All samples in a node belong to the same class.
2. **Maximum Depth**: The tree has reached a pre-defined depth.
3. **Minimum Samples**: A node contains fewer samples than a threshold.
4. **No Improvement**: Splitting the node does not improve the model's performance.

---

## **<font color='darkorange'>4. Practical Example: Steps to Create a Decision Tree</font>**

### **Example Dataset: Classification**
| **Age (Years)** | **Income (USD)** | **Student** | **Buys Product (Yes/No)** |
|-----------------|------------------|-------------|---------------------------|
| 22             | 30,000           | Yes         | Yes                       |
| 25             | 40,000           | No          | No                        |
| 47             | 50,000           | No          | Yes                       |
| 35             | 60,000           | Yes         | Yes                       |
| 52             | 70,000           | No          | No                        |

---

### **Steps**:
1. Start with the **root node**.
2. Calculate **Gini Index** or **Entropy** for all features:  
   - Age, Income, and Student status.
3. Split the data on the feature with the **lowest Gini Index** or **highest Information Gain**.
4. Continue recursively until stopping conditions are met.
5. Assign **class labels** at the leaf nodes.

---

### **Code for Classification Tree**:

```python
# Import required libraries
from sklearn.tree import DecisionTreeClassifier, plot_tree
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Create the dataset
data = {
    'Age': [22, 25, 47, 35, 52],
    'Income': [30000, 40000, 50000, 60000, 70000],
    'Student': [1, 0, 0, 1, 0],  # Yes = 1, No = 0
    'Buys': ['Yes', 'No', 'Yes', 'Yes', 'No']
}
df = pd.DataFrame(data)

# Step 2: Define features (X) and target (y)
X = df[['Age', 'Income', 'Student']]
y = df['Buys']

# Step 3: Train the Decision Tree Classifier
classifier = DecisionTreeClassifier(criterion='gini', max_depth=3)
classifier.fit(X, y)

# Step 4: Visualize the Decision Tree
plt.figure(figsize=(12, 8))
plot_tree(classifier, feature_names=X.columns, class_names=['No', 'Yes'], filled=True, rounded=True)
plt.title("Decision Tree for Classification Example")
plt.show()
