# **Decision Tree - The Ultimate Guide (Beginner to Advanced)**

## **1. Introduction to Decision Trees**

A **Decision Tree** is a supervised learning algorithm used for both **classification** and **regression** problems. It is structured like a flowchart, where each internal node represents a **decision based on a feature**, each branch represents an **outcome**, and each leaf node represents a **final decision or class**.

Decision Trees are widely used because they are:

- **Easy to understand & visualize**: The decision-making process is transparent.

- **Versatile**: Can handle both numerical and categorical data.

- **Non-parametric**: No assumptions about data distribution.

- **Capable of handling missing values** using surrogate splits.

## **2. Understanding the Structure of a Decision Tree**

- **Root Node**: The topmost node where the first split occurs.

- **Decision Nodes**: Internal nodes where further splits happen.

- **Leaf Nodes**: The final output (class label or regression value).

- **Splitting**: Process of dividing data based on conditions.

- **Pruning**: Removing unnecessary splits to prevent overfitting.

- **Max Depth**: The maximum number of levels the tree can have.

- **Impurity**: How mixed the data is at a node (lower impurity is better).

In [None]:
"![](https://github.com/rakeshravikumar-ML/ml_algorithm_helper/blob/01f99c916eb0b62ca90145da664e0ff9287cc59d/Decision_Tree/Decision_Tree.png)\n",

## **3. How a Decision Tree Works?**

A Decision Tree works by recursively partitioning the dataset at each node based on the best splitting criterion. The goal is to **reduce impurity** and make the groups as homogeneous as possible.

### **Step-by-step working**

1. Select the **best feature** to split the data using a metric (Gini, Entropy, etc.).

2. Recursively split the dataset into subgroups.

3. Stop splitting when a stopping criterion is met (e.g., max depth, minimum samples per leaf).

## **4. Impurity Measures: Entropy & Gini Impurity**

### **4.1 Entropy (Used in ID3 Algorithm)**

Entropy measures the **randomness** or **uncertainty** in a dataset. The goal is to reduce entropy with each split.

\[ Entropy = - \sum p_i \log_2 p_i \]

#### **Example Calculation:**

Suppose we have a dataset with 10 observations:

- 6 belong to Class A

- 4 belong to Class B

Entropy = \[ - (6/10 \log_2 6/10 + 4/10 \log_2 4/10) \]

### **4.2 Gini Impurity (Used in CART Algorithm)**

Gini Impurity measures the probability of incorrect classification:

\[ Gini = 1 - \sum p_i^2 \]

#### **Example Calculation:**

For the same dataset (6 in Class A, 4 in Class B), Gini Impurity = \[ 1 - (0.6^2 + 0.4^2) \]

## **5. Information Gain & Splitting Criteria**

Information Gain (IG) determines the best feature to split the data:

\[ IG = \text{Entropy before split} - \text{Weighted Entropy after split} \]

The feature that **maximizes Information Gain** is chosen.

## **6. Overfitting & Pruning**

Overfitting occurs when a tree memorizes training data instead of learning patterns. Solutions include:

- **Pre-pruning** (Limiting depth, minimum samples per leaf).

- **Post-pruning** (Trimming unnecessary branches).

## **7. When & How to Use Decision Trees**

### **When to Use Decision Trees**

✅ When **interpretability** is crucial (e.g., medical diagnosis, finance).

✅ When working with **small to medium-sized datasets**.

✅ When handling **both numerical and categorical data**.

✅ When you need a model that doesn’t require feature scaling.

### **When NOT to Use Decision Trees**

❌ When dataset is **too large** → Trees can overfit or be slow.

❌ When **complex relationships** exist → Neural Networks or SVMs might be better.

❌ When **very high accuracy is needed** → Use ensembles like **Random Forest**.

## **8. Advantages & Disadvantages of Decision Trees**

### **Advantages**

✔ **Simple & interpretable** - Can be visualized as a flowchart.

✔ **No need for feature scaling** - Works with raw data.

✔ **Handles categorical & numerical features**.

✔ **Fast for small datasets**.

### **Disadvantages**

❌ **Prone to overfitting** - If too deep, it memorizes data.

❌ **Sensitive to small changes** - A slight change in data can alter structure.

❌ **Not great for large datasets** - Slower than linear models.

## **9. Implementing Decision Tree in Python**

Now, let's implement a Decision Tree Classifier using `sklearn`.

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import tree
import math
    

In [None]:

def entropy(class_counts):
    total = sum(class_counts)
    return -sum((count / total) * math.log2(count / total) for count in class_counts if count != 0)

# Example Calculation: 6 in Class A, 4 in Class B
class_counts = [6, 4]
entropy_value = entropy(class_counts)
print(f'Entropy: {entropy_value:.4f}')
    

In [None]:

def gini_impurity(class_counts):
    total = sum(class_counts)
    return 1 - sum((count / total) ** 2 for count in class_counts)

# Example Calculation: 6 in Class A, 4 in Class B
gini_value = gini_impurity(class_counts)
print(f'Gini Impurity: {gini_value:.4f}')
    

In [None]:

data = {'Study_Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Pass_Exam': [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]}  # 1 = Pass, 0 = Fail

df = pd.DataFrame(data)
df.head()
    

In [None]:

X = df[['Study_Hours']]
y = df['Pass_Exam']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    

In [None]:

clf = DecisionTreeClassifier(criterion='entropy', max_depth=3)
clf.fit(X_train, y_train)
    

In [None]:

plt.figure(figsize=(8,5))
tree.plot_tree(clf, filled=True, feature_names=['Study_Hours'], class_names=['Fail', 'Pass'])
plt.show()
    

In [None]:

y_pred = clf.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")
    