In [None]:
# Import libraries for creating diagrams and mathematical examples
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.patches import Rectangle, FancyBboxPatch
import seaborn as sns
from math import log2

# Set up matplotlib for better visualization
plt.style.use('default')
sns.set_palette("husl")

print("📚 Decision Tree Theory - Part 1")
print("================================")
print("Libraries imported successfully for theoretical explanations and examples.")

# Q1. Explain the concept of a Decision Tree. What kind of problems is it best suited for?

## Answer:

### **Concept of Decision Tree:**

A **Decision Tree** is a tree-like model used for both classification and regression tasks that makes decisions by splitting the data based on feature values. It mimics human decision-making by asking a series of questions about the features and following different paths based on the answers until reaching a final decision.

### **How it Works:**
1. **Starting Point**: Begin with the entire dataset at the root
2. **Splitting**: Choose the best feature and threshold to split the data
3. **Branching**: Create branches for different values/ranges of the chosen feature
4. **Recursion**: Repeat the process for each subset until stopping criteria are met
5. **Prediction**: Follow the path from root to leaf based on new data's feature values

### **Tree Structure:**
- **Root Node**: Top node containing all data
- **Internal Nodes**: Decision points that split data based on features
- **Branches**: Connections representing possible feature values
- **Leaf Nodes**: Terminal nodes containing final predictions/classifications

### **Problems Best Suited For:**

#### **1. Classification Problems:**
- **Binary Classification**: Email spam detection, medical diagnosis
- **Multi-class Classification**: Image recognition, customer segmentation
- **Examples**: 
  - Determining loan approval (Approved/Rejected)
  - Medical diagnosis (Disease A/B/C/Healthy)
  - Customer churn prediction (Will churn/Won't churn)

#### **2. Regression Problems:**
- **Continuous Target Prediction**: House price prediction, stock prices
- **Examples**:
  - Predicting sales revenue based on marketing spend
  - Estimating delivery time based on distance and traffic

#### **3. Specific Problem Characteristics Where Decision Trees Excel:**

**✅ Categorical Features:**
- Naturally handles categorical variables without encoding
- Works well with mixed data types (categorical + numerical)

**✅ Non-linear Relationships:**
- Captures complex, non-linear patterns in data
- No assumptions about feature distributions

**✅ Feature Interactions:**
- Automatically captures interactions between features
- Can model complex decision boundaries

**✅ Interpretability Requirements:**
- Provides clear, explainable decision rules
- Easy to understand for non-technical stakeholders

**✅ Missing Data Handling:**
- Can handle missing values through surrogate splits
- Robust to incomplete data

#### **4. Real-World Applications:**

**Healthcare:**
- Symptom-based diagnosis systems
- Treatment recommendation engines
- Risk assessment for medical procedures

**Finance:**
- Credit scoring and loan approval
- Fraud detection systems
- Investment decision support

**Marketing:**
- Customer segmentation
- Targeted advertising campaigns
- Recommendation systems

**Business Operations:**
- Supply chain optimization
- Quality control processes
- Human resource decisions

#### **5. Advantages for Specific Problem Types:**

**When Feature Interpretability is Crucial:**
- Regulatory compliance (banking, healthcare)
- Scientific research where understanding relationships matters
- Business decisions requiring justification

**When Data has Complex Patterns:**
- Non-linear relationships between features and target
- Multiple feature interactions
- Hierarchical decision-making processes

**When Minimal Data Preprocessing is Desired:**
- Mixed data types (numerical and categorical)
- Missing values present
- No need for feature scaling or normalization

### **Limitations to Consider:**

❌ **Overfitting**: Can create overly complex trees that don't generalize well
❌ **Instability**: Small changes in data can result in very different trees  
❌ **Bias**: Favors features with more levels when using certain splitting criteria
❌ **Linear Relationships**: May not efficiently capture simple linear patterns

### **Summary:**

Decision Trees are particularly well-suited for problems requiring **interpretable models** with **mixed data types**, **complex non-linear relationships**, and scenarios where **understanding the decision process** is as important as prediction accuracy. They excel in domains like healthcare, finance, and business where stakeholders need to understand and trust the model's reasoning."

In [None]:
# Visual representation of a Decision Tree concept
print("🌳 Decision Tree Concept Visualization")
print("="*50)

# Create a visual example of a decision tree for loan approval
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))

# Left subplot: Tree structure diagram
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 10)
ax1.set_aspect('equal')

# Draw the tree structure
# Root node
root = FancyBboxPatch((4, 8), 2, 1, boxstyle="round,pad=0.1", 
                     facecolor='lightblue', edgecolor='black', linewidth=2)
ax1.add_patch(root)
ax1.text(5, 8.5, 'Income > 50K?', ha='center', va='center', fontweight='bold', fontsize=10)

# Left branch (No)
left_internal = FancyBboxPatch((1, 5.5), 2, 1, boxstyle="round,pad=0.1",
                              facecolor='lightgreen', edgecolor='black', linewidth=1)
ax1.add_patch(left_internal)
ax1.text(2, 6, 'Age > 25?', ha='center', va='center', fontweight='bold', fontsize=9)

# Right branch (Yes)
right_internal = FancyBboxPatch((7, 5.5), 2, 1, boxstyle="round,pad=0.1",
                               facecolor='lightgreen', edgecolor='black', linewidth=1)
ax1.add_patch(right_internal)
ax1.text(8, 6, 'Credit Score\n> 700?', ha='center', va='center', fontweight='bold', fontsize=9)

# Leaf nodes
leaf1 = FancyBboxPatch((0, 3), 1.5, 0.8, boxstyle="round,pad=0.1",
                      facecolor='lightcoral', edgecolor='black', linewidth=1)
ax1.add_patch(leaf1)
ax1.text(0.75, 3.4, 'Reject', ha='center', va='center', fontweight='bold', fontsize=9)

leaf2 = FancyBboxPatch((2.5, 3), 1.5, 0.8, boxstyle="round,pad=0.1",
                      facecolor='lightcoral', edgecolor='black', linewidth=1)
ax1.add_patch(leaf2)
ax1.text(3.25, 3.4, 'Approve', ha='center', va='center', fontweight='bold', fontsize=9)

leaf3 = FancyBboxPatch((6, 3), 1.5, 0.8, boxstyle="round,pad=0.1",
                      facecolor='lightcoral', edgecolor='black', linewidth=1)
ax1.add_patch(leaf3)
ax1.text(6.75, 3.4, 'Review', ha='center', va='center', fontweight='bold', fontsize=9)

leaf4 = FancyBboxPatch((8.5, 3), 1.5, 0.8, boxstyle="round,pad=0.1",
                      facecolor='lightcoral', edgecolor='black', linewidth=1)
ax1.add_patch(leaf4)
ax1.text(9.25, 3.4, 'Approve', ha='center', va='center', fontweight='bold', fontsize=9)

# Draw connections
# Root to internal nodes
ax1.plot([4.5, 2.5], [8, 6.5], 'k-', linewidth=2)
ax1.plot([5.5, 7.5], [8, 6.5], 'k-', linewidth=2)
ax1.text(3.2, 7.3, 'No', fontweight='bold', color='red')
ax1.text(6.8, 7.3, 'Yes', fontweight='bold', color='green')

# Internal nodes to leaves
ax1.plot([1.5, 0.75], [5.5, 3.8], 'k-', linewidth=1)
ax1.plot([2.5, 3.25], [5.5, 3.8], 'k-', linewidth=1)
ax1.plot([7.5, 6.75], [5.5, 3.8], 'k-', linewidth=1)
ax1.plot([8.5, 9.25], [5.5, 3.8], 'k-', linewidth=1)

ax1.text(1, 4.5, 'No', fontweight='bold', color='red', fontsize=8)
ax1.text(3, 4.5, 'Yes', fontweight='bold', color='green', fontsize=8)
ax1.text(7, 4.5, 'No', fontweight='bold', color='red', fontsize=8)
ax1.text(9, 4.5, 'Yes', fontweight='bold', color='green', fontsize=8)

ax1.set_title('Decision Tree Structure\n(Loan Approval Example)', fontsize=12, fontweight='bold')
ax1.axis('off')

# Add legend
legend_elements = [
    plt.Rectangle((0, 0), 1, 1, facecolor='lightblue', edgecolor='black', label='Root Node'),
    plt.Rectangle((0, 0), 1, 1, facecolor='lightgreen', edgecolor='black', label='Internal Node'),
    plt.Rectangle((0, 0), 1, 1, facecolor='lightcoral', edgecolor='black', label='Leaf Node')
]
ax1.legend(handles=legend_elements, loc='upper right', bbox_to_anchor=(1, 0.2))

# Right subplot: Problem types suited for Decision Trees
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 10)

# Create categories
categories = [
    ('Classification\nProblems', 8.5, ['Email Spam Detection', 'Medical Diagnosis', 'Customer Segmentation']),
    ('Regression\nProblems', 6.5, ['House Price Prediction', 'Sales Forecasting', 'Risk Assessment']),
    ('Business\nApplications', 4.5, ['Loan Approval', 'Marketing Campaigns', 'Quality Control']),
    ('Data\nCharacteristics', 2.5, ['Mixed Data Types', 'Non-linear Patterns', 'Feature Interactions'])
]

colors = ['lightblue', 'lightgreen', 'lightyellow', 'lightpink']

for i, (category, y_pos, examples) in enumerate(categories):
    # Category box
    cat_box = FancyBboxPatch((1, y_pos-0.4), 2.5, 0.8, boxstyle="round,pad=0.1",
                            facecolor=colors[i], edgecolor='black', linewidth=2)
    ax2.add_patch(cat_box)
    ax2.text(2.25, y_pos, category, ha='center', va='center', fontweight='bold', fontsize=10)
    
    # Examples
    for j, example in enumerate(examples):
        ex_box = FancyBboxPatch((4.5, y_pos-0.2+j*0.4-0.4), 4.5, 0.3, boxstyle="round,pad=0.05",
                               facecolor='white', edgecolor='gray', linewidth=1)
        ax2.add_patch(ex_box)
        ax2.text(6.75, y_pos-j*0.4+0.2-0.4, example, ha='center', va='center', fontsize=9)
        
        # Arrow
        ax2.annotate('', xy=(4.4, y_pos-j*0.4+0.2-0.4), xytext=(3.6, y_pos),
                    arrowprops=dict(arrowstyle='->', color='black', lw=1))

ax2.set_title('Problems Best Suited for Decision Trees', fontsize=12, fontweight='bold')
ax2.axis('off')

plt.tight_layout()
plt.show()

# Example decision path
print("\n📝 Example Decision Path:")
print("="*40)
print("For a loan applicant with:")
print("• Income: $60,000 (> $50K) ✓")
print("• Credit Score: 750 (> 700) ✓")
print("• Decision Path: Root → Income > 50K? (Yes) → Credit Score > 700? (Yes) → APPROVE")
print("\nThis demonstrates how Decision Trees make transparent, rule-based decisions!")

print("\n✅ Decision Tree concept visualization completed!")

# Q2. Define the following terms with examples: Root Node, Leaf Node, Internal Node, Branch

## Answer:

Understanding the anatomy of a Decision Tree is crucial for comprehending how the algorithm works. Each component plays a specific role in the decision-making process.

In [None]:
# Detailed visualization of Decision Tree components
print("🌳 DECISION TREE COMPONENTS - DEFINITIONS AND EXAMPLES")
print("="*60)

# Create a comprehensive diagram showing all components
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Decision Tree Components - Definitions and Examples', fontsize=16, fontweight='bold')

# Example 1: Weather Decision Tree
ax1 = axes[0, 0]
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 10)
ax1.set_aspect('equal')

# Root Node
root = FancyBboxPatch((3.5, 8), 3, 1.2, boxstyle="round,pad=0.1",
                     facecolor='gold', edgecolor='red', linewidth=3)
ax1.add_patch(root)
ax1.text(5, 8.6, 'Weather = ?', ha='center', va='center', fontweight='bold', fontsize=12)
ax1.text(5, 9.5, 'ROOT NODE', ha='center', va='center', fontweight='bold', 
         fontsize=10, color='red', bbox=dict(boxstyle="round,pad=0.3", facecolor='white', edgecolor='red'))

# Internal Nodes
internal1 = FancyBboxPatch((1, 5.5), 2.5, 1, boxstyle="round,pad=0.1",
                          facecolor='lightblue', edgecolor='blue', linewidth=2)
ax1.add_patch(internal1)
ax1.text(2.25, 6, 'Temperature\n> 20°C?', ha='center', va='center', fontweight='bold', fontsize=10)

internal2 = FancyBboxPatch((6.5, 5.5), 2.5, 1, boxstyle="round,pad=0.1",
                          facecolor='lightblue', edgecolor='blue', linewidth=2)
ax1.add_patch(internal2)
ax1.text(7.75, 6, 'Wind Speed\n> 15 mph?', ha='center', va='center', fontweight='bold', fontsize=10)

ax1.text(1, 7.2, 'INTERNAL NODES', ha='left', va='center', fontweight='bold', 
         fontsize=10, color='blue', bbox=dict(boxstyle="round,pad=0.3", facecolor='white', edgecolor='blue'))

# Leaf Nodes
leaf1 = FancyBboxPatch((0, 2.5), 1.8, 1, boxstyle="round,pad=0.1",
                      facecolor='lightgreen', edgecolor='green', linewidth=2)
ax1.add_patch(leaf1)
ax1.text(0.9, 3, 'Stay\nIndoors', ha='center', va='center', fontweight='bold', fontsize=10)

leaf2 = FancyBboxPatch((2.3, 2.5), 1.8, 1, boxstyle="round,pad=0.1",
                      facecolor='lightgreen', edgecolor='green', linewidth=2)
ax1.add_patch(leaf2)
ax1.text(3.2, 3, 'Go for\nWalk', ha='center', va='center', fontweight='bold', fontsize=10)

leaf3 = FancyBboxPatch((5.7, 2.5), 1.8, 1, boxstyle="round,pad=0.1",
                      facecolor='lightgreen', edgecolor='green', linewidth=2)
ax1.add_patch(leaf3)
ax1.text(6.6, 3, 'Indoor\nActivity', ha='center', va='center', fontweight='bold', fontsize=10)

leaf4 = FancyBboxPatch((8.2, 2.5), 1.8, 1, boxstyle="round,pad=0.1",
                      facecolor='lightgreen', edgecolor='green', linewidth=2)
ax1.add_patch(leaf4)
ax1.text(9.1, 3, 'Outdoor\nSports', ha='center', va='center', fontweight='bold', fontsize=10)

ax1.text(8.5, 4.2, 'LEAF NODES', ha='right', va='center', fontweight='bold', 
         fontsize=10, color='green', bbox=dict(boxstyle="round,pad=0.3", facecolor='white', edgecolor='green'))

# Branches
ax1.plot([4.2, 2.8], [8, 6.5], 'k-', linewidth=3, color='purple')
ax1.plot([5.8, 7.2], [8, 6.5], 'k-', linewidth=3, color='purple')
ax1.plot([1.7, 0.9], [5.5, 3.5], 'k-', linewidth=2, color='purple')
ax1.plot([2.8, 3.2], [5.5, 3.5], 'k-', linewidth=2, color='purple')
ax1.plot([7.2, 6.6], [5.5, 3.5], 'k-', linewidth=2, color='purple')
ax1.plot([8.3, 9.1], [5.5, 3.5], 'k-', linewidth=2, color='purple')

# Branch labels
ax1.text(3.2, 7.3, 'Sunny', fontweight='bold', color='orange', fontsize=10,
         bbox=dict(boxstyle="round,pad=0.2", facecolor='white', edgecolor='purple'))
ax1.text(6.8, 7.3, 'Rainy', fontweight='bold', color='orange', fontsize=10,
         bbox=dict(boxstyle="round,pad=0.2", facecolor='white', edgecolor='purple'))

ax1.text(0.8, 4.5, 'No', fontweight='bold', color='orange', fontsize=9)
ax1.text(3.5, 4.5, 'Yes', fontweight='bold', color='orange', fontsize=9)
ax1.text(6.2, 4.5, 'No', fontweight='bold', color='orange', fontsize=9)
ax1.text(9.5, 4.5, 'Yes', fontweight='bold', color='orange', fontsize=9)

ax1.text(5, 1, 'BRANCHES', ha='center', va='center', fontweight='bold', 
         fontsize=12, color='purple', bbox=dict(boxstyle="round,pad=0.3", facecolor='white', edgecolor='purple'))

ax1.set_title('Weather Activity Decision Tree', fontsize=12, fontweight='bold')
ax1.axis('off')

# Example 2: Component Definitions Table
ax2 = axes[0, 1]
ax2.axis('off')

definitions = [
    ("ROOT NODE", "gold", "red", [
        "• The topmost node of the tree",
        "• Contains the entire dataset initially", 
        "• First decision point in the tree",
        "• Has no parent node",
        "• Example: 'Weather = ?' in our tree"
    ]),
    ("INTERNAL NODE", "lightblue", "blue", [
        "• Intermediate decision points",
        "• Test a specific feature/attribute",
        "• Have both parent and child nodes",
        "• Split data into subsets",
        "• Example: 'Temperature > 20°C?'"
    ]),
    ("LEAF NODE", "lightgreen", "green", [
        "• Terminal nodes (end points)",
        "• Contain final predictions/decisions",
        "• Have parent but no child nodes",
        "• No further splitting occurs",
        "• Example: 'Go for Walk', 'Stay Indoors'"
    ]),
    ("BRANCH", "white", "purple", [
        "• Connections between nodes",
        "• Represent possible outcomes",
        "• Show the path of decisions",
        "• Labeled with conditions/values",
        "• Example: 'Sunny', 'Rainy', 'Yes', 'No'"
    ])
]

y_start = 9.5
for i, (title, bg_color, border_color, points) in enumerate(definitions):
    y_pos = y_start - i * 2.3
    
    # Title box
    title_box = FancyBboxPatch((0.5, y_pos-0.3), 8, 0.6, boxstyle="round,pad=0.1",
                              facecolor=bg_color, edgecolor=border_color, linewidth=2)
    ax2.add_patch(title_box)
    ax2.text(4.5, y_pos, title, ha='center', va='center', fontweight='bold', 
             fontsize=12, color=border_color)
    
    # Definition points
    for j, point in enumerate(points):
        ax2.text(0.7, y_pos - 0.8 - j*0.3, point, ha='left', va='center', fontsize=10)

ax2.set_xlim(0, 9)
ax2.set_ylim(0, 10)
ax2.set_title('Component Definitions', fontsize=12, fontweight='bold')

# Example 3: Real-world example - Medical Diagnosis
ax3 = axes[1, 0]
ax3.set_xlim(0, 10)
ax3.set_ylim(0, 10)
ax3.set_aspect('equal')

# Medical diagnosis tree
ax3.text(5, 9.5, 'Medical Diagnosis Example', ha='center', va='center', 
         fontweight='bold', fontsize=12)

# Root
med_root = FancyBboxPatch((3.5, 8), 3, 0.8, boxstyle="round,pad=0.1",
                         facecolor='gold', edgecolor='red', linewidth=2)
ax3.add_patch(med_root)
ax3.text(5, 8.4, 'Fever > 38°C?', ha='center', va='center', fontweight='bold', fontsize=10)

# Internal nodes
med_int1 = FancyBboxPatch((1.5, 6), 2, 0.8, boxstyle="round,pad=0.1",
                         facecolor='lightblue', edgecolor='blue', linewidth=2)
ax3.add_patch(med_int1)
ax3.text(2.5, 6.4, 'Cough?', ha='center', va='center', fontweight='bold', fontsize=10)

med_int2 = FancyBboxPatch((6.5, 6), 2, 0.8, boxstyle="round,pad=0.1",
                         facecolor='lightblue', edgecolor='blue', linewidth=2)
ax3.add_patch(med_int2)
ax3.text(7.5, 6.4, 'Headache?', ha='center', va='center', fontweight='bold', fontsize=10)

# Leaf nodes
diagnoses = [
    (0.5, 4, 'Common\nCold'),
    (2.5, 4, 'Flu'),
    (6, 4, 'Monitor\nSymptoms'),
    (8.5, 4, 'Possible\nMigraine')
]

for x, y, diagnosis in diagnoses:
    leaf = FancyBboxPatch((x-0.6, y-0.4), 1.2, 0.8, boxstyle="round,pad=0.1",
                         facecolor='lightgreen', edgecolor='green', linewidth=2)
    ax3.add_patch(leaf)
    ax3.text(x, y, diagnosis, ha='center', va='center', fontweight='bold', fontsize=9)

# Connections
ax3.plot([4.2, 3], [8, 6.8], 'k-', linewidth=2)
ax3.plot([5.8, 7], [8, 6.8], 'k-', linewidth=2)
ax3.plot([2, 1.1], [6, 4.8], 'k-', linewidth=2)
ax3.plot([3, 3.1], [6, 4.8], 'k-', linewidth=2)
ax3.plot([7, 6.6], [6, 4.8], 'k-', linewidth=2)
ax3.plot([8, 9.1], [6, 4.8], 'k-', linewidth=2)

# Labels
ax3.text(3.4, 7.5, 'Yes', fontweight='bold', color='red', fontsize=9)
ax3.text(6.6, 7.5, 'No', fontweight='bold', color='blue', fontsize=9)
ax3.text(1.3, 5.3, 'Yes', fontweight='bold', color='red', fontsize=8)
ax3.text(3.3, 5.3, 'No', fontweight='bold', color='blue', fontsize=8)
ax3.text(6.5, 5.3, 'No', fontweight='bold', color='blue', fontsize=8)
ax3.text(8.8, 5.3, 'Yes', fontweight='bold', color='red', fontsize=8)

ax3.axis('off')

# Example 4: Tree Traversal Example
ax4 = axes[1, 1]
ax4.axis('off')
ax4.text(0.5, 9.5, 'Tree Traversal Example', ha='center', va='center', 
         fontweight='bold', fontsize=12)

traversal_text = """
EXAMPLE PATIENT:
• Fever: 39°C (> 38°C) → Yes
• Cough: Present → Yes
• Decision Path: Root → Fever? (Yes) → Cough? (Yes) → DIAGNOSIS: Flu

TREE COMPONENT IDENTIFICATION:

ROOT NODE:
└── "Fever > 38°C?" 
    ├── Contains all patient data initially
    └── First decision point

INTERNAL NODES:
├── "Cough?" (left branch from root)
└── "Headache?" (right branch from root)
    ├── Test specific symptoms
    └── Split patients into subgroups

LEAF NODES:
├── "Common Cold" (final diagnosis)
├── "Flu" (final diagnosis)  
├── "Monitor Symptoms" (final recommendation)
└── "Possible Migraine" (final diagnosis)
    └── No further splitting, contain final decisions

BRANCHES:
├── "Yes"/"No" connections between nodes
├── Show possible paths through the tree
└── Labeled with decision outcomes
"""

ax4.text(0, 8.5, traversal_text, ha='left', va='top', fontsize=9, 
         fontfamily='monospace', bbox=dict(boxstyle="round,pad=0.5", facecolor='lightyellow'))

ax4.set_xlim(0, 1)
ax4.set_ylim(0, 10)

plt.tight_layout()
plt.show()

# Summary table
print("\n📊 COMPONENT SUMMARY TABLE")
print("="*80)

components_data = {
    'Component': ['Root Node', 'Internal Node', 'Leaf Node', 'Branch'],
    'Definition': [
        'Topmost node containing entire dataset',
        'Intermediate decision points testing features', 
        'Terminal nodes with final predictions',
        'Connections showing decision paths'
    ],
    'Characteristics': [
        'No parent, has children, first split',
        'Has parent and children, feature tests',
        'Has parent, no children, final output',
        'Labeled connections between nodes'
    ],
    'Example': [
        '"Weather = ?" in weather tree',
        '"Temperature > 20°C?" splitting on temperature',
        '"Go for Walk" as final decision',
        '"Sunny", "Rainy", "Yes", "No" labels'
    ]
}

components_df = pd.DataFrame(components_data)
print(components_df.to_string(index=False))

print(f"\n✅ Decision Tree components visualization and definitions completed!")
print(f"Each component plays a crucial role in the tree's decision-making process.")

# Q3. Explain the concept of Entropy in Decision Trees with mathematical examples

**Entropy** is a fundamental concept in decision trees that measures the **impurity** or **randomness** in a dataset. It helps determine the best feature to split on at each node by quantifying how mixed the target classes are.

## 🎯 **Key Concepts:**

### **What is Entropy?**
- **Entropy** measures the disorder or uncertainty in a set of data
- **Lower entropy** = more pure/homogeneous data (better)
- **Higher entropy** = more mixed/heterogeneous data (worse for classification)
- **Range**: 0 (perfect purity) to log₂(n) where n = number of classes

### **Mathematical Formula:**
```
Entropy(S) = -∑(i=1 to c) p_i × log₂(p_i)

Where:
- S = dataset or subset
- c = number of classes
- p_i = proportion of samples belonging to class i
- log₂ = logarithm base 2
```

### **Interpretation:**
- **Entropy = 0**: All samples belong to the same class (perfect purity)
- **Entropy = 1**: For binary classification, equal distribution of classes (maximum impurity)
- **Entropy = log₂(c)**: Maximum entropy for c classes with equal distribution

In [None]:
# Entropy Calculation Examples and Visualizations
print("🧮 ENTROPY IN DECISION TREES - MATHEMATICAL EXAMPLES")
print("="*65)

import math

def calculate_entropy(class_counts):
    """Calculate entropy given class counts"""
    total = sum(class_counts)
    if total == 0:
        return 0
    
    entropy = 0
    for count in class_counts:
        if count > 0:
            probability = count / total
            entropy -= probability * math.log2(probability)
    
    return entropy

def calculate_proportions(class_counts):
    """Calculate class proportions"""
    total = sum(class_counts)
    return [count/total for count in class_counts]

# Example datasets for entropy calculation
examples = [
    {
        'name': 'Perfect Purity (All Same Class)',
        'data': 'Play Tennis Dataset',
        'positive': 9, 'negative': 0,
        'description': 'All samples want to play tennis'
    },
    {
        'name': 'Maximum Impurity (Equal Distribution)', 
        'data': 'Play Tennis Dataset',
        'positive': 5, 'negative': 5,
        'description': 'Equal number of positive and negative samples'
    },
    {
        'name': 'Moderate Impurity (Skewed Distribution)',
        'data': 'Play Tennis Dataset', 
        'positive': 7, 'negative': 2,
        'description': 'More positive samples than negative'
    },
    {
        'name': 'High Impurity (Slightly Skewed)',
        'data': 'Play Tennis Dataset',
        'positive': 6, 'negative': 4, 
        'description': 'Slightly more positive samples'
    }
]

# Calculate and display entropy for each example
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Entropy Calculation Examples', fontsize=16, fontweight='bold')

calculation_results = []

for i, example in enumerate(examples):
    pos = example['positive']
    neg = example['negative']
    total = pos + neg
    
    # Calculate proportions
    p_pos = pos / total if total > 0 else 0
    p_neg = neg / total if total > 0 else 0
    
    # Calculate entropy
    entropy = calculate_entropy([pos, neg])
    
    # Store results
    calculation_results.append({
        'example': example['name'],
        'positive': pos,
        'negative': neg,
        'total': total,
        'p_positive': p_pos,
        'p_negative': p_neg,
        'entropy': entropy
    })
    
    # Visualization
    ax = axes[i//2, i%2]
    
    # Pie chart showing class distribution
    if total > 0:
        sizes = [pos, neg]
        colors = ['lightgreen', 'lightcoral']
        labels = [f'Positive ({pos})', f'Negative ({neg})']
        
        wedges, texts, autotexts = ax.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
                                         startangle=90, textprops={'fontsize': 10})
    
    # Add entropy calculation details
    ax.text(0, -1.5, f"Calculation:", fontweight='bold', ha='center', fontsize=11)
    
    if total > 0:
        if pos > 0 and neg > 0:
            calc_text = f"Entropy = -({p_pos:.3f} × log₂({p_pos:.3f}) + {p_neg:.3f} × log₂({p_neg:.3f}))\n"
            calc_text += f"        = -({p_pos:.3f} × {math.log2(p_pos):.3f} + {p_neg:.3f} × {math.log2(p_neg):.3f})\n"
            calc_text += f"        = {entropy:.4f}"
        elif pos == 0:
            calc_text = f"Entropy = -(0 × log₂(0) + 1 × log₂(1))\n        = -(0 + 0) = 0"
        else:  # neg == 0
            calc_text = f"Entropy = -(1 × log₂(1) + 0 × log₂(0))\n        = -(0 + 0) = 0"
    else:
        calc_text = "No data"
    
    ax.text(0, -2.2, calc_text, ha='center', fontsize=9, fontfamily='monospace',
            bbox=dict(boxstyle="round,pad=0.3", facecolor='lightyellow'))
    
    # Title and entropy result
    ax.set_title(f"{example['name']}\nEntropy = {entropy:.4f}", 
                fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

# Detailed calculation table
print("\n📊 DETAILED ENTROPY CALCULATIONS")
print("="*90)

results_df = pd.DataFrame(calculation_results)
results_df['entropy_rounded'] = results_df['entropy'].round(4)

print(results_df[['example', 'positive', 'negative', 'total', 'p_positive', 'p_negative', 'entropy_rounded']].to_string(index=False))

# Real-world example: Email Classification
print(f"\n🌟 REAL-WORLD EXAMPLE: Email Classification")
print("="*55)

email_scenarios = [
    {'spam': 0, 'ham': 10, 'scenario': 'Trusted sender folder'},
    {'spam': 5, 'ham': 5, 'scenario': 'Mixed inbox'},  
    {'spam': 8, 'ham': 2, 'scenario': 'Suspicious folder'},
    {'spam': 1, 'ham': 9, 'scenario': 'Clean inbox'}
]

print("Email Classification Entropy Analysis:")
print("-" * 50)

for scenario in email_scenarios:
    spam_count = scenario['spam']
    ham_count = scenario['ham']
    total = spam_count + ham_count
    entropy = calculate_entropy([spam_count, ham_count])
    
    print(f"\n📧 Scenario: {scenario['scenario']}")
    print(f"   Spam emails: {spam_count}, Ham emails: {ham_count}")
    print(f"   Total emails: {total}")
    print(f"   Entropy: {entropy:.4f}")
    
    if entropy == 0:
        print(f"   → Perfect classification! All emails are the same type.")
    elif entropy > 0.9:
        print(f"   → High uncertainty! Difficult to predict email type.")
    else:
        print(f"   → Moderate uncertainty. Some predictability exists.")

# Entropy vs Number of Classes
print(f"\n🔢 ENTROPY WITH MULTIPLE CLASSES")
print("="*40)

multi_class_examples = [
    {'classes': [10], 'name': '1 class (impossible in practice)'},
    {'classes': [5, 5], 'name': '2 classes (binary)'},
    {'classes': [3, 3, 4], 'name': '3 classes'},
    {'classes': [2, 2, 3, 3], 'name': '4 classes'},
    {'classes': [2, 2, 2, 2, 2], 'name': '5 classes'}
]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Maximum possible entropy for different number of classes
num_classes = [1, 2, 3, 4, 5]
max_entropies = [0 if n == 1 else math.log2(n) for n in num_classes]

ax1.bar(num_classes, max_entropies, color='skyblue', alpha=0.7, edgecolor='navy')
ax1.set_xlabel('Number of Classes')
ax1.set_ylabel('Maximum Possible Entropy')
ax1.set_title('Maximum Entropy vs Number of Classes')
ax1.grid(True, alpha=0.3)

for i, (n, max_ent) in enumerate(zip(num_classes, max_entropies)):
    ax1.text(n, max_ent + 0.05, f'{max_ent:.2f}', ha='center', fontweight='bold')

# Actual entropy calculations for examples
actual_entropies = []
example_names = []

for example in multi_class_examples:
    if len(example['classes']) > 1:  # Skip single class
        entropy = calculate_entropy(example['classes'])
        actual_entropies.append(entropy)
        example_names.append(f"{len(example['classes'])} classes")

ax2.bar(range(len(actual_entropies)), actual_entropies, color='lightcoral', alpha=0.7, edgecolor='darkred')
ax2.set_xlabel('Example Scenarios')
ax2.set_ylabel('Calculated Entropy')
ax2.set_title('Actual Entropy for Equal Distribution')
ax2.set_xticks(range(len(example_names)))
ax2.set_xticklabels(example_names, rotation=45)
ax2.grid(True, alpha=0.3)

for i, ent in enumerate(actual_entropies):
    ax2.text(i, ent + 0.05, f'{ent:.2f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

# Key insights
print(f"\n💡 KEY INSIGHTS ABOUT ENTROPY:")
print("-" * 35)
insights = [
    "🎯 Lower entropy = better for decision trees (more pure splits)",
    "⚖️ Entropy = 0 means perfect classification (all same class)", 
    "🌪️ Higher entropy = more mixed classes = harder to classify",
    "📊 Binary classification: max entropy = 1.0 (50-50 split)",
    "🔢 For n classes: max entropy = log₂(n) with equal distribution",
    "🎲 Entropy guides feature selection in decision tree algorithms"
]

for insight in insights:
    print(f"   {insight}")

print(f"\n✅ Entropy concept explanation with mathematical examples completed!")
print(f"Entropy is crucial for determining the best splits in decision trees.")

# Q4. What is Information Gain? Explain with examples and show calculations

**Information Gain** is the primary metric used in decision trees (especially ID3 algorithm) to determine the best feature for splitting the dataset at each node. It measures how much **uncertainty** is reduced by splitting on a particular feature.

## 🎯 **Key Concepts:**

### **What is Information Gain?**
- **Information Gain** = Reduction in entropy after splitting on a feature
- **Higher Information Gain** = better feature for splitting
- **Goal**: Select the feature that maximizes information gain at each split
- **Result**: Creates the most informative and efficient decision tree

### **Mathematical Formula:**
```
Information Gain(S, A) = Entropy(S) - Weighted_Average_Entropy(S, A)

Where:
- S = current dataset/subset
- A = attribute/feature being considered for split
- Weighted_Average_Entropy = Σ(|Sv|/|S|) × Entropy(Sv)
- Sv = subset of S where attribute A has value v
- |S| = size of dataset S
```

### **Detailed Formula:**
```
IG(S, A) = Entropy(S) - Σ(v ∈ Values(A)) (|Sv|/|S|) × Entropy(Sv)

Steps:
1. Calculate entropy of original dataset: Entropy(S)
2. Split dataset based on feature A into subsets Sv
3. Calculate weighted average entropy of subsets
4. Information Gain = Original Entropy - Weighted Average Entropy
```

### **Interpretation:**
- **IG = 0**: No information gained (useless split)
- **IG > 0**: Some information gained (useful split)  
- **Higher IG**: More valuable the split (better feature)
- **IG = Entropy(S)**: Perfect split (pure subsets)

In [None]:
# Information Gain Calculation Examples and Visualizations
print("📈 INFORMATION GAIN IN DECISION TREES - COMPLETE EXAMPLES")
print("="*70)

def calculate_information_gain(original_entropy, subsets_info):
    """
    Calculate information gain
    subsets_info: list of tuples (subset_size, subset_entropy)
    """
    total_size = sum(size for size, _ in subsets_info)
    weighted_entropy = sum((size/total_size) * entropy for size, entropy in subsets_info)
    return original_entropy - weighted_entropy

# Example 1: Play Tennis Dataset (Classic Example)
print("🎾 EXAMPLE 1: PLAY TENNIS DATASET")
print("="*40)

# Original dataset
tennis_data = {
    'total_samples': 14,
    'play_yes': 9,
    'play_no': 5
}

# Calculate original entropy
original_entropy = calculate_entropy([tennis_data['play_yes'], tennis_data['play_no']])
print(f"Original Dataset: {tennis_data['play_yes']} Yes, {tennis_data['play_no']} No")
print(f"Original Entropy: {original_entropy:.4f}")

# Feature 1: Weather (Sunny, Overcast, Rainy)
weather_splits = {
    'Sunny': {'yes': 2, 'no': 3, 'total': 5},
    'Overcast': {'yes': 4, 'no': 0, 'total': 4}, 
    'Rainy': {'yes': 3, 'no': 2, 'total': 5}
}

# Feature 2: Humidity (High, Normal)  
humidity_splits = {
    'High': {'yes': 3, 'no': 4, 'total': 7},
    'Normal': {'yes': 6, 'no': 1, 'total': 7}
}

# Feature 3: Wind (Weak, Strong)
wind_splits = {
    'Weak': {'yes': 6, 'no': 2, 'total': 8},
    'Strong': {'yes': 3, 'no': 3, 'total': 6}
}

# Calculate Information Gain for each feature
features = {
    'Weather': weather_splits,
    'Humidity': humidity_splits, 
    'Wind': wind_splits
}

results = {}

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Information Gain Calculations - Play Tennis Dataset', fontsize=16, fontweight='bold')

for idx, (feature_name, splits) in enumerate(features.items()):
    print(f"\n🌟 Feature: {feature_name}")
    print("-" * 30)
    
    # Calculate entropy for each subset
    subset_entropies = []
    total_weighted_entropy = 0
    
    for value, counts in splits.items():
        subset_entropy = calculate_entropy([counts['yes'], counts['no']])
        subset_entropies.append((counts['total'], subset_entropy))
        weight = counts['total'] / tennis_data['total_samples']
        total_weighted_entropy += weight * subset_entropy
        
        print(f"{value}: {counts['yes']} Yes, {counts['no']} No → Entropy = {subset_entropy:.4f}")
    
    # Calculate Information Gain
    info_gain = calculate_information_gain(original_entropy, subset_entropies)
    results[feature_name] = info_gain
    
    print(f"Weighted Average Entropy: {total_weighted_entropy:.4f}")
    print(f"Information Gain: {original_entropy:.4f} - {total_weighted_entropy:.4f} = {info_gain:.4f}")
    
    # Visualization
    if idx < 3:
        ax = axes[idx//2, idx%2]
        
        # Create bar chart showing entropy reduction
        categories = ['Original'] + list(splits.keys())
        entropies = [original_entropy] + [calculate_entropy([splits[val]['yes'], splits[val]['no']]) for val in splits.keys()]
        colors = ['red'] + ['lightblue'] * len(splits)
        
        bars = ax.bar(categories, entropies, color=colors, alpha=0.7, edgecolor='navy')
        ax.set_ylabel('Entropy')
        ax.set_title(f'{feature_name}\nInformation Gain = {info_gain:.4f}')
        ax.grid(True, alpha=0.3)
        
        # Add value labels on bars
        for bar, entropy in zip(bars, entropies):
            height = bar.get_height()
            ax.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                   f'{entropy:.3f}', ha='center', va='bottom', fontweight='bold')
        
        # Add sample size annotations
        for i, (val, counts) in enumerate(splits.items()):
            ax.text(i+1, -0.1, f'n={counts["total"]}', ha='center', va='top', 
                   fontsize=9, style='italic')

# Best feature selection
best_feature = max(results, key=results.get)
ax_summary = axes[1, 1]
ax_summary.axis('off')

summary_text = f"""
📊 INFORMATION GAIN COMPARISON

Weather:   {results['Weather']:.4f}
Humidity:  {results['Humidity']:.4f}  
Wind:      {results['Wind']:.4f}

🏆 BEST FEATURE: {best_feature}
   (Highest Information Gain)

💡 DECISION: Split on {best_feature} first
   as it provides maximum information gain
   and reduces uncertainty the most.

📈 INFORMATION GAIN RANKING:
   1. {sorted(results.items(), key=lambda x: x[1], reverse=True)[0][0]}: {sorted(results.items(), key=lambda x: x[1], reverse=True)[0][1]:.4f}
   2. {sorted(results.items(), key=lambda x: x[1], reverse=True)[1][0]}: {sorted(results.items(), key=lambda x: x[1], reverse=True)[1][1]:.4f}
   3. {sorted(results.items(), key=lambda x: x[1], reverse=True)[2][0]}: {sorted(results.items(), key=lambda x: x[1], reverse=True)[2][1]:.4f}
"""

ax_summary.text(0.1, 0.9, summary_text, transform=ax_summary.transAxes, fontsize=11,
               verticalalignment='top', fontfamily='monospace',
               bbox=dict(boxstyle="round,pad=0.5", facecolor='lightgreen', alpha=0.8))

plt.tight_layout()
plt.show()

# Example 2: Detailed Step-by-Step Calculation
print(f"\n🔍 DETAILED STEP-BY-STEP CALCULATION")
print("="*45)
print(f"Let's work through the {best_feature} feature calculation in detail:")

if best_feature == 'Weather':
    splits = weather_splits
elif best_feature == 'Humidity':  
    splits = humidity_splits
else:
    splits = wind_splits

print(f"\n🎯 Feature: {best_feature}")
print("─" * 25)

print(f"Step 1: Original Dataset Entropy")
print(f"   Total: {tennis_data['total_samples']} samples ({tennis_data['play_yes']} Yes, {tennis_data['play_no']} No)")
p_yes = tennis_data['play_yes'] / tennis_data['total_samples']
p_no = tennis_data['play_no'] / tennis_data['total_samples']
print(f"   P(Yes) = {tennis_data['play_yes']}/{tennis_data['total_samples']} = {p_yes:.3f}")
print(f"   P(No) = {tennis_data['play_no']}/{tennis_data['total_samples']} = {p_no:.3f}")
print(f"   Entropy(S) = -({p_yes:.3f} × log₂({p_yes:.3f}) + {p_no:.3f} × log₂({p_no:.3f}))")
print(f"             = {original_entropy:.4f}")

print(f"\nStep 2: Calculate Entropy for Each Subset")
weighted_sum = 0
for i, (value, counts) in enumerate(splits.items()):
    print(f"   {value}: {counts['total']} samples ({counts['yes']} Yes, {counts['no']} No)")
    if counts['total'] > 0:
        p_yes_subset = counts['yes'] / counts['total']
        p_no_subset = counts['no'] / counts['total']
        subset_entropy = calculate_entropy([counts['yes'], counts['no']])
        weight = counts['total'] / tennis_data['total_samples']
        weighted_contribution = weight * subset_entropy
        weighted_sum += weighted_contribution
        
        print(f"      P(Yes|{value}) = {counts['yes']}/{counts['total']} = {p_yes_subset:.3f}")
        print(f"      P(No|{value}) = {counts['no']}/{counts['total']} = {p_no_subset:.3f}")
        print(f"      Entropy({value}) = {subset_entropy:.4f}")
        print(f"      Weight = {counts['total']}/{tennis_data['total_samples']} = {weight:.3f}")
        print(f"      Weighted Contribution = {weight:.3f} × {subset_entropy:.4f} = {weighted_contribution:.4f}")

print(f"\nStep 3: Calculate Weighted Average Entropy")
print(f"   Weighted Average = {weighted_sum:.4f}")

print(f"\nStep 4: Calculate Information Gain")
final_ig = original_entropy - weighted_sum
print(f"   Information Gain = {original_entropy:.4f} - {weighted_sum:.4f} = {final_ig:.4f}")

# Example 3: Medical Diagnosis Example
print(f"\n🏥 EXAMPLE 2: MEDICAL DIAGNOSIS")
print("="*35)

medical_data = {
    'total': 20,
    'disease_yes': 12,
    'disease_no': 8
}

medical_original_entropy = calculate_entropy([medical_data['disease_yes'], medical_data['disease_no']])
print(f"Medical Dataset: {medical_data['disease_yes']} Disease, {medical_data['disease_no']} No Disease")
print(f"Original Entropy: {medical_original_entropy:.4f}")

# Medical features
medical_features = {
    'Fever': {
        'High': {'disease': 8, 'healthy': 2, 'total': 10},
        'Low': {'disease': 4, 'healthy': 6, 'total': 10}
    },
    'Cough': {
        'Present': {'disease': 10, 'healthy': 3, 'total': 13},
        'Absent': {'disease': 2, 'healthy': 5, 'total': 7}
    },
    'Age': {
        'Young': {'disease': 3, 'healthy': 5, 'total': 8},
        'Old': {'disease': 9, 'healthy': 3, 'total': 12}
    }
}

medical_results = {}

print(f"\nMedical Feature Analysis:")
print("-" * 30)

for feature_name, splits in medical_features.items():
    subset_info = []
    for value, counts in splits.items():
        subset_entropy = calculate_entropy([counts['disease'], counts['healthy']])
        subset_info.append((counts['total'], subset_entropy))
    
    info_gain = calculate_information_gain(medical_original_entropy, subset_info)
    medical_results[feature_name] = info_gain
    print(f"{feature_name}: Information Gain = {info_gain:.4f}")

best_medical_feature = max(medical_results, key=medical_results.get)
print(f"\n🏆 Best Medical Feature: {best_medical_feature} (IG = {medical_results[best_medical_feature]:.4f})")

# Information Gain Comparison Chart
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Tennis dataset comparison
tennis_features = list(results.keys())
tennis_gains = list(results.values())
bars1 = ax1.bar(tennis_features, tennis_gains, color=['gold', 'lightblue', 'lightcoral'], 
               alpha=0.8, edgecolor='navy')
ax1.set_title('Play Tennis Dataset\nInformation Gain by Feature')
ax1.set_ylabel('Information Gain')
ax1.grid(True, alpha=0.3)

for bar, gain in zip(bars1, tennis_gains):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.005,
            f'{gain:.4f}', ha='center', va='bottom', fontweight='bold')

# Medical dataset comparison  
medical_features_list = list(medical_results.keys())
medical_gains = list(medical_results.values())
bars2 = ax2.bar(medical_features_list, medical_gains, color=['lightgreen', 'orange', 'pink'],
               alpha=0.8, edgecolor='navy') 
ax2.set_title('Medical Diagnosis Dataset\nInformation Gain by Feature')
ax2.set_ylabel('Information Gain')
ax2.grid(True, alpha=0.3)

for bar, gain in zip(bars2, medical_gains):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 0.005,
            f'{gain:.4f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

# Key insights
print(f"\n💡 KEY INSIGHTS ABOUT INFORMATION GAIN:")
print("-" * 42)
insights = [
    "🎯 Higher Information Gain = Better feature for splitting",
    "⚡ Information Gain = Entropy Reduction after splitting", 
    "🔄 ID3 algorithm uses Information Gain to build decision trees",
    "📊 Always choose feature with maximum Information Gain",
    "🎲 Perfect split: IG = Original Entropy (pure subsets)",
    "❌ Useless split: IG = 0 (no entropy reduction)",
    "🌳 Greedy approach: locally optimal decisions at each node",
    "⚖️ Weighted average considers subset sizes proportionally"
]

for insight in insights:
    print(f"   {insight}")

print(f"\n✅ Information Gain explanation with detailed calculations completed!")
print(f"Information Gain is the driving force behind decision tree construction.")

# Q5. Compare Gini Impurity and Entropy as splitting criteria. Which one is better?

Both **Gini Impurity** and **Entropy** are impurity measures used in decision trees to determine the best splits. While they serve the same purpose, they have different mathematical formulations and computational characteristics.

## 🎯 **Mathematical Formulations:**

### **Gini Impurity:**
```
Gini(S) = 1 - Σ(i=1 to c) p_i²

Where:
- S = dataset or subset  
- c = number of classes
- p_i = proportion of samples belonging to class i
- Range: 0 (pure) to 0.5 (maximum impurity for binary classification)
```

### **Entropy:**
```
Entropy(S) = -Σ(i=1 to c) p_i × log₂(p_i)

Where:
- S = dataset or subset
- c = number of classes  
- p_i = proportion of samples belonging to class i
- Range: 0 (pure) to 1 (maximum impurity for binary classification)
```

## ⚖️ **Key Differences:**

| Aspect | Gini Impurity | Entropy |
|--------|---------------|---------|
| **Formula** | 1 - Σp_i² | -Σp_i × log₂(p_i) |
| **Computation** | Faster (no logarithm) | Slower (logarithm required) |
| **Range (binary)** | 0 to 0.5 | 0 to 1.0 |
| **Sensitivity** | Less sensitive to changes | More sensitive to changes |
| **Algorithm** | CART (Classification Trees) | ID3, C4.5 |
| **Curve Shape** | Quadratic | Logarithmic |

In [None]:
# Gini Impurity vs Entropy - Comprehensive Comparison
print("⚖️ GINI IMPURITY vs ENTROPY - COMPREHENSIVE COMPARISON")
print("="*65)

def calculate_gini(class_counts):
    """Calculate Gini Impurity given class counts"""
    total = sum(class_counts)
    if total == 0:
        return 0
    
    gini = 1.0
    for count in class_counts:
        probability = count / total
        gini -= probability ** 2
    
    return gini

def calculate_gini_gain(original_gini, subsets_info):
    """Calculate Gini Gain (similar to Information Gain)"""
    total_size = sum(size for size, _ in subsets_info)
    weighted_gini = sum((size/total_size) * gini for size, gini in subsets_info)
    return original_gini - weighted_gini

# Example datasets for comparison
comparison_datasets = [
    {'name': 'Perfect Purity', 'positive': 10, 'negative': 0},
    {'name': 'Maximum Impurity', 'positive': 5, 'negative': 5},
    {'name': 'Moderate Skew', 'positive': 7, 'negative': 3},
    {'name': 'High Skew', 'positive': 8, 'negative': 2},
    {'name': 'Very High Skew', 'positive': 9, 'negative': 1}
]

# Calculate both measures for each dataset
comparison_results = []

print("📊 DIRECT COMPARISON: GINI vs ENTROPY")
print("="*45)
print(f"{'Dataset':<20} {'Positive':<8} {'Negative':<8} {'Gini':<8} {'Entropy':<8} {'Difference':<10}")
print("-" * 75)

for dataset in comparison_datasets:
    pos = dataset['positive']
    neg = dataset['negative']
    
    gini = calculate_gini([pos, neg])
    entropy = calculate_entropy([pos, neg])
    difference = abs(entropy - gini)
    
    comparison_results.append({
        'name': dataset['name'],
        'positive': pos,
        'negative': neg,
        'gini': gini,
        'entropy': entropy,
        'difference': difference
    })
    
    print(f"{dataset['name']:<20} {pos:<8} {neg:<8} {gini:<8.4f} {entropy:<8.4f} {difference:<10.4f}")

# Visualization: Gini vs Entropy curves
print(f"\n📈 VISUALIZATION: GINI vs ENTROPY CURVES")
print("="*45)

# Create probability range for binary classification
p_values = np.linspace(0.001, 0.999, 1000)  # Avoid 0 and 1 to prevent log(0)
gini_values = [2 * p * (1 - p) for p in p_values]  # Gini = 2p(1-p) for binary
entropy_values = [-p * math.log2(p) - (1-p) * math.log2(1-p) for p in p_values]

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Gini Impurity vs Entropy - Comprehensive Analysis', fontsize=16, fontweight='bold')

# Plot 1: Impurity curves
ax1 = axes[0, 0]
ax1.plot(p_values, gini_values, 'b-', linewidth=3, label='Gini Impurity', alpha=0.8)
ax1.plot(p_values, entropy_values, 'r-', linewidth=3, label='Entropy', alpha=0.8)
ax1.set_xlabel('Probability of Positive Class (p)')
ax1.set_ylabel('Impurity Measure')
ax1.set_title('Impurity Curves Comparison')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Highlight key points
key_points = [0.1, 0.3, 0.5, 0.7, 0.9]
for p in key_points:
    gini_val = 2 * p * (1 - p)
    entropy_val = -p * math.log2(p) - (1-p) * math.log2(1-p)
    ax1.plot(p, gini_val, 'bo', markersize=8)
    ax1.plot(p, entropy_val, 'ro', markersize=8)
    ax1.text(p, max(gini_val, entropy_val) + 0.05, f'p={p}', ha='center', fontweight='bold', fontsize=8)

# Plot 2: Difference between measures
ax2 = axes[0, 1]
difference_values = [abs(e - g) for e, g in zip(entropy_values, gini_values)]
ax2.plot(p_values, difference_values, 'g-', linewidth=3, label='|Entropy - Gini|')
ax2.set_xlabel('Probability of Positive Class (p)')
ax2.set_ylabel('Absolute Difference')
ax2.set_title('Difference Between Entropy and Gini')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Find maximum difference
max_diff_idx = np.argmax(difference_values)
max_diff_p = p_values[max_diff_idx]
max_diff_val = difference_values[max_diff_idx]
ax2.plot(max_diff_p, max_diff_val, 'ro', markersize=10)
ax2.text(max_diff_p, max_diff_val + 0.01, f'Max diff at p={max_diff_p:.3f}', 
         ha='center', fontweight='bold', bbox=dict(boxstyle="round,pad=0.3", facecolor='yellow'))

# Plot 3: Real dataset comparison
ax3 = axes[1, 0]
dataset_names = [result['name'] for result in comparison_results]
gini_vals = [result['gini'] for result in comparison_results]
entropy_vals = [result['entropy'] for result in comparison_results]

x = np.arange(len(dataset_names))
width = 0.35

bars1 = ax3.bar(x - width/2, gini_vals, width, label='Gini Impurity', color='skyblue', alpha=0.8)
bars2 = ax3.bar(x + width/2, entropy_vals, width, label='Entropy', color='lightcoral', alpha=0.8)

ax3.set_xlabel('Dataset Types')
ax3.set_ylabel('Impurity Value')
ax3.set_title('Real Dataset Comparison')
ax3.set_xticks(x)
ax3.set_xticklabels(dataset_names, rotation=45, ha='right')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height + 0.005,
               f'{height:.3f}', ha='center', va='bottom', fontweight='bold', fontsize=8)

# Plot 4: Decision tree splitting example
ax4 = axes[1, 1]
ax4.axis('off')

# Performance comparison table
performance_text = """
🏃 COMPUTATIONAL PERFORMANCE

                    Gini        Entropy
Computation Speed:  ⭐⭐⭐⭐⭐     ⭐⭐⭐
Memory Usage:       ⭐⭐⭐⭐⭐     ⭐⭐⭐⭐
Sensitivity:        ⭐⭐⭐       ⭐⭐⭐⭐⭐
Mathematical:       ⭐⭐⭐       ⭐⭐⭐⭐⭐

🎯 USE CASES

GINI IMPURITY:
✅ Large datasets (faster computation)
✅ Real-time applications  
✅ CART algorithm
✅ When speed matters more than precision

ENTROPY:
✅ Theoretical analysis
✅ Information theory applications
✅ ID3, C4.5 algorithms
✅ When mathematical rigor is important

🏆 WINNER: Context Dependent!
• Speed needed → Gini
• Theory/Research → Entropy
• Most practical applications → Gini
"""

ax4.text(0.05, 0.95, performance_text, transform=ax4.transAxes, fontsize=10,
         verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle="round,pad=0.5", facecolor='lightgreen', alpha=0.8))

plt.tight_layout()
plt.show()

# Detailed mathematical comparison
print(f"\n🔢 DETAILED MATHEMATICAL COMPARISON")
print("="*40)

print(f"Example: Dataset with 6 positive, 4 negative samples")
pos, neg = 6, 4
total = pos + neg

print(f"\nGini Impurity Calculation:")
p_pos = pos / total
p_neg = neg / total
gini_result = 1 - (p_pos**2 + p_neg**2)
print(f"  Gini = 1 - (p_pos² + p_neg²)")
print(f"       = 1 - ({p_pos:.2f}² + {p_neg:.2f}²)")
print(f"       = 1 - ({p_pos**2:.4f} + {p_neg**2:.4f})")
print(f"       = 1 - {p_pos**2 + p_neg**2:.4f}")
print(f"       = {gini_result:.4f}")

print(f"\nEntropy Calculation:")
entropy_result = -(p_pos * math.log2(p_pos) + p_neg * math.log2(p_neg))
print(f"  Entropy = -(p_pos × log₂(p_pos) + p_neg × log₂(p_neg))")
print(f"          = -({p_pos:.2f} × log₂({p_pos:.2f}) + {p_neg:.2f} × log₂({p_neg:.2f}))")
print(f"          = -({p_pos:.2f} × {math.log2(p_pos):.4f} + {p_neg:.2f} × {math.log2(p_neg):.4f})")
print(f"          = -({p_pos * math.log2(p_pos):.4f} + {p_neg * math.log2(p_neg):.4f})")
print(f"          = {entropy_result:.4f}")

print(f"\nDifference: |{entropy_result:.4f} - {gini_result:.4f}| = {abs(entropy_result - gini_result):.4f}")

# Practical example: Feature selection comparison
print(f"\n🌟 PRACTICAL EXAMPLE: FEATURE SELECTION")
print("="*45)

# Sample dataset for feature selection
sample_data = {
    'total': 16,
    'class_a': 9,
    'class_b': 7
}

original_gini = calculate_gini([sample_data['class_a'], sample_data['class_b']])
original_entropy = calculate_entropy([sample_data['class_a'], sample_data['class_b']])

print(f"Original Dataset: {sample_data['class_a']} Class A, {sample_data['class_b']} Class B")
print(f"Original Gini: {original_gini:.4f}")
print(f"Original Entropy: {original_entropy:.4f}")

# Feature comparison
features_comparison = {
    'Feature X': {
        'Left': {'a': 2, 'b': 6},
        'Right': {'a': 7, 'b': 1}
    },
    'Feature Y': {
        'Left': {'a': 5, 'b': 3}, 
        'Right': {'a': 4, 'b': 4}
    }
}

print(f"\nFeature Selection Comparison:")
print("-" * 35)

for feature_name, splits in features_comparison.items():
    print(f"\n{feature_name}:")
    
    # Calculate Gini Gain
    gini_subsets = []
    entropy_subsets = []
    
    for split_name, counts in splits.items():
        subset_gini = calculate_gini([counts['a'], counts['b']])
        subset_entropy = calculate_entropy([counts['a'], counts['b']])
        subset_size = counts['a'] + counts['b']
        
        gini_subsets.append((subset_size, subset_gini))
        entropy_subsets.append((subset_size, subset_entropy))
        
        print(f"  {split_name}: {counts['a']} A, {counts['b']} B → Gini={subset_gini:.4f}, Entropy={subset_entropy:.4f}")
    
    gini_gain = calculate_gini_gain(original_gini, gini_subsets)
    info_gain = calculate_information_gain(original_entropy, entropy_subsets)
    
    print(f"  Gini Gain: {gini_gain:.4f}")
    print(f"  Information Gain: {info_gain:.4f}")
    
    # Determine best feature by each metric
    if feature_name == 'Feature X':
        x_gini, x_entropy = gini_gain, info_gain
    else:
        y_gini, y_entropy = gini_gain, info_gain

print(f"\nFeature Selection Results:")
print(f"  Gini prefers: {'Feature X' if x_gini > y_gini else 'Feature Y'}")
print(f"  Entropy prefers: {'Feature X' if x_entropy > y_entropy else 'Feature Y'}")

if (x_gini > y_gini) == (x_entropy > y_entropy):
    print("  ✅ Both metrics agree on the best feature!")
else:
    print("  ⚠️ Metrics disagree - rare but possible!")

# Final recommendation
print(f"\n🎯 FINAL RECOMMENDATION")
print("="*25)

recommendations = [
    "🚀 For production systems: Use Gini (faster computation)",
    "📚 For research/education: Use Entropy (more interpretable)", 
    "⚡ For large datasets: Definitely use Gini",
    "🔬 For theoretical work: Entropy provides better insights",
    "🏭 Most ML libraries default to Gini for good reason",
    "📊 Both usually give similar tree structures",
    "🎲 Choice rarely affects final model performance significantly",
    "💡 When in doubt, benchmark both on your specific data"
]

for rec in recommendations:
    print(f"   {rec}")

print(f"\n✅ Gini vs Entropy comparison completed!")
print(f"Both are excellent metrics - choose based on your specific needs!")

# 🎓 Part 1 Conclusion: Decision Tree Theory Mastery

## 📚 **What We've Learned:**

### **Core Concepts Covered:**
1. **Decision Tree Fundamentals** - Understanding the algorithm and its applications
2. **Tree Components** - Root nodes, internal nodes, leaf nodes, and branches  
3. **Entropy** - Mathematical foundation for measuring dataset impurity
4. **Information Gain** - The driving force behind optimal feature selection
5. **Gini vs Entropy** - Comparative analysis of splitting criteria

### **Key Mathematical Insights:**
- **Entropy**: Measures uncertainty and guides optimal splits
- **Information Gain**: Quantifies the value of each feature for classification
- **Gini Impurity**: Provides a computationally efficient alternative to entropy
- **Weighted Averages**: Essential for calculating gains across subsets

### **Practical Knowledge Gained:**
✅ How to identify the best features for decision tree splits  
✅ Mathematical calculations behind tree construction  
✅ Understanding trade-offs between different splitting criteria  
✅ Real-world applications in various domains (medical, business, etc.)  
✅ Performance considerations for large datasets  

---

## 🚀 **Ready for Part 2!**

With this solid theoretical foundation, you're now prepared to tackle the **practical implementation** in **Part 2**, where we'll:

- 🍄 Work with the **Mushroom Classification Dataset**
- 🛠️ Implement decision trees from scratch and using sklearn
- 📊 Apply preprocessing techniques and feature engineering
- 🎯 Evaluate model performance with various metrics
- 🔧 Perform hyperparameter tuning and optimization
- 📈 Create comprehensive visualizations and interpretations

**The theory you've mastered here will directly inform every decision in the practical implementation!**

---

*Excellent work completing the theoretical foundation! Now let's put this knowledge into practice with real data.* 🌟

# Decision Tree – Part 1: Theoretical Understanding

This notebook contains theoretical questions and explanations about Decision Trees, covering fundamental concepts, mathematical foundations, and comparative analysis.