# **Problem Statement**  
## **4. Build a Decision Tree Classifier from Scratch using Gini Index**

Build a Decision Tree classifier from scratch using the Gini Index as the splitting criterion.

Given a labeled dataset:
- Recursively split the dataset based on feature values
- Choose splits that minimize Gini impurity
- Construct a tree until a stopping condition is met
- Use the tree to classify unseen samples

### Constraints & Example Inputs/Outputs

- Binary or multi-class classification
- Numerical features only (for simplicity)
- Greedy splitting (no backtracking)
- No use of sklearn or ML libraries

Example Input:
```python
X = [
    [2.7, 2.5],
    [1.3, 1.5],
    [3.0, 3.5],
    [2.0, 2.0],
    [3.5, 4.0]
]
y = [0, 0, 1, 0, 1]

```

Expected Output:
```python
Predicted class for [3.0, 3.0] → 1
```

### Solution Approach

### How Decision Tree (Gini) Works

1. For each feature:
- Try all possible split points
2. For each split:
- Divide data into left & right nodes
- Compute Gini Index
3. Choose the split with minimum weighted Gini
4. Recursively repeat for child nodes

### Stopping Conditions
- All samples belong to one class
- Maximum depth reached
- Minimum samples per node reached

### Solution Code

### Helper Functions (Gini & Splits)

In [8]:
from collections import Counter
import numpy as np

def gini_index(groups, classes):
    total_samples = sum(len(group) for group in groups)
    gini = 0.0
    
    for group in groups:
        size = len(group)
        if size == 0:
            continue
        
        score = 0.0
        labels = [row[-1] for row in group]
        counts = Counter(labels)
        
        for cls in classes:
            p = counts[cls] / size if cls in counts else 0
            score += p * p
        
        gini += (1 - score) * (size / total_samples)
    
    return gini


### Brute Force Best Split Finder

In [9]:
# Approach1: Brute Force Best Split Finder
def test_split(index, value, dataset):
    left, right = [], []
    for row in dataset:
        if row[index] < value:
            left.append(row)
        else:
            right.append(row)
    return left, right

def get_best_split(dataset):
    class_values = list(set(row[-1] for row in dataset))
    best_index, best_value, best_score, best_groups = None, None, float("inf"), None
    
    for index in range(len(dataset[0]) - 1):
        for row in dataset:
            groups = test_split(index, row[index], dataset)
            gini = gini_index(groups, class_values)
            
            if gini < best_score:
                best_index, best_value, best_score, best_groups = index, row[index], gini, groups
    
    return {
        "index": best_index,
        "value": best_value,
        "groups": best_groups
    }


### Decision Tree Construction (Optimized Recursive)

In [10]:
# Approach 2: Decision Tree Construction (Optimized Recursive)
def to_terminal(group):
    labels = [row[-1] for row in group]
    return Counter(labels).most_common(1)[0][0]

def split(node, max_depth, min_size, depth):
    left, right = node["groups"]
    del node["groups"]
    
    if not left or not right:
        node["left"] = node["right"] = to_terminal(left + right)
        return
    
    if depth >= max_depth:
        node["left"], node["right"] = to_terminal(left), to_terminal(right)
        return
    
    if len(left) <= min_size:
        node["left"] = to_terminal(left)
    else:
        node["left"] = get_best_split(left)
        split(node["left"], max_depth, min_size, depth + 1)
    
    if len(right) <= min_size:
        node["right"] = to_terminal(right)
    else:
        node["right"] = get_best_split(right)
        split(node["right"], max_depth, min_size, depth + 1)


In [11]:
def build_tree(train, max_depth=3, min_size=1):
    root = get_best_split(train)
    split(root, max_depth, min_size, 1)
    return root


### Predictive Function

In [12]:
def predict(node, row):
    if row[node["index"]] < node["value"]:
        return predict(node["left"], row) if isinstance(node["left"], dict) else node["left"]
    else:
        return predict(node["right"], row) if isinstance(node["right"], dict) else node["right"]


### Alternative Approaches

#### Brute Force
- Try all splits for all features
- Accurate but slow

#### Optimized
- Limit depth & min samples
- Pruning
- Random feature selection (Random Forest idea)

### Test Case

In [13]:
# Test Case 1: Simple Binary Classification

dataset = [
    [2.7, 2.5, 0],
    [1.3, 1.5, 0],
    [3.0, 3.5, 1],
    [2.0, 2.0, 0],
    [3.5, 4.0, 1]
]

tree = build_tree(dataset, max_depth=3)

In [14]:
test_sample = [3.0, 3.0]
print("Prediction:", predict(tree, test_sample))

Prediction: 1


In [15]:
# Test Case 2: Multiple Predictions

X_test = [
    [1.5, 1.7],
    [3.2, 3.8],
    [2.1, 2.2]
]

for x in X_test:
    print(x, "→", predict(tree, x))


[1.5, 1.7] → 0
[3.2, 3.8] → 1
[2.1, 2.2] → 0


In [16]:
# Test Case 3: Edge Case (Single Feature)

dataset = [
    [1.0, 0],
    [2.0, 0],
    [3.0, 1],
    [4.0, 1]
]

tree = build_tree(dataset)
print(predict(tree, [3.5]))


1


## Complexity Analysis

### Time Complexity
- Split search: O(n² × d)
- Tree depth: O(log n) (average)

### Space Complexity
Tree storage: O(n)

#### Thank You!!