## ID3 Decision Tree Classifier

### Class: `DecisionTree`

#### Hyperparameters
- `max_depth`: Tree's max depth.
- `min_samples_split`: Min samples to split a node.

#### Methods
- `fit`: Builds the tree.
- `prune`: Prunes the tree.
- `predict`: Makes predictions.
- `get_rules`: Returns tree rules.

### Algorithm Steps
1. **Initialization**: Set hyperparameters.
2. **Building**: 
    - Check base cases (all labels same, max depth reached).
    - Calculate entropy and information gain.
    - Choose best feature, split, and recurse.
3. **Pruning**: Temporarily make nodes leaf and check accuracy.
4. **Prediction**: Traverse tree for predictions.

### Utility Functions
- `accuracy`: Measures accuracy.
- `entropy`: Calculates entropy.

### Usage
1. `model = DecisionTree(max_depth=5)`
2. `model.fit(X, y)`
3. `predictions = model.predict(X_test)`

In [1]:
import numpy as np 
import pandas as pd 
import decision_tree as dt  # <-- Your implementation

In [2]:
data_2 = pd.read_csv('data_2.csv')
data_2.drop('Birth Month', axis=1, inplace=True)
data_2

Unnamed: 0,Founder Experience,Second Opinion,Competitive Advantage,Lucurative Market,Outcome,Split
0,moderate,negative,yes,no,success,train
1,high,positive,yes,no,failure,train
2,low,negative,no,no,failure,train
3,low,negative,no,no,failure,train
4,low,positive,yes,yes,success,train
...,...,...,...,...,...,...
195,moderate,positive,no,yes,failure,test
196,low,negative,no,no,failure,test
197,moderate,negative,no,no,failure,test
198,moderate,negative,no,no,failure,test


In [3]:

data_2_train = data_2.query('Split == "train"')
data_2_valid = data_2.query('Split == "valid"')
data_2_test = data_2.query('Split == "test"')
X_train, y_train = data_2_train.drop(columns=['Outcome', 'Split']), data_2_train.Outcome
X_valid, y_valid = data_2_valid.drop(columns=['Outcome', 'Split']), data_2_valid.Outcome
X_test, y_test = data_2_test.drop(columns=['Outcome', 'Split']), data_2_test.Outcome
data_2.Split.value_counts()

Split
test     100
train     50
valid     50
Name: count, dtype: int64

In [4]:
# Fit model (TO TRAIN SET ONLY)
model_2 = dt.DecisionTree(max_depth=4, min_samples_split=2)  # <-- Feel free to add hyperparameters 
model_2.fit(X_train, y_train)
validation_data = {'X': X_valid, 'y': y_valid}
#model_2.prune(model_2.tree, validation_data)

print(model_2.tree)

print(f'Train: {dt.accuracy(y_train, model_2.predict(X_train)) * 100 :.1f}%')
print(f'Valid: {dt.accuracy(y_test, model_2.predict(X_test)) * 100 :.1f}%')

{'leaf': False, 'feature': 'Competitive Advantage', 'children': {'yes': {'leaf': False, 'feature': 'Founder Experience', 'children': {'moderate': {'leaf': False, 'feature': 'Lucurative Market', 'children': {'no': {'leaf': True, 'label': 'success'}, 'yes': {'leaf': True, 'label': 'failure'}}, 'majority_label': 'failure'}, 'high': {'leaf': True, 'label': 'failure'}, 'low': {'leaf': False, 'feature': 'Second Opinion', 'children': {'positive': {'leaf': True, 'label': 'success'}, 'negative': {'leaf': True, 'label': 'failure'}}, 'majority_label': 'failure'}}, 'majority_label': 'failure'}, 'no': {'leaf': False, 'feature': 'Founder Experience', 'children': {'low': {'leaf': False, 'feature': 'Second Opinion', 'children': {'negative': {'leaf': True, 'label': 'failure'}, 'positive': {'leaf': True, 'label': 'success'}}, 'majority_label': 'failure'}, 'moderate': {'leaf': False, 'feature': 'Second Opinion', 'children': {'positive': {'leaf': False, 'feature': 'Lucurative Market', 'children': {'no': {

In [5]:
import pydot

drawn_edges = set()  # Keep track of drawn edges

def draw(parent_name, child_name):
    if parent_name is None or child_name is None:
        return
    edge_key = (parent_name, child_name)
    if edge_key in drawn_edges:  # Skip if this edge is already drawn
        return
    drawn_edges.add(edge_key)  # Mark this edge as drawn
    edge = pydot.Edge(parent_name, child_name)
    graph.add_edge(edge)

def visit(node, parent=None):
    if node['leaf']:
        label = f"Leaf: {node['label']}"
        draw(parent, label)
    else:
        feature_name = node['feature']
        if parent:
            draw(parent, feature_name)
        
        for child_val, child_node in node['children'].items():
            child_name = f"{feature_name}={child_val}"
            draw(feature_name, child_name)
            visit(child_node, child_name)



graph = pydot.Dot(graph_type='graph')
visit(model_2.tree)
graph.write_png('example1_graph.png')