# Decision Tree â€“ Step-by-Step (Classification)
Predict **Pass / Fail** using a **DecisionTreeClassifier**.

Covers:
- Build & train a Decision Tree
- Control overfitting with `max_depth`
- Evaluate with accuracy + confusion matrix
- Visualize the tree

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 1) Create a small dataset

In [None]:
X = np.array([
    [1, 0],
    [2, 0],
    [2, 1],
    [3, 1],
    [3, 2],
    [4, 2],
    [5, 2],
    [6, 3],
    [7, 3],
    [8, 4],
])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])  # 0=Fail, 1=Pass

print('X shape:', X.shape)
print('y shape:', y.shape)
print('First 5 rows of X:\n', X[:5])
print('First 5 labels:', y[:5])

## 2) Train/Test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)
print('Train size:', len(X_train))
print('Test size:', len(X_test))

## 3) Create the Decision Tree model

In [None]:
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree

## 4) Train (fit) the model

In [None]:
tree.fit(X_train, y_train)
print('Decision Tree trained!')

## 5) Predict and evaluate

In [None]:
y_pred = tree.predict(X_test)

print('Predictions:', y_pred)
print('Actual:     ', y_test)

print('\nAccuracy:', accuracy_score(y_test, y_pred))
print('\nConfusion Matrix:\n', confusion_matrix(y_test, y_pred))
print('\nClassification Report:\n', classification_report(y_test, y_pred))

## 6) Visualize the tree

In [None]:
plt.figure(figsize=(12, 6))
plot_tree(
    tree,
    feature_names=['Hours', 'PracticeTests'],
    class_names=['Fail', 'Pass'],
    filled=True,
    rounded=True
)
plt.title('Decision Tree Visualization')
plt.show()

## 7) Try your own input (new student)

In [None]:
new_student = np.array([[4, 1]])
pred = tree.predict(new_student)[0]
proba = tree.predict_proba(new_student)[0]

print('New student [hours, practice_tests]:', new_student[0])
print('Predicted class (0=Fail, 1=Pass):', pred)
print('Probabilities [P(Fail), P(Pass)]:', proba)

## 8) Optional: see how depth affects performance

In [None]:
for depth in [1, 2, 3, 4, 5, None]:
    m = DecisionTreeClassifier(max_depth=depth, random_state=42)
    m.fit(X_train, y_train)
    acc = accuracy_score(y_test, m.predict(X_test))
    print(f'max_depth={depth} -> accuracy={acc:.3f}')