# Random Forest â€“ Step-by-Step (Classification)
Predict **Pass / Fail** using a **RandomForestClassifier**.

You will learn:
- Why Random Forest = **many decision trees**
- How to train + predict
- How to evaluate
- How to read **feature importance**

In [None]:
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 1) Create a small dataset

In [None]:
X = np.array([
    [1, 0],
    [2, 0],
    [2, 1],
    [3, 1],
    [3, 2],
    [4, 2],
    [5, 2],
    [6, 3],
    [7, 3],
    [8, 4],
])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])  # 0=Fail, 1=Pass

print('X shape:', X.shape)
print('y shape:', y.shape)
print('First 5 rows of X:\n', X[:5])
print('First 5 labels:', y[:5])

## 2) Train/Test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)
print('Train size:', len(X_train))
print('Test size:', len(X_test))

## 3) Build the Random Forest model

In [None]:
rf = RandomForestClassifier(
    n_estimators=200,   # number of trees
    max_depth=4,        # limit depth to reduce overfitting
    random_state=42
)
rf

## 4) Train the model

In [None]:
rf.fit(X_train, y_train)
print('Random Forest trained!')

## 5) Predict and evaluate

In [None]:
y_pred = rf.predict(X_test)
y_prob = rf.predict_proba(X_test)

print('Predictions:', y_pred)
print('Actual:     ', y_test)

print('\nAccuracy:', accuracy_score(y_test, y_pred))
print('\nConfusion Matrix:\n', confusion_matrix(y_test, y_pred))
print('\nClassification Report:\n', classification_report(y_test, y_pred))

print('\nProbabilities (first 5 rows):\n', y_prob[:5])

## 6) Try your own input (new student)

In [None]:
new_student = np.array([[4, 1]])
pred = rf.predict(new_student)[0]
proba = rf.predict_proba(new_student)[0]

print('New student [hours, practice_tests]:', new_student[0])
print('Predicted class (0=Fail, 1=Pass):', pred)
print('Probabilities [P(Fail), P(Pass)]:', proba)

## 7) Feature importance (which feature matters more?)

In [None]:
feature_names = ['Hours', 'PracticeTests']
importances = rf.feature_importances_

for name, imp in sorted(zip(feature_names, importances), key=lambda x: x[1], reverse=True):
    print(f'{name}: {imp:.3f}')

## 8) Optional: compare different numbers of trees

In [None]:
for n in [10, 50, 100, 200, 500]:
    m = RandomForestClassifier(n_estimators=n, max_depth=4, random_state=42)
    m.fit(X_train, y_train)
    acc = accuracy_score(y_test, m.predict(X_test))
    print(f'n_estimators={n} -> accuracy={acc:.3f}')