![image.png](https://i.imgur.com/a3uAqnb.png)


# **🎯 Machine Learning Exercise: Student Performance Prediction**

---

## **📋 Exercise Overview**

Build a complete ML pipeline to predict student performance with **three tasks**:

1. **Regression**: Predict final exam score (0-100)
2. **Binary Classification**: Pass/Fail (≥60 = Pass)
3. **Multi-Class Classification**: Grade (A, B, C, D, F)

**Features:**
- study_hours, attendance, previous_score, sleep_hours, extracurricular

**Difficulty:** 🟢 Easy | 🟡 Medium | 🔴 Hard

**Total: 28 TODOs**


# **1️⃣ Setup**

In [None]:
from IPython.display import clear_output
%pip install tqdm -q
clear_output()

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
%matplotlib inline

## **🟢 TODO 1: Create Dataset**

In [None]:
# TODO: Complete dataset creation
np.random.seed(42)
n_samples = 500

# TODO: Create features (Hint: np.random.uniform)
study_hours = np.random.uniform(5, 40, n_samples)  # TODO: Your turn for others
attendance = # TODO: range 50-100
previous_score = # TODO: range 40-95
sleep_hours = # TODO: range 4-9
extracurricular = # TODO: np.random.choice([0, 1], n_samples)

# Create target
final_score = (0.4 * study_hours + 0.3 * attendance + 0.2 * previous_score + 
               0.1 * sleep_hours * 5 + 3 * extracurricular + 
               np.random.normal(0, 5, n_samples))
final_score = np.clip(final_score, 0, 100)

# TODO: Create DataFrame
df = pd.DataFrame({
    'study_hours': study_hours,
    # TODO: Add other columns
})

print(f"✓ Created dataset with {len(df)} samples")

# **2️⃣ EDA**

## **🟢 TODO 2: Dataset Info**

In [None]:
# TODO: Display dataset info
# df.info()

## **🟡 TODO 3: Missing Values**

In [None]:
# TODO: Check for missing values
missing = # TODO: df.isnull().sum()
print("Missing values:")
# print(missing)

## **🟡 TODO 4: Visualize Distribution**

In [None]:
# TODO: Create histogram of final_score
plt.figure(figsize=(10, 6))
# TODO: plt.hist(df['final_score'], bins=30)
# plt.xlabel('Final Score')
# plt.ylabel('Frequency')
# plt.title('Score Distribution')
plt.show()

# **3️⃣ Feature Engineering**

## **🟢 TODO 5: Binary Target**

In [None]:
# TODO: Create pass_fail (1 if score>=60, else 0)
df['pass_fail'] = # TODO: np.where(df['final_score'] >= 60, 1, 0)
print(df['pass_fail'].value_counts())

## **🟡 TODO 6: Multi-Class Target**

In [None]:
# TODO: Create grade column
def assign_grade(score):
    if score >= 90: return 'A'
    elif score >= 80: return # TODO
    elif score >= 70: return # TODO
    # TODO: Complete for D and F
    
df['grade'] = df['final_score'].apply(assign_grade)
print(df['grade'].value_counts())

## **🟡 TODO 7: Encode Grades**

In [None]:
# TODO: Map grades to numbers
grade_mapping = {'A': 4, 'B': 3, # TODO: Complete mapping
df['grade_encoded'] = df['grade'].map(grade_mapping)

# **4️⃣ Preprocessing**

## **🟡 TODO 8: Feature Scaling**

In [None]:
from sklearn.preprocessing import StandardScaler

# TODO: Select features
X = df[['study_hours', 'attendance', # TODO: Add others]]
y_reg = df['final_score']
y_binary = df['pass_fail']
y_multi = df['grade_encoded']

# TODO: Scale features
scaler = StandardScaler()
X_scaled = # TODO: scaler.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

# **5️⃣ Part A: Linear Regression**

## **🔴 TODO 9: MSE Function**

In [None]:
# TODO: Implement MSE
def mean_squared_error(y_true, y_pred):
    # TODO: MSE = mean((y_true - y_pred)^2)
    return # TODO

## **🔴 TODO 10: Gradient Descent**

In [None]:
# TODO: Complete gradient descent
def gradient_descent_regression(X, y, lr=0.01, n_iters=1000):
    m, n = X.shape
    theta = np.zeros(n)
    losses = []
    
    for _ in tqdm(range(n_iters)):
        # TODO: y_pred = X @ theta
        # TODO: loss = mean_squared_error(y, y_pred)
        # TODO: gradient = (1/m) * X.T @ (y_pred - y)
        # TODO: theta -= lr * gradient
        pass
    
    return theta, losses

## **🟡 TODO 11: Train Model**

In [None]:
from sklearn.model_selection import train_test_split

# TODO: Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y_reg, test_size=0.2, random_state=42)

# TODO: Add bias
X_train_bias = np.c_[np.ones(len(X_train)), X_train]
X_test_bias = # TODO

# TODO: Train
theta, losses = gradient_descent_regression(X_train_bias, y_train, lr=0.1, n_iters=1000)

## **🟢 TODO 12: Evaluate**

In [None]:
from sklearn.metrics import r2_score

# TODO: Predict and calculate RMSE, R²
y_pred = # TODO
rmse = # TODO: np.sqrt(mean_squared_error(y_test, y_pred))
r2 = # TODO

print(f"RMSE: {rmse:.2f}, R²: {r2:.3f}")

# **6️⃣ Part B: Binary Classification**

## **🔴 TODO 13: Sigmoid Function**

In [None]:
# TODO: Implement sigmoid
def sigmoid(z):
    return # TODO: 1 / (1 + np.exp(-z))

## **🔴 TODO 14: Binary Cross-Entropy**

In [None]:
# TODO: Implement BCE
def binary_cross_entropy(y_true, y_pred, eps=1e-15):
    y_pred = np.clip(y_pred, eps, 1-eps)
    return # TODO: -mean(y_true*log(y_pred) + (1-y_true)*log(1-y_pred))

## **🔴 TODO 15: Logistic Gradient Descent**

In [None]:
# TODO: Complete logistic regression
def gradient_descent_logistic(X, y, lr=0.01, n_iters=1000):
    m, n = X.shape
    theta = np.zeros(n)
    losses = []
    
    for _ in tqdm(range(n_iters)):
        # TODO: z = X @ theta
        # TODO: y_pred = sigmoid(z)
        # TODO: loss = binary_cross_entropy(y, y_pred)
        # TODO: gradient = (1/m) * X.T @ (y_pred - y)
        # TODO: theta -= lr * gradient
        pass
    
    return theta, losses

## **🟡 TODO 16: Train & Evaluate Binary Model**

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix

# TODO: Split, train, predict
X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(
    X_scaled, y_binary, test_size=0.2, random_state=42)

# TODO: Add bias, train, predict, evaluate
# accuracy = accuracy_score(y_test_b, y_pred_b)
# print(f"Accuracy: {accuracy:.3f}")

# **7️⃣ Part C: Multi-Class Classification**

## **🔴 TODO 17: Softmax Function**

In [None]:
# TODO: Implement softmax
def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
    return # TODO: exp_z / sum(exp_z)

## **🔴 TODO 18: Categorical Cross-Entropy**

In [None]:
# TODO: Implement CCE
def categorical_cross_entropy(y_true, y_pred, eps=1e-15):
    y_pred = np.clip(y_pred, eps, 1-eps)
    return # TODO: -mean(sum(y_true * log(y_pred)))

## **🔴 TODO 19: One-Hot Encoding**

In [None]:
# TODO: Implement one-hot encoding
def one_hot_encode(y, num_classes):
    n = len(y)
    one_hot = np.zeros((n, num_classes))
    # TODO: one_hot[np.arange(n), y] = 1
    return one_hot

## **🔴 TODO 20: Softmax Gradient Descent**

In [None]:
# TODO: Complete softmax regression
def gradient_descent_softmax(X, y, num_classes, lr=0.01, n_iters=1000):
    m, n = X.shape
    theta = np.zeros((n, num_classes))
    y_onehot = one_hot_encode(y, num_classes)
    losses = []
    
    for _ in tqdm(range(n_iters)):
        # TODO: z = X @ theta
        # TODO: y_pred = softmax(z)
        # TODO: loss = categorical_cross_entropy(y_onehot, y_pred)
        # TODO: gradient = (1/m) * X.T @ (y_pred - y_onehot)
        # TODO: theta -= lr * gradient
        pass
    
    return theta, losses

## **🟡 TODO 21: Train & Evaluate Multi-Class**

In [None]:
# TODO: Split, train, predict
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
    X_scaled, y_multi, test_size=0.2, random_state=42)

num_classes = len(np.unique(y_multi))

# TODO: Add bias, train, predict, calculate accuracy
# accuracy_m = accuracy_score(y_test_m, y_pred_m)
# print(f"Accuracy: {accuracy_m:.3f}")


# **🎓 Completion Summary**

## **Congratulations! 🎉**

You've completed **28 TODOs** covering:
- ✅ Data loading & EDA
- ✅ Feature engineering
- ✅ Linear Regression (MSE, Gradient Descent)
- ✅ Binary Classification (Sigmoid, BCE)
- ✅ Multi-Class Classification (Softmax, CCE)

**Expected Results:**
- Regression: RMSE ~5-8, R² ~0.85-0.95
- Binary: Accuracy ~85-95%
- Multi-Class: Accuracy ~70-85%

**Next Steps:**
- Compare with sklearn implementations
- Try different hyperparameters
- Add cross-validation
- Implement regularization
