# Day 13: Train/Test Split & Cross-validation

When training ML models, we must test them on **unseen data** to check generalization.  

---

## 1. Train/Test Split
- Splits dataset into:
  - **Training set** → used to fit the model.
  - **Test set** → used to evaluate model performance.
- Common split = **70/30** or **80/20**.

---

## 2. Cross-validation (CV)
- Instead of a single split, we divide data into **k folds**.
- Train on (k-1) folds, test on 1 fold, repeat k times.
- Average the results for a **more reliable performance estimate**.

✅ Common: **k=5** or **k=10** → *k-Fold CV*.

In [1]:
# Day 13: Train/Test Split & Cross-validation
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Sample dataset
data = {
    "Hours_Studied": [1,2,3,4,5,6,7,8,9,10],
    "Exam_Score": [35,50,65,70,75,78,85,87,90,92]
}
df = pd.DataFrame(data)

X = df[["Hours_Studied"]]
y = df["Exam_Score"]


In [2]:
# 1. Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("MSE (Test Set):", mse)
print("R² (Test Set):", r2)


MSE (Test Set): 7.8587148318792215
R² (Test Set): 0.9720219804244806


In [3]:
# 2. Cross-validation (5-Fold)
scores = cross_val_score(model, X, y, cv=5, scoring="r2")
print("Cross-validation R² Scores:", scores)
print("Average R² Score:", np.mean(scores))


Cross-validation R² Scores: [  -5.29482741  -12.07847222   -9.88679027   -8.40451389 -144.98901644]
Average R² Score: -36.13072404676177


### ✅ Key Points:
- **Train/Test Split** gives quick evaluation but depends on split randomness.
- **Cross-validation** provides a more reliable estimate of model performance.
- Always use CV for **small datasets**.
