# Overfitting and Underfitting Demo (Google Colab)

This notebook shows a simple visual demonstration of **underfitting**, **overfitting**, and a **good fit** using a small dataset.

You can run each cell one by one while presenting and explain the graphs.

## 1. Import Libraries
Run this cell first to import required libraries.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## 2. Create Dataset (Hours Studied vs Marks)

We create a small dataset where `X` = hours studied and `y` = marks scored.

In [None]:
# Sample dataset: Hours vs Marks
X = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = np.array([15, 28, 35, 50, 65, 80, 88, 95])  # Non-linear-ish pattern

# Train-test split (for checking generalization)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

plt.scatter(X, y)
plt.title("Dataset: Hours Studied vs Marks Scored")
plt.xlabel("Hours Studied")
plt.ylabel("Marks")
plt.show()

## 3. Helper Function to Fit and Plot

This function will:
- Create polynomial features of a given degree
- Train a linear regression model
- Plot the curve along with data points
- Print training and test Mean Squared Error (MSE)

In [None]:
def fit_and_plot(degree, title):
    poly = PolynomialFeatures(degree=degree)
    X_train_poly = poly.fit_transform(X_train)
    X_test_poly = poly.transform(X_test)

    model = LinearRegression()
    model.fit(X_train_poly, y_train)

    # Predictions
    y_train_pred = model.predict(X_train_poly)
    y_test_pred = model.predict(X_test_poly)

    # Errors
    train_mse = mean_squared_error(y_train, y_train_pred)
    test_mse = mean_squared_error(y_test, y_test_pred)

    # Plot over full range
    X_range = np.linspace(X.min(), X.max(), 200).reshape(-1, 1)
    X_range_poly = poly.transform(X_range)
    y_range_pred = model.predict(X_range_poly)

    plt.scatter(X_train, y_train, label="Train Data")
    plt.scatter(X_test, y_test, marker='x', label="Test Data")
    plt.plot(X_range, y_range_pred, label=f"Model (degree={degree})")
    plt.title(title)
    plt.xlabel("Hours Studied")
    plt.ylabel("Marks Scored")
    plt.legend()
    plt.show()

    print(f"Degree: {degree}")
    print(f"Train MSE: {train_mse:.2f}")
    print(f"Test MSE: {test_mse:.2f}")


## 4. Underfitting Demo (Degree = 1)

A linear (degree 1) model is too simple and cannot follow the curve well. This shows **underfitting**.

In [None]:
fit_and_plot(degree=1, title="Underfitting: Linear Model (Degree 1)")

## 5. Good Fit Demo (Degree = 3)

A degree 3 model is flexible enough to capture the pattern but not too complex. This is a **good fit**.

In [None]:
fit_and_plot(degree=3, title="Good Fit: Polynomial Model (Degree 3)")

## 6. Overfitting Demo (Degree = 10)

A high-degree model (degree 10) becomes too complex and starts to memorize the training data, leading to **overfitting**.

In [None]:
# @title
fit_and_plot(degree=10, title="Overfitting: Polynomial Model (Degree 10)")

## 7. Summary

- **Underfitting**: Model too simple → high error on train and test.
- **Overfitting**: Model too complex → low error on train, high on test.
- **Good Fit**: Balance between simplicity and complexity → good performance on both train and test.