# <font color="#418FDE" size="6.5" uppercase>**Fitting And Overfitting**</font>

>Last update: 20260201.
    
By the end of this Lecture, you will be able to:
- Describe in intuitive terms how training adjusts model parameters to reduce loss. 
- Differentiate underfitting, appropriate fitting, and overfitting using examples. 
- Explain why evaluating models only on training data can be misleading. 


## **1. How Models Adjust**

### **1.1. From Guess to Better Fit**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_01_01.jpg?v=1769960586" width="250">



>* Training starts with a rough, usually poor guess
>* Loss measures errors and guides gradual parameter adjustments

>* Bread recipe tuning mirrors model training feedback
>* Loss guides parameter tweaks to improve predictions

>* Model follows downhill slope from high loss
>* Many small steps gradually reach better predictions



### **1.2. Stepwise Model Improvement**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_01_02.jpg?v=1769960598" width="250">



>* Model checks prediction errors and tweaks parameters slightly
>* Repeated small tweaks gradually lower loss and improve fit

>* Training is like walking downhill using local slope
>* Many small, local steps gradually reach lower loss

>* Like practicing piano, learning happens through refinements
>* Many small parameter tweaks gradually reduce prediction errors



### **1.3. Seeing Parameters Move**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_01_03.jpg?v=1769960611" width="250">



>* Model parameters act like adjustable control knobs
>* Training nudges knobs to lower loss over time

>* Simple house-price example shows parameter knobs
>* Training adjusts size and age weights to reduce loss

>* Basketball practice mirrors gradual parameter adjustments
>* Refined parameters give accurate responses without retraining



In [None]:
#@title Python Code - Seeing Parameters Move

# This script shows parameters moving during simple training.
# We use a tiny linear model with manual updates.
# Watch how loss shrinks as weights slowly adjust.

# import required built in and numerical libraries.
import numpy as np
import matplotlib.pyplot as plt

# set deterministic random seed for reproducible behavior.
np.random.seed(0)

# create small synthetic dataset for a simple line.
true_w, true_b = 2.0, 1.0
x = np.linspace(0, 5, 20)

# generate targets with small noise for realism.
noise = np.random.normal(loc=0.0, scale=0.5, size=x.shape)
y = true_w * x + true_b + noise

# initialize model parameters with poor starting guess.
w, b = -1.0, 5.0

# choose learning rate and number of training steps.
learning_rate = 0.05
steps = 20

# prepare lists to store parameter and loss history.
history_w, history_b, history_loss = [], [], []

# define function to compute predictions from parameters.
def predict(x_values, weight, bias):
    return weight * x_values + bias

# define function to compute mean squared error loss.
def mse_loss(y_true, y_pred):
    diff = y_true - y_pred
    return float(np.mean(diff ** 2))

# training loop that nudges parameters to reduce loss.
for step in range(steps):
    y_pred = predict(x, w, b)
    loss = mse_loss(y, y_pred)

    # compute gradients for weight and bias analytically.
    error = y_pred - y
    grad_w = float(np.mean(2 * error * x))
    grad_b = float(np.mean(2 * error))

    # update parameters by stepping opposite gradient direction.
    w = w - learning_rate * grad_w
    b = b - learning_rate * grad_b

    # store history for later inspection and plotting.
    history_w.append(w)
    history_b.append(b)
    history_loss.append(loss)

# print a few snapshots to see parameters move.
print("Initial guess weight and bias were -1.0 and 5.0.")
print("Final learned weight and bias are", round(w, 2), round(b, 2))
print("True underlying weight and bias are", true_w, true_b)
print("First recorded loss value was", round(history_loss[0], 3))
print("Last recorded loss value was", round(history_loss[-1], 3))

# create a simple plot showing parameter and loss changes.
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(8, 3))

# plot loss over training steps to show improvement.
axes[0].plot(range(1, steps + 1), history_loss, marker="o")
axes[0].set_title("Loss shrinking as parameters move")
axes[0].set_xlabel("Training step index")
axes[0].set_ylabel("Mean squared error loss")

# plot weight and bias trajectories together for comparison.
axes[1].plot(range(1, steps + 1), history_w, label="weight w")
axes[1].plot(range(1, steps + 1), history_b, label="bias b")
axes[1].axhline(true_w, color="gray", linestyle="--", label="true w")
axes[1].axhline(true_b, color="black", linestyle=":", label="true b")
axes[1].set_title("Parameters nudged toward better values")
axes[1].set_xlabel("Training step index")
axes[1].set_ylabel("Parameter value magnitude")
axes[1].legend(loc="best")

plt.tight_layout()
plt.show()



## **2. Model Complexity Balance**

### **2.1. Underfitting Simple Models**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_02_01.jpg?v=1769960664" width="250">



>* Underfitting means the model is too simple
>* It makes systematic errors and misses key patterns

>* Straight-line models can’t follow curved data patterns
>* Oversimplified features cause systematic errors across domains

>* Overly simple models stay biased and inaccurate
>* Spot underfitting and choose richer models instead



### **2.2. Overly Complex Models**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_02_02.jpg?v=1769960676" width="250">



>* Too-flexible models fit noise and quirks
>* They memorize training data, failing to generalize

>* Moderate models learn general links between features
>* Overly complex models memorize quirks, fail on new

>* Complex models memorize quirks from specific datasets
>* They seem accurate but fail on new situations



In [None]:
#@title Python Code - Overly Complex Models

# This script shows overly complex models intuitively.
# We compare simple and complex curves on noisy data.
# Focus on how complexity affects generalization performance.

# import required numerical and plotting libraries.
import numpy as np
import matplotlib.pyplot as plt

# set deterministic random seed for reproducible results.
np.random.seed(42)

# create one dimensional input data for training examples.
x_train = np.linspace(0, 1, 10)

# define true underlying relationship as a simple line.
true_y_train = 2 * x_train + 1

# add small noise to create realistic training targets.
noise_train = np.random.normal(loc=0.0, scale=0.1, size=x_train.shape)

# compute noisy training targets from true line plus noise.
y_train = true_y_train + noise_train

# create separate test inputs to check generalization behavior.
x_test = np.linspace(0, 1, 100)

# compute true outputs for test inputs without noise.
true_y_test = 2 * x_test + 1

# fit simple linear polynomial model to training data.
coeffs_simple = np.polyfit(x_train, y_train, deg=1)

# fit overly complex polynomial model with high degree.
coeffs_complex = np.polyfit(x_train, y_train, deg=9)

# evaluate simple model predictions on training and test inputs.
y_pred_simple_train = np.polyval(coeffs_simple, x_train)

# evaluate complex model predictions on training and test inputs.
y_pred_complex_train = np.polyval(coeffs_complex, x_train)

# compute simple model predictions on dense test grid.
y_pred_simple_test = np.polyval(coeffs_simple, x_test)

# compute complex model predictions on dense test grid.
y_pred_complex_test = np.polyval(coeffs_complex, x_test)

# define helper function for mean squared error calculation.
def mean_squared_error(y_true, y_pred):
    # validate shapes before computing mean squared error.
    assert y_true.shape == y_pred.shape

    return float(np.mean((y_true - y_pred) ** 2))

# compute training error for simple linear model.
mse_simple_train = mean_squared_error(y_train, y_pred_simple_train)

# compute training error for complex polynomial model.
mse_complex_train = mean_squared_error(y_train, y_pred_complex_train)

# compute test error for simple linear model.
mse_simple_test = mean_squared_error(true_y_test, y_pred_simple_test)

# compute test error for complex polynomial model.
mse_complex_test = mean_squared_error(true_y_test, y_pred_complex_test)

# print concise summary comparing training and test errors.
print("Simple model training MSE:", round(mse_simple_train, 4))
print("Complex model training MSE:", round(mse_complex_train, 4))
print("Simple model test MSE:", round(mse_simple_test, 4))
print("Complex model test MSE:", round(mse_complex_test, 4))

# create figure to visualize true line and both models.
plt.figure(figsize=(6, 4))

# plot noisy training points as scattered markers.
plt.scatter(x_train, y_train, color="black", label="Training data")

# plot true underlying simple relationship line.
plt.plot(x_test, true_y_test, color="green", label="True relationship")

# plot simple model line which should generalize well.
plt.plot(x_test, y_pred_simple_test, color="blue", label="Simple model")

# plot complex model curve showing overfitting wiggles.
plt.plot(x_test, y_pred_complex_test, color="red", label="Complex model")

# add axis labels and legend for clarity.
plt.xlabel("Input feature value")
plt.ylabel("Target value")
plt.legend()

# display the final plot to visually compare models.
plt.show()




### **2.3. Balanced Model Fit**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_02_03.jpg?v=1769960731" width="250">



>* Captures main data patterns without chasing noise
>* Performs similarly on training and new data

>* Balanced model uses meaningful medical patterns, not trivia
>* It generalizes reliably across new patients and hospitals

>* Choose model complexity that matches real patterns
>* Align model behavior with domain knowledge, common sense



## **3. Generalization Beyond Training**

### **3.1. Training vs Unseen Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_03_01.jpg?v=1769960745" width="250">



>* Training adjusts model parameters using labeled examples
>* Real goal is accurate predictions on unseen cases

>* Memorizing practice questions hides lack of understanding
>* Models must generalize beyond training examples to work

>* Generalization means handling new, slightly different data
>* Training-only success can hide failures on real cases



### **3.2. Limits of Memorization**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_03_02.jpg?v=1769960756" width="250">



>* Overfitting stores examples instead of learning patterns
>* Training accuracy can hide poor generalization performance

>* Memorized spam filters fail on new tricks
>* We must test if models learn general patterns

>* Memorizing one hospital’s data fails elsewhere
>* Training accuracy hides poor real‑world generalization



### **3.3. Why Separate Evaluation**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_05/Lecture_B/image_03_03.jpg?v=1769960768" width="250">



>* Training data shows performance on familiar examples only
>* Separate evaluation checks generalization to new situations

>* Complex models can memorize noise and quirks
>* Separate evaluation reveals poor performance on new data

>* Separate evaluation checks if changes truly generalize
>* Prevents complex models that impress training but fail



In [None]:
#@title Python Code - Why Separate Evaluation

# This script shows why separate evaluation matters.
# We compare training and test errors for memorization.
# Focus is on generalization beyond seen training examples.

# Required libraries are available in Colab by default.
# No extra installations are necessary for this script.
# Uncomment below only if environment lacks numpy or matplotlib.
# pip install numpy matplotlib seaborn.

# Import required standard and numerical libraries.
import numpy as np
import matplotlib.pyplot as plt

# Set deterministic random seed for reproducibility.
np.random.seed(0)

# Create simple one dimensional input features.
x_all = np.linspace(0.0, 1.0, 20)

# Create true underlying linear relationship with noise.
true_y_all = 2.0 * x_all + 1.0 + np.random.normal(0.0, 0.05, 20)

# Split into small training and test sets.
train_x, test_x = x_all[:10], x_all[10:]

# Split corresponding target values for training and test.
train_y, test_y = true_y_all[:10], true_y_all[10:]

# Validate shapes before further computations.
assert train_x.shape == train_y.shape

# Define helper function to compute mean squared error.
def mean_squared_error(y_true, y_pred):
    # Ensure shapes match before computing error.
    assert y_true.shape == y_pred.shape

    return float(np.mean((y_true - y_pred) ** 2))


# Train simple linear model using closed form solution.
X_design = np.vstack((np.ones_like(train_x), train_x)).T

# Compute parameters using normal equation solution.
theta = np.linalg.pinv(X_design.T @ X_design) @ (X_design.T @ train_y)

# Extract intercept and slope from parameter vector.
intercept, slope = float(theta[0]), float(theta[1])

# Compute predictions for training and test sets.
train_pred_linear = intercept + slope * train_x

# Compute linear model predictions on test inputs.
test_pred_linear = intercept + slope * test_x

# Compute training and test errors for linear model.
train_mse_linear = mean_squared_error(train_y, train_pred_linear)

# Compute test mean squared error for linear model.
test_mse_linear = mean_squared_error(test_y, test_pred_linear)

# Build memorization model that stores training targets.
memorized_train_y = train_y.copy()

# Define function that predicts by nearest training neighbor.
def memorize_predict(train_inputs, train_targets, new_inputs):
    # Ensure training and target shapes are compatible.
    assert train_inputs.shape == train_targets.shape

    # Allocate prediction array for new inputs.
    preds = np.zeros_like(new_inputs)

    # Loop over each new input and find nearest neighbor.
    for i, value in enumerate(new_inputs):
        distances = np.abs(train_inputs - value)
        nearest_index = int(np.argmin(distances))
        preds[i] = train_targets[nearest_index]

    return preds


# Compute memorization predictions on training data.
train_pred_mem = memorize_predict(train_x, memorized_train_y, train_x)

# Compute memorization predictions on test data.
test_pred_mem = memorize_predict(train_x, memorized_train_y, test_x)

# Compute training and test errors for memorization model.
train_mse_mem = mean_squared_error(train_y, train_pred_mem)

# Compute test mean squared error for memorization model.
test_mse_mem = mean_squared_error(test_y, test_pred_mem)

# Print concise comparison of both models and datasets.
print("Linear model train MSE:", round(train_mse_linear, 4))

# Print linear model test error for comparison.
print("Linear model test MSE:", round(test_mse_linear, 4))

# Print memorization model training error showing perfect fit.
print("Memorization model train MSE:", round(train_mse_mem, 4))

# Print memorization model test error showing poor generalization.
print("Memorization model test MSE:", round(test_mse_mem, 4))

# Create dense grid for plotting model predictions.
plot_x = np.linspace(0.0, 1.0, 100)

# Compute linear predictions on dense grid.
plot_y_linear = intercept + slope * plot_x

# Compute memorization predictions on dense grid.
plot_y_mem = memorize_predict(train_x, memorized_train_y, plot_x)

# Start a new figure for visualization.
plt.figure(figsize=(6, 4))

# Plot training points as blue circles.
plt.scatter(train_x, train_y, color="blue", label="Train points")

# Plot test points as orange triangles.
plt.scatter(test_x, test_y, color="orange", label="Test points")

# Plot linear model prediction line.
plt.plot(plot_x, plot_y_linear, color="green", label="Linear model")

# Plot memorization model step like curve.
plt.step(plot_x, plot_y_mem, where="mid", color="red", label="Memorization model")

# Add axis labels and title for clarity.
plt.xlabel("Input feature x value")

# Label y axis to show target variable.
plt.ylabel("Target y value")

# Add legend explaining plotted elements.
plt.legend(loc="best")

# Add title emphasizing generalization beyond training.
plt.title("Training fit versus generalization on separate test data")

# Display the final plot to the user.
plt.show()



# <font color="#418FDE" size="6.5" uppercase>**Fitting And Overfitting**</font>


In this lecture, you learned to:
- Describe in intuitive terms how training adjusts model parameters to reduce loss. 
- Differentiate underfitting, appropriate fitting, and overfitting using examples. 
- Explain why evaluating models only on training data can be misleading. 

In the next Module (Module 6), we will go over 'Evaluation Metrics'