# <font color="#418FDE" size="6.5" uppercase>**Using model.fit**</font>

>Last update: 20260125.
    
By the end of this Lecture, you will be able to:
- Configure model.fit with appropriate batch sizes, epochs, and validation strategies for a given dataset. 
- Use Keras callbacks to monitor training, implement early stopping, and save model checkpoints. 
- Interpret training and validation curves to diagnose underfitting and overfitting. 


## **1. Configuring model fit**

### **1.1. Batch size and epochs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_01_01.jpg?v=1769392054" width="250">



>* Batch size affects update frequency, stability, memory
>* Epochs balance learning time, underfitting, overfitting risk

>* Batch size is limited by hardware memory
>* Tune batch size for speed and generalization

>* Epochs depend on task complexity and difficulty
>* Use validation curves to stop training at optimum



In [None]:
#@title Python Code - Batch size and epochs

# This script shows batch size and epochs.
# It uses a tiny MNIST subset dataset.
# It keeps training fast and output short.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolution layers.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Use a small subset to keep training quick.
train_samples = 4000
val_samples = 1000
x_train_small = x_train[:train_samples]
y_train_small = y_train[:train_samples]

# Create a validation split from training subset.
x_val_small = x_train[train_samples:train_samples + val_samples]
y_val_small = y_train[train_samples:train_samples + val_samples]

# Validate shapes before building the model.
print("Train subset shape:", x_train_small.shape)
print("Validation subset shape:", x_val_small.shape)

# Build a simple sequential CNN model.
model = keras.Sequential([
    layers.Conv2D(16, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile model with optimizer and loss.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Define two different batch sizes for comparison.
small_batch_size = 16
large_batch_size = 128
epochs = 5

# Train with small batch size and fixed epochs.
history_small = model.fit(
    x_train_small,
    y_train_small,
    batch_size=small_batch_size,
    epochs=epochs,
    validation_data=(x_val_small, y_val_small),
    verbose=0,
)

# Train again with larger batch size and same epochs.
history_large = model.fit(
    x_train_small,
    y_train_small,
    batch_size=large_batch_size,
    epochs=epochs,
    validation_data=(x_val_small, y_val_small),
    verbose=0,
)

# Helper function to summarize final metrics.
def summarize_run(name, history):
    train_loss = history.history["loss"][-1]
    train_acc = history.history["accuracy"][-1]
    val_loss = history.history["val_loss"][-1]
    val_acc = history.history["val_accuracy"][-1]
    print(
        f"{name} -> loss: {train_loss:.3f}, acc: {train_acc:.3f}, "
        f"val_loss: {val_loss:.3f}, val_acc: {val_acc:.3f}"
    )

# Print a short explanation header line.
print("Comparing small and large batch sizes after training.")

# Show results for small batch size training.
summarize_run("Small batch size", history_small)

# Show results for large batch size training.
summarize_run("Large batch size", history_large)



### **1.2. Choosing Validation Inputs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_01_02.jpg?v=1769392097" width="250">



>* Validation data must be separate from training
>* Make validation realistic to estimate real performance

>* Choose validation method based on dataset structure
>* Respect ordering, mimic deployment to avoid misleading scores

>* Match preprocessing, avoid random validation augmentations
>* Preserve real class balance for stable feedback



In [None]:
#@title Python Code - Choosing Validation Inputs

# This script shows choosing validation inputs.
# It compares validation_split and explicit validation_data.
# It uses a tiny MNIST subset for speed.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select device preference based on availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_type = "GPU"
else:
    device_type = "CPU"

# Print which device type will likely be used.
print("Using device type:", device_type)

# Load MNIST dataset from Keras datasets.
(mnist_x_train, mnist_y_train), _ = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to the range zero one.
mnist_x_train = mnist_x_train.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
mnist_x_train = np.expand_dims(mnist_x_train, axis=-1)

# Confirm shapes are as expected before splitting.
print("Training data shape:", mnist_x_train.shape)

# Use a small subset to keep runtime short.
subset_size = 6000
mnist_x_train = mnist_x_train[:subset_size]
mnist_y_train = mnist_y_train[:subset_size]

# Verify subset sizes are consistent and safe.
print("Subset size:", mnist_x_train.shape[0])

# Build a simple convolutional classification model.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(8, (3, 3), activation="relu",
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(32, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax"),
])

# Compile the model with standard settings.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train using validation_split on shuffled iid data.
history_split = model.fit(
    mnist_x_train,
    mnist_y_train,
    batch_size=64,
    epochs=3,
    validation_split=0.2,
    shuffle=True,
    verbose=0,
)

# Manually create explicit validation_data arrays.
val_size = int(0.2 * subset_size)
explicit_x_val = mnist_x_train[-val_size:]
explicit_y_val = mnist_y_train[-val_size:]

# Use the remaining data strictly for training.
explicit_x_train = mnist_x_train[:-val_size]
explicit_y_train = mnist_y_train[:-val_size]

# Confirm explicit split sizes are consistent.
print("Explicit train size:", explicit_x_train.shape[0])

# Rebuild a fresh model for fair comparison.
model_explicit = tf.keras.Sequential([
    tf.keras.layers.Conv2D(8, (3, 3), activation="relu",
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(32, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax"),
])

# Compile the second model with same settings.
model_explicit.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train using explicit validation_data argument.
history_explicit = model_explicit.fit(
    explicit_x_train,
    explicit_y_train,
    batch_size=64,
    epochs=3,
    validation_data=(explicit_x_val, explicit_y_val),
    shuffle=False,
    verbose=0,
)

# Extract final metrics from both training runs.
final_split_val_acc = history_split.history["val_accuracy"][-1]
final_explicit_val_acc = history_explicit.history["val_accuracy"][-1]

# Print concise comparison of validation strategies.
print("Final val_accuracy with validation_split:",
      round(float(final_split_val_acc), 4))

# Show validation accuracy using explicit validation_data.
print("Final val_accuracy with explicit data:",
      round(float(final_explicit_val_acc), 4))

# Explain why explicit validation helps structured data.
print("Use validation_split only for well shuffled iid data.")




### **1.3. Shuffling and Class Weights**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_01_03.jpg?v=1769392135" width="250">



>* Shuffle data so batches reflect overall distribution
>* Prevents order-based overfitting and stabilizes training

>* Imbalanced datasets make models ignore rare classes
>* Class weights and shuffling amplify rare examples

>* Shuffle whole sequences while preserving internal order
>* Use class weights so rare sequences matter



In [None]:
#@title Python Code - Shuffling and Class Weights

# This script shows shuffling and class weights.
# It uses a tiny imbalanced dataset example.
# It runs quickly with minimal printed output.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reduce dataset size for quick training.
x_train = x_train[:6000]
y_train = y_train[:6000]

# Create binary labels to induce imbalance.
minority_class = 1
binary_labels = (y_train == minority_class).astype("int32")

# Check class counts for imbalance.
unique, counts = np.unique(binary_labels, return_counts=True)
print("Class counts:", dict(zip(unique, counts)))

# Normalize images and add channel dimension.
x_train = x_train.astype("float32") / 255.0
x_train = np.expand_dims(x_train, axis=-1)

# Validate shapes before building model.
print("Train shape:", x_train.shape, binary_labels.shape)

# Build a small sequential model.
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(8, (3, 3), activation="relu"),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(16, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid"),
])

# Compile model with binary crossentropy loss.
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"],
)

# Compute simple class weights for imbalance.
neg, pos = counts
weight_for_0 = (1.0 / neg) * (neg + pos) / 2.0
weight_for_1 = (1.0 / pos) * (neg + pos) / 2.0
class_weights = {0: weight_for_0, 1: weight_for_1}

# Print class weights for inspection.
print("Class weights:", class_weights)

# Train with shuffling enabled and class weights.
history = model.fit(
    x_train,
    binary_labels,
    epochs=3,
    batch_size=64,
    shuffle=True,
    class_weight=class_weights,
    validation_split=0.2,
    verbose=0,
)

# Extract final training and validation metrics.
final_loss = history.history["loss"][-1]
final_val_loss = history.history["val_loss"][-1]
final_acc = history.history["accuracy"][-1]
final_val_acc = history.history["val_accuracy"][-1]

# Print concise summary of training results.
print("Final loss and val_loss:", round(final_loss, 3), round(final_val_loss, 3))
print("Final acc and val_acc:", round(final_acc, 3), round(final_val_acc, 3))




## **2. Keras Training Callbacks**

### **2.1. Tuning Early Stopping**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_02_01.jpg?v=1769392206" width="250">



>* Early stopping halts training before serious overfitting
>* Balances enough learning time against memorizing noise

>* Choose metric, patience, and minimum meaningful improvement
>* Adjust settings to handle plateaus and noise

>* Tune early stopping iteratively using training curves
>* Balance compute cost, metric noise, and performance



In [None]:
#@title Python Code - Tuning Early Stopping

# This script demonstrates tuning early stopping.
# It uses a tiny MNIST subset for speed.
# It keeps training output short and clear.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select device based on GPU availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_name = "GPU"
else:
    device_name = "CPU"

# Print which device will be mainly used.
print("Using device:", device_name)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize images to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for Conv2D.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before training.
assert x_train.shape[0] == y_train.shape[0]
assert x_test.shape[0] == y_test.shape[0]

# Use a small subset for quick training.
train_samples = 800
val_samples = 200
x_train_small = x_train[:train_samples]
y_train_small = y_train[:train_samples]

# Create validation split from training subset.
x_val_small = x_train[train_samples:train_samples + val_samples]
y_val_small = y_train[train_samples:train_samples + val_samples]

# Build a simple convolutional model.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(
        16,
        (3, 3),
        activation="relu",
        input_shape=(28, 28, 1),
    ),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(32, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax"),
])

# Compile model with suitable settings.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Define a baseline early stopping callback.
baseline_es = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=2,
    min_delta=0.001,
    restore_best_weights=True,
)

# Define a stricter early stopping callback.
strict_es = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=1,
    min_delta=0.01,
    restore_best_weights=True,
)

# Train model with baseline early stopping.
history_baseline = model.fit(
    x_train_small,
    y_train_small,
    validation_data=(x_val_small, y_val_small),
    epochs=20,
    batch_size=64,
    callbacks=[baseline_es],
    verbose=0,
)

# Record number of epochs actually run.
epochs_baseline = len(history_baseline.history["loss"])

# Reinitialize model weights for fair comparison.
model_strict = tf.keras.models.clone_model(model)
model_strict.build(input_shape=(None, 28, 28, 1))
model_strict.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train model with stricter early stopping.
history_strict = model_strict.fit(
    x_train_small,
    y_train_small,
    validation_data=(x_val_small, y_val_small),
    epochs=20,
    batch_size=64,
    callbacks=[strict_es],
    verbose=0,
)

# Record number of epochs for strict setting.
epochs_strict = len(history_strict.history["loss"])

# Evaluate both models on shared test subset.
small_test_samples = 500
x_test_small = x_test[:small_test_samples]
y_test_small = y_test[:small_test_samples]

# Evaluate baseline early stopping model.
baseline_eval = model.evaluate(
    x_test_small,
    y_test_small,
    verbose=0,
)

# Evaluate strict early stopping model.
strict_eval = model_strict.evaluate(
    x_test_small,
    y_test_small,
    verbose=0,
)

# Print concise comparison of both strategies.
print("Baseline ES epochs:", epochs_baseline)
print("Strict ES epochs:", epochs_strict)
print("Baseline ES test loss:", round(baseline_eval[0], 4))
print("Baseline ES test acc:", round(baseline_eval[1], 4))
print("Strict ES test loss:", round(strict_eval[0], 4))
print("Strict ES test acc:", round(strict_eval[1], 4))




### **2.2. Checkpoint Paths and Formats**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_02_02.jpg?v=1769392286" width="250">



>* Organize checkpoints in clear, separate experiment folders
>* Use descriptive filenames to track training progress

>* Choose formats by what they store
>* Full-model formats ease sharing and deployment

>* Organize checkpoints by project, configs, and logs
>* Keep key checkpoints to ensure reproducible, auditable models



In [None]:
#@title Python Code - Checkpoint Paths and Formats

# This script shows simple Keras checkpoint usage.
# It focuses on paths and file formats.
# Run cells sequentially in a Colab notebook.

# Install TensorFlow if not already available.
# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import pathlib
import random

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)

# Set TensorFlow random seed value.
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a small subset for quick training.
train_samples = 4000
x_train = x_train[:train_samples]

y_train = y_train[:train_samples]

# Normalize pixel values to zero one.
x_train = x_train.astype("float32") / 255.0

# Add channel dimension for Conv2D.
x_train = x_train[..., tf.newaxis]

# Confirm shapes are as expected.
assert x_train.shape[0] == train_samples

# Create a simple sequential model.
model = keras.Sequential([
    layers.Conv2D(8, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.Flatten(),
    layers.Dense(10, activation="softmax"),
])

# Compile model with basic settings.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Create a base directory for checkpoints.
base_dir = pathlib.Path("checkpoints_example")
base_dir.mkdir(exist_ok=True)

# Define an experiment specific subdirectory.
experiment_dir = base_dir / "mnist_small_run"
experiment_dir.mkdir(exist_ok=True)

# Show the resolved experiment directory.
print("Checkpoint directory:", experiment_dir.resolve())

# Define a filename pattern with epoch and val accuracy.
ckpt_pattern = "epoch_{epoch:02d}_valacc_{val_accuracy:.3f}"

# Create full checkpoint path with Keras format.
ckpt_path = str(experiment_dir / (ckpt_pattern + ".keras"))

# Configure ModelCheckpoint callback for best model.
checkpoint_cb = keras.callbacks.ModelCheckpoint(
    filepath=ckpt_path,
    monitor="val_accuracy",
    save_best_only=True,
    save_weights_only=False,
    mode="max",
    verbose=0,
)

# Configure EarlyStopping to avoid long training.
early_stop_cb = keras.callbacks.EarlyStopping(
    monitor="val_accuracy",
    patience=2,
    restore_best_weights=True,
    verbose=0,
)

# Train model briefly with validation split.
history = model.fit(
    x_train,
    y_train,
    epochs=5,
    batch_size=64,
    validation_split=0.2,
    callbacks=[checkpoint_cb, early_stop_cb],
    verbose=0,
)

# List checkpoint files created in directory.
ckpt_files = sorted(experiment_dir.glob("*.keras"))

# Print short summary of checkpoint files.
print("Number of checkpoint files:", len(ckpt_files))

# Print each checkpoint filename clearly.
for path in ckpt_files:
    print("Saved checkpoint:", path.name)

# Load best model from last checkpoint file.
if ckpt_files:
    best_model_path = str(ckpt_files[-1])
else:
    best_model_path = None

# Safely load model if a checkpoint exists.
if best_model_path is not None:
    loaded_model = keras.models.load_model(best_model_path)
else:
    loaded_model = model

# Evaluate loaded model on small test subset.
x_test_small = x_test[:1000].astype("float32") / 255.0

# Add channel dimension for test images.
x_test_small = x_test_small[..., tf.newaxis]

y_test_small = y_test[:1000]

# Confirm shapes before evaluation.
assert x_test_small.shape[0] == y_test_small.shape[0]

# Evaluate silently and print concise results.
loss, acc = loaded_model.evaluate(
    x_test_small,
    y_test_small,
    verbose=0,
)

# Show accuracy from loaded checkpoint model.
print("Loaded checkpoint test accuracy:", round(acc, 3))




### **2.3. TensorBoard Training Logs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_02_03.jpg?v=1769392330" width="250">



>* TensorBoard logs metrics over time during training
>* Interactive dashboards reveal learning progress and stability

>* Log many metrics and visualize multiple runs
>* Compare runs to find faster, better generalization

>* Visualize why early stopping and checkpoints trigger
>* Use curves to tune hyperparameters and callbacks



In [None]:
#@title Python Code - TensorBoard Training Logs

# This script shows TensorBoard training logs.
# It uses a tiny model and dataset.
# Run it in Google Colab for practice.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import datetime
import numpy as np

# Import TensorFlow and Keras.
import tensorflow as tf
from tensorflow import keras

# Set deterministic seeds for reproducibility.
np.random.seed(7)
tf.random.set_seed(7)

# Print TensorFlow version in one line.
print("TensorFlow version:", tf.__version__)

# Choose device based on GPU availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_name = "GPU"
else:
    device_name = "CPU"

# Print which device will be mainly used.
print("Using device:", device_name)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a small subset for quick training.
train_samples = 4000
test_samples = 1000

# Slice the dataset to the chosen size.
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice the test set similarly.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize images to the range [0,1].
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for Conv2D.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before building model.
print("Train shape:", x_train.shape, y_train.shape)

# Build a simple CNN model.
model = keras.Sequential([
    keras.layers.Conv2D(8, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(32, activation="relu"),
    keras.layers.Dense(10, activation="softmax"),
])

# Compile the model with basic settings.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Create a base log directory for TensorBoard.
base_log_dir = "logs_tf_callbacks"
os.makedirs(base_log_dir, exist_ok=True)

# Build a unique run directory using timestamp.
run_id = datetime.datetime.now().strftime("run_%Y%m%d_%H%M%S")
log_dir = os.path.join(base_log_dir, run_id)

# Create the TensorBoard callback object.
tensorboard_cb = keras.callbacks.TensorBoard(
    log_dir=log_dir,
    histogram_freq=0,
    write_graph=True,
    write_images=False,
)

# Create an early stopping callback.
early_stop_cb = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=2,
    restore_best_weights=True,
)

# Create a model checkpoint callback.
checkpoint_path = os.path.join(base_log_dir, "best_model.keras")
checkpoint_cb = keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    monitor="val_loss",
    save_best_only=True,
)

# Train the model with callbacks attached.
history = model.fit(
    x_train,
    y_train,
    validation_split=0.2,
    epochs=5,
    batch_size=64,
    callbacks=[tensorboard_cb, early_stop_cb, checkpoint_cb],
    verbose=0,
)

# Evaluate the model silently on test data.
loss, acc = model.evaluate(x_test, y_test, verbose=0)

# Print a short training summary.
print("TensorBoard logs saved to:", log_dir)

# Print where the best model checkpoint is stored.
print("Best model checkpoint:", checkpoint_path)

# Print final test performance metrics.
print("Test loss:", round(float(loss), 4), "Test accuracy:", round(float(acc), 4))




## **3. Monitoring Training Curves**

### **3.1. Inspecting History Objects**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_03_01.jpg?v=1769392369" width="250">



>* History object logs metrics for every epoch
>* It reveals training and validation performance over time

>* Metrics store per-epoch training and validation values
>* Comparing sequences reveals progress and early overfitting

>* View metric histories as learning trajectories over epochs
>* Use trajectories to spot underfitting, overfitting, and stagnation



In [None]:
#@title Python Code - Inspecting History Objects

# This script shows how to inspect History objects.
# It focuses on training and validation metric trajectories.
# Use it to connect curves with underfitting and overfitting.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a small subset to keep runtime low.
train_samples = 4000
test_samples = 1000
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Reduce test set size for quick evaluation.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before building the model.
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)

# Build a small convolutional neural network.
model = keras.Sequential([
    layers.Conv2D(16, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile the model with accuracy metric.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train the model quietly and capture the History.
history = model.fit(
    x_train,
    y_train,
    epochs=8,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Access the history dictionary from the History object.
history_dict = history.history

# Print available metric keys stored per epoch.
print("History keys:", list(history_dict.keys()))

# Extract training and validation accuracy sequences.
train_acc = history_dict.get("accuracy", [])
val_acc = history_dict.get("val_accuracy", [])

# Extract training and validation loss sequences.
train_loss = history_dict.get("loss", [])
val_loss = history_dict.get("val_loss", [])

# Print first three epochs to inspect trajectories.
for epoch in range(3):
    ta = float(train_acc[epoch])
    va = float(val_acc[epoch])
    tl = float(train_loss[epoch])
    vl = float(val_loss[epoch])
    print(
        f"Epoch {epoch+1}: acc={ta:.3f}, val_acc={va:.3f}, "
        f"loss={tl:.3f}, val_loss={vl:.3f}"
    )

# Print final epoch metrics to compare with early epochs.
last = len(train_acc) - 1
print(
    f"Final epoch {last+1}: acc={train_acc[last]:.3f}, "
    f"val_acc={val_acc[last]:.3f}"
)

# Import matplotlib for a single compact plot.
import matplotlib.pyplot as plt

# Create a simple plot of loss curves over epochs.
plt.figure(figsize=(5, 3))
plt.plot(train_loss, label="train_loss")
plt.plot(val_loss, label="val_loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs validation loss trajectory")
plt.legend()
plt.tight_layout()
plt.show()



### **3.2. Visualizing Training Metrics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_03_02.jpg?v=1769392413" width="250">



>* Plot loss and accuracy across training epochs
>* Use curves to spot learning patterns and issues

>* Plot key training and validation metrics together
>* Use curves to spot plateaus and overtraining

>* Compare runs to see hyperparameter effects visually
>* Use curves as a dashboard for overfitting



In [None]:
#@title Python Code - Visualizing Training Metrics

# This script visualizes training and validation metrics.
# It uses a tiny MNIST subset for speed.
# Focus is on reading training curves.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select device based on GPU availability.
physical_gpus = tf.config.list_physical_devices("GPU")
use_gpu = bool(physical_gpus)
print("Using GPU:", use_gpu)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Reduce dataset size for quick demonstration.
train_samples = 4000
test_samples = 1000
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data to a small subset.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize images to range zero to one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before building model.
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)

# Build a small convolutional neural network.
model = keras.Sequential([
    layers.Conv2D(16, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile model with suitable loss and metrics.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train model with validation split and silent logs.
history = model.fit(
    x_train,
    y_train,
    epochs=8,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Extract history dictionary for plotting.
history_dict = history.history

# Import matplotlib for plotting curves.
import matplotlib.pyplot as plt

# Create a new figure for loss curves.
plt.figure(figsize=(6, 4))

# Plot training and validation loss over epochs.
plt.plot(history.history["loss"], label="train_loss")
plt.plot(history.history["val_loss"], label="val_loss")

# Label axes and add legend for clarity.
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs Validation Loss")
plt.legend()

# Display the plot to visualize learning.
plt.show()



### **3.3. Spotting Overfitting Trends**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_04/Lecture_A/image_03_03.jpg?v=1769392450" width="250">



>* Overfitting means memorizing training data, not generalizing
>* Training improves while validation worsens, revealing overfitting

>* Ignore single noisy spikes in validation metrics
>* Overfitting shows sustained rising validation loss, plateaued accuracy

>* Use divergence to stop or adjust training
>* Apply regularization or simplification to prevent overfitting



In [None]:
#@title Python Code - Spotting Overfitting Trends

# This script shows overfitting using training curves.
# It trains a small model on MNIST digits.
# Then it plots training and validation loss trends.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select device preference based on availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_type = "GPU"
else:
    device_type = "CPU"
print("Using device type:", device_type)

# Load MNIST dataset from Keras utilities.
(mnist_x_train, mnist_y_train), (mnist_x_test, mnist_y_test) = (
    tf.keras.datasets.mnist.load_data()
)

# Confirm dataset shapes before subsampling.
print("Train shape:", mnist_x_train.shape, "Test shape:", mnist_x_test.shape)

# Reduce dataset size to speed up training.
train_samples = 4000
val_samples = 2000
x_train_small = mnist_x_train[:train_samples]
y_train_small = mnist_y_train[:train_samples]

# Create a validation split from training data.
x_val_small = mnist_x_train[train_samples:train_samples + val_samples]
y_val_small = mnist_y_train[train_samples:train_samples + val_samples]

# Normalize pixel values to range zero one.
x_train_small = x_train_small.astype("float32") / 255.0
x_val_small = x_val_small.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
x_train_small = np.expand_dims(x_train_small, axis=-1)
x_val_small = np.expand_dims(x_val_small, axis=-1)

# Verify shapes after preprocessing steps.
print("Train small shape:", x_train_small.shape)
print("Val small shape:", x_val_small.shape)

# Build a small convolutional model.
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(
        16,
        (3, 3),
        activation="relu",
        input_shape=(28, 28, 1),
    ),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax"),
])

# Compile model with suitable loss and optimizer.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train for many epochs to encourage overfitting.
history = model.fit(
    x_train_small,
    y_train_small,
    epochs=40,
    batch_size=64,
    validation_data=(x_val_small, y_val_small),
    verbose=0,
)

# Extract loss curves from training history.
train_loss = history.history["loss"]
val_loss = history.history["val_loss"]

# Print final training and validation losses.
print("Final train loss:", round(train_loss[-1], 4))
print("Final val loss:", round(val_loss[-1], 4))

# Print simple message about overfitting trend.
if val_loss[-1] > min(val_loss):
    print("Validation loss increased after its best epoch.")
else:
    print("Validation loss did not clearly increase.")

# Import matplotlib for plotting curves.
import matplotlib.pyplot as plt

# Create a figure for loss curves.
plt.figure(figsize=(6, 4))

# Plot training loss across epochs.
plt.plot(train_loss, label="Train loss")

# Plot validation loss across epochs.
plt.plot(val_loss, label="Val loss")

# Add labels and legend for clarity.
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training vs validation loss over epochs")
plt.legend()

# Display the plot to visually inspect overfitting.
plt.show()



# <font color="#418FDE" size="6.5" uppercase>**Using model.fit**</font>


In this lecture, you learned to:
- Configure model.fit with appropriate batch sizes, epochs, and validation strategies for a given dataset. 
- Use Keras callbacks to monitor training, implement early stopping, and save model checkpoints. 
- Interpret training and validation curves to diagnose underfitting and overfitting. 

In the next Lecture (Lecture B), we will go over 'Custom Training Loops'