<a href="https://colab.research.google.com/github/lahirumanulanka/ai-notebooks/blob/main/03_optimize_nn_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Sumani
# https://www.linkedin.com/in/sumanaruban/
# https://github.com/Sumanaruban
# 29-7-2024

# Introduction

This notebook is a continuation of the previous Jupyter notebook [02_fully_connected_mnist.ipynb](./02_fully_connected_mnist.ipynb), where we built and trained a fully connected neural network to classify handwritten digits from the MNIST dataset using PyTorch. Having understood the fundamental steps involved in model development, this notebook provides a series of incremental exercises designed to deepen your understanding of various aspects of neural network training and optimization.

Each exercise introduces a small modification to the existing model or training process. These modifications will help you explore the effects of different model architectures, activation functions, optimizers, learning rates, regularization techniques, data augmentation, loss functions, and evaluation metrics. By completing these exercises, you will gain hands-on experience in tuning neural networks and improving model performance.

# How to Use This Notebook
1. **Review the Previous Notebook**: Ensure you are familiar with the steps and code in the previous notebook, as this notebook builds upon that foundation.
2. **Complete the Exercises**: Work through each exercise one by one. Make the specified changes to the model or training process and run the cells to see the effects.
3. **Document Your Observations**: For each exercise, take note of how the changes impact the model's training and evaluation metrics. Provide explanations for any improvements or deteriorations in performance.
4. **Experiment and Explore**: Feel free to experiment further by combining different techniques or exploring additional modifications beyond the provided exercises.

By the end of this notebook, you will have a deeper understanding of how various factors influence the training and performance of neural networks, equipping you with the knowledge and skills to effectively tune and optimize models for real-world applications.

## Exercise 1: Modify the Model Architecture

    Add a Hidden Layer: Add an additional hidden layer with 256 neurons between the existing layers. Re-train the model and evaluate its performance.

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

In [5]:
# Load the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model with an additional hidden layer
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='relu'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_acc:.4f}')

# Save the model
model.save('mnist_model_with_hidden_layer.h5')

# Load the model (optional, for verification)
loaded_model = keras.models.load_model('mnist_model_with_hidden_layer.h5')

# Verify the loaded model
loaded_test_loss, loaded_test_acc = loaded_model.evaluate(x_test, y_test)
print(f'Loaded model test accuracy: {loaded_test_acc:.4f}')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 13ms/step - accuracy: 0.8868 - loss: 0.3922 - val_accuracy: 0.9651 - val_loss: 0.1141
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9732 - loss: 0.0879 - val_accuracy: 0.9702 - val_loss: 0.0966
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.9841 - loss: 0.0500 - val_accuracy: 0.9721 - val_loss: 0.0991
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.9888 - loss: 0.0334 - val_accuracy: 0.9757 - val_loss: 0.0824
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9917 - loss: 0.0253 - val_accuracy: 0.9778 - val_loss: 0.0828
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9743 - loss: 0.0900




Test accuracy: 0.9779
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9743 - loss: 0.0900
Loaded model test accuracy: 0.9779


# Exercise 2: Change the Activation Function

    Use Different Activation Functions: Replace the ReLU activation function with other activation functions such as Sigmoid or Tanh. Observe the impact on training and evaluation metrics.

In [6]:
model = models.Sequential()
model.add(layers.Dense(512, activation='tanh', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='tanh'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with Tanh activation
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model with Tanh activation
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with Tanh activation: {test_acc:.4f}')

# Save the model with Tanh activation
model.save('mnist_model_with_tanh_activation.h5')

# Load the model with Tanh activation (optional, for verification)
loaded_model_tanh = keras.models.load_model('mnist_model_with_tanh_activation.h5')

# Verify the loaded model with Tanh activation
loaded_test_loss_tanh, loaded_test_acc_tanh = loaded_model_tanh.evaluate(x_test, y_test)
print(f'Loaded model with Tanh activation test accuracy: {loaded_test_acc_tanh:.4f}')

Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.8777 - loss: 0.4039 - val_accuracy: 0.9528 - val_loss: 0.1556
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9626 - loss: 0.1267 - val_accuracy: 0.9643 - val_loss: 0.1148
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9756 - loss: 0.0824 - val_accuracy: 0.9682 - val_loss: 0.1073
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9832 - loss: 0.0563 - val_accuracy: 0.9721 - val_loss: 0.0930
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9869 - loss: 0.0419 - val_accuracy: 0.9723 - val_loss: 0.0927
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9662 - loss: 0.1047




Test accuracy with Tanh activation: 0.9703




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9662 - loss: 0.1047
Loaded model with Tanh activation test accuracy: 0.9703


# Exercise 3: Change the Optimizer

    Use Adam Optimizer: Replace the SGD optimizer with the Adam optimizer. Compare the training speed and final accuracy.

In [7]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='relu'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))

# Compile the model with SGD optimizer
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with SGD optimizer
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model with SGD optimizer
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with SGD optimizer: {test_acc:.4f}')

# Save the model with SGD optimizer
model.save('mnist_model_with_sgd_optimizer.h5')

# Load the model with SGD optimizer (optional, for verification)
loaded_model_sgd = keras.models.load_model('mnist_model_with_sgd_optimizer.h5')

# Verify the loaded model with SGD optimizer
loaded_test_loss_sgd, loaded_test_acc_sgd = loaded_model_sgd.evaluate(x_test, y_test)
print(f'Loaded model with SGD optimizer test accuracy: {loaded_test_acc_sgd:.4f}')

Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.6552 - loss: 1.3636 - val_accuracy: 0.8938 - val_loss: 0.4041
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 8ms/step - accuracy: 0.8923 - loss: 0.3981 - val_accuracy: 0.9127 - val_loss: 0.3111
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9107 - loss: 0.3144 - val_accuracy: 0.9237 - val_loss: 0.2729
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 9ms/step - accuracy: 0.9206 - loss: 0.2785 - val_accuracy: 0.9298 - val_loss: 0.2479
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 9ms/step - accuracy: 0.9272 - loss: 0.2591 - val_accuracy: 0.9348 - val_loss: 0.2296
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9254 - loss: 0.2641




Test accuracy with SGD optimizer: 0.9350
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9254 - loss: 0.2641
Loaded model with SGD optimizer test accuracy: 0.9350


# Exercise 4: Adjust the Learning Rate

    Experiment with Learning Rates: Try different learning rates (e.g., 0.1, 0.001, 0.0001) with the SGD and Adam optimizers. Record the effects on the model's training performance.

In [8]:
learning_rates = [0.1, 0.001, 0.0001]
for lr in learning_rates:
    print(f'\nTraining with learning rate: {lr}')

    # Compile the model with the current learning rate
    model.compile(optimizer=keras.optimizers.SGD(learning_rate=lr),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Train the model
    model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

    # Evaluate the model
    test_loss, test_acc = model.evaluate(x_test, y_test)
    print(f'Test accuracy with SGD optimizer and learning rate {lr}: {test_acc:.4f}')

    # Save the model with the current learning rate
    model.save(f'mnist_model_with_sgd_lr_{lr}.h5')

    # Load the model with the current learning rate (optional, for verification)
    loaded_model_lr = keras.models.load_model(f'mnist_model_with_sgd_lr_{lr}.h5')
    # Verify the loaded model with the current learning rate

    loaded_test_loss_lr, loaded_test_acc_lr = loaded_model_lr.evaluate(x_test, y_test)
    print(f'Loaded model with SGD optimizer and learning rate {lr} test accuracy: {loaded_test_acc_lr:.4f}')


Training with learning rate: 0.1
Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9268 - loss: 0.2484 - val_accuracy: 0.9549 - val_loss: 0.1585
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 9ms/step - accuracy: 0.9607 - loss: 0.1326 - val_accuracy: 0.9657 - val_loss: 0.1169
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9712 - loss: 0.0963 - val_accuracy: 0.9682 - val_loss: 0.1056
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9801 - loss: 0.0690 - val_accuracy: 0.9720 - val_loss: 0.0915
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9857 - loss: 0.0528 - val_accuracy: 0.9718 - val_loss: 0.0903
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9697 - loss: 0.0967




Test accuracy with SGD optimizer and learning rate 0.1: 0.9740




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9697 - loss: 0.0967
Loaded model with SGD optimizer and learning rate 0.1 test accuracy: 0.9740

Training with learning rate: 0.001
Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9897 - loss: 0.0400 - val_accuracy: 0.9768 - val_loss: 0.0778
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9919 - loss: 0.0339 - val_accuracy: 0.9768 - val_loss: 0.0769
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9917 - loss: 0.0343 - val_accuracy: 0.9773 - val_loss: 0.0763
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 8ms/step - accuracy: 0.9926 - loss: 0.0312 - val_accuracy: 0.9773 - val_loss: 0.0761
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 10ms/step - accuracy: 0.9932 - loss: 0.0314 - val_accurac



Test accuracy with SGD optimizer and learning rate 0.001: 0.9782




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9743 - loss: 0.0809
Loaded model with SGD optimizer and learning rate 0.001 test accuracy: 0.9782

Training with learning rate: 0.0001
Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9929 - loss: 0.0310 - val_accuracy: 0.9774 - val_loss: 0.0759
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 9ms/step - accuracy: 0.9926 - loss: 0.0324 - val_accuracy: 0.9774 - val_loss: 0.0758
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 9ms/step - accuracy: 0.9924 - loss: 0.0325 - val_accuracy: 0.9774 - val_loss: 0.0758
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 0.9924 - loss: 0.0311 - val_accuracy: 0.9774 - val_loss: 0.0758
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 9ms/step - accuracy: 0.9930 - loss: 0.0319 - val_acc



Test accuracy with SGD optimizer and learning rate 0.0001: 0.9781




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9742 - loss: 0.0807
Loaded model with SGD optimizer and learning rate 0.0001 test accuracy: 0.9781


# Exercise 5: Implement Dropout

    Add Dropout Layers: Introduce dropout layers with a dropout probability of 0.5 to the network. Check if the model's performance improves by reducing overfitting.

In [9]:
# Load the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model with dropout layers
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dropout(0.5))  # Dropout layer with 50% dropout rate
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))  # Another dropout layer with 50% dropout rate
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with dropout layers
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model with dropout layers
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with dropout layers: {test_acc:.4f}')

# Save the model with dropout layers
model.save('mnist_model_with_dropout.h5')

# Load the model with dropout layers (optional, for verification)
loaded_model_dropout = keras.models.load_model('mnist_model_with_dropout.h5')

# Verify the loaded model with dropout layers
loaded_test_loss_dropout, loaded_test_acc_dropout = loaded_model_dropout.evaluate(x_test, y_test)
print(f'Loaded model with dropout layers test accuracy: {loaded_test_acc_dropout:.4f}')

Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 17ms/step - accuracy: 0.7935 - loss: 0.6482 - val_accuracy: 0.9588 - val_loss: 0.1384
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9418 - loss: 0.1941 - val_accuracy: 0.9689 - val_loss: 0.1030
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9553 - loss: 0.1523 - val_accuracy: 0.9735 - val_loss: 0.0934
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 13ms/step - accuracy: 0.9614 - loss: 0.1262 - val_accuracy: 0.9709 - val_loss: 0.0981
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 14ms/step - accuracy: 0.9675 - loss: 0.1071 - val_accuracy: 0.9758 - val_loss: 0.0833
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9721 - loss: 0.0885




Test accuracy with dropout layers: 0.9757




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9721 - loss: 0.0885
Loaded model with dropout layers test accuracy: 0.9757


# Exercise 6: Batch Normalization

    Add Batch Normalization: Incorporate batch normalization layers into the network and observe any changes in training stability and speed.

In [3]:
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model with batch normalization layers
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.BatchNormalization())  # Batch normalization layer
model.add(layers.Dense(256, activation='relu'))
model.add(layers.BatchNormalization())  # Another batch normalization layer
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with batch normalization layers
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model with batch normalization layers
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with batch normalization layers: {test_acc:.4f}')

# Save the model with batch normalization layers
model.save('mnist_model_with_batch_normalization.h5')

# Load the model with batch normalization layers (optional, for verification)
loaded_model_batch_norm = keras.models.load_model('mnist_model_with_batch_normalization.h5')
# Verify the loaded model with batch normalization layers

loaded_test_loss_batch_norm, loaded_test_acc_batch_norm = loaded_model_batch_norm.evaluate(x_test, y_test)
print(f'Loaded model with batch normalization layers test accuracy: {loaded_test_acc_batch_norm:.4f}')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 14ms/step - accuracy: 0.9039 - loss: 0.3118 - val_accuracy: 0.9682 - val_loss: 0.1071
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 13ms/step - accuracy: 0.9737 - loss: 0.0856 - val_accuracy: 0.9707 - val_loss: 0.0951
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 14ms/step - accuracy: 0.9809 - loss: 0.0597 - val_accuracy: 0.9720 - val_loss: 0.0953
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 14ms/step - accuracy: 0.9857 - loss: 0.0424 - val_accuracy: 0.9728 - val_loss: 0.1052
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 13ms/step - accuracy: 0.9894 - loss: 0.0337 - val_accuracy: 0.9723 - val_loss: 0.0973
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.9695 - loss: 0.1060




Test accuracy with batch normalization layers: 0.9749




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9695 - loss: 0.1060
Loaded model with batch normalization layers test accuracy: 0.9749


# Exercise 7: Data Augmentation

    Apply Data Augmentation: Implement data augmentation techniques like random rotations, shifts, and flips to the training dataset. Evaluate the model's robustness to these transformations.

In [5]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Create an ImageDataGenerator for data augmentation
datagen = ImageDataGenerator(
    rotation_range=10,  # Random rotations
    width_shift_range=0.1,  # Random horizontal shifts
    height_shift_range=0.1,  # Random vertical shifts
    horizontal_flip=False,  # MNIST is not typically flipped horizontally
    zoom_range=0.1  # Random zoom
)

# Fit the data generator on the training data
datagen.fit(x_train)

# Build the model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model using the data generator
model.fit(datagen.flow(x_train, y_train, batch_size=64),
          epochs=5,
          validation_data=(x_test, y_test),
          steps_per_epoch=len(x_train) // 64)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with data augmentation: {test_acc:.4f}')

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  self._warn_if_super_not_called()


Epoch 1/5
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 73ms/step - accuracy: 0.8026 - loss: 0.6142 - val_accuracy: 0.9843 - val_loss: 0.0494
Epoch 2/5
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m42s[0m 46ms/step - accuracy: 0.9531 - loss: 0.1369



[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9531 - loss: 0.1369 - val_accuracy: 0.9841 - val_loss: 0.0487
Epoch 3/5
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 72ms/step - accuracy: 0.9641 - loss: 0.1176 - val_accuracy: 0.9845 - val_loss: 0.0475
Epoch 4/5
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0223 - val_accuracy: 0.9854 - val_loss: 0.0473
Epoch 5/5
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 74ms/step - accuracy: 0.9743 - loss: 0.0814 - val_accuracy: 0.9877 - val_loss: 0.0361
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.9839 - loss: 0.0453
Test accuracy with data augmentation: 0.9877


# Exercise 8: Learning Rate Scheduling

    Implement Learning Rate Scheduling: Use a learning rate scheduler to decrease the learning rate during training. Compare the training process and model performance.

In [6]:
from tensorflow.keras.callbacks import LearningRateScheduler

# Load the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='relu'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Define a learning rate scheduler
def scheduler(epoch, lr):
    if epoch < 5:
        return lr
    else:
        return float(lr * tf.math.exp(-0.1))  # Ensure the returned value is a Python float

# Create a LearningRateScheduler callback
lr_scheduler = LearningRateScheduler(scheduler)

# Train the model with learning rate scheduling
model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2, callbacks=[lr_scheduler])

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with learning rate scheduling: {test_acc:.4f}')

# Save the model with learning rate scheduling
model.save('mnist_model_with_lr_scheduling.h5')

# Load the model with learning rate scheduling (optional, for verification)
loaded_model_lr_sched = keras.models.load_model('mnist_model_with_lr_scheduling.h5')

# Verify the loaded model with learning rate scheduling
loaded_test_loss_lr_sched, loaded_test_acc_lr_sched = loaded_model_lr_sched.evaluate(x_test, y_test)
print(f'Loaded model with learning rate scheduling test accuracy: {loaded_test_acc_lr_sched:.4f}')

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 15ms/step - accuracy: 0.8864 - loss: 0.3880 - val_accuracy: 0.9586 - val_loss: 0.1290 - learning_rate: 0.0010
Epoch 2/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9728 - loss: 0.0871 - val_accuracy: 0.9703 - val_loss: 0.0997 - learning_rate: 0.0010
Epoch 3/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9834 - loss: 0.0523 - val_accuracy: 0.9692 - val_loss: 0.0993 - learning_rate: 0.0010
Epoch 4/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9881 - loss: 0.0372 - val_accuracy: 0.9758 - val_loss: 0.0832 - learning_rate: 0.0010
Epoch 5/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9923 - loss: 0.0236 - val_accuracy: 0.9746 - val_loss: 0.0951 - learning_rate: 0.0010
Epoch 6/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m



Test accuracy with learning rate scheduling: 0.9841




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9801 - loss: 0.0960
Loaded model with learning rate scheduling test accuracy: 0.9841


# Exercise 9: Change the Loss Function

    Use Different Loss Functions: Experiment with different loss functions such as Mean Squared Error (MSE) and observe how the choice of loss function affects model training and performance.

In [7]:
# Load the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='relu'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
# Train the model with categorical crossentropy
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)
# Evaluate the model with categorical crossentropy
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy with categorical crossentropy: {test_acc:.4f}')
# Change the loss function to Mean Squared Error (MSE)
model.compile(optimizer='adam',
                loss='mean_squared_error',  # Change to MSE
                metrics=['accuracy'])
# Train the model with MSE loss function
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)
# Evaluate the model with MSE loss function
test_loss_mse, test_acc_mse = model.evaluate(x_test, y_test)
print(f'Test accuracy with Mean Squared Error loss: {test_acc_mse:.4f}')
# Save the model with MSE loss function
model.save('mnist_model_with_mse_loss.h5')
# Load the model with MSE loss function (optional, for verification)
loaded_model_mse = keras.models.load_model('mnist_model_with_mse_loss.h5')
# Verify the loaded model with MSE loss function
loaded_test_loss_mse, loaded_test_acc_mse = loaded_model_mse.evaluate(x_test, y_test)
print(f'Loaded model with MSE loss test accuracy: {loaded_test_acc_mse:.4f}')

Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 13ms/step - accuracy: 0.8820 - loss: 0.3873 - val_accuracy: 0.9663 - val_loss: 0.1100
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9748 - loss: 0.0828 - val_accuracy: 0.9711 - val_loss: 0.0979
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 13ms/step - accuracy: 0.9835 - loss: 0.0520 - val_accuracy: 0.9732 - val_loss: 0.0948
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.9879 - loss: 0.0377 - val_accuracy: 0.9754 - val_loss: 0.0848
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9922 - loss: 0.0249 - val_accuracy: 0.9776 - val_loss: 0.0800
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9751 - loss: 0.0846
Test accuracy with categorical crossentropy: 0.9791
Epoch 1/5
[1m750/750[0m



Test accuracy with Mean Squared Error loss: 0.9691




[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9634 - loss: 0.0060
Loaded model with MSE loss test accuracy: 0.9691


# Exercise 10: Evaluate with Different Metrics

    Implement Precision, Recall, and F1-Score: Extend the evaluation metrics to include precision, recall, and F1-score. Analyze the model's performance using these metrics.

In [3]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Load the MNIST dataset
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape((60000, 28 * 28)).astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28)).astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(256, activation='relu'))  # Additional hidden layer with 256 neurons
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_acc:.4f}')

# Get predictions
y_pred = model.predict(x_test)

# Convert predictions and true labels to class indices
y_pred_classes = tf.argmax(y_pred, axis=1).numpy()
y_true_classes = tf.argmax(y_test, axis=1).numpy()

# Calculate precision, recall, and F1-score
precision = precision_score(y_true_classes, y_pred_classes, average='weighted')
recall = recall_score(y_true_classes, y_pred_classes, average='weighted')
f1 = f1_score(y_true_classes, y_pred_classes, average='weighted')
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1-Score: {f1:.4f}')

# Save the model
model.save('mnist_model_with_metrics.h5')

# Load the model (optional, for verification)
loaded_model_metrics = keras.models.load_model('mnist_model_with_metrics.h5')

# Verify the loaded model
loaded_test_loss_metrics, loaded_test_acc_metrics = loaded_model_metrics.evaluate(x_test, y_test)
print(f'Loaded model test accuracy: {loaded_test_acc_metrics:.4f}')

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 13ms/step - accuracy: 0.8837 - loss: 0.3939 - val_accuracy: 0.9650 - val_loss: 0.1183
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.9727 - loss: 0.0855 - val_accuracy: 0.9757 - val_loss: 0.0869
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9833 - loss: 0.0515 - val_accuracy: 0.9732 - val_loss: 0.0900
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 14ms/step - accuracy: 0.9886 - loss: 0.0351 - val_accuracy: 0.9766 - val_loss: 0.0821
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 12ms/step - accuracy: 0.9913 - loss: 0.0258 - val_accuracy: 0.9745 - val_loss: 0.1027
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9730 - loss: 0.0977
Test accuracy: 0.9769
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 



Precision: 0.9770
Recall: 0.9769
F1-Score: 0.9769
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9730 - loss: 0.0977
Loaded model test accuracy: 0.9769


# Conclusion

These exercises will help you to understand the impact of different neural network architectures, optimization techniques, and evaluation metrics on the model's performance.