#   Task 3 – Neural Networks (35%)

In [1]:
##  Part 1 (25 marks):
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

# Define constants
IMAGE_SIZE = (176, 208)  # Assuming image dimensions
BATCH_SIZE = 32
NUM_CLASSES = 4
EPOCHS = 10

# Data preprocessing
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    directory='train',
    target_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical'
)

test_generator = test_datagen.flow_from_directory(
    directory='test',
    target_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical'
)

# Simple Neural Network
model_simple = models.Sequential([
    layers.Flatten(input_shape=(*IMAGE_SIZE, 3)),
    layers.Dense(128, activation='relu'),
    layers.Dense(NUM_CLASSES, activation='softmax')
])

model_simple.compile(optimizer='adam',
                     loss='categorical_crossentropy',
                     metrics=['accuracy'])

history_simple = model_simple.fit(train_generator,
                                  epochs=EPOCHS,
                                  validation_data=test_generator)

# Convolutional Neural Network
model_cnn = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(*IMAGE_SIZE, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(NUM_CLASSES, activation='softmax')
])

model_cnn.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

history_cnn = model_cnn.fit(train_generator,
                            epochs=EPOCHS,
                            validation_data=test_generator)



2024-03-20 11:25:52.267756: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Found 5121 images belonging to 4 classes.
Found 1279 images belonging to 4 classes.


  super().__init__(**kwargs)
2024-03-20 11:26:01.736797: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-20 11:26:01.737689: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2024-03-20 11:26:01.843295: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 56229888 exceeds 10% of free system memory.
2024-03-20 11:26:01.945494: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Alloca

Epoch 1/10


2024-03-20 11:26:05.832820: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 56229888 exceeds 10% of free system memory.
  self._warn_if_super_not_called()
2024-03-20 11:26:11.865014: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 56229888 exceeds 10% of free system memory.


[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 256ms/step - accuracy: 0.4285 - loss: 18.9361 - val_accuracy: 0.5262 - val_loss: 4.3823
Epoch 2/10
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 251ms/step - accuracy: 0.5899 - loss: 2.4801 - val_accuracy: 0.5551 - val_loss: 2.6830
Epoch 3/10
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 243ms/step - accuracy: 0.6011 - loss: 1.9743 - val_accuracy: 0.5457 - val_loss: 2.7805
Epoch 4/10
[1m 20/161[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m35s[0m 248ms/step - accuracy: 0.6483 - loss: 1.2638

# Part 2 (10 marks):



1. **Effect of Learning Rate:**
   - Learning Rate of 0.00000001: This is an extremely small learning rate, which means the model updates its parameters very slowly. It may lead to very slow convergence or even convergence to a suboptimal solution due to slow updates.
   - Learning Rate of 10: This is an extremely large learning rate, which means the model updates its parameters very quickly. It may lead to overshooting the optimal solution, causing the loss function to diverge or fluctuate wildly.
   - Advantages of a Higher Learning Rate:
     - Faster convergence: With a higher learning rate, the model converges to the optimal solution more quickly.
     - Works well for simple problems: Higher learning rates may be suitable for simpler problems with smooth loss landscapes.
   - Disadvantages of a Higher Learning Rate:
     - Risk of divergence: Too high a learning rate may cause the loss function to diverge or fluctuate wildly, making it difficult to converge to an optimal solution.
     - Overshooting: Large updates can cause the optimizer to overshoot the optimal solution, leading to oscillations or instability.
   - Advantages of a Lower Learning Rate:
     - Stability: Lower learning rates typically result in more stable training with smaller updates, reducing the risk of divergence.
     - Precision: Smaller updates allow the optimizer to fine-tune the parameters more precisely, potentially leading to better convergence.
   - Disadvantages of a Lower Learning Rate:
     - Slow convergence: Very low learning rates may lead to slow convergence, requiring more iterations to reach the optimal solution.
     - Prone to getting stuck in local minima: In complex loss landscapes, lower learning rates may cause the optimizer to get stuck in local minima or saddle points.

2. **Effect of Batch Size:**
   - Advantages of a Higher Batch Size:
     - Faster computation: With a larger batch size, more samples are processed simultaneously, leading to faster training times, especially on hardware optimized for parallel processing like GPUs.
     - Smoother gradients: Larger batch sizes tend to produce smoother gradient estimates, which can lead to more stable training and convergence.
   - Disadvantages of a Higher Batch Size:
     - Memory requirements: Larger batch sizes require more memory, which may limit the size of models or the number of samples that can be processed simultaneously.
     - Generalization performance: Larger batch sizes may lead to poorer generalization performance, as the model may not see a diverse enough set of samples in each batch.
   - Advantages of a Lower Batch Size:
     - Improved generalization: Smaller batch sizes may lead to better generalization performance, as the model sees a more diverse set of samples in each batch, which can help prevent overfitting.
     - More noise in gradients: Smaller batch sizes introduce more noise into gradient estimates, which can help the optimizer escape from local minima and explore the parameter space more effectively.
   - Disadvantages of a Lower Batch Size:
     - Slower convergence: Smaller batch sizes typically result in slower convergence, as each update is based on a smaller subset of the training data.
     - Less efficient computation: Smaller batch sizes may lead to less efficient computation, especially on hardware optimized for parallel processing, as fewer samples are processed simultaneously.