# Problem statement: Create a classification model for the Fashion MNIST

The objective is to create a classification model for the Fashion MNIST dataset using a Multi-Layer Perceptron (MLP).

We'll follow these steps:

### 1. Data Preprocessing
- **Loading the Data**: Fashion MNIST is a dataset of Zalando's article images, with 60,000 training samples and 10,000 test samples. Each sample is a 28x28 grayscale image, associated with a label from 10 classes.
- **Normalization**: We normalize the pixel values (ranging from 0 to 255) to a scale of 0 to 1. This improves the training efficiency.
- **Reshaping for MLP**: Since we are using an MLP, we need to reshape the 28x28 images into a flat array of 784 pixels.

### 2. Building the MLP Model
- **Dense Layers**: These are fully connected neural layers. The first layer needs to know the input shape (784 in this case).
- **Activation Functions**: 'ReLU' is used for non-linear transformations. The final layer uses 'softmax' for a probability distribution over 10 classes.

### 3. Compiling the Model
- **Optimizer**: 'Adam' is a popular choice for its adaptive learning rate properties.
- **Loss Function**: 'sparse_categorical_crossentropy' is suitable for multi-class classification problems.
- **Metrics**: We'll use 'accuracy' to understand the performance.

### 4. Training the Model
- We train the model using the `fit` method, specifying epochs and batch size.

### 5. Evaluating the Model
- The `evaluate` method is used to test the model on the test set.

The notebook contains one exercise in total:

* [Exercise 1](#ex_1)

In [18]:
%pip install numpy pandas scikit-learn matplotlib seaborn jupyter tensorflow-macos tensorflow-metal

Defaulting to user installation because normal site-packages is not writeable
Collecting jupyter
  Using cached jupyter-1.1.1-py2.py3-none-any.whl (2.7 kB)
Collecting notebook
  Using cached notebook-7.3.2-py3-none-any.whl (13.2 MB)
Collecting jupyterlab
  Using cached jupyterlab-4.3.5-py3-none-any.whl (11.7 MB)
Collecting nbconvert
  Using cached nbconvert-7.16.6-py3-none-any.whl (258 kB)
Collecting ipywidgets
  Using cached ipywidgets-8.1.5-py3-none-any.whl (139 kB)
Collecting jupyter-console
  Using cached jupyter_console-6.6.3-py3-none-any.whl (24 kB)
Collecting jupyterlab-widgets~=3.0.12
  Downloading jupyterlab_widgets-3.0.13-py3-none-any.whl (214 kB)
[K     |████████████████████████████████| 214 kB 1.7 MB/s eta 0:00:01
[?25hCollecting widgetsnbextension~=4.0.12
  Downloading widgetsnbextension-4.0.13-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 1.7 MB/s eta 0:00:01
[?25hCollecting jinja2>=3.0.3
  Using cached jinja2-3.1.6-py3-none-any.whl (134 k

In [19]:
%pip install tensorflow-macos tensorflow-metal

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [22]:
# Import necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical

# Load the dataset
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# Normalize the images to [0, 1] to improve training convergence
train_images = train_images / 255.0
test_images = test_images / 255.0

# Reshape data for MLP input
# Reshapes each 28x28 image into a 1D array of 784 pixels, 
# making it compatible with the input layer of the MLP model.
train_images = train_images.reshape((-1, 28*28))
test_images = test_images.reshape((-1, 28*28))

# Build the MLP model
# Layer 1: Fully connected (Dense) layer with 128 neurons and ReLU activation.
#   - This layer learns high-level features from the input.
# Layer 2 (Output layer): Fully connected layer with 10 neurons (one for each class) and softmax activation.
#   - Softmax ensures the output values sum to 1, giving probabilities for each class.
# Breaking down what is going on:
# 	1.	Sequential Model:
#       Sequential() defines a stack of layers where data flows one layer at a time.
#           It’s a simple way to build feedforward neural networks.
#   2. First Layer (Hidden Layer)
# 	    Dense Layer: Fully connected layer where every neuron is connected to all previous layer neurons.
#       128 neurons: This means the layer has 128 nodes to learn patterns.
#       Activation Function (relu):
#           “ReLU” stands for Rectified Linear Unit, which introduces non-linearity into the model.
#           Formula:  f(x) = \max(0, x)  (any negative values become 0).
#           Why? It helps the network learn complex patterns in the data while avoiding issues 
#           like the vanishing gradient problem.
#       Input Shape (784,):
#           The input is a 1D array with 784 values (28×28 image pixels).
#           The input shape is only defined in the first layer.
#   3. Second layer (Output Layer)
#       Dense Layer: Another fully connected layer.
#       10 neurons: One neuron for each of the 10 clothing categories in Fashion MNIST.
#       Activation Function (softmax):
#           Converts raw output numbers into probabilities.
#           Each neuron outputs a value between 0 and 1 representing the likelihood of an image belonging to a particular class.
#           Example: If an image is of a shirt, the softmax output might look like:
#           [0.1, 0.05, 0.02, 0.7, 0.03, 0.05, 0.01, 0.02, 0.01, 0.01]
#           Here, the highest probability (0.7) corresponds to the “Shirt” class.

model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

# Compile the model
# Optimizer: Adam (adaptive learning rate optimizer).
# Loss function: sparse_categorical_crossentropy 
# (used because the labels are integers instead of one-hot encoded vectors).
# Metric: Accuracy (used to track model performance).
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
# What is an epoch?
# An epoch represents one complete pass through the entire training dataset.
#Imagine training is like learning from a textbook:
#	•	One Epoch = Reading the entire textbook once.
#	•	Multiple Epochs = Re-reading the textbook multiple times to improve understanding.
#In machine learning:
#	•	If we set epochs=10, it means the model will go through the entire dataset 10 times.
#	•	Each epoch updates the model, gradually improving accuracy.
#How to Choose the Number of Epochs?
#	•	Too few epochs → Model may underfit, meaning it hasn’t learned enough.
#	•	Too many epochs → Model may overfit, meaning it memorizes training data but performs poorly on new data.
#Common approach:
#	1.	Start with 10-50 epochs.
#	2.	Monitor accuracy and loss: Stop training if accuracy stops improving.
#	3.	Use “Early Stopping”: Stop training automatically when performance plateaus.
#What is batch size?
# Instead of training on the entire dataset at once, we split data into small groups (batches) to speed up learning.
#	•	Batch size = Number of samples processed before model updates weights.
#	•	Example:
#	•	    Dataset: 60,000 images
#	•	    Batch size = 64
#	•	    Training will process 64 images at a time, 
#           then adjust model weights before moving to the next batch.
#How to Choose the Batch Size?
#	•	Small batch (e.g., 32, 64) → More updates per epoch (better generalization, but slower training).
#	•	Large batch (e.g., 128, 256, 512) → Faster training (but may generalize worse).
#	•	Typical values: 32, 64, 128.
#	•	Power of 2 (e.g., 32, 64) is often used for faster computation.

model.fit(train_images, train_labels, epochs=10, batch_size=64)

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.7528 - loss: 0.7212
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8399 - loss: 0.4664
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8466 - loss: 0.4438
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8488 - loss: 0.4324
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8488 - loss: 0.4356
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8474 - loss: 0.4354
Epoch 7/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8493 - loss: 0.4352
Epoch 8/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.8493 - loss: 0.4406
Epoch 9/10
[1m938/938[0m [32m━━━━━━━━

![Train and Accuracy over Epochs](Training_and_Test_Accuracy_Over_Epochs.png)

To improve the model's accuracy on the Fashion MNIST dataset, we can experiment with various techniques. Here are some strategies:

1. **Increase Model Complexity**: Add more layers or increase the number of neurons in each layer to capture more complex patterns in the data.

2. **Regularization**: Implement dropout or L1/L2 regularization to reduce overfitting.

3. **Advanced Optimizers**: Experiment with different optimizers like SGD or RMSprop.

4. **Learning Rate Scheduling**: Adjust the learning rate during training.

5. **Data Augmentation**: Although not typical for MLPs, slight modifications to the input data can make the model more robust.

6. **Early Stopping**: Stop training when the validation accuracy stops improving.

7. **Hyperparameter Tuning**: Experiment with different activation functions, batch sizes, and epochs.

8. **Batch Normalization**: This can help in faster convergence and overall performance improvement.

Let's modify the previous code to incorporate some of these strategies.

In [29]:
from tensorflow.keras.layers import Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping

# Modified MLP model
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(784,)))
# 1. Batch Normalization
#	•   Applied after a dense layer.
#	•	Normalizes activations (makes sure neuron outputs have mean ~0 and standard deviation ~1).
#	•	Stabilizes training: Prevents exploding/vanishing gradients.
#	•	Speeds up learning by reducing internal covariate shift (shifts in data distribution during training).
model.add(BatchNormalization())  # Batch normalization layer
# 2. Dropout
#	•	Randomly disables 50% of neurons during training (Dropout(0.5)).
#	•	Forces the model to learn redundant representations, improving generalization.
#	•	Prevents overfitting, ensuring neurons don’t become too dependent on specific features.
model.add(Dropout(0.5))         # Dropout layer
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())  # Another batch normalization layer
model.add(Dropout(0.5))         # Another dropout layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='SGD', # changed from adam to SGD
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Train the model with validation split
model.fit(train_images, train_labels, epochs=50, batch_size=64,
          validation_split=0.2, callbacks=[early_stopping])

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

Epoch 1/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.4989 - loss: 1.5956 - val_accuracy: 0.7855 - val_loss: 0.6010
Epoch 2/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7091 - loss: 0.8304 - val_accuracy: 0.7999 - val_loss: 0.5559
Epoch 3/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7403 - loss: 0.7280 - val_accuracy: 0.8083 - val_loss: 0.5359
Epoch 4/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.7560 - loss: 0.6839 - val_accuracy: 0.8123 - val_loss: 0.5247
Epoch 5/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7660 - loss: 0.6613 - val_accuracy: 0.8142 - val_loss: 0.5164
Epoch 6/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 8ms/step - accuracy: 0.7723 - loss: 0.6505 - val_accuracy: 0.8175 - val_loss: 0.5074
Epoch 7/50
[1m750/750[0m 

# **Difference Between the Two Approaches**

Your new model includes **Batch Normalization** and **Dropout**, which were missing in the previous approach. These additions improve the model’s ability to **generalize** and **stabilize learning**.

---

## **Key Differences**

| Feature | **Previous Model** | **New Model** |
|---------|------------------|--------------|
| **Hidden Layer Neurons** | 128 | 256 → 128 |
| **Batch Normalization** | ❌ No | ✅ Yes (improves training stability) |
| **Dropout** | ❌ No | ✅ Yes (prevents overfitting) |
| **Regularization Impact** | Overfits more easily | More resistant to overfitting |
| **Training Stability** | Can have unstable gradients | Normalized activations, stable learning |
| **Generalization** | Good on training data, but may overfit | Better generalization to test data |
| **Test Accuracy** | 0.830299973487854 | 0.8337000012397766 |

New model is slightly better - the comments below are wrong

The test accuracy decreased slightly in this case. This outcome highlights an important aspect of machine learning: improvements in model architecture don't always lead to better performance, and sometimes simpler models can outperform more complex ones, especially on smaller datasets like Fashion MNIST.

Here are a few additional steps you can take to try and improve the model's performance:

1. **Adjust the Dropout Rate**: The dropout rate of 0.5 might be too high, causing the model to lose relevant information. Try reducing it to 0.3 or 0.2.

2. **Fine-Tune the Model Complexity**: The addition of more neurons might have made the model too complex. Try reducing the number of neurons in the dense layers.

3. **Experiment with Different Optimizers**: While Adam is a strong general-purpose optimizer, sometimes others like SGD (with a momentum) or RMSprop might yield better results for specific problems.

4. **Modify the Learning Rate**: Adjusting the learning rate of the Adam optimizer could also lead to better results. A lower learning rate with more epochs can sometimes achieve better generalization.

5. **Experiment with Batch Sizes**: Smaller or larger batch sizes can impact the model's ability to generalize and learn effectively.

6. **Cross-Validation**: Instead of a single validation split, use k-fold cross-validation for a more robust estimate of model performance.

Let's adjust the code with some of these suggestions.

In [30]:
# Adjust the model architecture and training parameters
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(10, activation='softmax'))

# Compile the model with a modified optimizer
#commenting out this one to test k-fold
#model.compile(optimizer='RMSprop',  # You can experiment with learning rate here
#              loss='sparse_categorical_crossentropy',
#              metrics=['accuracy'])

# Train the model with a different batch size
#model.fit(train_images, train_labels, epochs=50, batch_size=32,  # Smaller batch size
#          validation_split=0.2, callbacks=[early_stopping])

from sklearn.model_selection import KFold
import numpy as np

# Assuming train_images and train_labels are your data and labels
k = 5  # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Convert data to numpy arrays if not already
train_images = np.array(train_images)
train_labels = np.array(train_labels)

# Initialize lists to store results
fold_accuracies = []

for train_index, val_index in kf.split(train_images):
    # Split data into training and validation sets
    X_train, X_val = train_images[train_index], train_images[val_index]
    y_train, y_val = train_labels[train_index], train_labels[val_index]
    
    # Create a new model instance for each fold
 #   model = create_model()  # Assuming create_model() is a function that returns a compiled model
    model.compile(optimizer='RMSprop',  # You can experiment with learning rate here
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    # Train the model
    model.fit(X_train, y_train, epochs=50, batch_size=32, 
              validation_data=(X_val, y_val), callbacks=[early_stopping])
    
    # Evaluate the model on the validation data
    val_loss, val_accuracy = model.evaluate(X_val, y_val, verbose=0)
    fold_accuracies.append(val_accuracy)

# Calculate the average accuracy across all folds
average_accuracy = np.mean(fold_accuracies)
print(f"Average Validation Accuracy: {average_accuracy:.4f}")

# Evaluate the model
#test_loss, test_acc = model.evaluate(test_images, test_labels)

#print('Test accuracy:', test_acc)

Epoch 1/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.6087 - loss: 1.3309 - val_accuracy: 0.8074 - val_loss: 0.5829
Epoch 2/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.7582 - loss: 0.7640 - val_accuracy: 0.7976 - val_loss: 0.5816
Epoch 3/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.7722 - loss: 0.6932 - val_accuracy: 0.8115 - val_loss: 0.5368
Epoch 4/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.7676 - loss: 0.7486 - val_accuracy: 0.8052 - val_loss: 0.6275
Epoch 5/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.7511 - loss: 0.9232 - val_accuracy: 0.8126 - val_loss: 0.6780
Epoch 6/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.7384 - loss: 1.1989 - val_accuracy: 0.7776 - val_loss: 1.0666
Epoch 7/50
[1m

The test accuracy has improved to 0.8778, which is a positive outcome. This result indicates that the adjustments made to the model architecture and training parameters were beneficial.

However, achieving higher accuracy on a dataset like Fashion MNIST can be challenging, especially with a simple model like a Multi-Layer Perceptron (MLP). To potentially achieve even better results, consider the following additional steps:

1. **Feature Engineering**: Although this is more limited with image data and MLPs, ensuring the input data is as informative and clean as possible is crucial.

2. **Ensemble Methods**: Combine predictions from several models to improve accuracy. For example, train multiple MLPs with different architectures and average their predictions.

3. **Convolutional Neural Networks (CNNs)**: For image data, CNNs are generally more effective than MLPs. They can capture spatial hierarchies in the data better due to their convolutional layers.

4. **Hyperparameter Optimization**: Use techniques like grid search or random search to systematically explore different hyperparameter combinations.

5. **Advanced Regularization Techniques**: Experiment with other regularization methods like L1 regularization or different dropout configurations.

Let's adjust the code with some of these suggestions.

In [31]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D

# Reshape data for CNN input
train_images_cnn = train_images.reshape((-1, 28, 28, 1))
test_images_cnn = test_images.reshape((-1, 28, 28, 1))

# Build a simple CNN model
cnn_model = Sequential()
cnn_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Conv2D(64, (3, 3), activation='relu'))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Flatten())
cnn_model.add(Dense(64, activation='relu'))
cnn_model.add(Dense(10, activation='softmax'))

# Compile the model
cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

# Train the model
cnn_model.fit(train_images_cnn, train_labels, epochs=10, batch_size=64,
              validation_split=0.2)

# Evaluate the model
test_loss, test_acc = cnn_model.evaluate(test_images_cnn, test_labels)

print('CNN Test accuracy:', test_acc)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 12ms/step - accuracy: 0.7186 - loss: 0.7679 - val_accuracy: 0.8613 - val_loss: 0.3888
Epoch 2/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.8681 - loss: 0.3716 - val_accuracy: 0.8672 - val_loss: 0.3689
Epoch 3/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.8794 - loss: 0.3341 - val_accuracy: 0.8850 - val_loss: 0.3209
Epoch 4/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.8940 - loss: 0.2981 - val_accuracy: 0.8818 - val_loss: 0.3236
Epoch 5/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9005 - loss: 0.2724 - val_accuracy: 0.8917 - val_loss: 0.3083
Epoch 6/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 11ms/step - accuracy: 0.9016 - loss: 0.2705 - val_accuracy: 0.8758 - val_loss: 0.3528
Epoch 7/10
[1m750/7

<a name="ex_1"></a>
## Exercise 1: Improve the accuracy of the MLP model
1. Try different architectures and hyperparameters.
2. Use regularization techniques like L1 or L2 regularization.
3. Use dropout to reduce overfitting.

Referans link: https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

## 1. Try different architectures and hyperparameters

| Modification                          | What It Does?                                      | Expected Effect                                      |
|---------------------------------------|----------------------------------------------------|------------------------------------------------------|
| Increase the number of neurons per layer | Allows the model to learn more complex patterns     | Can improve accuracy, but may cause overfitting      |
| Add more hidden layers                | Increases model depth, capturing higher-level features | Improves learning, but training takes longer         |
| Use different activation functions (ReLU, LeakyReLU, ELU) | Changes how neurons activate, affecting gradient flow | Helps avoid dead neurons and improves learning       |
| Change optimizer (Adam, SGD with momentum, RMSprop) | Affects how weights are updated during training     | Can improve convergence speed & accuracy             |
| Use different learning rates          | Controls how fast the model updates weights         | Too high = unstable; too low = slow training         |

### Modified model with better architecture and hyperparameters

In [32]:
# Import necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam

# Define a deeper and wider MLP model
model = Sequential()

# Input layer + First hidden layer with 512 neurons
model.add(Dense(512, activation='relu', input_shape=(784,)))  # Increased neurons
model.add(BatchNormalization())  # Helps stabilize training by normalizing activations
model.add(Dropout(0.3))  # Drops 30% of neurons randomly to prevent overfitting

# Second hidden layer with 256 neurons
model.add(Dense(256, activation='relu'))  # More neurons than before
model.add(BatchNormalization())  
model.add(Dropout(0.3))

# Third hidden layer with 128 neurons
model.add(Dense(128, activation='relu'))  # Smaller layer for gradual reduction
model.add(BatchNormalization())
model.add(Dropout(0.3))

# Output layer (10 classes for Fashion MNIST)
model.add(Dense(10, activation='softmax'))  # Softmax ensures output probabilities sum to 1

# Compile the model with Adam optimizer and a tuned learning rate
model.compile(optimizer=Adam(learning_rate=0.001),  # Lower learning rate for smoother convergence
              loss='sparse_categorical_crossentropy',  # Suitable for integer labels
              metrics=['accuracy'])

# Summary of the model
model.summary()

✅ Key Improvements:

	•	Increased neurons (512 → 256 → 128) for richer feature extraction.

	•	Batch Normalization: Stabilizes learning and speeds up training.

	•	Dropout (0.3): Prevents overfitting.

	•	Adam optimizer with learning_rate=0.001 for efficient weight updates.
	

## Use L1 & L2 Regularisation (With penalty)

| Regularization Type | What It Does?                                   | Effect                                           |
|---------------------|-------------------------------------------------|--------------------------------------------------|
| L1 Regularization   | Encourages sparse weights (some weights become 0) | Can remove unnecessary connections                |
| L2 Regularization   | Penalizes large weights, forcing smaller values | Reduces overfitting by preventing extreme weight values |

In [33]:
# Import necessary libraries
from tensorflow.keras.regularizers import l2

# Define the model with L2 regularization
model = Sequential()

# First hidden layer with L2 regularization
model.add(Dense(512, activation='relu', input_shape=(784,), kernel_regularizer=l2(0.001)))  # L2 penalty of 0.001
model.add(BatchNormalization())  # Stabilizes training
model.add(Dropout(0.3))  # Helps prevent overfitting

# Second hidden layer with L2 regularization
model.add(Dense(256, activation='relu', kernel_regularizer=l2(0.001)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

# Third hidden layer with L2 regularization
model.add(Dense(128, activation='relu', kernel_regularizer=l2(0.001)))
model.add(BatchNormalization())
model.add(Dropout(0.3))

# Output layer
model.add(Dense(10, activation='softmax'))  # Softmax for multi-class classification

# Compile the model with Adam optimizer
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Summary of the model
model.summary()

✅ Key Improvements:

	•	L2 Regularization (l2(0.001)): Forces smaller weights, reducing overfitting.

	•	Batch Normalization: Stabilizes learning.
    
	•	Dropout (0.3): Maintains regularization effect.

## 3. Use Dropout to Reduce Overfitting

🔹 What is Dropout?
	•	Dropout randomly deactivates neurons during training.
	•	This prevents the model from relying too much on specific neurons, forcing it to generalize better.

🔹 Model with Higher Dropout for Stronger Regularization

In [34]:
# Define the model with stronger dropout
model = Sequential()

# First hidden layer with 512 neurons and 50% dropout
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(BatchNormalization())
model.add(Dropout(0.5))  # Drops 50% of neurons randomly

# Second hidden layer with 256 neurons and 50% dropout
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

# Third hidden layer with 128 neurons and 50% dropout
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

# Output layer
model.add(Dense(10, activation='softmax'))  

# Compile the model with Adam optimizer
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Summary of the model
model.summary()

✅ Key Improvements:
	•	Higher dropout (0.5): Forces more neurons to be inactive, reducing overfitting.
	•	Batch Normalization: Keeps training stable despite high dropout.

### 4. Train & Evaluate the Model

In [35]:
# Train the model on the Fashion MNIST dataset
history = model.fit(train_images, train_labels, 
                    epochs=20, batch_size=64, 
                    validation_data=(test_images, test_labels))

# Evaluate the model performance on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test Accuracy: {test_acc:.4f}")

Epoch 1/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 18ms/step - accuracy: 0.6436 - loss: 1.0821 - val_accuracy: 0.7997 - val_loss: 0.5755
Epoch 2/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 18ms/step - accuracy: 0.7789 - loss: 0.6300 - val_accuracy: 0.8189 - val_loss: 0.5242
Epoch 3/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 17ms/step - accuracy: 0.7948 - loss: 0.5921 - val_accuracy: 0.8241 - val_loss: 0.5030
Epoch 4/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 17ms/step - accuracy: 0.7985 - loss: 0.5765 - val_accuracy: 0.8260 - val_loss: 0.4969
Epoch 5/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 17ms/step - accuracy: 0.7971 - loss: 0.5823 - val_accuracy: 0.8270 - val_loss: 0.4942
Epoch 6/20
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 18ms/step - accuracy: 0.8006 - loss: 0.5783 - val_accuracy: 0.8194 - val_loss: 0.5064
Epoch 7/20
[1m9

✅ Training Setup:
	•	Epochs = 20: More epochs allow the model to learn better.
	•	Batch size = 64: A balanced choice for speed & generalization.
	•	Validation set: Helps monitor test accuracy during training.

## Expected Results

🔹 Higher test accuracy (~84-86%) due to:

	•	Deeper architecture (more layers & neurons).

	•	Regularization (L2, Dropout, Batch Norm) prevents overfitting.

	•	Tuned learning rate (Adam optimizer with 0.001).

![Training Accuracy Comparison](Training_Accuracy_Comparison.png)