# Problem statement: Create a classification model for the Fashion MNIST

The objective is to create a classification model for the Fashion MNIST dataset using a Multi-Layer Perceptron (MLP).

We'll follow these steps:

### 1. Data Preprocessing
- **Loading the Data**: Fashion MNIST is a dataset of Zalando's article images, with 60,000 training samples and 10,000 test samples. Each sample is a 28x28 grayscale image, associated with a label from 10 classes.
- **Normalization**: We normalize the pixel values (ranging from 0 to 255) to a scale of 0 to 1. This improves the training efficiency.
- **Reshaping for MLP**: Since we are using an MLP, we need to reshape the 28x28 images into a flat array of 784 pixels.

### 2. Building the MLP Model
- **Dense Layers**: These are fully connected neural layers. The first layer needs to know the input shape (784 in this case).
- **Activation Functions**: 'ReLU' is used for non-linear transformations. The final layer uses 'softmax' for a probability distribution over 10 classes.

### 3. Compiling the Model
- **Optimizer**: 'Adam' is a popular choice for its adaptive learning rate properties.
- **Loss Function**: 'sparse_categorical_crossentropy' is suitable for multi-class classification problems.
- **Metrics**: We'll use 'accuracy' to understand the performance.

### 4. Training the Model
- We train the model using the `fit` method, specifying epochs and batch size.

### 5. Evaluating the Model
- The `evaluate` method is used to test the model on the test set.

The notebook contains one exercise in total:

* [Exercise 1](#ex_1)

In [1]:
# Import necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical

# Load the dataset
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# Normalize the images to [0, 1]
train_images = train_images / 255.0
test_images = test_images / 255.0

# Reshape data for MLP input
train_images = train_images.reshape((-1, 28*28))
test_images = test_images.reshape((-1, 28*28))

# Build the MLP model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=64)

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.7648 - loss: 0.6796
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8575 - loss: 0.3995
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8724 - loss: 0.3561
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8815 - loss: 0.3243
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8859 - loss: 0.3128
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8950 - loss: 0.2918
Epoch 7/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8982 - loss: 0.2775
Epoch 8/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9028 - loss: 0.2645
Epoch 9/10
[1m938/938[0m [32m━━━━━━━━

To improve the model's accuracy on the Fashion MNIST dataset, we can experiment with various techniques. Here are some strategies:

1. **Increase Model Complexity**: Add more layers or increase the number of neurons in each layer to capture more complex patterns in the data.

2. **Regularization**: Implement dropout or L1/L2 regularization to reduce overfitting.

3. **Advanced Optimizers**: Experiment with different optimizers like SGD or RMSprop.

4. **Learning Rate Scheduling**: Adjust the learning rate during training.

5. **Data Augmentation**: Although not typical for MLPs, slight modifications to the input data can make the model more robust.

6. **Early Stopping**: Stop training when the validation accuracy stops improving.

7. **Hyperparameter Tuning**: Experiment with different activation functions, batch sizes, and epochs.

8. **Batch Normalization**: This can help in faster convergence and overall performance improvement.

Let's modify the previous code to incorporate some of these strategies.

In [2]:
from tensorflow.keras.layers import Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping

# Modified MLP model
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(784,)))
model.add(BatchNormalization())  # Batch normalization layer
model.add(Dropout(0.5))         # Dropout layer
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())  # Another batch normalization layer
model.add(Dropout(0.5))         # Another dropout layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Train the model with validation split
model.fit(train_images, train_labels, epochs=50, batch_size=64,
          validation_split=0.2, callbacks=[early_stopping])

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

Epoch 1/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.6814 - loss: 0.9600 - val_accuracy: 0.8253 - val_loss: 0.4624
Epoch 2/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8048 - loss: 0.5528 - val_accuracy: 0.8497 - val_loss: 0.4080
Epoch 3/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8235 - loss: 0.4981 - val_accuracy: 0.8563 - val_loss: 0.3862
Epoch 4/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8325 - loss: 0.4658 - val_accuracy: 0.8609 - val_loss: 0.3812
Epoch 5/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8405 - loss: 0.4540 - val_accuracy: 0.8478 - val_loss: 0.4071
Epoch 6/50
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8386 - loss: 0.4532 - val_accuracy: 0.8529 - val_loss: 0.3955
Epoch 7/50
[1m750/750[0m 

The test accuracy decreased slightly in this case. This outcome highlights an important aspect of machine learning: improvements in model architecture don't always lead to better performance, and sometimes simpler models can outperform more complex ones, especially on smaller datasets like Fashion MNIST.

Here are a few additional steps you can take to try and improve the model's performance:

1. **Adjust the Dropout Rate**: The dropout rate of 0.5 might be too high, causing the model to lose relevant information. Try reducing it to 0.3 or 0.2.

2. **Fine-Tune the Model Complexity**: The addition of more neurons might have made the model too complex. Try reducing the number of neurons in the dense layers.

3. **Experiment with Different Optimizers**: While Adam is a strong general-purpose optimizer, sometimes others like SGD (with a momentum) or RMSprop might yield better results for specific problems.

4. **Modify the Learning Rate**: Adjusting the learning rate of the Adam optimizer could also lead to better results. A lower learning rate with more epochs can sometimes achieve better generalization.

5. **Experiment with Batch Sizes**: Smaller or larger batch sizes can impact the model's ability to generalize and learn effectively.

6. **Cross-Validation**: Instead of a single validation split, use k-fold cross-validation for a more robust estimate of model performance.

Let's adjust the code with some of these suggestions.

In [3]:
# Adjust the model architecture and training parameters
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(10, activation='softmax'))

# Compile the model with a modified optimizer
model.compile(optimizer='adam',  # You can experiment with learning rate here
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with a different batch size
model.fit(train_images, train_labels, epochs=50, batch_size=32,  # Smaller batch size
          validation_split=0.2, callbacks=[early_stopping])

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

Epoch 1/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.6849 - loss: 0.8894 - val_accuracy: 0.8368 - val_loss: 0.4320
Epoch 2/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8290 - loss: 0.4844 - val_accuracy: 0.8487 - val_loss: 0.4156
Epoch 3/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8401 - loss: 0.4458 - val_accuracy: 0.8636 - val_loss: 0.3743
Epoch 4/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8496 - loss: 0.4135 - val_accuracy: 0.8660 - val_loss: 0.3693
Epoch 5/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8532 - loss: 0.4027 - val_accuracy: 0.8695 - val_loss: 0.3599
Epoch 6/50
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8596 - loss: 0.3875 - val_accuracy: 0.8739 - val_loss: 0.3515
Epoch 7/50
[1m1

The test accuracy has improved to 0.8778, which is a positive outcome. This result indicates that the adjustments made to the model architecture and training parameters were beneficial.

However, achieving higher accuracy on a dataset like Fashion MNIST can be challenging, especially with a simple model like a Multi-Layer Perceptron (MLP). To potentially achieve even better results, consider the following additional steps:

1. **Feature Engineering**: Although this is more limited with image data and MLPs, ensuring the input data is as informative and clean as possible is crucial.

2. **Ensemble Methods**: Combine predictions from several models to improve accuracy. For example, train multiple MLPs with different architectures and average their predictions.

3. **Convolutional Neural Networks (CNNs)**: For image data, CNNs are generally more effective than MLPs. They can capture spatial hierarchies in the data better due to their convolutional layers.

4. **Hyperparameter Optimization**: Use techniques like grid search or random search to systematically explore different hyperparameter combinations.

5. **Advanced Regularization Techniques**: Experiment with other regularization methods like L1 regularization or different dropout configurations.

Let's adjust the code with some of these suggestions.

In [4]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D

# Reshape data for CNN input
train_images_cnn = train_images.reshape((-1, 28, 28, 1))
test_images_cnn = test_images.reshape((-1, 28, 28, 1))

# Build a simple CNN model
cnn_model = Sequential()
cnn_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Conv2D(64, (3, 3), activation='relu'))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Flatten())
cnn_model.add(Dense(64, activation='relu'))
cnn_model.add(Dense(10, activation='softmax'))

# Compile the model
cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

# Train the model
cnn_model.fit(train_images_cnn, train_labels, epochs=10, batch_size=64,
              validation_split=0.2)

# Evaluate the model
test_loss, test_acc = cnn_model.evaluate(test_images_cnn, test_labels)

print('CNN Test accuracy:', test_acc)

Epoch 1/10


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 13ms/step - accuracy: 0.7324 - loss: 0.7666 - val_accuracy: 0.8499 - val_loss: 0.4226
Epoch 2/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.8685 - loss: 0.3604 - val_accuracy: 0.8830 - val_loss: 0.3367
Epoch 3/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.8902 - loss: 0.3058 - val_accuracy: 0.8848 - val_loss: 0.3264
Epoch 4/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.8978 - loss: 0.2805 - val_accuracy: 0.8977 - val_loss: 0.2886
Epoch 5/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 0.9078 - loss: 0.2487 - val_accuracy: 0.8953 - val_loss: 0.2864
Epoch 6/10
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 13ms/step - accuracy: 0.9170 - loss: 0.2292 - val_accuracy: 0.9056 - val_loss: 0.2668
Epoch 7/10
[1m750/750[0m [

<a name="ex_1"></a>
## Exercise 1: Improve the accuracy of the MLP model
1. Try different architectures and hyperparameters.
2. Use regularization techniques like L1 or L2 regularization.
3. Use dropout to reduce overfitting.

Referans link: https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

In [5]:
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.optimizers import Adam

# Create a more complex MLP with regularization
model = Sequential([
    # First layer with L1L2 regularization
    Dense(512, activation='relu', input_shape=(784,),
          kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4)),
    BatchNormalization(),
    Dropout(0.3),
    
    # Second layer
    Dense(256, activation='relu',
          kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4)),
    BatchNormalization(),
    Dropout(0.3),
    
    # Third layer
    Dense(128, activation='relu',
          kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4)),
    BatchNormalization(),
    Dropout(0.2),
    
    # Output layer
    Dense(10, activation='softmax')
])

# Compile with custom learning rate
optimizer = Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer,
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

# Train with early stopping
early_stopping = EarlyStopping(
    monitor='val_accuracy',
    patience=10,
    restore_best_weights=True
)

# Fit the model
history = model.fit(
    train_images, 
    train_labels,
    epochs=100,
    batch_size=128,
    validation_split=0.2,
    callbacks=[early_stopping]
)

# Evaluate
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Final test accuracy: {test_acc:.4f}')

Epoch 1/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 11ms/step - accuracy: 0.5966 - loss: 1.5525 - val_accuracy: 0.8192 - val_loss: 0.8479
Epoch 2/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.7998 - loss: 0.8988 - val_accuracy: 0.8484 - val_loss: 0.7351
Epoch 3/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 11ms/step - accuracy: 0.8240 - loss: 0.8132 - val_accuracy: 0.8606 - val_loss: 0.7068
Epoch 4/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.8423 - loss: 0.7623 - val_accuracy: 0.8637 - val_loss: 0.6851
Epoch 5/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.8498 - loss: 0.7308 - val_accuracy: 0.8656 - val_loss: 0.6775
Epoch 6/100
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.8579 - loss: 0.7053 - val_accuracy: 0.8723 - val_loss: 0.6571
Epoch 7/100
[1m