# Problem statement: Create a classification model for the Fashion MNIST

The objective is to create a classification model for the Fashion MNIST dataset using a Multi-Layer Perceptron (MLP).

We'll follow these steps:

### 1. Data Preprocessing
- **Loading the Data**: Fashion MNIST is a dataset of Zalando's article images, with 60,000 training samples and 10,000 test samples. Each sample is a 28x28 grayscale image, associated with a label from 10 classes.
- **Normalization**: We normalize the pixel values (ranging from 0 to 255) to a scale of 0 to 1. This improves the training efficiency.
- **Reshaping for MLP**: Since we are using an MLP, we need to reshape the 28x28 images into a flat array of 784 pixels.

### 2. Building the MLP Model
- **Dense Layers**: These are fully connected neural layers. The first layer needs to know the input shape (784 in this case).
- **Activation Functions**: 'ReLU' is used for non-linear transformations. The final layer uses 'softmax' for a probability distribution over 10 classes.

### 3. Compiling the Model
- **Optimizer**: 'Adam' is a popular choice for its adaptive learning rate properties.
- **Loss Function**: 'sparse_categorical_crossentropy' is suitable for multi-class classification problems.
- **Metrics**: We'll use 'accuracy' to understand the performance.

### 4. Training the Model
- We train the model using the `fit` method, specifying epochs and batch size.

### 5. Evaluating the Model
- The `evaluate` method is used to test the model on the test set.

The notebook contains one exercise in total:

* [Exercise 1](#ex_1)

In [None]:
# Import necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical

# Load the dataset
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# Normalize the images to [0, 1]
train_images = train_images / 255.0
test_images = test_images / 255.0

# Reshape data for MLP input
train_images = train_images.reshape((-1, 28*28))
test_images = test_images.reshape((-1, 28*28))

# Build the MLP model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=64)

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

To improve the model's accuracy on the Fashion MNIST dataset, we can experiment with various techniques. Here are some strategies:

1. **Increase Model Complexity**: Add more layers or increase the number of neurons in each layer to capture more complex patterns in the data.

2. **Regularization**: Implement dropout or L1/L2 regularization to reduce overfitting.

3. **Advanced Optimizers**: Experiment with different optimizers like SGD or RMSprop.

4. **Learning Rate Scheduling**: Adjust the learning rate during training.

5. **Data Augmentation**: Although not typical for MLPs, slight modifications to the input data can make the model more robust.

6. **Early Stopping**: Stop training when the validation accuracy stops improving.

7. **Hyperparameter Tuning**: Experiment with different activation functions, batch sizes, and epochs.

8. **Batch Normalization**: This can help in faster convergence and overall performance improvement.

Let's modify the previous code to incorporate some of these strategies.

In [None]:
from tensorflow.keras.layers import Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping

# Modified MLP model
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(784,)))
model.add(BatchNormalization())  # Batch normalization layer
model.add(Dropout(0.5))         # Dropout layer
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())  # Another batch normalization layer
model.add(Dropout(0.5))         # Another dropout layer
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Train the model with validation split
model.fit(train_images, train_labels, epochs=50, batch_size=64,
          validation_split=0.2, callbacks=[early_stopping])

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

The test accuracy decreased slightly in this case. This outcome highlights an important aspect of machine learning: improvements in model architecture don't always lead to better performance, and sometimes simpler models can outperform more complex ones, especially on smaller datasets like Fashion MNIST.

Here are a few additional steps you can take to try and improve the model's performance:

1. **Adjust the Dropout Rate**: The dropout rate of 0.5 might be too high, causing the model to lose relevant information. Try reducing it to 0.3 or 0.2.

2. **Fine-Tune the Model Complexity**: The addition of more neurons might have made the model too complex. Try reducing the number of neurons in the dense layers.

3. **Experiment with Different Optimizers**: While Adam is a strong general-purpose optimizer, sometimes others like SGD (with a momentum) or RMSprop might yield better results for specific problems.

4. **Modify the Learning Rate**: Adjusting the learning rate of the Adam optimizer could also lead to better results. A lower learning rate with more epochs can sometimes achieve better generalization.

5. **Experiment with Batch Sizes**: Smaller or larger batch sizes can impact the model's ability to generalize and learn effectively.

6. **Cross-Validation**: Instead of a single validation split, use k-fold cross-validation for a more robust estimate of model performance.

Let's adjust the code with some of these suggestions.

In [None]:
# Adjust the model architecture and training parameters
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))         # Reduced dropout rate
model.add(Dense(10, activation='softmax'))

# Compile the model with a modified optimizer
model.compile(optimizer='adam',  # You can experiment with learning rate here
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model with a different batch size
model.fit(train_images, train_labels, epochs=50, batch_size=32,  # Smaller batch size
          validation_split=0.2, callbacks=[early_stopping])

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

The test accuracy has improved to 0.8778, which is a positive outcome. This result indicates that the adjustments made to the model architecture and training parameters were beneficial.

However, achieving higher accuracy on a dataset like Fashion MNIST can be challenging, especially with a simple model like a Multi-Layer Perceptron (MLP). To potentially achieve even better results, consider the following additional steps:

1. **Feature Engineering**: Although this is more limited with image data and MLPs, ensuring the input data is as informative and clean as possible is crucial.

2. **Ensemble Methods**: Combine predictions from several models to improve accuracy. For example, train multiple MLPs with different architectures and average their predictions.

3. **Convolutional Neural Networks (CNNs)**: For image data, CNNs are generally more effective than MLPs. They can capture spatial hierarchies in the data better due to their convolutional layers.

4. **Hyperparameter Optimization**: Use techniques like grid search or random search to systematically explore different hyperparameter combinations.

5. **Advanced Regularization Techniques**: Experiment with other regularization methods like L1 regularization or different dropout configurations.

Let's adjust the code with some of these suggestions.

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D

# Reshape data for CNN input
train_images_cnn = train_images.reshape((-1, 28, 28, 1))
test_images_cnn = test_images.reshape((-1, 28, 28, 1))

# Build a simple CNN model
cnn_model = Sequential()
cnn_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Conv2D(64, (3, 3), activation='relu'))
cnn_model.add(MaxPooling2D((2, 2)))
cnn_model.add(Flatten())
cnn_model.add(Dense(64, activation='relu'))
cnn_model.add(Dense(10, activation='softmax'))

# Compile the model
cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

# Train the model
cnn_model.fit(train_images_cnn, train_labels, epochs=10, batch_size=64,
              validation_split=0.2)

# Evaluate the model
test_loss, test_acc = cnn_model.evaluate(test_images_cnn, test_labels)

print('CNN Test accuracy:', test_acc)

<a name="ex_1"></a>
## Exercise 1: Improve the accuracy of the MLP model
1. Try different architectures and hyperparameters.
2. Use regularization techniques like L1 or L2 regularization.
3. Use dropout to reduce overfitting.

Referans link: https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/