Import the Zalando dataset

In [18]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"  # Disable GPU

from tensorflow.keras.datasets import fashion_mnist
import numpy as np
import sys
import tensorflow.keras
import pandas as pd
import sklearn as sk
import scipy as sp
import tensorflow as tf
import platform


print(f"Python Platform: {platform.platform()}")
print(f"Tensor Flow Version: {tf.__version__}")
print(f"Keras Version: {tensorflow.keras.__version__}")
print()
print(f"Python {sys.version}")
print(f"Pandas {pd.__version__}")
print(f"Scikit-Learn {sk.__version__}")
print(f"SciPy {sp.__version__}")
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
# stop using GPU

((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()

Python Platform: macOS-13.3-arm64-arm-64bit
Tensor Flow Version: 2.16.2
Keras Version: 3.5.0

Python 3.11.2 (v3.11.2:878ead1ac1, Feb  7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]
Pandas 2.1.4
Scikit-Learn 1.3.2
SciPy 1.11.2
GPU is available


prepare the data (reshaping the samples and one-hot encoding the labels):

In [19]:
labels_train = np.zeros((60000, 10))
labels_train[np.arange(60000), trainY] = 1
data_train = trainX.reshape(60000, 28, 28, 1)

labels_test = np.zeros((10000, 10))
labels_test[np.arange(10000), testY] = 1
data_test = testX.reshape(10000, 28, 28, 1)

Note that in this case, we use as network’s inputs tensors of dimensions
(number_of_images, image_height, image_width, color_channels). Since the
Zalando dataset is made up of gray values images, the color_channels will be equal to 1.
Each observation is in a row (since feed-forward neural networks take as input flattened
tensors). Check the dimensions with the code

In [20]:
print('Dimensions of the training dataset: ', data_train.shape)
print('Dimensions of the test dataset: ', data_test.shape)
print('Dimensions of the training labels: ', labels_train.shape)
print('Dimensions of the test labels: ', labels_test.shape)

Dimensions of the training dataset:  (60000, 28, 28, 1)
Dimensions of the test dataset:  (10000, 28, 28, 1)
Dimensions of the training labels:  (60000, 10)
Dimensions of the test labels:  (10000, 10)


Normalize the data

In [21]:
data_train_norm = np.array(data_train / 255.0)
data_test_norm = np.array(data_test / 255.0)

Build our network. With Keras, creating and training a CNN
model is straightforward; the following function defines the network’s architecture

In [22]:
from tensorflow.keras import models, layers


def build_model():
	# create model
	model = models.Sequential()
	model.add(layers.Conv2D(6, (3, 3), strides=(1, 1),
	                        activation='relu',
	                        input_shape=(28, 28, 1)))
	model.add(layers.MaxPooling2D(pool_size=(2, 2),
	                              strides=(2, 2)))
	model.add(layers.Conv2D(16, (5, 5), strides=(1, 1),
	                        activation='relu'))
	model.add(layers.MaxPooling2D(pool_size=(2, 2),
	                              strides=(2, 2)))
	model.add(layers.Flatten())
	model.add(layers.Dense(128, activation='relu'))
	model.add(layers.Dense(10, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy',
	              optimizer='adam',
	              metrics=['categorical_accuracy'])
	return model

When building CNNs in Keras, a single line of code (and a Keras method) will
correspond to a different layer. The build_model function creates a CNN stacking Conv2D
(which builds a convolutional layer) and MaxPooling2D (which builds a max pooling
layer) layers. The stride is a tuple since it gives the stride in different dimensions (for
rows and columns). In our examples we have gray images, but we could also have RGB,
for example. That would mean having more dimensions: the three color channels.

Display the architecture of the model so far, using model.summary():

In [23]:
model = build_model()
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Note that the output of every convolutional and pooling layer is a 3D tensor of
shape (height, width, number_of_filters). The first dimension (i.e., the number of
batches), is set to None since the network does not know it yet and thus it can be applied
to every set of samples, of any length. The width and height dimensions decrease as you
go deeper into the network. The number of output channels for each Conv2D layer is
controlled by the first function argument. Typically, as the width and height decrease,
you can afford (computationally) to add more output filters to each Conv2D layer.

To complete the model, we added two Dense layers. They take vectors as input
(which are 1D), while the current output is a 3D tensor. This is why you first need to
flatten the 3D output to 1D, then add one or more Dense layers on top.

Train and test the network. Use mini-batch gradient descent
with a batch size of 100 and we will train our network for ten epochs.

If you run this code (it took roughly four minutes on a medium performance laptop),
it will start, after just one epoch, with a training accuracy of 76.3%. After ten epochs it will
reach a training accuracy of 91% (88% on the dev set).

In [24]:
model.fit(data_train_norm, labels_train, validation_data= (data_test_norm, labels_test), epochs=10, batch_size=100, verbose=1)

Epoch 1/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 29ms/step - categorical_accuracy: 0.6703 - loss: 0.9373 - val_categorical_accuracy: 0.8178 - val_loss: 0.4899
Epoch 2/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 31ms/step - categorical_accuracy: 0.8385 - loss: 0.4400 - val_categorical_accuracy: 0.8502 - val_loss: 0.4113
Epoch 3/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 28ms/step - categorical_accuracy: 0.8626 - loss: 0.3807 - val_categorical_accuracy: 0.8547 - val_loss: 0.3941
Epoch 4/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 39ms/step - categorical_accuracy: 0.8759 - loss: 0.3443 - val_categorical_accuracy: 0.8696 - val_loss: 0.3631
Epoch 5/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 38ms/step - categorical_accuracy: 0.8824 - loss: 0.3262 - val_categorical_accuracy: 0.8654 - val_loss: 0.3678
Epoch 6/10
[1m600/600[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

<keras.src.callbacks.history.History at 0x2900e5610>

Try to change the network’s parameters to see if you can get a better accuracy. 
Change kernel size, stride, and padding.