# Classify clothes using TensorFlow

### About the Dataset
Fashion-MNIST is a dataset of Zalando's article images, consisting of a **training set of 60,000 examples** and a **test set of 10,000 examples.** Each example is a **28x28 grayscale image**, associated with a label from **10 classes.** Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. Zalando seeks to replace the original MNIST dataset

- Training set - 60,000 examples
- Test set - 10,000 examples
- Each example is a 28x28 grayscale image
- 10 classes

### Content of Dataset
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. - Each row is a separate image - Column 1 is the class label. - Remaining columns are pixel numbers (784 total) - Each value is the darkness of the pixel (1 to 255)

### Labels
Each training and test example is assigned to one of the following labels:

0 - T-shirt/top

1 - Trouser

2 - Pullover

3 - Dress

4 - Coat

5 - Sandal

6 - Shirt

7 - Sneaker

8 - Bag

9 - Ankle boot

### Objective
Train a CNN model on this dataset.

In [1]:
# import required libraries
import numpy as np
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, Input
from tensorflow.keras.datasets import fashion_mnist
import tensorflow as tf
import gzip

In [2]:
# load train and test dataset
def load_data():
    
    filePath_train_set = '/cxldata/datasets/project/fashion-mnist/train-images-idx3-ubyte.gz'

    filePath_train_label = '/cxldata/datasets/project/fashion-mnist/train-labels-idx1-ubyte.gz'

    filePath_test_set = '/cxldata/datasets/project/fashion-mnist/t10k-images-idx3-ubyte.gz'

    filePath_test_label = '/cxldata/datasets/project/fashion-mnist/t10k-labels-idx1-ubyte.gz'

    with gzip.open(filePath_train_label, 'rb') as trainLbpath:
         trainLabel = np.frombuffer(trainLbpath.read(), dtype=np.uint8,
                                   offset=8)
    with gzip.open(filePath_train_set, 'rb') as trainSetpath:
         trainSet = np.frombuffer(trainSetpath.read(), dtype=np.uint8,
                                   offset=16).reshape(len(trainLabel), 28, 28)

    with gzip.open(filePath_test_label, 'rb') as testLbpath:
         testLabel = np.frombuffer(testLbpath.read(), dtype=np.uint8,
                                   offset=8)

    with gzip.open(filePath_test_set, 'rb') as testSetpath:
         testSet = np.frombuffer(testSetpath.read(), dtype=np.uint8,
                                   offset=16).reshape(len(testLabel), 28, 28)

    trainX = trainSet.copy()
    testX = testSet.copy()
    trainY = trainLabel.copy()
    testY = testLabel.copy()
    
    return trainX, trainY, testX, testY

x_train, y_train, x_test, y_test = load_data()

In [3]:
# scale the pixel intensities down to the 0-1 range and convert them to floats, by dividing by 255.
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

(60000, 28, 28) (10000, 28, 28) (60000,) (10000,)


In [4]:
# Add dimension at the end as images are greyscale.
input_shape = (x_train.shape[1:] + (1,)) # (28, 28, 1) -> 2D CNNs accept 3D input tensors.
print("Input shape = ", input_shape)

Input shape =  (28, 28, 1)


In [5]:
# convert our labels to one-hot encoded form
num_classes = len(np.unique(y_train))
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [6]:
# Model using Functional API
inp = Input(shape=input_shape)
_ = Conv2D(filters=32, kernel_size=(3, 3), activation='relu')(inp)
_ = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(_)
_ = MaxPool2D(pool_size=(2, 2))(_)
_ = Dropout(0.25)(_)
_ = Flatten()(_)
_ = Dense(units=128, activation='relu')(_)
_ = Dropout(0.2)(_)
_ = Dense(units=num_classes, activation='softmax')(_)
model = Model(inputs=inp, outputs=_)
model.summary()

W1120 11:37:06.469989 139785740138304 deprecation.py:506] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               117977

In [7]:
# Train
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(), metrics=['accuracy']) # categorical_crossentropy loss function for multi-class classification
history = model.fit(np.expand_dims(x_train, -1), y_train, batch_size=128, epochs=12, validation_split=0.3)

Train on 42000 samples, validate on 18000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [8]:
# Evaluation on test set
loss, accuracy = model.evaluate(np.expand_dims(x_test, -1), y_test, verbose=0)
print("Loss = ", loss)
print("Accuracy = ", accuracy)

Loss =  0.28951597318053246
Accuracy =  0.9165
