In [1]:
2 + 3

5

In [2]:
import numpy as np

In [4]:
import tensorflow as tf


In [5]:
from tensorflow import keras

In [6]:
# import dataset
fashion_mnist = keras.datasets.fashion_mnist

In [7]:
# make training and testing data
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

In this dataset, each image is listed as a 28x28 array. The pixel intensities are represesnted as integers from 0 to 255 and are not floats.

In [8]:
X_train_full.shape

(60000, 28, 28)

In [9]:
X_train_full.dtype

dtype('uint8')

Create a validation set from the full training set. It will be the last thousand images listed in the full training set.

In [None]:
# validation set - first 55000 images
X_valid = X_train_full[:5000]

# training set last 5000
X_train = X_train_full[5000:]

# y validation 
y_valid = y_train_full[5000:]

# y_train 
y_train = y_train_full[:5000]

In [14]:
X_train.shape

(55000, 28, 28)

In [15]:
X_valid.shape

(5000, 28, 28)

In [16]:
# scale the X values to be between 0 and 1 since gradient descent will be used
X_train = X_train / 255.0
X_valid = X_valid / 255.0

In [18]:
X_train.max()

1.0

In [19]:
# create class names 
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']

In [20]:
# check the first class of y_train
class_names[y_train[0]]

'Ankle Boot'

### Building the Model

In [21]:
model = keras.models.Sequential()

This is the first layer that we build. It is a flatten layer and the role is to convert each input image to a 1D array which means it will compute X.reshape(-1, 1). Remember that .reshape(-1, 1) reshapes the data where the -1 uses all rows no matter what the number is and reshapes it in to 1 column. This layer is only meant for preprocessing so that we can use the data. That being said, there are no parameters in this layer. We include the input shape to be the shape of the instances and not the shape of the batch size. 

In [22]:
# create layers to the model
model.add(keras.layers.Flatten(input_shape = [28, 28]))

Net comes a dense hidden layer with 300 neurons. This hidden layer will use the ReLu activation function. Each Dense layer has its own weight matrix so for this layer there will be 300 seperate weights for each neuron and the inputs (this means 28x28x300 weights). It also has a bias term which is 1 per neuron. When it receives data it will compute sum(xi*wi) + b. This means there are a total of 784x300 = 235,200 number of weights. Then we add the 300 bias terms for a total of 235,500 parameters!

In [23]:
# first hidden layer
model.add(keras.layers.Dense(300, activation='relu'))

This second layer is very similar to the previous but we will have the previous layer number of inputs going to this layer multiplied by the number of neurons in this layer so we have 300x100 weights and add 100 more bias terms.

In [24]:
# second hidden layer with less neurons and same activation function
model.add(keras.layers.Dense(100, activation='relu'))

Now it is time for the Dense output layer. We have 10 neurons in this layer which is one for each class. Since the classes are exclusive, we use the softmax activation function. The softmax activation function outputs can be interpreted as the probability of that instance belonging to each of the 10 different classes and the sum of them all will equal 1. 

For this last layer we have the previous number of neurons which is 100. Then multiply by the number of neurons in this layer which is 10 for 1000 parameters then add the 10 bias terms for a total of 1010 parameters.

In [25]:
# output layer
model.add(keras.layers.Dense(10, activation='softmax'))

We could have optionally passed a list of layers like this:

In [26]:
#model = keras.models.Sequential([
    #keras.layers.Flatten(input_shape = [28, 28]),
    #keras.layers.Dense(300, activation = 'relu'),
    #keras.layers.Dense(100, activation = 'relu),
    #keras.layers.Dense(10, activation = 'softmax')
#])

In [27]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235500    
                                                                 
 dense_1 (Dense)             (None, 100)               30100     
                                                                 
 dense_2 (Dense)             (None, 10)                1010      
                                                                 
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
