# An example using MNIST Fashion


## Start Coding

Let's start with our import of TensorFlow

In [None]:
import tensorflow as tf

The Fashion MNIST data is available directly in the tf.keras datasets API. You load it like this:

In [None]:
fash_mnist = tf.keras.datasets.fashion_mnist

Calling load_data on this object will give you two sets of two lists, these will be the training and testing values for the graphics that contain the clothing items and their labels.


In [None]:
(training_images, training_labels), (test_images, test_labels) = fash_mnist.load_data()

In [None]:
training_images.shape

What does these values look like? Let's print a training image, and a training label to see...Experiment with different indices in the array. For example, also take a look at index 42...that's a a different boot than the one at index 0


In [None]:
import numpy as np
np.set_printoptions(linewidth=200)
import matplotlib.pyplot as plt

print(training_labels[0])
print(training_images[0])

plt.imshow(training_images[0], cmap='gray')

You'll notice that all of the values in the number are between 0 and 255. If we are training a neural network, for various reasons it's easier if we treat all values as between 0 and 1, a process called '**normalizing**'...and fortunately in Python it's easy to normalize a list like this without looping. You do it like this:

In [None]:
training_images  = training_images / 255.0
test_images = test_images / 255.0

Now you might be wondering why there are 2 sets...training and testing -- remember we spoke about this in the intro? The idea is to have 1 set of data for training, and then another set of data...that the model hasn't yet seen...to see how good it would be at classifying values. After all, when you're done, you're going to want to try it out with data that it hadn't previously seen!

Let's now design the model. There's quite a few new concepts here, but don't worry, you'll get the hang of them. 

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])


**Sequential**: That defines a SEQUENCE of layers in the neural network

**Flatten**: Remember earlier where our images were a square, when you printed them out? Flatten just takes that square and turns it into a 1 dimensional set.

**Dense**: Adds a layer of neurons

Each layer of neurons need an **activation function** to tell them what to do. There's lots of options, but just use these for now. 

**Relu** effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.

**Softmax** takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!


The next thing to do, now the model is defined, is to actually build it. You do this by compiling it with an optimizer and loss function as before -- and then you train it by calling **model.fit ** asking it to fit your training data to your training labels -- i.e. have it figure out the relationship between the training data and its actual labels, so in future if you have data that looks like the training data, then it can make a prediction for what that data would look like. 

In [None]:
model.compile(optimizer = tf.optimizers.SGD(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, epochs=5)

Once it's done training -- you should see an accuracy value at the end of the final epoch. It might look something like 0.9098. This tells you that your neural network is about 91% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 91% of the time. Not great, but not bad considering it was only trained for 5 epochs and done quite quickly.

But how would it work with unseen data? That's why we have the test images. We can call model.evaluate. It will report back the loss and the accuracy. Let's give it a try:

In [None]:
model.evaluate(test_images, test_labels)

#### Model summary

To have a summary of the model architecture, you can use "model.summary()"

In [None]:
model.summary()

In [None]:
# common code
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(linewidth=200)

mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images  = training_images / 255.0
test_images = test_images / 255.0

# To do 

- "Play" with the model: Try different values/configurations of the hyper-parameters and keep track of the obtained results.
- Introduce the validation set: 15 % of the training dataset: by adding parameter 'validation_split=0.15'to the function 'model.fit'

### Report the obtained performance here:

##### 10 epochs => 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

##### 15 epochs => 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

In [None]:
# Exp 1

# Here, we work with a two-layer NN.
# Learning is performaed using SGD, with a learning rate of 0.01 and momentum = 0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = tf.optimizers.SGD(learning_rate=0.01, momentum=0.0),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, ... shuffle=True, epochs=15)

model.summary()

model.evaluate(test_images, test_labels)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


# To do 

- Change the optimizer from 'SGD' to 'adam' in the function 'model.compile(...)'...

### Report the obtained performance here:

##### 15 epochs => 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

##### What do you think of the variance?
...

In [None]:
# Exp 2

# Change the optimizer from 'SGD' to 'adam' in the function 'model.compile(...)'...

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = ...,
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, validation_split=0.15, shuffle=True, epochs=15)

model.summary()

model.evaluate(test_images, test_labels)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


# To do 

- In the previous experiment, we can notice a high variance.
- Use dropout to fix the problem. You can add layer "tf.keras.layers.Dropout(0.2)" before the hidden layer.

### Report the obtained performance here:

##### 15 epochs => 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

##### Does the dropout fix the problem?
...

In [None]:
# Exp 3

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    ...
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, validation_split=0.15, shuffle=True, epochs=15)

model.summary()

model.evaluate(test_images, test_labels)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


# To do 

- In the previous experiment, we have fixed the overfitting problem using dropout. Let's try to improve the accuracy of the model.
- Increase the number of neurons in the hidden layer => 512, 1024

### Report the obtained performance here:

##### 15 epochs for "512 hidden neurons"=> 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

##### 15 epochs for "1024 hidden neurons"=> 
- Train accuracy = ...
- Validation acc = ...
- Test acc = ...

##### Is it satisfying?
...

In [None]:
# Exp 4

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dropout(0.2),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, validation_split=0.15, shuffle=True, epochs=15)

model.summary()

model.evaluate(test_images, test_labels)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

# 15 epochs => 
# Train accuracy = 0.90
# Validation acc = 0.89
# Test acc = 0.89


# To do 

- Let's try to improve the performance by adding another hidden layer.
- So, we wil use two hidden layers (512, 128). For each layer, use dropout with a rate of 0.2.

### Report the obtained performance here:

##### 15 epochs => 
- Train accuracy = 
- Validation acc = 
- Test acc = 

##### Is it satisfying?
...

In [None]:
# Exp 5

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dropout(...),
                                    tf.keras.layers.Dense(..., activation=tf.nn.relu), 
                                    tf.keras.layers.Dropout(...),
                                    tf.keras.layers.Dense(..., activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(training_images, training_labels, validation_split=0.15, shuffle=True, epochs=15)

model.summary()

model.evaluate(test_images, test_labels)


acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


# Final report
...