# Beyond Hello World, A Computer Vision Example
In the previous exercise you saw how to create a neural network that figured out the problem you were trying to solve. This gave an explicit example of learned behavior. Of course, in that instance, it was a bit of overkill because it would have been easier to write the function Y=2x-1 directly, instead of bothering with using Machine Learning to learn the relationship between X and Y for a fixed set of values, and extending that for all values.

But what about a scenario where writing rules like that is much more difficult -- for example a computer vision problem? Let's take a look at a scenario where we can recognize different items of clothing, trained from a dataset containing 10 different types.

***Unlike the original Coursera notebook, this uses `tf.keras.datasets.fashion_mnist` throughout, rather than mixing cases of `tf.keras.datasets.mnist` containing MNIST digits.***

In [5]:
import tensorflow as tf
print(tf.__version__)

from time import time
from timeit import default_timer

1.14.0


#### To ensure reproducibility, set a bunch of random seed values

***Note: TensorFlow will not produce reproducible results if using a GPU!***

In [3]:
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set `tensorflow` pseudo-random generator at a fixed value
tf.set_random_seed(seed_value)

## Start Coding

Let's start with our import of TensorFlow

The Fashion MNIST data is available directly in the tf.keras datasets API. You load it like this:

In [3]:
mnist = tf.keras.datasets.fashion_mnist

Calling load_data on this object will give you two sets of two lists, these will be the training and testing values for the graphics that contain the clothing items and their labels.


In [4]:
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

What does these values look like? Let's print a training image, and a training label to see...Experiment with different indices in the array. For example, also take a look at index 42...that's a a different boot than the one at index 0


In [5]:
import matplotlib.pyplot as plt
plt.imshow(training_images[0])
print(training_labels[0])
print(training_images[0])

9
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0  13  73   0
    0   1   4   0   0   0   0   1   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3   0  36 136 127  62
   54   0   0   0   1   3   4   0   0   3]
 [  0   0   0   0   0   0   0   0   0   0   0   0   6   0 102 204 176 134
  144 123  23   0   0   0   0  12  10   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0 155 236 207 178
  107 156 161 109  64  23  77 130  72  15]
 [  0   0   0   0   0   0   0   0   0   0   0   1   0  69 207 223 218 216
  216 163 127 121 122 146 141  88 172  66]
 [  0   0   0   0   0   0   0   0   0   1   1   1   0 200 232 

You'll notice that all of the values in the number are between 0 and 255. If we are training a neural network, for various reasons it's easier if we treat all values as between 0 and 1, a process called '**normalizing**'...and fortunately in Python it's easy to normalize a list like this without looping. You do it like this:

In [6]:
training_images  = training_images / 255.0
test_images = test_images / 255.0

Now you might be wondering why there are 2 sets...training and testing -- remember we spoke about this in the intro? The idea is to have 1 set of data for training, and then another set of data...that the model hasn't yet seen...to see how good it would be at classifying values. After all, when you're done, you're going to want to try it out with data that it hadn't previously seen!

Let's now design the model. There's quite a few new concepts here, but don't worry, you'll get the hang of them. 

In [7]:
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

W0723 09:53:00.583595 139772214904640 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


**Sequential**: That defines a SEQUENCE of layers in the neural network

**Flatten**: Remember earlier where our images were a square, when you printed them out? Flatten just takes that square and turns it into a 1 dimensional set.

**Dense**: Adds a layer of neurons

Each layer of neurons need an **activation function** to tell them what to do. There's lots of options, but just use these for now. 

**Relu** effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.

**Softmax** takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!


The next thing to do, now the model is defined, is to actually build it. You do this by compiling it with an optimizer and loss function as before -- and then you train it by calling **model.fit ** asking it to fit your training data to your training labels -- i.e. have it figure out the relationship between the training data and its actual labels, so in future if you have data that looks like the training data, then it can make a prediction for what that data would look like. 

In [8]:
model.compile(optimizer = tf.train.AdamOptimizer(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

t0 = time()
model.fit(training_images, training_labels, epochs=25)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.")

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25

model fit Δt: 1m, 39.3s.


Once it's done training -- you should see an accuracy value at the end of the final epoch. It might look something like 0.9098. This tells you that your neural network is about 91% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 91% of the time. Not great, but not bad considering it was only trained for 5 epochs and done quite quickly.

But how would it work with unseen data? That's why we have the test images. We can call model.evaluate, and pass in the two sets, and it will report back the loss for each. Let's give it a try:

In [9]:
model.evaluate(test_images, test_labels)



[0.3956207452893257, 0.8832]

For me, that returned a accuracy of about .8838, which means it was about 88% accurate. As expected it probably would not do as well with *unseen* data as it did with data it was trained on!  As you go through this course, you'll look at ways to improve this. 

To explore further, try the below exercises:


# Exploration Exercises

### Exercise 1:
For this first exercise run the below code: It creates a set of classifications for each of the test images, and then prints the first entry in the classifications. The output, after you run it is a list of numbers. Why do you think this is, and what do those numbers represent? 

In [10]:
classifications = model.predict(test_images)

print(classifications[0])

[1.5770331e-11 4.1191679e-11 3.8599857e-12 1.6767428e-15 1.0556020e-15
 2.2329674e-07 2.2510032e-12 8.7730652e-05 3.8850845e-12 9.9991202e-01]


Hint: try running print(test_labels[0]) -- and you'll get a 9. Does that help you understand why this list looks the way it does? 

In [11]:
print(test_labels[0])

9


#### Question 1. What does this list represent?


1.   It's 10 random meaningless values
2.   It's the first 10 classifications that the computer made
3.   It's the probability that this item is each of the 10 classes



#### Answer: 
The correct answer is (3)

The output of the model is a list of 10 numbers. These numbers are a probability that the value being classified is the corresponding value, i.e. the first value in the list is the probability that the clothing item is of class '0', the next is a '1' etc. Notice that they are all VERY LOW probabilities.

For the 9, the probability was .996, i.e. the neural network is telling us that it's almost certainly belongs to class 9.

#### Question 2. How do you know that this list tells you that the item is an ankle boot?


1.   There's not enough information to answer that question
2.   The 10th element on the list is the biggest, and the ankle boot is labelled 9
2.   The ankle boot is label 9, and there are 0->9 elements in the list




#### Answer
The correct answer is (2). Both the list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of the 10 classes. The list having the 10th element being the highest value means that the Neural Network has predicted that the item it is classifying is most likely an ankle boot.

### Run the following neural net classifier for Fashion MNIST, using a hidden layer with 512 activation units.

**25 epochs, GPU:  / CPU: ~2m, 35s**

In [12]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=25)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25

model fit Δt: 2m, 37.3s.



 [1.8626867e-13 9.3411151e-17 2.6283591e-16 1.8817326e-22 1.9218954e-13
 7.5917725e-08 4.6711880e-13 4.7544523e-05 4.4580587e-14 9.9995244e-01]
9


### Exercise 2: 
Let's now look at the layers in your model. Experiment with different values for the dense layer with 512 neurons. What different results do you get for loss, training time etc? Why do you think that's the case?

#### 128 neurons in hidden layer

**25 epochs, GPU: / CPU: ~1m, 40s**

In [13]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=25)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25

model fit Δt: 1m, 42.4s.



 [4.00698181e-12 6.59834737e-20 4.42526658e-13 5.70492574e-21
 5.52180643e-13 9.62022023e-06 1.13418755e-13 2.68724310e-04
 2.96344789e-11 9.99721587e-01]
9


#### 1024 neurons in hidden layer

50 epochs instead of 25, since more neurons to train.

**50 epochs, GPU: / CPU: ~8m, 40s**

In [14]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(1024, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=50)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50

model fit Δt: 8m, 41.6s.



 [2.49241727e-24 2.65261926e-23 8.00470555e-28 1.03659397e-26
 5.44670923e-29 1.11057309e-10 4.46130897e-21 3.63680819e-09
 1.14250614e-23 1.00000000e+00]
9


#### Question 1. Increase to 1024 Neurons -- What's the impact?

1. Training takes longer, but is more accurate
2. Training takes longer, but no impact on accuracy
3. Training takes the same time, but is more accurate


#### Answer
The correct answer is (1) by adding more Neurons we have to do more calculations, slowing down the process, but in this case they have a good impact -- we do get more accurate. That doesn't mean it's always a case of 'more is better', you can hit the law of diminishing returns very quickly!

### Exercise 3: 

#### Question 1. What would happen if you remove the Flatten() layer. Why do you think that's the case? 


#### Answer

You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to 'flatten' that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they'll automatically be flattened for us.

In [15]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([#tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(64, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=5)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/5


InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [896,10] and labels shape [32]
	 [[{{node loss_4/output_1_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

### Exercise 4: 

#### Questions. Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5

#### Answer

You get an error as soon as it finds an unexpected value. Another rule of thumb -- the number of neurons in the last layer should match the number of classes you are classifying for. In this case it's the digits 0-9, so there are 10 of them, hence you should have 10 neurons in your final layer.

In [16]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(64, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(5, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=5)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/5


InvalidArgumentError: Received a label value of 9 which is outside the valid range of [0, 5).  Label values: 9 8 5 5 8 9 0 4 8 9 6 0 3 4 4 3 3 6 6 3 6 4 1 5 1 3 3 1 3 8 0 4
	 [[{{node loss_5/output_1_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

### Exercise 5: two hidden layers, each with 128 neurons

#### Question 1. Consider the effects of additional layers in the network. What will happen if you add another layer between the one with 512 and the final layer with 10. 

#### Answer

There isn't a significant impact -- because this is relatively simple data. For far more complex data (including color images to be classified as flowers that you'll see in the next lesson), extra layers are often necessary.

**100 epochs, GPU: / CPU: ~7m, 40s**

In [18]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=100)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### Exercise 6: hidden layer 512 neurons, 50 epochs

#### Question 1. Consider the impact of training for more or fewer epochs. Why do you think that would be the case? 

Try 15 epochs -- you'll probably get a model with a much better loss than the one with 5
Try 30 epochs -- you might see the loss value stops decreasing, and sometimes increases. This is a side effect of something called 'overfitting' which you can learn about [somewhere] and it's something you need to keep an eye out for when training neural networks. There's no point in wasting your time training if you aren't improving your loss, right! :)

**50 epochs, GPU: / CPU: ~5m, 40s**

In [19]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=50)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50

model fit Δt: 5m, 38.4s.



 [1.9045789e-29 8.0406305e-29 1.4603901e-31 0.0000000e+00 3.8137137e-31
 1.9946210e-12 1.3396883e-33 5.6032667e-09 2.2578751e-27 1.0000000e+00]
9


### Exercise 7: 512 hidden neurons, un-normalized inputs

#### Question 1. Before you trained, you normalized the data, going from values that were 0-255 to values that were 0-1. What would be the impact of removing that?

Here's the complete code to give it a try. Why do you think you get different results?

<font color="darkgreen">**The poor results (non-normalized pixel intensities) result from very shallow gradients in softmax when input values &Gt; 1. (The linear relu mapping to the hidden layer does not affect results except where negative bias terms &rarr; 0 values.)**</font>



**25 epochs, GPU: / CPU: ~2m, 45s**

In [6]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

training_images=training_images/1.0		# This merely converts to floating point
test_images=test_images/1.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=25)
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25

model fit Δt: 2m, 44.3s.



 [0.0000000e+00 2.2304909e-34 0.0000000e+00 0.0000000e+00 0.0000000e+00
 2.5079723e-03 0.0000000e+00 1.4297239e-04 0.0000000e+00 9.9734902e-01]
9


### Exercise 8: "standard" model, interrupted when loss < 0.15

Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought 'wouldn't it be nice if I could stop the training when I reach a desired value?' -- i.e. 95% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs....So how would you fix that? Like any other program...you have callbacks! Let's see them in action...

**2x epochs, GPU: / CPU: ~1m, s**

In [7]:
# import tensorflow as tf
# print(tf.__version__)
tf.set_random_seed(seed_value)

class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('loss')<0.15):
            print("\nReached 85% accuracy so cancelling training!")
            self.model.stop_training = True

callbacks = myCallback()


mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

t0 = time()
model.fit(training_images, training_labels, epochs=25, callbacks=[callbacks])
Δt = time() - t0
print(f"\nmodel fit Δt: {int(Δt//60)}m, {Δt % 60.0:4.1f}s.\n")
model.evaluate(test_images, test_labels)
print("")

classifications = model.predict(test_images)

print("\n", classifications[0])
print(test_labels[0])

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Reached 85% accuracy so cancelling training!

model fit Δt: 2m, 26.4s.



 [2.0639333e-17 4.4007718e-20 9.8756241e-18 9.8625670e-24 1.8242249e-16
 2.0454716e-07 2.8667591e-15 1.3607236e-05 6.7534735e-18 9.9998617e-01]
9


## Summary of results

|Exercise|Description|Train Error @10|Train Steps|Final Train Error|Test Error|
|:------:|-----------|:-------------:|:---------:|:---------------:|:--------:|
|Baseline|512 hidden|0.2230|25|0.1360|0.3991|
|&nbsp;|128 hidden|0.2360|25|0.1581|0.3672|
|2|1024 hidden|0.2189|50|0.0784|0.5889|
|5|128 & 128 hidden|0.2344|100|0.0524|1.0361|
|6|512 hidden|0.2218|50|0.0800|0.6030|
|7|512 hidden, not normed|0.4469|25|0.4278|0.5513|
|8|512 hidden, callback|0.2234|22|0.1462|0.3759|