##### Portions Copyright 2019 The TensorFlow Authors.
Modified by Jung Hee Kim and Michael Glass.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Beyond Hello World, A Computer Vision Example
In the previous exercise you saw how to create a neural network that figured out the problem you were trying to solve. This gave an explicit example of learned behavior. Of course, in that instance, it was a bit of overkill because it would have been easier to write the function $y=2x+1$ directly, instead of bothering with using Machine Learning to learn the relationship between X and Y for a fixed set of values, and extending that for all values.

But what about a scenario where writing rules like that is much more difficult -- for example a computer vision problem? Let's take a look at a scenario where we can recognize different items of clothing or a handwritten digit, trained from a dataset containing 10 different types of clothes or the 10 different digits.

## Start Coding

Let's start with our import of TensorFlow

In [None]:
import tensorflow as tf
print(tf.__version__)

The MNIST digit data and the Fashion MNIST data are available directly in the tf.keras datasets API.
* MNIST is hand-written digits, scanned as 28x28 gray-scale images.
* Fashion MNIST contains images of 10 different articles of apparel, also scanned as 28x28 gray-scale images.

The following code cell shows how to load the two data sets, so you can experiment with either one.

*We will use fashion for illustration. Later we can use digit images.

In [None]:
mnist = tf.keras.datasets.fashion_mnist        # Clothing images
#mnist = tf.keras.datasets.mnist               # Digit images

Calling load_data on this object will give you two sets of two lists, these will be the training and testing values for the graphics that contain the clothing items and their labels.

* The input values were in arrays called ``_images``,
* The corresponding correct output values are in arrays called ``_labels``


In [None]:
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
print("Number of training cases=", len(training_labels), "  Number of test cases=", len(test_labels))

What does these values look like? Let's print a training image, and a training label to see.
Experiment with different indices in the array.

The 10 different classes of apparel are described here: https://github.com/zalandoresearch/fashion-mnist#labels.

We will illustrate with sneakers or digit 7, which have label=7 the training data.

In [None]:
# This code will find the index numbers of the first 5 examples of any label
target_label = 0   # Digit 7 (mnist digits) or Sneaker (fashion)
cur_index = -1
training_labels_list = training_labels.tolist()
locations = []
for i in range(5):
  cur_index = training_labels_list.index(target_label, cur_index+1)
  locations.append(cur_index)
print("First 5 locations of label number", target_label, ":", locations)

In [None]:
import numpy as np
np.set_printoptions(linewidth=120)

# Pick an index number from the list of locations above
indexNumber = 17

print('Label=',training_labels[indexNumber])
print(training_images[indexNumber])
print(training_labels[:30])



We can use matplotlib to visualize the 28x28 image as gray-scale.


In [None]:
import matplotlib.pyplot as plt

plt.imshow(255-training_images[indexNumber], cmap='gray')

Now try another index number for comparison between the above code cell and the below code cell.
For example:
* Fashion: try index numbers 6 vs. 14, or 0 vs. 42.
* Digits: try index numbers 2 vs 9

In [None]:
indexNumber2 = 14 # In fashion 14 is another sneaker
plt.imshow(255-training_images[indexNumber2], cmap='gray')

You'll notice that all of the values in the number are between 0 and 255. If we are training a neural network, for various reasons it's easier if we treat all values as between 0.0 and 1.0, a process called '**normalizing**'. In Python-Numpy it's easy to normalize a numpy array by 'broadcasting', dividing an array by a single number will apply to the whole array.

Remember that we loaded the data in two parts: training and testing.

In [None]:
# Normalize numbers to the range [0.0, 1.0]
training_images  = training_images / 255.0
test_images = test_images / 255.0



---




Now we build the neural network model. This one will have three layers.




In [None]:
model = tf.keras.models.Sequential([tf.keras.Input([28,28]),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    # tf.keras.layers.Dropout(rate=0.2),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

**Sequential** neural network has a SEQUENCE of layers in the neural network

**Flatten**: Input layer. Remember earlier where our images were a square, when you printed them out? Flatten just takes that square and turns it into a 1 dimensional set. The input is a 28 x 28 tensor.

**Dense**: Middle layer. A layer of 128 neurons.
* A *dense* layer means "densely connected" from the preceding layer. Every middle-layer neuron has a connection from every input-layer neuron.

* **Relu** *rectified linear* activation function effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network. You saw relu in your EasyEquation neural network homework.

**Dense**: Output layer. A layer of 10 neurons, densely connected from the middle layer.

* **Softmax** activation effectively massages the 10 output values into the *appearance* of probabilities with the most likely answer boosted. The sum of all the values will be 1. So, for example\
``[0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05]`` is turned into\
 ``[0,   0,   0,    0,   1,   0,   0,    0,    0]``  
(not exactly 1.0, a little bit less, and not exactly 0.0, a little bit more).

Note also that if there are two similar large values, meaning that your network thinks that both of those have similar high likelihood, sofmax will put both of them close to 0.5.

Softmax is valuable for classification tasks, where the output of the neural network will tell us the **class** (category) of the data. In this lab we have 10 classes, numbered 0 to 9.  The softmax outputs are like this:
* Label=0, output=``[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]``
* Label=1, output=``[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]``\
 ...
* Label=9, output=``[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]``

Note: the numbers won't be exactly 0.0 and 1.0, but close.

Note also that softmax is a transfer function, it is the function within the neuron which transfers the weighted-sum-of-inputs into an output value. It is different from the other transfer functions in that the 10 output neurons are effectively cordinating with each other, deciding which is the biggest, adjusting their outputs so the sum is 1.0.

**One-hot encoding** is the name we give to the above scheme for encoding the category labels. If you have N categories, you have N outputs -- where only 1 output is 1 and the others are 0.



In [None]:
model.summary()

Summary:
* Input layer has 784 neurons, it does no computation.
 *  It flattens the 28x28 2-D input tensor to 784 1-D vector, no adjustable parameters.
* Middle layer has 128 neurons.
 * Each neuron receives 784 inputs from input layer, so each neuron has 784 weights.
 * Each neuron also has a bias input
 * So 785 adjustable parameters for each of 128 neurons = 100,480 adjustable parameters.
* Output layer has 10 neurons.
 * Each receives 128 inputs (with weights) from middle layer
 * Plus a bias input
 * Total is 129 parameters for each of 10 neurons = 1,290 adjustable parameters
* Grand total is 100,480 + 1,290 = 101,770

The next thing to do, now the model is defined, is to actually build it. You do this by compiling it with an optimizer and loss function as before.
Then you train it by calling the ``fit()`` method using the training data (the images and the labels).

The *categorial cross-entropy* loss function is useful for classification tasks, when the neural network output is categories.

The *Adam* optimizer is a variant of stochastic gradient descent, which we used
in the first lab.


In [None]:
# First compile the model, our chosen optimizer algorithm and loss metrics

model.compile(optimizer = tf.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Now train
model.fit(training_images, training_labels, epochs=10)

Notice that this time we are reporting both the loss value and the accuracy. The accuracy is the fraction of cases that are correctly classified. Look at the accuracy value at the end of the final epoch. If it was approximately 0.89, that would tell you that your neural network is about 89% accurate in classifying the training data.

But how would it work with unseen data? That's why we have the test data. We can call the ``evaluate()`` method, using the test images and the correct labels. It will report back the loss and accuracy.

In [None]:
model.evaluate(test_images, test_labels)

For me, that returned a accuracy of about 86% on the fashion.   As expected it
probably would not do as well with *unseen* test data as it did with data it was trained on.

To explore further, try the below exercises:


# Exploration Exercises

###Exercise 1:
For this first exercise run the below code: It classifies each of the test images. Pick two different index numbers to see.
The output, after you run it is a list of numbers. Why do you think this is?

Hint: consider the output layer of the neural network.

In [None]:
classifications = model.predict(test_images)

# Two example test cases
testIndexNumber1=0
testIndexNumber2=1

print('test1 output=', classifications[testIndexNumber1])
print('test2 output=', classifications[testIndexNumber2])

Now try running ``print(test_labels[indexNumber], test_labels[indexNumber2])``.
Does that help you understand the output list?

In [None]:

#
print(test_labels[testIndexNumber1])
print(test_labels[testIndexNumber2])

In [None]:
plt.imshow(1.0-training_images[testIndexNumber2], cmap='gray')


### What does an output list represent?


1.   It's 10 random meaningless values
2.   It's the first 10 classifications that the computer made
3.   It's the probability that this item is each of the 10 classes


####Answer:
The correct answer is (3)

The output of the model is a list of 10 numbers. These numbers are a probability that the value being classified is the corresponding class.

For the fashion data set, the labels are here: https://github.com/zalandoresearch/fashion-mnist#labels, i.e. the first value in the list is for class 0 (T-shirt/top), the next is a 1 (Trouser) etc.

For the digits data set, the labels are the same as the digit. Class 0 means digit "0" was written, etc.


##Exercise 2:
Let's now look at the layers in your model. Experiment with different numbers of neurons in the middle layer, for example 1024 or 16 neurons. What different results do you get for loss, training time etc? Why do you think that's the case?


In [None]:
import tensorflow as tf
print(tf.__version__)

mnist = tf.keras.datasets.mnist
#mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

modelB = tf.keras.models.Sequential([tf.keras.Input([28,28]),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(1000, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

modelB.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

modelB.fit(training_images, training_labels, epochs=5)

print("\nEvaluation on test data:")
modelB.evaluate(test_images, test_labels)

classifications = modelB.predict(test_images)



###Question 1. Increase to 1024 Neurons -- What's the impact?

1. Training takes longer, but is a lot more accurate
2. Training takes longer, but only a little more accurate
3. Training takes the same time, but is more accurate


####Answer
The correct answer is by adding more neurons we have to do more calculations, slowing down the process, but in this case they have a good impact -- we do get more accurate. That doesn't mean it's always a case of 'more is better', you can hit the law of diminishing returns very quickly!

##Exercise 3:

What would happen if you remove the Flatten() layer. Why do you think that's the case?

You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to 'flatten' that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they'll automatically be flattened for us.

In [None]:
import tensorflow as tf
print(tf.__version__)

#mnist = tf.keras.datasets.mnist
mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

modelC = tf.keras.models.Sequential([#tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(64, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

modelC.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

modelC.fit(training_images, training_labels, epochs=5)

modelC.evaluate(test_images, test_labels)

classifications = modelC.predict(test_images)

print(classifications[0])
print(test_labels[0])

##Exercise 4:

Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5

You get an error as soon as it finds an unexpected value. Another rule of thumb -- the number of neurons in the last layer should match the number of classes you are classifying for. In this case it's the digits 0-9, so there are 10 of them, hence you should have 10 neurons in your final layer.

In [None]:
import tensorflow as tf
print(tf.__version__)

mnist = tf.keras.datasets.mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

modelC = tf.keras.models.Sequential([tf.keras.Input([28,28]),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(64, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(5, activation=tf.nn.softmax)])

modelC.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy')

modelC.fit(training_images, training_labels, epochs=5)

print("\nEvaluating")
modelC.evaluate(test_images, test_labels)

classifications = modelC.predict(test_images)

print(classifications[0])
print(test_labels[0])

##Exercise 5:

Consider the effects of additional layers in the network. What will happen if you add another dense layer in the middle just before final layer with 10.

Ans: There isn't a significant impact. For far more complex data extra layers are often necessary.

In [None]:
import tensorflow as tf
print(tf.__version__)

#mnist = tf.keras.datasets.mnist
mnist = tf.keras.datasets.fashion_mnist


(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

modelD = tf.keras.models.Sequential([tf.keras.Input([28,28]),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(20, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

modelD.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])

modelD.fit(training_images, training_labels, epochs=5)

print("\nEvaluating")
modelD.evaluate(test_images, test_labels)

classifications = modelD.predict(test_images)

print(classifications[0])
print(test_labels[0])

#Exercise 6:

Consider the impact of training for more or less epochs.

Try 15 epochs -- you'll probably get a model with a much better loss than the one with 5 epochs.
Try 30 epochs -- you might see the loss value stops decreasing, and sometimes increases.
There is also the problem of **overfitting** where the trained model has "learned" the specific data and doesn't work as well on data it hasn't seen yet.

In [None]:
import tensorflow as tf
print(tf.__version__)

#mnist = tf.keras.datasets.mnist
mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) ,  (test_images, test_labels) = mnist.load_data()

training_images = training_images/255.0
test_images = test_images/255.0

modelE = tf.keras.models.Sequential([tf.keras.Input([28,28]),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

modelE.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])

modelE.fit(training_images, training_labels, epochs=15)

print("\nEvaluating")
modelE.evaluate(test_images, test_labels)

classifications = modelE.predict(test_images)

print(classifications[34])
print(test_labels[34])

#Exercise 7:

Before you trained, you normalized the data, going from values that were 0-255 to values that were 0-1. What would be the impact of removing that? Here's the complete code to give it a try. Why do you think you get different results?

In [None]:
import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
#mnist = tf.keras.datasets.mnist

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
#training_images=training_images/255.0
#test_images=test_images/255.0
modelF = tf.keras.models.Sequential([
  tf.keras.Input([28,28]),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
modelF.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
modelF.fit(training_images, training_labels, epochs=5)
print("\nEvaluating")
modelF.evaluate(test_images, test_labels)
classifications = modelF.predict(test_images)

print(classifications[0])
print(test_labels[0])

#Exercise 8:

Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought 'wouldn't it be nice if I could stop the training when I reach a desired value?' -- i.e. 95% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs....So how would you fix that? Like any other program...you have callbacks! Let's see them in action...

In [None]:
import tensorflow as tf
print(tf.__version__)

class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy')>0.85):
      print("\n\nReached 85% accuracy so cancelling training!\n")
      self.model.stop_training = True

callbacks = myCallback()
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
modelG = tf.keras.models.Sequential([
  tf.keras.Input([28,28]),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
modelG.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
               metrics=['accuracy'])
modelG.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])
