# Introduction to tinyML

> The summary of "Fundamental of tinyML" from Harvard University.

In [1]:
import numpy as np
import tensorflow as tf

## Multi-layer neural network

First lets re-train our original single layer network and see what the prediction is for $X = 10.0$ and what the learned weights are.

In [4]:
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

my_layer = Dense(units=1, input_shape=[1])
model = Sequential([
    my_layer
])
model.compile(optimizer='sgd', loss='mean_squared_error')

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

model.fit(xs, ys, epochs=500, verbose=False)

<tensorflow.python.keras.callbacks.History at 0x1a135241d08>

In [5]:
model.predict([10.0])

array([[18.98457]], dtype=float32)

In [6]:
my_layer.get_weights()

[array([[1.9977634]], dtype=float32), array([-0.99306583], dtype=float32)]

Next lets train a 2-layer network and see what its prediction and weights are.

In [8]:
my_layer_1 = Dense(units=2, input_shape=[1])
my_layer_2 = Dense(units=1)
model = Sequential([
    my_layer_1,
    my_layer_2
])
model.compile(optimizer='sgd', loss='mean_squared_error')

model.fit(xs, ys, epochs=500, verbose=False)

<tensorflow.python.keras.callbacks.History at 0x1a13b9d9388>

In [9]:
model.predict([10.0])

array([[18.999998]], dtype=float32)

In [10]:
my_layer_1.get_weights()

[array([[1.2703493 , 0.06604026]], dtype=float32),
 array([-0.41525835, -0.15489535], dtype=float32)]

In [11]:
my_layer_2.get_weights()

[array([[1.5339977],
        [0.7766009]], dtype=float32),
 array([-0.24270207], dtype=float32)]

Finally we can manually compute the output for our 2-layer network to better understand how it works.

In [12]:
value_to_predict = 10.0

layer1_w1 = (my_layer_1.get_weights()[0][0][0])
layer1_w2 = (my_layer_1.get_weights()[0][0][1])
layer1_b1 = ((my_layer_1.get_weights()[1][0]))
layer1_b2 = ((my_layer_1.get_weights()[1][1]))

layer2_w1 = (my_layer_2.get_weights()[0][0])
layer2_w2 = (my_layer_2.get_weights()[0][1])
layer2_b = (my_layer_2.get_weights()[1][0])

neuron1_output = (layer1_w1 * value_to_predict) + layer1_b1
neuron2_output = (layer1_w2 * value_to_predict) + layer1_b2

neuron3_output = (layer2_w1 * neuron1_output) + (layer2_w2 * neuron2_output) + layer2_b
neuron3_output

array([18.999998], dtype=float32)

## Exploring Categorical

### Start with a simple neural network for MNIST
Note that there are 2 layers, one with 20 neurons, and one with 10.

The 10-neuron layer is our final layer because we have 10 classes we want to classify.

Train this, and you should see it get about 98% accuracy

In [13]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Flatten

(X_train, y_train), (X_val, y_val) = mnist.load_data()

X_train = X_train / 255.0
X_val = X_val / 255.0

model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(20, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x1a14bba1588>

### Examine the test data

Using model.evaluate, you can get metrics for a test set. In this case we only have a training set and a validation set, so we can try it out with the validation set. The accuracy will be slightly lower, at maybe 96%. This is because the model hasn't previously seen this data and may not be fully generalized for all data. Still it's a pretty good score.

You can also predict images, and compare against their actual label. The [0] image in the set is a number 7, and here you can see that neuron 7 has a 9.9e-1 (99%+) probability, so it got it right!

In [14]:
model.evaluate(X_val, y_val)
classifications = model.predict(X_val)



In [15]:
classifications[0]

array([1.3297609e-06, 1.2231325e-12, 1.2750495e-06, 1.5201789e-03,
       1.1146433e-13, 7.4768068e-06, 5.8630570e-14, 9.9841964e-01,
       1.6355241e-05, 3.3744462e-05], dtype=float32)

In [16]:
y_val[0]

7

## 2-2 Coding Assignment

### Exploring DNN learning with Tensorflow

In this assignment we'll dive a little deeper with a series of hands on exercises to better understand DNN learning with Tensorflow. Remember that if you are taking the class for a certificate we will be asking you questions about the assignment in the test!

We start by setting up the problem for you.

In [28]:
from tensorflow.keras.datasets import fashion_mnist

# Load in Fashion MNIST
(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()

# Define the base model
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(20, activation=tf.nn.relu),
    Dense(10, activation=tf.nn.softmax)
])

Neural Networks learn the best when the data is scaled / normalized to fall in a constant range. One practitioners often use is the range [0,1]. How might you do this to the training and test images used here?

*A hint: these images are saved in the standard [RGB](https://www.rapidtables.com/web/color/RGB_Color.html) format*

In [29]:
training_images = training_images / 255.
test_images = test_images / 255.

Using these improved images lets compile our model using an adaptive optimizer to learn faster and a categorical loss function to differentiate between the the various classes we are trying to classify. Since this is a very simple dataset we will only train for 5 epochs.

In [30]:
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# fit the model to the training data
model.fit(training_images, training_labels, epochs=5)

# test the model on the test data
model.evaluate(test_images, test_labels)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.4527340829372406, 0.8378000259399414]

Once it's done training -- you should see an accuracy value at the end of the final epoch. It might look something like 0.8658. This tells you that your neural network is about 86% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 86% of the time. But how would it work with unseen data? That's why we have the test images. We can call ```model.evaluate```, and pass in the two sets, and it will report back the loss for each. This should reach about .8499 or thereabouts, showing about 85% accuracy. Not Bad!

But what did it actually learn? If we inference on the model using ```model.predict``` we get out the following list of values. **What does it represent?**

*A hint: trying running ```print(test_labels[0])```*

In [31]:
classifications = model.predict(test_images)
classifications[0]

array([7.5199951e-06, 6.9121331e-09, 3.4158870e-06, 6.7152919e-06,
       2.1683377e-06, 1.0825696e-02, 1.1499319e-04, 5.2464876e-02,
       4.5593292e-03, 9.3201536e-01], dtype=float32)

Let's now look at the layers in your model. What happens if you double the number of neurons in the dense layer. What different results do you get for loss, training time etc? Why do you think that's the case? 

In [32]:
NUMBER_OF_NEURONS = 128

# define the new model
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(NUMBER_OF_NEURONS, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

# compile fit and evaluate the model again
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.3516485095024109, 0.8744999766349792]

Consider the effects of additional layers in the network instead of simply more neurons to the same layer. First update the model to add an additional dense layer into the model between the two existing Dense layers.

In [33]:
YOUR_NEW_LAYER = Dense(256, activation=tf.nn.relu)

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    YOUR_NEW_LAYER,
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

Let's then compile, fit, and evaluate our model. What happens to the error? How does this compare to the original model and the model with double the number of neurons?

In [34]:
# compile fit and evaluate the model again
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.34299716353416443, 0.8779000043869019]

Before you trained, you normalized the data. What would be the impact of removing that? To see it for yourself fill in the following lines of code to get a non-normalized set of data and then re-fit and evaluate the model using this data.

In [35]:
# get new non-normalized mnist data
training_images_non = training_images * 255
test_images_non = test_images * 255

# re-compile, re-fit and re-evaluate
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    YOUR_NEW_LAYER,
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images_non, training_labels, epochs=5)
model.evaluate(test_images_non, test_labels)
classifications = model.predict(test_images_non)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Sometimes if you set the training for too many epochs you may find that training stops improving and you wish you could quit early. Good news, you can! TensorFlow has a function called ```Callbacks``` which can check the results from each epoch. Modify this callback function to make sure it exits training early but not before reaching at least the second epoch!

*A hint: logs.get(METRIC_NAME) will return the value of METRIC_NAME at the current step*

In [36]:
# define and instantiate your custom Callback
class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if( logs.get('accuracy') > 0.86):
            self.model.stop_training = True
callbacks = myCallback()

# re-compile, re-fit and re-evaluate
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                            tf.keras.layers.Dense(512, activation=tf.nn.relu),
                            YOUR_NEW_LAYER,
                            tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer = tf.keras.optimizers.Adam(),
      loss = 'sparse_categorical_crossentropy',
      metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])

Epoch 1/5
Epoch 2/5


<tensorflow.python.keras.callbacks.History at 0x1a3b43c4788>