# Using TensorFlow

The tutorials I went though in my Coursera courses used an outdated version of TensorFlow, so I need to relearn it all from first principles. It turns out that *a lot* has changed. The documentation for TensorFlow is quite good, and [there are several guides available](https://www.tensorflow.org/guide/). I'll go through some basics from the guides and then try using `tf.keras` on the MNIST dataset.

First, we start with importing TensorFlow and other needed modules.

In [1]:
import tensorflow as tf
import timeit
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

Let's start with some simple variables and see how tensors work in TensorFlow.

In [2]:
# Create variables
a = tf.Variable(5)
print(a)
b = tf.Variable(2, name='b')
print(b)
print(b.numpy())

<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=5>
<tf.Variable 'b:0' shape=() dtype=int32, numpy=2>
2


So the `Variable` is a stored TensorFlow quantity that can be changed, as opposed to `tf.constant` which cannot be updated. This is useful for things like our weights and biases which will update through backpropagation.  Additionally, we can call `.numpy` on a tensor to convert it to a `numpy` array.

In [3]:
c = tf.constant([[1, 2],
                 [3, 4]])
print(c)
print()
print(c.numpy())

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)

[[1 2]
 [3 4]]


We can also use broadcasting in TensorFlow. This will help a lot when adding weights and biases in a model which tend to be of size $n \times m$ and $1 \times m$.

In [4]:
print(c+8)

tf.Tensor(
[[ 9 10]
 [11 12]], shape=(2, 2), dtype=int32)


Now let's have a look at functions. Efficient use of TensorFlow applies [graphs](https://www.tensorflow.org/guide/intro_to_graphs) for operations. This means that rather than calling `tensor1 + tensor2`, it's better to make these functions into a graph. This allows for faster computations, especially with serveral operations wrapped into a graph. According to the documentation,

> "Graphs are data structures that contain a set of tf.Operation objects, which represent units of computation; and `tf.Tensor` objects, which represent the units of data that flow between operations. They are defined in a `tf.Graph` context. Since these graphs are data structures, they can be saved, run, and restored all without the original Python code."

We can create functions with a function decorator, where a function definition is preceded by `@tf.function`, or we can create a TensorFlow function object using `tf.function(myFunction)`. Additionally, `tf.function` recursively traces any function it calls, so if there are nested functions, we need only specify the outermost function as a `tf.function` object.

In [5]:
# Create function with decorator
@tf.function
def add(a, b):
    return a + b

# Create function with tf.function
def mult(a, b):
    return a*b
multFunction = tf.function(mult)

# Create an inner and outer function, where only the outer function is a tf.function object
def inner_function(a,b):
    return (a+b)
@tf.function
def outer_function(a,b):
    return inner_function(a,b)**2

# Evaluate functions
print(add(a,b))
print(multFunction(a,b))
print(outer_function(a,b))

tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(49, shape=(), dtype=int32)


The increased performance isn't really going to be seen with simple functions that simply add two numbers, but it can have a pretty big difference for substantially complex operations. Let's see the actual difference with a simple model that is wrapped in `tf.function` and another that is not.

In [6]:
# Create an oveerride model to classify pictures
class SequentialModel(tf.keras.Model):
  def __init__(self, **kwargs):
    super(SequentialModel, self).__init__(**kwargs)
    self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
    # Add a lot of small layers
    num_layers = 100
    self.my_layers = [tf.keras.layers.Dense(64, activation="relu")
                      for n in range(num_layers)]
    self.dropout = tf.keras.layers.Dropout(0.2)
    self.dense_2 = tf.keras.layers.Dense(10)

  def call(self, x):
    x = self.flatten(x)
    for layer in self.my_layers:
      x = layer(x)
    x = self.dropout(x)
    x = self.dense_2(x)
    return x

# Set input data
input_data = tf.random.uniform([20, 28, 28])

# Build the model without the tf.function specificaiton
eager_model = SequentialModel()

# Don't count the time for the initial build.
eager_model(input_data)
print("Eager time:", timeit.timeit(lambda: eager_model(input_data), number=100))

# Wrap the call method in a `tf.function`
graph_model = SequentialModel()
graph_model.call = tf.function(graph_model.call)

# Don't count the time for the initial build and trace.
graph_model(input_data)
print("Graph time:", timeit.timeit(lambda: graph_model(input_data), number=100))

Eager time: 1.7726695000000001
Graph time: 0.2920102999999994


So we can see a pretty notable speed up here in this simple model with 100 layers using graph computation rather than the default eager computations.

## MNIST Classification

Now let's use `tf.keras` to create a model to classify images in the MNIST dataset. This is taken almost directly from a tutorial on the TensorFlow documentation pages, [found here](https://www.tensorflow.org/tutorials/quickstart/beginner).

Let's first load the dataset embedded within the TensorFlow package. 

In [7]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimension
x_train = x_train[..., tf.newaxis].astype("float32")
x_test = x_test[..., tf.newaxis].astype("float32")

Now we'll build our model by stacking layers.

In [8]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

The model will return a vector of "logits" or "log-odds" scores, one for each class.

In [9]:
predictions = model(x_train[:1]).numpy()
predictions

array([[-0.00139627, -0.13016906,  0.02925561, -0.15676871, -0.4639399 ,
         0.14254001,  0.23802276,  0.5029071 ,  0.4270174 ,  0.4075299 ]],
      dtype=float32)

We can then use SoftMax to convert these logits to probabilities for each class.

In [10]:
tf.nn.softmax(predictions).numpy()

array([[0.08682629, 0.07633538, 0.08952887, 0.07433165, 0.05467276,
        0.10026789, 0.11031372, 0.1437697 , 0.13326278, 0.13069096]],
      dtype=float32)

The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a True index and returns a scalar loss for each example. This loss is equal to the negative log probability of the true class: It is zero if the model is sure of the correct class. This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.log(1/10)` ~ 2.3.

In [11]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()

2.2999098

Now we compile the model with specified a loss function and optimizer.

In [12]:
model.compile(optimizer='adam',loss=loss_fn,metrics=['accuracy'])

Finally, running `model.fit` will train the model to minimize the loss.

In [13]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x17b01e4e0a0>

Now we can evaluate the accuracy of the model on the test dataset.

In [14]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 0s - loss: 0.0745 - accuracy: 0.9767


[0.07445415109395981, 0.9767000079154968]

It looks like the model did well, at about 98\% on the test set!

## MNIST Classification with CNN

Now that we've crated a model using a simple neural network, let's try using a CNN. This is taken almost directly from a tutorial on the TensorFlow documentation pages, [found here](https://www.tensorflow.org/tutorials/quickstart/advanced).

We'll again start by loading the MNIST data.

In [15]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimension
x_train = x_train[..., tf.newaxis].astype("float32")
x_test = x_test[..., tf.newaxis].astype("float32")

Now we'll use `tf.data` to batch and shuffle the dataset.

In [16]:
train_ds = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train)).shuffle(10000).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

Now we build the `tf.keras` model using the Keras model subclassing API. Additionally, we'll choose an optimizer and loss function for training and select metrics to measure the loss and the accuracy of the model.

In [17]:
class MyModel(Model):
    
  def __init__(self):
    super(MyModel, self).__init__()
    self.conv1 = Conv2D(32, 3, activation='relu')
    self.flatten = Flatten()
    self.d1 = Dense(128, activation='relu')
    self.d2 = Dense(10)

  def call(self, x):
    x = self.conv1(x)
    x = self.flatten(x)
    x = self.d1(x)
    return self.d2(x)

# Create an instance of the model
model = MyModel()

# Define the loss function and optimizer
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define the metrics for loss and accuracy
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

To train the model, we use `tf.GradientTape`. According to the documentation:

> "TensorFlow provides the `tf.GradientTape` API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually `tf.Variable`s. TensorFlow 'records' relevant operations executed inside the context of a `tf.GradientTape` onto a "tape". TensorFlow then uses that tape to compute the gradients of a 'recorded' computation using reverse mode differentiation."

So now let's create functions for training and testing the model.

In [18]:
@tf.function
def train_step(images, labels):
    
    with tf.GradientTape() as tape:    
        # training=True is only needed if there are layers with different behavior during training versus inference (e.g. Dropout).
        predictions = model(images, training=True)
        loss = loss_object(labels, predictions)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))

        train_loss(loss)
        train_accuracy(labels, predictions)
    
@tf.function
def test_step(images, labels):
    
    # training=False is only needed if there are layers with different behavior during training versus inference (e.g. Dropout).
    predictions = model(images, training=False)
    t_loss = loss_object(labels, predictions)
    
    test_loss(t_loss)
    test_accuracy(labels, predictions)

Now we're set to run the model and see the results. Let's run the model for 5 epochs.

In [21]:
EPOCHS = 5

for epoch in range(EPOCHS):
    
    # Reset the metrics at the start of the next epoch
    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()

    for images, labels in train_ds:
        train_step(images, labels)

    for test_images, test_labels in test_ds:
        test_step(test_images, test_labels)

    print(f'Epoch {epoch + 1}, '
          f'Loss: {train_loss.result()}, '
          f'Accuracy: {train_accuracy.result() * 100}, '
          f'Test Loss: {test_loss.result()}, '
          f'Test Accuracy: {test_accuracy.result() * 100}')

Epoch 1, Loss: 0.1328735500574112, Accuracy: 96.01667022705078, Test Loss: 0.0704181119799614, Test Accuracy: 97.73999786376953
Epoch 2, Loss: 0.04222521185874939, Accuracy: 98.69499969482422, Test Loss: 0.05747160688042641, Test Accuracy: 98.12999725341797
Epoch 3, Loss: 0.02281898260116577, Accuracy: 99.25333404541016, Test Loss: 0.05551176890730858, Test Accuracy: 98.25
Epoch 4, Loss: 0.013914735987782478, Accuracy: 99.54833221435547, Test Loss: 0.06104082986712456, Test Accuracy: 98.18999481201172
Epoch 5, Loss: 0.010572392493486404, Accuracy: 99.65666198730469, Test Loss: 0.0683385357260704, Test Accuracy: 98.13999938964844


And we have our image classifier trained to about 98\% now. Using Keras in TensorFlow seems pretty easy, but this is certainly going to take some getting used to.