# Paperspace Gradient: TensorFlow 2 Quick Start for Beginners
Last modified: Oct 28th 2021

This demonstrates TensorFlow 2 usage in a Gradient Notebook. The material is based on the original TensorFlow 2 Beginner Quick Start at https://www.tensorflow.org/tutorials/quickstart/beginner .

See the end of the notebook for the original copyright notice and license.

This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:

1. Build a neural network that classifies images
2. Train this neural network
3. And, finally, evaluate the accuracy of the model

## Setup and GPUs

TensorFlow 2 is already installed in our container, so we can import it directly.

In [1]:
import tensorflow as tf

Provided that you have started this Notebook on a [Gradient GPU instance](https://docs.paperspace.com/gradient/more/instance-types), Gradient will see the GPU and TensorFlow will utilize it. Using a GPU provides large speedups for many machine learning models.

We can list the available GPUs.

In [2]:
gpus = tf.config.list_physical_devices('GPU')
gpus

2021-10-29 00:21:11.957814: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:12.149478: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:12.150306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero


[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

By default, TensorFlow allocates all the memory on a GPU to the model being run. This is fine, until, for example, you run a container like this one that has more than one `.ipynb` notebook on it. Then the second notebook's model may fail due to the GPU being out of memory.

We can help with this by setting the memory used to [grow as needed](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) rather than being allocated all at once at the start.

In [3]:
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)

1 Physical GPUs, 1 Logical GPUs


2021-10-29 00:21:15.354423: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:15.355153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:15.355690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:15.829073: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-29 00:21:15.829663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read f

## Load and prepare the data

The model shows classification of the handwritten digits 0-9 in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

The data are loaded, converted from integers to floating-point numbers, and normalized.

In [4]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Build and train the model

The model is a simple single dense layer neural network with 10 output classes. We build it using `tf.keras.Sequential` by stacking layers.

Here we separate the model from the step to convert its outputs into probabilities.

In [5]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "[logits](https://developers.google.com/machine-learning/glossary#logits)" or "[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)" scores, one for each class.

In [6]:
predictions = model(x_train[:1]).numpy()
predictions

2021-10-29 00:21:26.647873: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


array([[ 0.16978383,  0.0247148 , -0.12805404, -0.06819983, -0.21171437,
         0.25576675,  0.27811742,  0.414046  , -0.10003339, -0.7693459 ]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to "probabilities" for each class.

In [7]:
tf.nn.softmax(predictions).numpy()

array([[0.11476036, 0.09926341, 0.08520058, 0.0904559 , 0.07836269,
        0.12506442, 0.12789118, 0.14651214, 0.08762171, 0.04486762]],
      dtype=float32)

(Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to
provide an exact and numerically stable loss calculation for all models when using a softmax output.)

To train the network, we choose an optimizer and loss function for training.

The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [8]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class:
it is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [9]:
loss_fn(y_train[:1], predictions).numpy()

2.0789263

The optimized is the commonly used Adam algorithm.

In [10]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

The `Model.fit` method trains the model, adjusting the model parameters to minimize the loss.

In [11]:
model.fit(x_train, y_train, epochs=5)

2021-10-29 00:21:35.453495: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f27d0059460>

## Evaluate the model

The `Model.evaluate` method checks the model's performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

In [12]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 0s - loss: 0.0782 - accuracy: 0.9747


[0.07815324515104294, 0.9746999740600586]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

## Output predictions

If you want your model to return a probability, you can wrap the trained model, and attach the softmax layer to it.

In [13]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])

In [14]:
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[2.90114457e-07, 3.68077524e-09, 1.96653928e-05, 6.03780791e-04,
        8.58420279e-12, 6.22024515e-07, 1.62259420e-12, 9.99363840e-01,
        1.29077000e-06, 1.05249010e-05],
       [3.24419389e-08, 1.89962948e-03, 9.98077989e-01, 2.02968185e-05,
        2.06696558e-13, 1.25068027e-06, 1.05556296e-07, 2.60848255e-13,
        6.73356908e-07, 4.45306257e-11],
       [1.20596084e-07, 9.98204470e-01, 4.95736080e-04, 1.07503402e-05,
        8.22353613e-05, 2.17296974e-06, 3.61531202e-05, 1.06120668e-03,
        1.06352491e-04, 8.53167137e-07],
       [9.99978542e-01, 2.14903442e-10, 1.53154779e-05, 2.32770930e-07,
        9.49640921e-10, 1.64647008e-06, 1.01331477e-06, 2.03536658e-07,
        5.99166185e-07, 2.54709153e-06],
       [3.42733329e-05, 6.16482332e-09, 3.69191519e-04, 4.08460920e-07,
        9.92645025e-01, 1.69179850e-06, 1.98810056e-04, 1.14649156e-04,
        1.16404044e-05, 6.62431307e-03]], dtype=float32)>

## Next steps

To proceed with TensorFlow 2 in Gradient, you can:
    
 - Try out the quick_start_advanced.ipynb notebook in this same container if you want to see some more advanced usage
 - Look at other Gradient material, such as the [tutorials](https://docs.paperspace.com/gradient/get-started/tutorials-list), [ML Showcase](https://ml-showcase.paperspace.com), [blog](https://blog.paperspace.com), or [community](https://community.paperspace.com)
 - Start writing your own projects, using our [documentation](https://docs.paperspace.com/gradient) when needed
 
If you get stuck or need help, [contact support](https://support.paperspace.com), and we will be happy to assist.

Good luck!

## Original TensorFlow copyright notice and license
##### Copyright 2019 The TensorFlow Authors.

In [15]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.