# Paperspace Gradient: TensorFlow 2 Quick Start for Beginners
Last modified: Nov 18th 2021

## Purpose and intended audience

This Quick Start tutorial demonstrates TensorFlow 2 usage in a Gradient Notebook. It is aimed at users who are relatively new to TensorFlow, although we do assume that you have basic knowledge of Python code.

For more advanced users who are familiar with TensorFlow and want a quick overview of a more general model being built on Gradient, see the Quick Start for Advanced Users, `quick_start_advanced.ipynb`.

We use [Keras](https://www.tensorflow.org/guide/keras/overview) within TensorFlow to:

- Build a neural network that classifies images
- Train this neural network
- And, finally, evaluate the accuracy of the model

followed by some next steps that you can take to proceed with using Gradient.

The material is based on the original [TensorFlow 2 Beginner Quick Start](https://www.tensorflow.org/tutorials/quickstart/beginner).

See the end of the notebook for the original copyright notice and license.

## Check that you are on a GPU instance

The notebook is designed to run on a Gradient GPU instance (as opposed to a CPU-only instance). The instance type, e.g., A4000, can be seen by clicking on the instance icon on the left-hand navigation bar in the Gradient Notebook interface. It will say if it is CPU or GPU.

![Gradient Notebooks instance type](https://s3.amazonaws.com/ps.public.resources/images/instance_type.png)

If the instance type is CPU, you can change it by clicking *Stop Instance*, then the instance type displayed to get a drop-down list. Select a GPU instance and start up the Notebook again.

For help with instances, see the Gradient documentation on [instance types](https://docs.paperspace.com/gradient/more/instance-types) or [starting a Gradient Notebook](https://docs.paperspace.com/gradient/explore-train-deploy/notebooks).

## Setup and GPUs

TensorFlow 2 is already installed in our container, so we can import it directly.

In [None]:
import tensorflow as tf

Provided that you have started this Notebook on a [Gradient GPU instance](https://docs.paperspace.com/gradient/more/instance-types), Gradient will see the GPU and TensorFlow will utilize it. Using a GPU provides large speedups for many machine learning models.

We can list the available GPUs.

In [None]:
gpus = tf.config.list_physical_devices('GPU')
gpus

2021-11-18 22:25:56.588194: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:25:56.596964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:25:56.597592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero


[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

By default, TensorFlow allocates all the memory on a GPU to the model being run. This is fine, until, for example, you run a container like this one that has more than one `.ipynb` notebook on it. Then the second notebook's model may fail due to the GPU being out of memory.

We can help with this by setting the memory used to [grow as needed](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) rather than being allocated all at once at the start.

In [None]:
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)

1 Physical GPUs, 1 Logical GPUs


2021-11-18 22:26:01.433899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:26:01.434548: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:26:01.435048: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:26:01.902673: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 22:26:01.903224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read f

## Load and prepare the data

The model shows classification of the handwritten digits 0-9 in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

The data are loaded, converted from integers to floating-point numbers, and normalized.

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

## Build and train the model

The model is a simple single dense layer neural network with 10 output classes. We build it using `tf.keras.Sequential` by stacking layers.

Here we separate the model from the step to convert its outputs into probabilities.

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "[logits](https://developers.google.com/machine-learning/glossary#logits)" or "[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)" scores, one for each class.

In [None]:
predictions = model(x_train[:1]).numpy()
predictions

2021-11-18 22:26:10.980847: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


array([[ 0.15088367, -0.05580643, -0.00876592, -0.27116978,  0.63206875,
        -0.49716207, -0.01700336, -0.05629416,  0.8836522 ,  0.7265957 ]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to "probabilities" for each class.

In [None]:
tf.nn.softmax(predictions).numpy()

array([[0.09107447, 0.07406829, 0.07763574, 0.05971744, 0.14735764,
        0.04763805, 0.07699885, 0.07403217, 0.1895108 , 0.1619665 ]],
      dtype=float32)

(Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to
provide an exact and numerically stable loss calculation for all models when using a softmax output.)

To train the network, we choose an optimizer and loss function for training.

The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class:
it is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [None]:
loss_fn(y_train[:1], predictions).numpy()

3.0441236

The optimized is the commonly used Adam algorithm.

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

The `Model.fit` method trains the model, adjusting the model parameters to minimize the loss.

In [None]:
model.fit(x_train, y_train, epochs=5)

2021-11-18 22:26:22.003587: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f8db43ed040>

## Evaluate the model

The `Model.evaluate` method checks the model's performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

In [None]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 0s - loss: 0.0716 - accuracy: 0.9767


[0.07155822962522507, 0.9767000079154968]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

## Output predictions

If you want your model to return a probability, you can wrap the trained model, and attach the softmax layer to it.

In [None]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])

In [None]:
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[1.7349062e-07, 5.7786383e-07, 2.8814042e-05, 2.4379074e-04,
        4.4788934e-12, 1.1708044e-06, 5.1462865e-11, 9.9963212e-01,
        2.0317297e-07, 9.3058909e-05],
       [1.6954492e-07, 2.7827043e-04, 9.9967492e-01, 3.4268265e-05,
        7.8544746e-16, 1.1802070e-05, 2.3913348e-08, 3.0798752e-10,
        6.2784687e-07, 4.8282083e-11],
       [1.3037444e-06, 9.9576807e-01, 1.3244587e-04, 9.0079986e-05,
        1.0926812e-04, 5.7470035e-05, 7.3475398e-06, 2.7898736e-03,
        1.0398781e-03, 4.2153715e-06],
       [9.9974960e-01, 1.0754984e-07, 6.9114314e-05, 1.9143697e-06,
        1.4249044e-07, 1.5499092e-04, 1.7205464e-06, 6.5770114e-06,
        9.4809042e-07, 1.4924598e-05],
       [3.7654374e-06, 2.9582652e-09, 1.3960643e-05, 3.9624075e-07,
        9.8526043e-01, 8.6292096e-07, 1.9666064e-05, 1.0832635e-04,
        2.2412544e-06, 1.4590355e-02]], dtype=float32)>

## Next steps

To proceed with TensorFlow 2 in Gradient, you can:
    
 - Try out the quick_start_advanced.ipynb notebook in this same container if you want to see some more advanced usage
 - Look at other Gradient material, such as the [tutorials](https://docs.paperspace.com/gradient/get-started/tutorials-list), [ML Showcase](https://ml-showcase.paperspace.com), [blog](https://blog.paperspace.com), or [community](https://community.paperspace.com)
 - Start writing your own projects, using our [documentation](https://docs.paperspace.com/gradient) when needed
 
If you get stuck or need help, [contact support](https://support.paperspace.com), and we will be happy to assist.

Good luck!

## Original TensorFlow copyright notice and license
##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.