# Jupyter Notebook Example on ARC using TensorFlow

A first optional step is to disable TensorFlow logging output. This can make the output cells of the Jupyter notebook very cluttered. However, it can be helpful to see the verbose logging output to determine if CUDA libraries are loaded. If you're using TensorFlow on your system for the first time, I recommend __keeping__ the logging so you can understand what TensorFlow is doing. If you are familar with how it works, then you can disable logging using the commands below.

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # https://stackoverflow.com/a/64438413

To start, we need to first import TensorFlow. Here, we alias it as `tf` to make typing easier.

In [2]:
import tensorflow as tf

TensorFlow makes working with GPUs very easy. As long as a GPU is visible to TensorFlow it can use it. Here we retreive the list of GPUs that are visible to TensorFlow as a `list` of .

In [3]:
# Get list of GPU objects.
gpus = tf.config.list_physical_devices('GPU')

# Print the number of GPUs that are visible to TensorFlow.
print(f"Num GPUs Available: {len(gpus)}")

# Loop over all GPUs and print their name and device type.
for gpu in gpus:
    print(f"Name: {gpu.name}, Type: {gpu.device_type}")

Num GPUs Available: 1
Name: /physical_device:GPU:0, Type: GPU


Now let's do some basic tensor operations on the GPU specifically. There are multiple ways of doing this. By default TensorFlow automatically uses the first GPU in its available list, otherwise it uses the CPU. So, simply perform tensor operations and they will automatically be redirected to the appropriate device!

In [4]:
# Log when operations are done on GPU.
tf.debugging.set_log_device_placement(True)

In [5]:
# Run operation on available hardware.
a = tf.zeros((10,2)) # Left matrix with 10 rows and 2 columns
b = tf.zeros((2,3)) # Right matrix with 2 rows and 3 columns
c = tf.tensordot(a, b, axes=1) # Matrix dot product results in 10 rows and 3 columns
print(f"{c.shape=}")

Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
c.shape=TensorShape([10, 3])


There are of course other ways of running on GPUs. The next easiest way is to manually tell TensorFlow which GPU you want to use. To do this, we use a Python context manager called `tf.device()` which runs operations within its context on the device you choose.

In [6]:
# Iterate over GPUs and perform same computation on each separately.
for gpu in gpus:

    # Use context manager to have computation done on GPU.
    # with tf.device(gpu.name.replace('physical_device', 'device')):
    with tf.device(gpu.name.split(':', 1)[1]):
        a = tf.zeros((10,2)) # Left matrix with 10 rows and 2 columns
        b = tf.zeros((2,3)) # Right matrix with 2 rows and 3 columns
        c = tf.tensordot(a, b, axes=1) # Matrix dot product results in 10 rows and 3 columns
        print(f"{c.shape=}")

Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
c.shape=TensorShape([10, 3])


Another way of doing this is to use TensorFlow `Strategy` objects, which allow more advanced placement of where computations occurr (either on CPU, single GPU, or multiple GPUs in parallel). To start, let's use just a single GPU here.

In [7]:
# To make things simple, let's create strategy for a single GPU
# (the first one in our list, at device index "0")
strategy = tf.distribute.OneDeviceStrategy('gpu:0')

# Use context manager to have computation done on GPU.
with strategy.scope():
    a = tf.zeros((10,2)) # Left matrix with 10 rows and 2 columns
    b = tf.zeros((2,3)) # Right matrix with 2 rows and 3 columns
    c = tf.tensordot(a, b, axes=1) # Matrix dot product results in 10 rows and 3 columns
    print(f"{c.shape=}")

Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
c.shape=TensorShape([10, 3])


Now let's say you need more compute power. If you have multuple GPUs available, you can change the strategy to suport multiple GPUs in __parallel__. There are also multiple ways of doing this, all are covered in the [TensorFlow documentation](https://www.tensorflow.org/guide/gpu#:~:text=Note%3A%20Use%20tf.,machines%2C%20is%20using%20Distribution%20Strategies). The simplest way is to use a `MirroredStrategy`. In more complex settings, this strategy replicates a an AI model on each GPU and splits the batches across multiple GPUs, thus efficiently parallelizing computation.

In [8]:
# Create a mirrored strategy using all GPUs visible to TensorFlow.
strategy = tf.distribute.MirroredStrategy([
    gpu.name.split(':', 1)[1] for gpu in gpus
])

# Use context manager to have computation done on GPU.
with strategy.scope():
    a = tf.zeros((10,2)) # Left matrix with 10 rows and 2 columns
    b = tf.zeros((2,3)) # Right matrix with 2 rows and 3 columns
    c = tf.tensordot(a, b, axes=1) # Matrix dot product results in 10 rows and 3 columns
    print(f"{c.shape=}")

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
c.shape=TensorShape([10, 3])
