# AIST 4010 TUT1: Deep Learning with a GPU on Colab

In this notebook we will learn to do deep learning with a GPU in Colab. You will see how to call the GPU, set up a deep learning environment including PyTorch and Tensorflow in Colab.

**Author:** Licheng Zong

inspired by [TensorFlow with GPU](https://colab.research.google.com/notebooks/gpu.ipynb) notebook

## Enabling and testing the GPU

First, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

Next, we'll check the GPU and CUDA information. 

In [None]:
!/opt/bin/nvidia-smi

Thu Jan 13 07:39:15 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Connect to the GPU with TensorFlow

Import tensorflow and check its version

In [None]:
import tensorflow as tf
print(tf.__version__)

2.7.0


Check if GPU device is available with TensorFlow

In [None]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


Print the GPU information with TensorFlow

In [None]:
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.physical_device_desc for x in local_device_protos if x.device_type == 'GPU']

print(get_available_gpus())

['device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7']


## Connect to the GPU with PyTorch

Import Pytorch and check its version

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
print(torch.__version__)

1.10.0+cu111


Check if GPU device is available with PyTorch

In [None]:
# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Using device: cuda


Print the GPU information with TensorFlow

In [None]:
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))

Tesla K80


## Observe TensorFlow speedup on GPU relative to CPU

This example constructs a typical convolutional neural network layer over a
random image and manually places the resulting ops on either the CPU or the GPU
to compare execution speed.

Prepare gpu environment

In [None]:
import tensorflow as tf
import timeit

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')

Define cpu and gpu function

In [None]:
def cpu():
  with tf.device('/cpu:0'):
    random_image_cpu = tf.random.normal((100, 100, 100, 3))
    net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
    return tf.math.reduce_sum(net_cpu)

def gpu():
  with tf.device('/device:GPU:0'):
    random_image_gpu = tf.random.normal((100, 100, 100, 3))
    net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
    return tf.math.reduce_sum(net_gpu)
  
# We run each op once to warm up
cpu()
gpu()

<tf.Tensor: shape=(), dtype=float32, numpy=-164.0846>

Compare running time on GPU and CPU

In [None]:
# Run the op several times
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('GPU (s):')
gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
3.6941374410000662
GPU (s):
0.052690824000023895
GPU speedup over CPU: 70x
