# Running PyTorch vs TensorFlow on GPUs

In [3]:
import numpy as np 
import tensorflow as tf 
import torch
import time

## Summary for TensorFlow

If a TensorFlow operation has both CPU and GPU implementation, the GPU devices will be given priority, unless specified otherwise. 

e.g., on a system with devices `cpu:0` and `gpu:0`, `gpu:0` will automatically be selected to run `matmul`.

Below are three examples:

1. Generate matrices *and* run `matmul` all on the CPU.

2. Run `matmul` on a single GPU, but with matrix generation done on CPU (ie. with numpy). Note that this is not recommended, as the overhead to send data to the GPU to be processed at each iteration does not provide any speedup from the case where everything is run on the CPU.

3. Generate tensors that are automatically assigned to the GPU memory (ie. the default for TensorFlow) and then run `matmul` on these tensors. In this case everything will be done on the GPU and speedup will be considerable.


In [41]:
# Example 1 (everything done on CPU)

N = 100
ntimes = 1000
tstart = time.time()

for _ in range(ntimes):
    a = np.random.randn(N,N)
    b = np.random.randn(N,N)
    with tf.device('/cpu:0'):
        c = tf.matmul(a,b)

tend = time.time()
print('Time Elapsed: {} s\n'.format(tend-tstart))


Time Elapsed: 0.8255360126495361 s



In [42]:
# Example 2 (matrices generated on CPU, multiplication done on GPU)

N = 100
ntimes = 1000
tstart = time.time()

for _ in range(ntimes):
    a = np.random.randn(N,N)
    b = np.random.randn(N,N)
    c = tf.matmul(a,b)

tend = time.time()
print('Time Elapsed: {} s\n'.format(tend-tstart))


Time Elapsed: 0.8029618263244629 s



In [43]:
# Example 3 (everything done on GPU)

N = 100
ntimes = 1000
tstart = time.time()

for _ in range(ntimes):
    a = tf.random.normal([N,N])
    b = tf.random.normal([N,N])
    c = tf.matmul(a,b)

tend = time.time()
print('Time Elapsed: {} s\n'.format(tend-tstart))


Time Elapsed: 0.35160183906555176 s



## Summary for PyTorch

To utilise the GPU implementation, you must first manually allocate tensors in GPU memory (unlike TensorFlow, PyTorch does not do this automatically). Afterwards, when you perform operations on those tensors, e.g. `matmul`, they will be run on the GPU.

To allocate tensors in GPU memory, you must make them of a `torch.cuda` tensor type.

In [44]:
# Example 1 (everything done on CPU)

N = 100
ntimes = 1000
tstart = time.time()

for _ in range(ntimes):
    a = np.random.randn(N,N)
    b = np.random.randn(N,N)
    c = torch.matmul(torch.from_numpy(a), torch.from_numpy(b))

tend = time.time()
print('Time Elapsed: {} s\n'.format(tend-tstart))


Time Elapsed: 0.6669421195983887 s



In [39]:
# Example 2 (everything done on GPU)

N = 100
ntimes = 1000
tstart = time.time()

for _ in range(ntimes):
    a = torch.randn(N,N, device='cuda:0')
    b = torch.randn(N,N, device='cuda:0')
    c = torch.matmul(a,b)

tend = time.time()
print('Time Elapsed: {} s\n'.format(tend-tstart))

AssertionError: Torch not compiled with CUDA enabled