# GPU vs. CPU Time Testing

In this notebook, we compare the speed at which the CPU and the GPU complete a matrix multiplication of the same random arrays.

First, do the imports. We're setting the TensorFlow "log level" to 2 so that it supresses warnings, but still outputs whether the TensorFlow operations are taking place on the CPU, or the GPU

In [1]:
import os
# Set log level to 3 to supress INFO and WARNING messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf
import numpy as np

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

tf.debugging.set_log_device_placement(True)

Num GPUs Available:  1


### Create some tensors

Create the matrices that we'll be working with. TensorFlow requires that the values be in float32 format for doing matrix multiplication on the GPU.

In [2]:
array_a = np.random.rand(4000,6000).astype(np.float32)
array_b = np.random.rand(6000,4000).astype(np.float32)

### Matrix multiplication on the CPU

We run the "%%time" magic command and get the total runtime of the cell as the "Wall time" at the bottom of the cell output.

In [3]:
%%time

with tf.device('/CPU:0'):
  # Place tensors on the CPU
  a = tf.constant(array_a)
  b = tf.constant(array_b)
  # Run the matrix multiplication on the CPU
  c = tf.matmul(a, b)   
    
print(c)

Executing op _MklMatMul in device /job:localhost/replica:0/task:0/device:CPU:0
tf.Tensor(
[[1514.3379 1538.084  1503.3572 ... 1517.2496 1528.9951 1515.0752]
 [1509.2175 1526.9805 1500.2554 ... 1531.8606 1521.5109 1519.6726]
 [1516.6642 1526.9082 1497.382  ... 1504.0918 1509.9412 1505.5955]
 ...
 [1495.1278 1502.5363 1467.5347 ... 1489.9993 1489.1797 1473.3014]
 [1496.9824 1518.2943 1494.0378 ... 1504.7379 1516.57   1499.1046]
 [1496.7546 1499.8086 1470.2448 ... 1482.6279 1487.8771 1477.4657]], shape=(4000, 4000), dtype=float32)
CPU times: user 3.12 s, sys: 261 ms, total: 3.39 s
Wall time: 701 ms


### Matrix multiplication on the GPU

Now we do the same calculation on the GPU.

In [4]:
%%time

with tf.device('/GPU:0'):
  # Place tensors on the GPU
  a = tf.constant(array_a)
  b = tf.constant(array_b)
  # Run the matrix multiplication on the GPU
  c = tf.matmul(a, b)   
    
print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[1514.3392 1538.0817 1503.3596 ... 1517.2487 1528.9945 1515.0747]
 [1509.2147 1526.9796 1500.2537 ... 1531.8633 1521.5155 1519.6733]
 [1516.6649 1526.9099 1497.3842 ... 1504.0929 1509.9387 1505.5957]
 ...
 [1495.1288 1502.5367 1467.5345 ... 1490.0023 1489.1785 1473.3015]
 [1496.9811 1518.2958 1494.0375 ... 1504.7385 1516.5706 1499.1068]
 [1496.7513 1499.8108 1470.2441 ... 1482.6249 1487.8734 1477.4644]], shape=(4000, 4000), dtype=float32)
CPU times: user 228 ms, sys: 270 ms, total: 497 ms
Wall time: 495 ms


### Final time comparison

When you ran the two above cells, you should have found that the calculation is almost twice as fast on the GPU in comparison to the CPU. Continue working your way through the other tutorial Notebooks in this directory to continue to learn the ins and outs of doing data analysis with GPU accelerated computing!