# CPU vs GPU

The speed difference of CPU and GPU can be significant in deep learning. But how much? Let’s do a test.

The original source code was from https://medium.com/@erikhallstrm/hello-world-tensorflow-649b15aed18c, with some modifications by @faizmisman

<img src="https://blogs.nvidia.com/wp-content/uploads/2009/12/6a00d834515fca69e201287663224d970c.jpg">

In [None]:
##Importing some necessary libraries

from __future__ import print_function
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerLine2D
import tensorflow as tf
import time

## The code below is to show lists of CPU and GPU running on your machine.

NOTE: You can see "/device:CPU:0" and "/device:GPU:0" on the output if your tensorflow is running on both CPU and GPU. If you have multiple GPUs running, the list should be "/device:GPU:0", "/device:GPU:1" and more.

In [None]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

## Function below is to create the random matrix r1 and r2 and run dot functions

In [None]:
def get_times():

    device_times = {
        "/gpu:0":[],
        "/cpu:0":[]
    }
    matrix_sizes = range(1000,6000,1000) # range of matrix sizes with range(initial_size, end_size, interval)

    for size in matrix_sizes:
        for device_name in device_times.keys():
            
            shape = (size,size)
            print("####### Calculating on the " + device_name + " with matrix size " + str(size) + " #######")
            data_type = tf.float16
            with tf.device(device_name):
                r1 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type) # Generating random value matrix for r1
                r2 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type) # Generating random value matrix for r2
                dot_operation = tf.matmul(r2, r1) # Performing dot product on matrix r1 and r2


            with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
                    start_time = time.time()
                    result = session.run(dot_operation)
                    time_taken = time.time() - start_time
                    device_times[device_name].append(time_taken)
                    print("Time taken: " + str(time_taken))
            #print(device_times)

            #if time_taken > maximum_time:
    return device_times, matrix_sizes

Now let's check how long it take for the the CPU and the GPU to run the algorithm

In [None]:
device_times, matrix_sizes = get_times()
gpu_times = device_times["/gpu:0"]
cpu_times = device_times["/cpu:0"]

Plotting the results

In [None]:
plt.plot(matrix_sizes[:len(gpu_times)], gpu_times, 'o-', label = 'GPU')
plt.plot(matrix_sizes[:len(cpu_times)], cpu_times, 'o-', label = 'CPU')
plt.ylabel('Time(s)')
plt.xlabel('Matrix size')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()