<a href="https://colab.research.google.com/github/neurorishika/PSST/blob/colab-kaggle-restructure/Tutorial/Optional%20Material/TensorFlow%20Benchmark/Benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab\"/></a> &nbsp; <a href="https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neurorishika/PSST/colab-kaggle-restructure/Tutorial/Optional%20Material/TensorFlow%20Benchmark/Benchmark.ipynb" target="_parent"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open in Kaggle"/></a>

## Comparing Python vs TensorFlow Performance

To justify why we use TensorFlow over normal python, we can run some benchmarks for simple operations and compare the two implementations.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
import time as time

In [2]:
!nvidia-smi

Fri Aug 27 09:52:44 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
!lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:            0
CPU MHz:             2299.998
BogoMIPS:            4599.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs 

### Matrix Multiplication

We compare the time takes for Python and TensorFlow to evaluate the product of two matrices of varying sizes.

In [4]:
n_replicate = 20
matrix_sizes = [8,32,128,512,2048,8192]

#### In TensorFlow CPU

In [5]:
config = tf.ConfigProto(
        device_count = {'GPU': 0}
    )
with open("TFCPUMatrixBenchmark.csv","w") as file:
  file.write("Matrix Size,Median,Lower 95% CI,Upper 95% CI\n")
  for n in matrix_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        tf.reset_default_graph()
        with tf.device('/CPU:0'):
          a = tf.random_uniform([n,n])
          b = tf.random_uniform([n,n])
          c = tf.matmul(a,b)
        with tf.Session(config=config) as sess:
            sess.run(tf.global_variables_initializer())
            c = sess.run(c)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{}x{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,n,median,lci,uci))
    print("For a {}x{} Matrix, Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,n,median,lci,uci))

For a 8x8 Matrix, Runtime = 0.0076 (0.0072-0.0304) secs
For a 32x32 Matrix, Runtime = 0.0071 (0.0069-0.0081) secs
For a 128x128 Matrix, Runtime = 0.0079 (0.0075-0.0116) secs
For a 512x512 Matrix, Runtime = 0.0148 (0.0136-0.0213) secs
For a 2048x2048 Matrix, Runtime = 0.3042 (0.2912-0.3201) secs
For a 8192x8192 Matrix, Runtime = 18.9908 (18.8000-19.3735) secs


TensorFlow GPU Version

In [6]:
with open("TFGPUMatrixBenchmark.csv","w") as file:
  file.write("Matrix Size,Median,Lower 95% CI,Upper 95% CI\n")
  for n in matrix_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        tf.reset_default_graph()
        with tf.device('/GPU:0'):
          a = tf.random_uniform([n,n])
          b = tf.random_uniform([n,n])
          c = tf.matmul(a,b)
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            c = sess.run(c)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{}x{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,n,median,lci,uci))
    print("For a {}x{} Matrix, Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,n,median,lci,uci))

For a 8x8 Matrix, Runtime = 0.0138 (0.0133-0.3986) secs
For a 32x32 Matrix, Runtime = 0.0136 (0.0132-0.0170) secs
For a 128x128 Matrix, Runtime = 0.0139 (0.0133-0.0164) secs
For a 512x512 Matrix, Runtime = 0.0153 (0.0142-0.0192) secs
For a 2048x2048 Matrix, Runtime = 0.0265 (0.0258-0.0292) secs
For a 8192x8192 Matrix, Runtime = 0.4717 (0.4657-0.5200) secs


#### NumPy Version (CPU only)

In [7]:
with open("NumpyMatrixBenchmark.csv","w") as file:
  file.write("Matrix Size,Median,Lower 95% CI,Upper 95% CI\n")
  for n in matrix_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        a = np.random.uniform(size=(n,n))
        b = np.random.uniform(size=(n,n))
        c = np.matmul(a,b)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{}x{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,n,median,lci,uci))
    print("For a {}x{} Matrix, Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,n,median,lci,uci))

For a 8x8 Matrix, Runtime = 0.0000 (0.0000-0.0022) secs
For a 32x32 Matrix, Runtime = 0.0000 (0.0000-0.0001) secs
For a 128x128 Matrix, Runtime = 0.0031 (0.0030-0.0033) secs
For a 512x512 Matrix, Runtime = 0.0223 (0.0199-0.0260) secs
For a 2048x2048 Matrix, Runtime = 0.6440 (0.6299-0.6595) secs
For a 8192x8192 Matrix, Runtime = 33.3957 (32.8667-33.6082) secs


### RK4 Integration

We compare the time takes for Python and TensorFlow to evaluate the integrate varying numbers of the differential equations of the form $\dot x = \sin{xt}$ .

In [8]:
n_replicate = 20
equation_sizes = [1,10,100,1000,10000,100000]
t = np.arange(0,5,0.01)

#### TensorFlow Integrator

In [9]:
def tf_check_type(t, y0): # Ensure Input is Correct
    if not (y0.dtype.is_floating and t.dtype.is_floating):
        raise TypeError('Error in Datatype')

class Tf_Integrator():
    
    def integrate(self, func, y0, t): 
        time_delta_grid = t[1:] - t[:-1]
        
        def scan_func(y, t_dt): 
            t, dt = t_dt
            dy = self._step_func(func,t,dt,y)
            return y + dy

        y = tf.scan(scan_func, (t[:-1], time_delta_grid),y0)
        return tf.concat([[y0], y], axis=0)
    
    def _step_func(self, func, t, dt, y):
        k1 = func(y, t)
        half_step = t + dt / 2
        dt_cast = tf.cast(dt, y.dtype) # Failsafe

        k2 = func(y + dt_cast * k1 / 2, half_step)
        k3 = func(y + dt_cast * k2 / 2, half_step)
        k4 = func(y + dt_cast * k3, t + dt)
        return tf.add_n([k1, 2 * k2, 2 * k3, k4]) * (dt_cast / 6)
    

def odeint_tf(func, y0, t):
    t = tf.convert_to_tensor(t, preferred_dtype=tf.float64, name='t')
    y0 = tf.convert_to_tensor(y0, name='y0')
    tf_check_type(y0,t)
    return Tf_Integrator().integrate(func,y0,t)
        
def f(X,t):
    return tf.sin(X*t)

TensorFlow CPU version

In [10]:
config = tf.ConfigProto(
        device_count = {'GPU': 0}
    )

with open("TFCPUIntegrationBenchmark.csv","w") as file:
  file.write("Number of Equations,Median,Lower 95% CI,Upper 95% CI\n")
  for n in equation_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        tf.reset_default_graph()
        with tf.device('/CPU:0'):
          state = odeint_tf(f,tf.constant([0.]*n,dtype=tf.float64),t)
        with tf.Session(config=config) as sess:
            sess.run(tf.global_variables_initializer())
            state = sess.run(state)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,median,lci,uci))
    print("For {} equation(s), Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,median,lci,uci))

For 1 equation(s), Runtime = 0.1129 (0.1047-0.2044) secs
For 10 equation(s), Runtime = 0.1097 (0.1064-0.1235) secs
For 100 equation(s), Runtime = 0.1158 (0.1128-0.1258) secs
For 1000 equation(s), Runtime = 0.1575 (0.1492-0.1686) secs
For 10000 equation(s), Runtime = 0.4570 (0.4499-0.4771) secs
For 100000 equation(s), Runtime = 3.7758 (3.7220-3.8195) secs


TensorFlow GPU Version

In [11]:
with open("TFGPUIntegrationBenchmark.csv","w") as file:
  file.write("Number of Equations,Median,Lower 95% CI,Upper 95% CI\n")
  for n in equation_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        tf.reset_default_graph()
        with tf.device('/GPU:0'):
          state = odeint_tf(f,tf.constant([0.]*n,dtype=tf.float64),t)
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            state = sess.run(state)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,median,lci,uci))
    print("For {} equation(s), Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,median,lci,uci))

For 1 equation(s), Runtime = 0.4121 (0.3917-0.4338) secs
For 10 equation(s), Runtime = 0.4143 (0.4028-0.4378) secs
For 100 equation(s), Runtime = 0.4395 (0.4120-0.5239) secs
For 1000 equation(s), Runtime = 0.4443 (0.4257-0.4611) secs
For 10000 equation(s), Runtime = 0.4701 (0.4550-0.5066) secs
For 100000 equation(s), Runtime = 0.6729 (0.6429-0.7258) secs


#### Numpy Integrator

In [12]:
def python_check_type(y,t): # Ensure Input is Correct
    return y.dtype == np.floating and t.dtype == np.floating

class python_Integrator():
    
    def integrate(self,func,y0,t):
        time_delta_grid = t[1:] - t[:-1]
        
        y = np.zeros((y0.shape[0],t.shape[0]))
        y[:,0] = y0

        for i in range(time_delta_grid.shape[0]):
            k1 = func(y[:,i], t[i])                               # RK4 Integration Steps
            half_step = t[i] + time_delta_grid[i] / 2
            k2 = func(y[:,i] + time_delta_grid[i] * k1 / 2, half_step)
            k3 = func(y[:,i] + time_delta_grid[i] * k2 / 2, half_step)
            k4 = func(y[:,i] + time_delta_grid[i] * k3, t[i] + time_delta_grid[i])
            y[:,i+1]= (k1 + 2 * k2 + 2 * k3 + k4) * (time_delta_grid[i] / 6) + y[:,i]
        return y

def odeint_python(func,y0,t):
    y0 = np.array(y0)
    t = np.array(t)
    if python_check_type(y0,t):
        return python_Integrator().integrate(func,y0,t)
    else:
        print("error encountered")
        
def f(X,t):
    return np.sin(X*t)

NumPy (CPU only)

In [13]:
with open("NumpyIntegrationBenchmark.csv","w") as file:
  file.write("Number of Equations,Median,Lower 95% CI,Upper 95% CI\n")
  for n in equation_sizes:
    run_time = []
    for i in range(n_replicate):
        start = time.time()
        solution = odeint_python(f,[0.]*n,t)
        end = time.time()
        run_time.append(end-start)
    median = np.median(run_time)
    lci = np.quantile(run_time,0.05)
    uci = np.quantile(run_time,0.95)
    file.write("{},{:0.4f},{:0.4f},{:0.4f}\n".format(n,median,lci,uci))
    print("For {} equation(s), Runtime = {:0.4f} ({:0.4f}-{:0.4f}) secs".format(n,median,lci,uci))

  


For 1 equation(s), Runtime = 0.0355 (0.0343-0.0392) secs
For 10 equation(s), Runtime = 0.0247 (0.0165-0.0447) secs
For 100 equation(s), Runtime = 0.0480 (0.0457-0.0498) secs
For 1000 equation(s), Runtime = 0.0826 (0.0791-0.0925) secs
For 10000 equation(s), Runtime = 0.7697 (0.7479-0.7913) secs
For 100000 equation(s), Runtime = 5.3550 (3.3901-7.4485) secs
