The following two sections of code are used to demonstrate the power of CUDA.<br>
The code blocks are broken into three sections:<br>
1. Initilize our variables and load libraries<br>
2. Perform a repetative array multiplication without CUDA<br>
3. Perform a repetative array multiplication with CUDA<br>

A 'for' loop is used to repeat the calculation 30 million times.<br>

The two arrays are of length 1,000 
$$
\left(\begin{array}{cc} 
A_{1}, & A_{2}, & \cdots, & A_{1000}
\end{array}\right)
\left(\begin{array}{cc} 
A_{1}, & A_{2}, & \cdots, & A_{1000}
\end{array}\right)
$$ 

### Using the following hardware:<br>
1. CPU = Intel(R) Core(TM) i5-7600K @ 3.80 GHz (16G RAM)
2. GPU = NVIDEA GeForce GTS 1070 (8G RAM)<br>
_*Demo for Dave H._

# Initilize our variables and load libraries.

In [1]:
import pycuda.autoinit
import pycuda.driver as drv
import numpy

from timeit import default_timer as timer

from pycuda.compiler import SourceModule

arrayA = numpy.random.randn(1000).astype(numpy.float32)
arrayB = numpy.random.randn(1000).astype(numpy.float32)
arrayC = numpy.zeros_like(arrayA)

## The following is the CUDA code 'C' that gets loaded into the GPU.

In [2]:
mod = SourceModule("""
__global__ void multiply_them(float *arrayC, float *arrayA, float *arrayB)
{
    #include <stdint.h> 
    
    const int i = threadIdx.x;
    for( uint64_t counter = 0; counter < 30000000; counter++)
    {
        arrayC[i] = arrayA[i] * arrayB[i];
    }    
}
""")

multiply_them = mod.get_function("multiply_them")

## In CUDA (GPU)
1. Start the timer<br>
2. Call the CUDA function to perform the calculations<br>
3. Print out the execution time

In [3]:
start = timer()
multiply_them(drv.Out(arrayC), drv.In(arrayA), drv.In(arrayB), block=(1000,1,1), grid=(1,1))
duration = timer() - start
print("It took {0:.4f} seconds to execute.".format(duration))

It took 5.0169 seconds to execute.


## In RAM (CPU)
1. Start the timer<br>
2. Call the python function to perform the calculations<br>
3. Print out the execution time

In [4]:
start = timer()
for i in range(0,30000000):
    arrayC = arrayA * arrayB
duration = timer() - start
print("It took {0:.4f} seconds to execute.".format(duration))

It took 21.5464 seconds to execute.
