<a href="https://colab.research.google.com/github/evaneschneider/parallel-programming/blob/master/gpu_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to GPU programming with Numba

Yesterday we discussed the principles of parallel programming, and explored the key aspects of using Numba - the `@jit` decorator, benchmarking, and the `@vectorize` decorator for Numpy UFuncs. Today we are going to expand on that basis and use Numba to do parallel calculations in python by taking advantage of Numba's GPU interface (and Google's free GPUs - thanks colab!).

In [0]:
import numpy as np
from numba import cuda

#### Problem 0 - Accessing the GPU

**0a)** In order to run Numba functions using the GPU, we have to do a couple of things. First, go to the Runtime menu, click on 'Change Runtime Type', and in the pop-up box, under 'Hardware Accelerator', select 'GPU'. Save the Runtime.

**0b)** Ideally, that's all we should have to do. But in practice, even thought the Cuda libararies are installed, for some reason Colab usually can't find them. So, we'll just figure out where they are, and then point Colab to them.

In [4]:
!find / -iname 'libdevice'
!find / -iname 'libnvvm.so'

/usr/local/cuda-10.0/nvvm/libdevice
/usr/local/cuda-10.0/nvvm/lib64/libnvvm.so


Paste the location of the libraries into the following code box:

In [0]:
import os
os.environ['NUMBAPRO_LIBDEVICE'] = "/usr/local/cuda-10.0/nvvm/libdevice"
os.environ['NUMBAPRO_NVVM'] = "/usr/local/cuda-10.0/nvvm/lib64/libnvvm.so"

And that should do it! Okay, now that we've pointed Numba to the correct libraries, let's get going. To start, we are going to return to the simplest function we created yesterday - the vector add.

### Problem 1 - Vector Addition on GPUs

[[0 0 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 1 1]
 [2 2 2 2 2 2 2 2 2 2]
 [3 3 3 3 3 3 3 3 3 3]
 [4 4 4 4 4 4 4 4 4 4]]
