---
type: GPU, Parallel

required_modules:
  - numba
  - math
  - numpy
  - vectorize
  - cuda
---

## Source Information
---
**Created by**: Abe Stern

**Updated by**: October 01, 2024 by Gloria Seo

**Resources**: http://numba.pydata.org/

---

## Goal
The notebook aims to demonstrate how to use Numba's vectorization feature to efficiently compute angles in a triangle using the law of cosines, leveraging GPU acceleration.

# CUDA Ufuncs

Numba’s vectorize allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Using the `vectorize()` decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays and executes on the GPU.

Using vectorize(), you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs.

### Law of Cosines

For a triangle with sides $a$, $b$, and $c$ the law of cosines dictates that

$$
\frac{a^2+b^2-c^2}{2ab}=\cos C
$$

### Numba Ufunc Kernel

Below, we define the GPU-accelerated eager, or decoration-time, compilation vectorized function by providing signatures to the decorator and specifying `target='cuda'`.  GPU-targeted Ufuncs require signatures. 

## Defining Our Numba Ufunc Kernel

Now, let’s define a GPU-accelerated function to compute our angles. We do this using the @vectorize decorator and by specifying target='cuda'. Here’s how we set it up:

In [3]:
@vectorize(['float32(float32, float32, float32)',
            'float64(float64, float64, float64)'],
           target='cuda')
def compute_angle(a, b, c):
    cos_c = ( a**2 + b**2 - c**2 ) / ( 2.0 * a * b )
    return math.acos(cos_c)

## Required Modules for the Jupyter Notebook

Before running the notebook, make sure the following modules are loaded.

**Module:numba, math, numpy, cuda, vectorize**

In [2]:
import numba
from numba import vectorize,cuda
import numpy as np
import math

### Prepare Data

Next, we need to prepare our input data. We’re going to create large arrays of random numbers to represent the sides of our triangles. Let’s set this up!

In [4]:
N = int(5e8)
dtype = np.float32

# prepare the input
a = np.array(np.random.sample(N)+3, dtype=dtype)
b = np.array(np.random.sample(N)+4, dtype=dtype)
c = np.array(np.random.sample(N)+5, dtype=dtype)

### Call GPU Ufunc

Now, let’s call our GPU function just like we would with a regular NumPy function. Numba handles all the complex CUDA configurations for us automatically. 

In [5]:
%%timeit -n2 -r5 -o
C_GPU = compute_angle(a, b, c)

4 s ± 73.4 ms per loop (mean ± std. dev. of 5 runs, 2 loops each)


<TimeitResult : 4 s ± 73.4 ms per loop (mean ± std. dev. of 5 runs, 2 loops each)>

You’ll see that our GPU computation time.

### Numpy Version
To see how our GPU performance stacks up, we’ll also compute the angles using NumPy. This will help us measure the speedup we gain from using the GPU:

In [7]:
%%timeit -n1 -r1 -o
# CPU version
C_CPU = np.arccos(( a**2 + b**2 - c**2 ) / ( 2.0 * a * b ))

6.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


<TimeitResult : 6.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>

In [8]:
# store the timing result
CPU_TIMING = _

### Computing Speedup Factor

In [9]:
print('Speedup factor: ', CPU_TIMING.average / GPU_TIMING.average, 'X')

Speedup factor:  1.5568690690652713 X


## Checking Results
To ensure our results are consistent between the CPU and GPU calculations, we can recompute the values and check for agreement. Here’s how we do that:

In [10]:
# recompute (workaround for timeit bug)
C_GPU = compute_angle(a, b, c)
C_CPU = np.arccos(( a**2 + b**2 - c**2 ) / ( 2.0 * a * b ))

tol=1e-5
if np.array(np.abs(C_CPU-C_GPU)<tol).sum()==N:
    print('results agree')

results agree


## Submit Ticket
If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:

Email: consult@sdsc.edu

We appreciate your input and will review your suggestions promptly!
