# Extending the Numba CUDA Target Exercise

Create a Numba extension that implements support for the following class:

In [1]:
class Quaternion(object):
    """
    A quaternion. Not to be taken as an exemplar API or implementation of a
    quaternion!  For Numba extension example purposes only.

    A quaternion is: a + bi + cj + dk
    """
    def __init__(self, a, b, c, d):
        self.a = a
        self.b = b
        self.c = c
        self.d = d

    @property
    def phi(self):
        """The angle between the x-axis and the N-axis"""
        a = self.a
        b = self.b
        c = self.c
        d = self.d
        return math.atan(2 * (a * b + c * d) / (a * a - b * b - c * c + d * d))

    @property
    def theta(self):
        """The angle between the z-axis and the Z-axis"""
        a = self.a
        b = self.b
        c = self.c
        d = self.d
        return -math.asin(2 * (b * d - a * c))

    @property
    def psi(self):
        """The angle between the N-axis and the X-axis"""
        a = self.a
        b = self.b
        c = self.c
        d = self.d
        return math.atan(2 * (a * d + b * c) / (a * a + b * b - c * c - d * d))

    def __add__(self, other):
        return Quaternion(self.a + other.a,
                          self.b + other.b,
                          self.c + other.c,
                          self.d + other.d)

Your implementation may be somewhat based on the Interval example - a similar set of classes and functions to add the typing, data model, and lowering are required.

When your extension is complete, the following kernel should work:

```python
@cuda.jit
def kernel(arr):
    q1 = Quaternion(1.0, 2.0, 9.75, 5.0)
    q2 = Quaternion(3.0, 4.0, 5.0, 6.0)
    q3 = q1 + q2
    arr[0] = q1.phi
    arr[1] = q1.theta
    arr[2] = q1.psi
    arr[3] = q3.a
    arr[4] = q3.b
    arr[5] = q3.c
    arr[6] = q3.d
```

such that calling it with:

```python
numba_res = np.zeros(7)
kernel[1, 1](numba_res)
print(numba_res)
```

prints:

```
[-0.94688683 -0.52359878 -0.40259505  4. 6. 14.75 11.]
```

## Hints

### Example solution

The code for the Interval example is available in [interval_example.py](interval_example.py). 

There is an example solution for the `Quaternion` class in [quaternion_solution.py](quaternion_solution.py).

### Implementing the __add__ method

The `__add__` method supports addition of two `Quaternion` instances using the `+` operator, e.g.:

```
q1 = Quaternion(1.0, 2.0, 3.0, 4.0)
q2 = Quaternion(5.0, 6.0, 7.0, 8.0)
q3 = q1 + q2
```

To implement typing for the `__add__` method, we need to type the `operator.add` method for the case where two `quaternion_type` objects are passed. This is done with:

```python
from numba.core.typing import signature
from numba.core.typing.templates import ConcreteTemplate

@cuda_registry.register_global(operator.add)
class Quaternion_ops(ConcreteTemplate):
    cases = [signature(quaternion_type, quaternion_type, quaternion_type)]
```

* We use `register_global` with a method (`operator.add`) to register operator methods.
* The `ConcreteTemplate` class allows us to give a list of accepted type signatures for a method.
* Signatures are constructed with `signature(return_type, *arg_types)`, where the types are instances of Numba types.
* The list of accepted signatures must be stored in `cases`, and can contain as many signatures as necessary. Here we only use one, but we could type more - for example, `signature(quaternion_type, quaternion_type, types.float32)` could be used for typing the addition of a `Quaternion` and a scalar.

To declare the lowering function, we use:

```python
from numba.cuda.cudaimpl import lower as cuda_lower

@cuda_lower(operator.add, quaternion_type, quaternion_type)
def cuda_quaternion_add(context, builder, sig, args):
```

The two `Quaternion` arguments are passed in the list `args`. Implementing the body of this function requires the construction of struct proxies (see e.g. the use of them in the `Interval` example) for the operands and the result.

### Calling other functions inside lowering functions

Sometimes it is necessary to call a function when lowering. For example, the `theta` method requires the use of `math.asin`. You can create a call to another function using this pattern:

```python
# Declare signature of function to be called.
# (returns a float64, takes a float64 argument)
sig = signature(types.float64, types.float64)
# Look up the implementation of the function with the given signature
impl = context.get_function(math.asin, sig)
# Generate a call to the function with the list of args `args`
res = impl(builder, args)
```

### Building arithmetic operations

Various arithmetic operations are required (addition, subtraction, multiplication, etc.). These operations can be constructed with the `builder`. Operations on integers and floating point types use different instructions - for example, `builder.add` creates an integer addition whereas `builder.fadd` creates a floating point addition.

Look at the llvmlite documentation for [a list of all IR builder functions](https://llvmlite.readthedocs.io/en/latest/user-guide/ir/ir-builder.html). In particular, the [floating point arithmetic section](https://llvmlite.readthedocs.io/en/latest/user-guide/ir/ir-builder.html#floating-point).

There is no unary negation in LLVM - negation of `x` must be implemented as `-0.0 - x`. To get the constant `-0.0`, use `context.get_constant(types.float64, -0.0)`.