# GPUs

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/swd6_hpp/blob/main/docs/07_GPUs.ipynb)

GPUs (Graphics Processing Units) are optimised for numerical operations, while CPUs (central processing units) perform general computation.

GPU hardware is designed for data parallelism, where high throughputs are achieved when the GPU is computing the same operations on many different elements at once.

Could use other types of accelerators too.

## [JAX](https://jax.readthedocs.io/en/latest/index.html)

...

## Automatic detection

Many libraries can use GPUs automatically if they can detect one.

[`TensorFlow`](https://www.tensorflow.org/install/gpu)
```python
import tensorflow as tf
tf.config.list_physical_devices('GPU')
```

[`PyTorch`](https://pytorch.org/docs/stable/notes/cuda.html)
```python
import torch
torch.cuda.is_available()
```

## [CUDA](https://developer.nvidia.com/how-to-cuda-python) (Compute Unified Device Architecture)
...

### [Numba](https://numba.pydata.org/numba-doc/latest/index.html)
...

In [2]:
from numba import cuda, vectorize

Numba [`@vectorize`](https://numba.pydata.org/numba-doc/latest/user/vectorize.html) on the CPU
- Can also use [`@jit`](https://numba.readthedocs.io/en/stable/user/jit.html)

In [5]:
@vectorize
def do_maths(x, y):
    return x + y

Numba [`@vectorize`](https://numba.pydata.org/numba-doc/latest/user/vectorize.html) on the GPU
- Can also use [`@cuda.jit`](https://numba.readthedocs.io/en/stable/cuda/kernels.html)

In [7]:
# For the GPU, need: types output(inputs) and target 
@vectorize(['float32(float32, float32)'], target='cuda')
def do_maths(x, y):
    return x + y

Considerations and more information:
- Ensure inputs are not too small and the calculation is not too simple.
- Consider whether the calculation is worth the overhead of sending data to and from the GPU ([memory management](https://numba.pydata.org/numba-doc/dev/cuda/memory.html)).
- Working with arrays of different dimensions: can use [generalized ufuncs](https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html) (NumPy), implemented in Numba as `guvectorize` on [CPUs](http://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator) and [GPUs](http://numba.pydata.org/numba-doc/latest/cuda/ufunc.html#generalized-cuda-ufuncs).
- What data precision is required (i.e., is 64-bit needed?).
- Custom functions beyond ufuncs ([kernels](https://numba.pydata.org/numba-doc/dev/cuda/kernels.html))

## [RAPIDS](https://developer.nvidia.com/rapids)
Accelerated data science libraries.
- Arrays and matrices:
  - [cuPy](https://cupy.dev/) for NumPy and SciPy

### [cuPy](https://cupy.dev/)

```python
# NumPy for CPU
>>> import numpy as np
>>> x_cpu = np.zeros((10, ))
>>> y_cpu = np.zeros((10, 5))
>>> z_cpu = np.dot(x_cpu, y_cpu)
>>> z_cpu = cp.asnumpy(z_gpu) # convert over

# CuPy for GPU
>>> import cupy as cp
>>> x_gpu = cp.zeros((10, ))
>>> y_gpu = cp.zeros((10, 5))
>>> z_gpu = cp.dot(x_gpu, y_gpu)
>>> z_gpu = cp.asarray(z_cpu) # convert over
```

- Tabular data
  - [cuDF](https://docs.rapids.ai/api/cudf/stable/) for Pandas
- Machine learning
  - [cuML](https://docs.rapids.ai/api/cuml/stable/) for scikit-learn
  - [XGBoost](https://rapids.ai/xgboost.html) on GPUs
- Graphs and networks
  - [cuGraph](https://docs.rapids.ai/api/cugraph/stable/) for [NetworkX](https://networkx.org/)
- Multiple GPUs
  - [Dask with CUDA](https://rapids.ai/dask.html), cuDF, cuML, and others.

## Exercise

...

## Further information

### Other options

- ...

### Resources

- [CuPy - Sean Farley](https://www.youtube.com/watch?v=_AKDqw6li58), PyBay 2019.  
- [cuDF - Mark Harris](https://www.youtube.com/watch?v=lV7rtDW94do), PyCon AU 2019.  