# CUDA Array Interface

Because moving data from the CPU to GPU is expensive we want to keep as much data located on the GPU as possible at all times.

Sometimes in our workflow we want to change which tool we are using too. Perhaps we load an array of data with `cupy` but we want to write a custom CUDA kernel with `numba`. Or perhaps we want to switch to using a Deep Learning framework like `pytorch`. 

When any of these libraries load data onto the GPU the array in memory is pretty much the same, the differences between a cupy `ndarray` and a numba `DeviceNDArray` just boil down to how that array is wrapped and hooked into Python.

Thankfully with utilities like [DLPack](https://github.com/dmlc/dlpack) and [__ cuda_array__interface __](https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html) we can convert from one type to another without modifying the data on the GPU. We just create a new Python wrapper object and transfer all the device pointers accross.

Ensuring compatibility between popular GPU Python libraries is one of the core goals of the RAPIDS community.

![](images/array-interface.png)

Let's see this in action!

We start off my creating an array with cupy.

In [None]:
import cupy as cp
cp_arr = cp.random.random((1, 100_000, 10_000))
cp_arr

In [None]:
type(cp_arr)

Now let's convert this to a Numba array.

In [None]:
from numba import cuda
numba_arr = cuda.to_device(cp_arr)
numba_arr

_Notice that the GPU memory usage stays the same. This is because both `cp_arr` and `numba_arr` reference the same underlying data array, but are different types._

We can also convert our array to a pytorch `Tensor` object.

In [None]:
import torch  # Requires pytorch

In [None]:
torch_arr = torch.as_tensor(numba_arr, device='cuda')
torch_arr

In [None]:
type(torch_arr)