# Exploring DLPack APIs and semantics

_Note: installing all these libraries in a single environment is a little painful. Here is a way that works (as of now, YMMV): https://gist.github.com/rgommers/347b40695b526ff3993a61d36bdb1c6e_.

For the relevant part of the API standard doc, see: https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html

## Python APIs for each library

Create a simple int32 array, and roundtrip with `to/from_dlpack`:

In [2]:
 # CuPy
import cupy as cp

x = cp.arange(3)
capsule = x.toDlpack()
x2 = cp.fromDlpack(capsule)
x2

array([0, 1, 2])

In [3]:
# MXNet
import mxnet

x = mxnet.nd.arange(3)
# MXNet also has to_dlpack_for_write(), with identical docs (?)
# Looks like the same idea as JAX: keep ownership if _for_read(),
#                                  consume if _for_write().
capsule = x.to_dlpack_for_read()
x2 = mxnet.nd.from_dlpack(capsule)
x2


[0. 1. 2.]
<NDArray 3 @cpu(0)>

In [4]:
# PyTorch
import torch
import torch.utils.dlpack

x = torch.arange(3)
capsule = torch.utils.dlpack.to_dlpack(x)
x2 = torch.utils.dlpack.from_dlpack(capsule)
x2

tensor([0, 1, 2])

In [5]:
# TensorFlow
import tensorflow as tf

x = tf.range(3)
capsule = tf.experimental.dlpack.to_dlpack(x)
x2 = tf.experimental.dlpack.from_dlpack(capsule)
x2

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([0, 1, 2], dtype=int32)>

In [10]:
import jax
import jax.dlpack

x = jax.numpy.arange(3)
# Note: take_ownership=False (default) requires jaxlib 0.1.57, released 11 Nov 2020
#       this is a mode where the user guarantees not to mutate the buffer
#       see https://github.com/google/jax/issues/4636
capsule = jax.dlpack.to_dlpack(x, take_ownership=True)
x2 = jax.dlpack.from_dlpack(capsule)
x2



DeviceArray([0, 1, 2], dtype=int32)

### Some observations
    
- All libraries except NumPy and Dask support DLPack, with a similar API
- Names are not the same:
  - `to_dlpack`, `from_dlpack` *functions* (PyTorch, TensorFlow, JAX)
  - `toDlpack`, `fromDlpack` *methods* (CuPy)
  - a `take_ownership` keyword (JAX)
  - `to_dlpack_for_read`/`to_dlpack_for_write` *method* + `from_dlpack` *function* (MXNet)
  
  

## Interop between libraries

In [13]:
# TensorFlow - PyTorch interop
# ----------------------------
import tensorflow as tf
import torch.utils.dlpack

x = tf.range(3)
capsule = tf.experimental.dlpack.to_dlpack(x)
x2 = torch.utils.dlpack.from_dlpack(capsule)

x2 += 1
assert x2[2] == 3  # sanity check we got the data

capsule2 = torch.utils.dlpack.to_dlpack(x2)
x3 = tf.experimental.dlpack.from_dlpack(capsule2)

assert x3[2] == 3

In [14]:
# PyTorch to CuPy
# ```````````````
x = cp.arange(3)
capsule = x.toDlpack()
t2 = torch.utils.dlpack.from_dlpack(capsule)

# This will actually share memory:
t2[0] = 3
assert x[0] == 3

# Now see if `x` is still available after `t2` goes out of scope:
x = cp.arange(3)

def somefunc(x):
    capsule = x.toDlpack()
    t2 = torch.utils.dlpack.from_dlpack(capsule)
    t2[0] = 3
    return None

somefunc(x)
x += 1
assert x[0] == 4  # Yep

In [17]:
# JAX to PyTorch
# ``````````````
j = jax.numpy.arange(3)
capsule = jax.dlpack.to_dlpack(j, take_ownership=True)
t = torch.utils.dlpack.from_dlpack(capsule)

# Now see the impact of JAX's immutability
j[0]

RuntimeError: Invalid argument: Invalid buffer passed to Execute() as argument 0 to replica 0: Invalid argument: Hold requested on deleted or donated buffer

In [18]:
# Consuming a capsule twice is a no-no!
j = jax.numpy.arange(3)
capsule = jax.dlpack.to_dlpack(j, take_ownership=True)
t = torch.utils.dlpack.from_dlpack(capsule)
t2 = torch.utils.dlpack.from_dlpack(capsule)

RuntimeError: from_dlpack received an invalid capsule. Note that DLTensor capsules can be consumed only once, so you might have already constructed a tensor from it once.

## What does an implementation look like?

A CuPy implementation snippet (only some essential parts) from https://github.com/cupy/cupy/blob/master/cupy/core/dlpack.pyx:

In [None]:
%%cython
cdef class DLPackMemory(memory.BaseMemory):
    """Memory object for a dlpack tensor.
    
    This does not allocate any memory.
    """
    cdef DLManagedTensor* dlm_tensor
    cdef object dltensor

    def __init__(self, object dltensor):
        self.dltensor = dltensor
        self.dlm_tensor = <DLManagedTensor *>cpython.PyCapsule_GetPointer(
            dltensor, 'dltensor')
        ...
        # Make sure this capsule will never be used again.
        cpython.PyCapsule_SetName(dltensor, 'used_dltensor')

    def __dealloc__(self):
        self.dlm_tensor.deleter(self.dlm_tensor)


cpdef ndarray fromDlpack(object dltensor):
    """Zero-copy conversion from a DLPack tensor to a :class:`~cupy.ndarray`."""
    mem = DLPackMemory(dltensor)
    ...
    return ndarray(shape_vec, cp_dtype, mem_ptr, strides=strides_vec)


cpdef object toDlpack(ndarray array) except +:
    cdef DLManagedTensor* dlm_tensor = \
        <DLManagedTensor*>stdlib.malloc(sizeof(DLManagedTensor))

    cdef size_t ndim = array._shape.size()
    cdef DLTensor* dl_tensor = &dlm_tensor.dl_tensor
    dl_tensor.data = array.data.ptr
    dl_tensor.ndim = ndim

    ...

    dlm_tensor.manager_ctx = <void *>array
    cpython.Py_INCREF(array)
    dlm_tensor.deleter = deleter

    return cpython.PyCapsule_New(dlm_tensor, 'dltensor', pycapsule_deleter)

## So what's going on with ownership and deletion?

_Key bits of explanation by Tianqi Chen (from https://github.com/data-apis/consortium-feedback/issues/1)_:

Clarification wrt "consume exactly one": It does not mean that we are moving the memory from numpy to torch. Instead, the convention means that the PyCapsule can only be consumed exactly once. The exporter(that calls `to_dlpack`) still retains the memory.

... 

The memory will be released only after both `x` (exported tensor) and `t2` (imported tensor) go out of scope.

...

In particular, the `DLManagedTensor` contains a deleter that allows the consumer to signal that the tensor is no longer needed. Because the way the signature is designed, we need to make sure that there is a sole consumer of the `DLManagedTensor` so it is only called once when the consumer no longer needs the memory (otherwise it will cause a double free).

Of course, we can also change the signature to include refcounting(e.g. call `IncRef` when there is a copy) in `DLManagedTensor`, however, that means additional requirement that not every exporter might support.

...

The way things works is that when the consumer choose to de-allocate later, it will call into the deleter in the `DLManagedTensor`. A common implementation of a deleter will then decrease the refcount to the array object.

For example, in order to implement `np.to_dlpack`, we will call `PyIncRef` on the numpy object, and put the object pointer into the `manager_ctx` field. Then the deleter will call into `PyDecRef`.

_My (Ralf's) thoughts on this ownership/deletion behaviour_:

- From what I can tell, the issues I saw on issue trackers and the discussions about being careful about ownership are related to the *capsule* only.
- This is zero-copy, shared memory behaviour that works as expected.
- If the consuming library creates views, it itself is responsible for not letting the base array go out of scope (which would call the deleter) - but that's normal and won't result in unexpected behaviour.
- Producing libraries *may* decide to invalidate the buffer (like JAX does), but they don't *have* to do that. It only matters in case the consumer mutates the data. 

## Proposed Python API

In [None]:
# In the namespace implementing the array API standard

class ndarray():
    def __dlpack__(self):
        # Export a DLPack capsule
        ...
        
        
def from_dlpack(x: array):
    # Get capsule
    capsule = x.__dlpack__()
    
    # Construct own array type (here `ndarray) from capsule
    # Guarantees that the capsule gets consumed exactly once
    ....
    return x_new