Heat uses PyTorch and mpi4py to enable memory-distributed array operations on multi-node (including multi-GPU) systems. Let's see what this means in practice.



In [6]:
import numpy as np
import torch

array = np.arange(60).reshape(5,4,3)
tensor = torch.arange(60).reshape(5,4,3)

tensor

tensor([[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17],
         [18, 19, 20],
         [21, 22, 23]],

        [[24, 25, 26],
         [27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]],

        [[36, 37, 38],
         [39, 40, 41],
         [42, 43, 44],
         [45, 46, 47]],

        [[48, 49, 50],
         [51, 52, 53],
         [54, 55, 56],
         [57, 58, 59]]])

Heat implements numpy's API as far as possible. We can create a Heat array (officially `DNDarray` or distributed n-dimensional array) using with the same functions that we use to create numpy arrays. We'll create a 3D DNDarray of integers ranging from 0 to 59 (5 matrices of size (4,3)).

In [5]:
#%%px
import heat as ht
dndarray = ht.arange(60).reshape(5,4,3)
dndarray

DNDarray([[[ 0,  1,  2],
           [ 3,  4,  5],
           [ 6,  7,  8],
           [ 9, 10, 11]],

          [[12, 13, 14],
           [15, 16, 17],
           [18, 19, 20],
           [21, 22, 23]],

          [[24, 25, 26],
           [27, 28, 29],
           [30, 31, 32],
           [33, 34, 35]],

          [[36, 37, 38],
           [39, 40, 41],
           [42, 43, 44],
           [45, 46, 47]],

          [[48, 49, 50],
           [51, 52, 53],
           [54, 55, 56],
           [57, 58, 59]]], dtype=ht.int32, device=cpu:0, split=None)

Notice the additional metadata printed with the DNDarray. With respect to a numpy ndarray, the DNDarray has additional information on the device (in this case, the CPU) and the `split` axis. In the example above, the split axis is `None`, meaning that the DNDarray is not distributed and each MPI process has a full copy of the data.

Let's experiment with a distributed DNDarray: we'll split the same DNDarrayas above, but distributed along the first axis.

In [None]:
%%px
dndarray = ht.arange(60, split=0).reshape(5,4,3)
dndarray

The `split` axis is now 0, meaning that the DNDarray is distributed along the first axis. Each MPI process has a slice of the data along the first axis. In order to see the data on each process, we can print the "local array" via the `larray` attribute.

In [7]:
%%px
dndarray.larray

tensor([[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17],
         [18, 19, 20],
         [21, 22, 23]],

        [[24, 25, 26],
         [27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]],

        [[36, 37, 38],
         [39, 40, 41],
         [42, 43, 44],
         [45, 46, 47]],

        [[48, 49, 50],
         [51, 52, 53],
         [54, 55, 56],
         [57, 58, 59]]], dtype=torch.int32)

Note that the `larray` is a `torch.Tensor` object. This is the underlying tensor that holds the data. The `dndarray` object is an MPI-aware wrapper around these process-local tensors, providing memory-distributed functionality and information.

The DNDarray can be distributed along any axis. Modify the cell above to distribute the DNDarray along a different axis, and see how the `larray`s change. You'll notice that the distributed arrays are always load-balanced, meaning that the data are distributed as evenly as possible across the MPI processes.