# Column, Series and DummyColumn

The objective of this notebook is to study the behaviour when moving different objects across GPUs via UCX. We move three different kinds of objects:


1. cuDF `Column` objects (specifically `NumericalColumn` objects)
2. cuDF `Series` objects
3. `DummyColumn` objects (see `dummy.py`)

    `DummyColumn` helps conveniently create objects to test communicating different kinds of objects
    
    #### Examples:
    
    ```python
    a = DummyColumn(size=10000, kind='cupy')  # serializing a yields a CuPy array of size 10000
    b = DummyColumn(size=100, kind='numba')   # serializing b yields a Numba DeviceArray
    ```

## Starting dask scheduler and workers via CLI

Use the commands below to start a dask scheduler and workers respectively. Importantly, the `CUDA_VISIBLE_DEVICES` on the workers should be set in a cyclic fashion, e.g., `2,3` and `3,2`, NOT `2,3` and `2,3`.

```
$ SCHEDULER=1 UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy,cuda_ipc CUDA_VISIBLE_DEVICES=2,3 dask-scheduler --interface ib0 --protocol ucx

$ UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy,cuda_ipc CUDA_VISIBLE_DEVICES=3,2 dask-worker ucx://10.33.225.165:8786 --nthreads=1 --memory-limit 32gb --no-nanny --protocol=ucx --name=worker_0

$ UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy,cuda_ipc CUDA_VISIBLE_DEVICES=2,3 dask-worker ucx://10.33.225.165:8786 --nthreads=1 --memory-limit 32gb --no-nanny --protocol=ucx --name=ashwint_worker_1
```

## Checking nvlink utilization

Use this command to monitor nvlink utilization when communicating device objects:

Here `-i 2` refers to the ID of the GPU to check and `-g 1` refers to a "counter ID" - we've found only `1` to be helpful.

```
$ nvidia-smi nvlink -r 1  # reset counters
$ watch -n1 nvidia-smi nvlink -g 1 -i 2
```

## Some initial set up and checks

In [None]:
import sys, os

In [None]:
base_env = {
    "NOTEBOOK": "1",
    "UCX_RNDV_SCHEME": "put_zcopy",
    "UCX_MEMTYPE_CACHE": "n",
    "UCX_TLS": "rc,cuda_copy,cuda_ipc",
    "CUDA_VISIBLE_DEVICES": "2,3",
}
os.environ.update(base_env)

In [None]:
import dask
import distributed

In [None]:
print(f'Dask: {dask.__file__}')
print(f'Distributed: {distributed.__file__}')

## Connecting to the Dask cluster started on the CLI

In [None]:
from dask.distributed import Client, wait
# from dask_cuda import DGX

#cluster = DGX(CUDA_VISIBLE_DEVICES=[2,3], 
#              dashboard_address='10.33.227.165:8789')
#client = Client(cluster)
client = Client('ucx://10.33.225.165:8786')
client

## Setting CUDA Context on the workers - important!

This needs to be the first thing that happens on all workers

In [None]:
# You can add it to your global config with the following yaml
#     distributed:
#       worker:
#         preload:
#           - dask_cuda.initialize_context
def set_nb_context():
    import numba.cuda
    try:
        numba.cuda.current_context()
    except Exception:
        print("FAILED EXCEPTION!")

In [None]:
client.run(set_nb_context)

## Printing worker environments:

In [None]:
def get_env():
    import os
    return os.environ["CUDA_VISIBLE_DEVICES"]

In [None]:
client.run(get_env)

## Adding `pwd` to `sys.path`

In [None]:
path = os.getcwd()

In [None]:
def set_path(path):
    import sys
    sys.path.append(path)
    return sys.path

In [None]:
result = client.run(set_path, path)

## Getting worker IDs

In [None]:
worker_1, worker_2 = client.scheduler_info()['workers']

In [None]:
import cudf
import numpy as np
import cupy
from dummy import DummyColumn

## Moving DummyColumn objects serializing to cupy - works

This works and also registers on the NVLINK counter.

In [None]:
from dask.distributed import wait

left = client.map(lambda x: DummyColumn(10000, "cupy"), range(100), workers=[worker_1])
right = client.map(lambda x: DummyColumn(10000, "cupy"), range(100), workers=[worker_2])
results = client.map(lambda x,y: (x,y), left, right, priority=10)
_ = wait(results)

## Moving DummyColumn objects serializing to numba device arrays - works

This works and also registers on the NVLINK counter.

In [None]:
from dask.distributed import wait

left = client.map(lambda x: DummyColumn(10000, "numba"), range(100), workers=[worker_1])
right = client.map(lambda x: DummyColumn(10000, "numba"), range(100), workers=[worker_2])
results = client.map(lambda x,y: (x,y), left, right, priority=10)
_ = wait(results)

## Moving DummyColumn objects serializing to RMM backed device arrays - works

This works and also registers on the NVLINK counter.

In [None]:
from dask.distributed import wait

left = client.map(lambda x: DummyColumn(10000, "rmm"), range(100), workers=[worker_1])
right = client.map(lambda x: DummyColumn(10000, "rmm"), range(100), workers=[worker_2])
results = client.map(lambda x,y: (x,y), left, right, priority=10)
_ = wait(results)

## Moving cuDF Column objects - works

This works and also registers on the NVLINK counter.

In [None]:
from dask.distributed import wait

left = client.map(lambda x: cudf.Series(np.arange(10000))._column, range(100), workers=[worker_1])
right = client.map(lambda x: cudf.Series(np.arange(10000))._column, range(100), workers=[worker_2])
results = client.map(lambda x,y: (x,y), left, right, priority=10)
_ = wait(results)

## Moving cuDF Series objects - does NOT work

This does not work, although it *might* register on the NVLINK counter until it hangs.

In [None]:
from dask.distributed import wait

left = client.map(lambda x: cudf.Series(np.arange(10000)), range(100), workers=[worker_1])
right = client.map(lambda x: cudf.Series(np.arange(10000)), range(100), workers=[worker_2])
results = client.map(lambda x,y: (x,y), left, right, priority=10)
_ = wait(results)