# Multi-GPU Mean Calculation

In [10]:
from dask.distributed import Client

import cudf, dask_cudf
from dask_cuml import mean

There are a couple ways to get data into cuml, which will need to be tested:
1. A large cudf object could be created and then passed to dask_cudf
2. The workers are asked to fetch the data directly

Since this will likely be running in a single worker per GPU mode, it will be important that the cuDF's are able to work across the GPUs (e.g. When a very large cuDF is partitioned across the workers- it will be important that the GPU memory is re-allocated to the new worker's local device and de-allocated on the cuDF's old device.)

__Example workflow__:
- User allocates a dask_cudf (or, eventually, a dask_cuml_array) and distributes it across the cluster
- User calls MGMean().calculate(dask_cudf) after the dask_cudf
- MGMean performs redistribution / preprocessing
- MGMean gathers allocations (hostname/device/key triplets) from Dask workers
- MGMean c++ code is executed with the allocation information as its argument


In [11]:
client = Client(n_workers=1, threads_per_worker = 1)

Port 8787 is already in use. 
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.


In [3]:
client

0,1
Client  Scheduler: tcp://127.0.0.1:41318  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 1  Memory: 1.57 GB


In [4]:
import numpy as np
df = cudf.DataFrame([('a', np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0]).astype(np.float32)), ('b', np.array([2.0, 3.0, 4.0, 5.0, 6.0, 7.0]).astype(np.float32))])
dask_df = dask_cudf.from_cudf(df, chunksize = 2)

### Persist the Dataframe to scatter it out to the workers

In [5]:
dask_df = client.persist(dask_df)

In [6]:
print(str(dask_df))

<dask_cudf.DataFrame | 3 tasks | 3 npartitions>


In [7]:
m = mean.MGMean()

In [8]:
result = m.calculate(dask_df)

[c_ulong(139954445358592), c_ulong(139954445358080), c_ulong(139954445357568)]


In [9]:
print(str(result))

      
0  3.5
1  4.5
