Dask cuML contains parallel machine learning algorithms that can make use of multiple GPUs on a single host. It is able to play nicely with other projects in the Dask ecosystem, as well as other RAPIDS projects, such as Dask cuDF.
As an example, the following Python snippet loads input from a csv file into a Dask cuDF Dataframe and Performs a NearestNeighbors query in parallel, on multiple GPUs:
# Create a Dask CUDA cluster w/ one worker per device from dask_cuda import LocalCUDACluster cluster = LocalCUDACluster() # Read CSV file in parallel across workers import dask_cudf df = dask_cudf.read_csv("/path/to/csv") # Fit a NearestNeighbors model and query it from dask_cuml.neighbors import NearestNeighbors nn = NearestNeighbors(n_neighbors = 10) nn.fit(df) nn.kneighbors(df)
Dask CUDA Clusters
Using the LocalCUDACluster()
Clusters of Dask workers can be started in several different ways. One of the simplest methods used in non-CUDA Dask clusters is to use
LocalCluster. For a CUDA variant of the
LocalCluster that works well with Dask cuML, check out the
LocalCUDACluster from the dask-cuda project.
Note: It's important to make sure the
LocalCUDACluster is instantiated in your code before any CUDA contexts are created (eg. before importing Numba or cudf). Otherwise, it's possible that your workers will all be mapped to the same device.
Using the dask-worker command
If you will be starting your workers using the
dask-worker command, Dask cuML requires that each worker has been started with their own unique
For example, a user with a workstation containing 2 devices, would want their workers to be started with the following
CUDA_VISIBLE_DEVICES settings (one per worker):
CUDA_VISIBLE_DEVICES=0,1 dask-worker --nprocs 1 --nthreads 1 scheduler_host:8786
CUDA_VISIBLE_DEVICES=1,0 dask-worker --nprocs 1 --nthreads 1 scheduler_host:8786
This enables each worker to map the device memory of their local cuDFs to separate devices.
Note: If starting Dask workers using
--nprocs 1 must be used.
- Nearest Neighbors
- Linear Regression
More ML algorithms are being worked on.
Dask cuML relies on cuML to be installed. Refer to cuML on Github for more information.
Dask cuML can be installed using the
rapidsai conda channel (if you have CUDA 9.2 installed, change the
conda install -c nvidia -c rapidsai -c conda-forge -c defaults dask-cuml cudatoolkit=10.0
Dask cuML can also be installed using pip.
pip install dask-cuml
Build/Install from Source
Dask cuML depends on:
Dask cuML can be installed with the following command at the root of the repository:
python setup.py install
Tests can be verified using Pytest:
Find out more details on the RAPIDS site
The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.