# Introduction to Dask: Hello World!
#### By Paul Hendricks
-------

While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For the same reasons, scientific computing and deep learning has turned to NVIDIA GPU acceleration, data analytics and machine learning where GPU acceleration is ideal. 

NVIDIA created RAPIDS – an open-source data analytics and machine learning acceleration platform that leverages GPUs to accelerate computations. RAPIDS is based on Python, has pandas-like and Scikit-Learn-like interfaces, is built on Apache Arrow in-memory data format, and can scale from 1 to multi-GPU to multi-nodes. RAPIDS integrates easily into the world’s most popular data science Python-based workflows. RAPIDS accelerates data science end-to-end – from data prep, to machine learning, to deep learning. And through Arrow, Spark users can easily move data into the RAPIDS platform for acceleration.

In this notebook, we will show how to quickly setup Dask and run a "Hello World" example.

**Table of Contents**

* Setup
* Load Libraries
* Setup Dask
* Hello World!
* Sleeping in Parallel
* Conclusion

## Setup

This notebook was tested using the `rapidsai/rapidsai-dev-nightly:0.10-cuda10.0-devel-ubuntu18.04-py3.7` container from [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai-nightly) and run on the NVIDIA GV100 GPU. Please be aware that your system may be different and you may need to modify the code or install packages to run the below examples. 

If you think you have found a bug or an error, please file an issue here:  https://github.com/rapidsai/notebooks-contrib/issues

Before we begin, let's check out our hardware setup by running the `nvidia-smi` command.

In [None]:
!nvidia-smi

Next, let's see what CUDA version we have:

In [None]:
!nvcc --version

## Load Lbraries

Next, let's load some libraries.

In [None]:
import dask; print('Dask Version:', dask.__version__)
from dask.delayed import delayed
from dask.distributed import Client
import os
import subprocess
import time

## Setup Dask

Dask is a library the allows for parallelized computing. Written in Python, it allows one to schedule tasks dynamically as well handle large data structures - similar to those found in NumPy and Pandas. In the subsequent tutorials, we'll show how to use Dask with Pandas and cuDF and how we can use both to accelerate common ETL tasks as well as build ML models like XGBoost.

To learn more about Dask, check out the documentation here: http://docs.dask.org/en/latest/

Dask operates using a concept of a "Client" and "workers". The client tells the workers what tasks to perform and when to perform. Typically, we set the number of works to be equal to the number of computing resources we have available to us. For example, we might set `n_workers = 8` if we have 8 CPU cores on our machine that can each operate in parallel. This allows us to take advantage of all of our computing resources and enjoy the most benefits from parallelization.

Dask is a first class citizen in the world of General Purpose GPU computing and the RAPIDS ecosystem makes it very easy to use Dask with cuDF and XGBoost. As we see below, we can inititate a Cluster and Client using only few lines of code.

In [None]:
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster()
client = Client(cluster)

Now, let's show our current Dask status. We should see the IP Address for our Scheduler as well the the number of workers in our Cluster. 

In [None]:
# show current Dask status
client

You can also see the status and more information at the Dashboard, found at `http://<scheduler_uri>/status`. You can ignore this for now, we'll dive into this in subsequent tutorials.

## Hello World

Our Dask Client and Dask Workers have been setup. It's time to execute our first program in parallel. We'll define a function that takes some value `x` and adds 5 to it.

In [None]:
def add_x_to_5(x):
    return x + 5

Next, we'll iterate through our `n_workers` and create an execution graph, where each worker is responsible for taking it's ID and passing it to the function `add_x_to_5`. For example, Dask Worker 2 will result in the value 7.

An important thing to note is that the Dask Workers aren't actually executing these results - we're just defining the execution graph for our Dask Client to execute later. The `delayed` function wrapper ensures that this computation is in fact "delayed" and not executed on the spot.

In [None]:
n_workers = 4
results_delayed = [delayed(add_x_to_5)(i) for i in range(n_workers)]

In [None]:
results_delayed

We'll use the Dask Client to compute the results. 

In [None]:
results = client.compute(results_delayed, optimize_graph=False, fifo_timeout="0ms")
time.sleep(1)  # this will give Dask time to execute each worker

In [None]:
results

Note that the results are not the "actual results" of adding 5 to each of `[0, 1, 2, 3]` - we need to collect and print the results. We can do so by calling the `result()` method for each of our results.

In [None]:
print([result.result() for result in results])

## Sleeping in Parallel

To see that Dask is truly executing in parallel, we'll define a function that sleeps for 1 second and returns the string "Success!". In serial, this function will take 4 seconds to execute.

In [None]:
def sleep_1():
    time.sleep(1)
    return 'Success!'

In [None]:
%%time

for _ in range(n_workers):
    sleep_1()

Using Dask, we see that this whole process takes a little over second - each worker is executing in parallel!

In [None]:
%%time

# define delayed execution graph
results_delayed = [delayed(sleep_1)() for _ in range(n_workers)]

# use client to perform computations using execution graph
results = client.compute(results_delayed, optimize_graph=False, fifo_timeout="0ms")

# collect and print results
print([result.result() for result in results])

## Conclusion

To learn more about RAPIDS, be sure to check out: 

* [Open Source Website](http://rapids.ai)
* [GitHub](https://github.com/rapidsai/)
* [Press Release](https://nvidianews.nvidia.com/news/nvidia-introduces-rapids-open-source-gpu-acceleration-platform-for-large-scale-data-analytics-and-machine-learning)
* [NVIDIA Blog](https://blogs.nvidia.com/blog/2018/10/10/rapids-data-science-open-source-community/)
* [Developer Blog](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/)
* [NVIDIA Data Science Webpage](https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/)
