<img style="float: right" src="img/saturn.png" width="300" />

# Scaling Machine Learning in Python

## Discussion

In this workshop, we covered:

- How to use Dask Dataframes for loading and cleaning data
- How to perform distributed model training with Dask
- How to scale a hyperparameter search across a cluster
- How to conduct a batch inference task over a cluster

## Dask concepts

We introduced several key Dask concepts by walking through the machine learning workflows. The workflow examples were specific to the NYC taxi data, but you can utilize these concepts to parallelize just about any use case with Dask.

- Initialize Dask: `SaturnCluster` and `Client`
- Dask's lazy evaluation: `.compute()`, `.persist()`, and `wait()`
- `dask.delayed` functions: when processing doesn't fit into `dask.dataframe` or `dask.array` classes
- Dask DataFrames: parallel pandas DataFrames
    - `map_partitions`: execute arbitrary functions
- Dask Joblib backend: parallelize scikit-learn algorithms
- Dask futures: execute functions remotely

## Could we use a large Jupyter Server instead of a Dask Cluster?

Dask will work with a `LocalCluster` on any sized machine by not passing a cluster to the `Client` object

In [None]:
from dask.distributed import Client
client = Client()
client

The Dashboard link displayed here will not load because the Jupyter Server is executing within Saturn Cloud. You can access it via the Jupyter proxy by copying the URL from this JupyterLab browser window and replacing `/lab/*` with `/proxy/8787/status`.

In Saturn Cloud, you can get a Jupyter Server with up to 64 cores and 512 GB of memory. However, we recommend using Dask Clusters because they can scale out to many more machines and handle even more computational or data-intensive workloads.

## What's next?

### RAPIDS

[RAPIDS](http://rapids.ai/) is an exciting project that accelerates data science workloads on the GPU, and parallelizes to multiple GPUs with Dask.

### Deep learning

Dask can scale out popular deep learning tools like TensorFlow and PyTorch.