# Scaling Machine Learning in Python

<img src="img/saturn.png" width="400" />

## Welcome!

In this hands-on workshop, attendees will have the opportunity to see how a standard data science and machine learning workflow using pandas and scikit-learn can be easily parallelized using Dask clusters on Saturn Cloud.

After this workshop you will know:
- When you need parallel computing for your workflow
- How to use Dask Dataframes for loading and cleaning data
- How to perform distributed model training with Dask
- How to scale a hyperparameter search across a cluster
- How to conduct a batch inference task over a cluster

## Start Dask cluster

If you are not reading this from inside Jupyter Lab in Saturn Cloud, check out the [README.md](README.md) to set up your account and servers.

Run the following cell to ensure your Dask cluster is up and running (if it is not yet started, it may take a few minutes to spin up). If you see something like:
```
[2020-11-05 19:23:55] INFO - dask-saturn | Cluster is ready
Hello, world!
```
as the output, you are ready to go! More details will be explained in subsequent notebooks.

In [2]:
from dask_saturn import SaturnCluster
from dask.distributed import Client

cluster = SaturnCluster()
client = Client(cluster)
client.wait_for_workers(3)

print('Hello, world!')

[2020-11-05 19:24:14] INFO - dask-saturn | Cluster is ready
Hello, world!
