<img src="img/saturn_logo.png" width="300" />

# Introduction to PyTorch with Dask

## Welcome!

This workshop is meant to help users of PyTorch for deep learning get familiar with some useful concepts in Dask that can make your work faster and easier. We will specifically be looking at Computer Vision tasks for our examples, but Pytorch and Dask can be used for many other kinds of deep learning cases.

After this workshop, you will know:
* Basics of how Dask works
* How to run inference with a pretrained model on Dask cluster
* How to run transfer learning on Dask cluster


#### _Wait!_

<img src="https://media.giphy.com/media/tHy9Qsv4svHXt7Eiq5/giphy.gif" alt="wait" style="width: 200px;"/>

#### If you are _**not**_ reading this from inside Jupyter Lab in Saturn Cloud, check out the [README.md](README.md) to set up your account and servers.


***

## Saturn Cloud concepts

### Projects

A "Project" is where all the work done in Saturn Cloud resides. Each user can have multiple projects, and these projects can be shared between users. The services associated with each project are called "Resources" and they are organized in the following manner:

```
└── Project
    ├── Jupyter Server (*)
    │   └── Dask Cluster
    ├── Deployment
    │   └── Dask Cluster
```

(*) Every Project has a Jupyter Server, while Dask Clusters and Deployments are optional.

### Images

An "Image" is a Docker image that contains a Python environment to be attached to various Resources. A Project is set to use one Image, and all Resources in that Project will utilize the same Image.

Saturn Cloud includes pre-built images for users to get up and running quickly. Users can create custom images by navigating to the "Images" tab from the Saturn Cloud UI.

### Jupyter Server

This resource runs the Jupyter Notebook and Jupyter Lab interfaces. Most time will likely be spent "inside" one of these Jupyter interfaces. 

### Dask Cluster

A Dask Cluster can be attached to a Jupyter Server to scale out work. Clusters are composed of a scheduler instance and any number of worker instances. Clusters can be created and started/stopped from the Saturn Cloud UI. The [dask-saturn](https://github.com/saturncloud/dask-saturn) package is the interface for working with Dask Clusters in a notebook or script within a Jupyter Server, and can also be used to start, stop, or resize the cluster.

### Deployment

A "Deployment" is a resource that is created to serve an always-on or scheduled workload such as serving a machine learning model, hosting a dashboard via a web app, or an ETL job. It utilizes the same project Image and code from the Jupyter Server, and can optional have its own Dask cluster assigned to it.

Deployments will not be covered in this workshop.

### Code and data files

The filesystem of a Jupyter Server is maintained on persistent volumes, so any code or files created/uploaded will remain there after shutting down the server. 

However, all files are not sent to associated Dask cluster workers or Deployments because those are different machines with their own filesystems. 

**Code**: Code maintained in the `/home/jovyan/project` folder or through the Repositories feature will be sent to the resources when they are turned on. 

**Data files**: Data files should be managed outside of Saturn Cloud in systems such as S3 or a database. This ensures each worker in a Dask cluster has access to the data.

### Advanced settings

Advanced settings for Projects include Environment Variables and Start Scripts. These will not be covered in the workshop, but more information can be found in the [Saturn Cloud docs](https://www.saturncloud.io/docs/getting-started/spinning/jupyter/#advanced-settings).

***

## Hello world

Run the following cell to ensure your Dask cluster is up and running (if it is not yet started, it may take a few minutes to spin up). If you see something like:
```
[2020-12-03 21:39:00] INFO - dask-saturn | Cluster is ready
[2020-12-03 21:39:00] INFO - dask-saturn | Registering default plugins
[2020-12-03 21:39:01] INFO - dask-saturn | {'tcp://10.0.0.150:35343': {'status': 'OK'}, 'tcp://10.0.24.234:35189': {'status': 'OK'}, 'tcp://10.0.3.113:42823': {'status': 'OK'}}
Hello, world!
```
as the output, you are ready to go!

In [None]:
from dask_saturn import SaturnCluster
from dask.distributed import Client

cluster = SaturnCluster(
    n_workers = 3, 
    scheduler_size = 'medium', 
    worker_size = 'p32xlarge',
    nthreads = 8
)
client = Client(cluster)
client.wait_for_workers(3)

print('Hello, world!')

Since we are working on GPU machines for this tutorial, we should check and make sure all our workers and this Jupyter instance have GPU resources.

In [None]:
import torch

torch.cuda.is_available() 

In [None]:
client.run(lambda: torch.cuda.is_available())

## Access data

This workshop will be using the [Stanford Dogs Dataset]( http://vision.stanford.edu/aditya86/ImageNetDogs/), which we have made available on S3, as read-only. No AWS account is required for access. Run the following cell to list the available files in the bucket.

In [None]:
import s3fs

s3 = s3fs.S3FileSystem(anon=True)
s3.glob('s3://saturn-public-data/dogs/Images/*/*.jpg')[-10:]

## Exercises

Some code cells throughout the workshop notebooks will require input based on the concepts being introduced in the notebook. Some cells will have most of the code written out, but indicate places to be filled in. Others will be completely blank and will require you to write a few lines.

Try your best to fill in appropriate code and get the cell to run.

To check your work (or cheat), expand the cell immediately below it. Make sure to run your cell with the correct code or run the hidden cell, as subsequent cells may depend on it. Try it here!

In [None]:
### FILL IN THE BLANKS ###

def hello(name, x):
    print(f"Hello, {name}!")
    print(f"Your result is: {x + 5}")
    
my_name = <<< YOUR NAME HERE >>>
hello(my_name, 10)

In [None]:
def hello(name, x):
    print(f"Hello, {name}!")
    print(f"Your result is: {x + 5}")
    
my_name = "Kip"
hello(my_name, 10)

If you feel comfortable with all that, then we can begin with [Notebook 2](02-dask-basics.ipynb)!

<img src="https://media.giphy.com/media/XZrOvaUvmsCYL31HIe/giphy.gif" alt="go" style="width: 200px;"/>