# Use External Dask Cluster from Local Environment

In this example, you'll learn how to create and interact with a Dask cluster from your local environment using the Saturn Cloud service. This allows you to skip interacting with the Saturn Cloud UI almost entirely, if you want to.

While we're using Jupyter locally to demonstrate, you can apply this technique for scripting or other kinds of ML workflows. 

<img src="dask-cluster.png" width = 500px>

As this diagram illustrates, the pieces in the gray box constitute the cluster, and that's what will be hosted on Saturn Cloud. Instead of the pink box (the Client) being a Jupyter instance also on Saturn Cloud, this will be your local machine.


> This tutorial does not go into great detail about the underlying concepts of Dask, but we have [reference material for those who need more information](https://www.saturncloud.io/docs/reference/dask_concepts/).

## Setup 

If you haven't already, make sure to <a href="https://www.saturncloud.io/docs/getting-started/start_in_ten/" target='_blank' rel='noopener'>create a Saturn Cloud account</a> before you begin.

Once you have logged in to your account, you'll be brought to the Saturn Cloud projects page. Click "Create Custom Project".

Give the project a name (ex: "ml-demo"), but you can leave all other settings as their defaults. Then click "Create".

After the project is created you'll be brought to that project's page. At this point you'll need to retrieve two ID values:

- **`project_id`** - the id for this particular project. You can get this from the URL of the project page. For example: `https://app.community.saturnenterprise.io/dash/projects/a753517c0d4b40b598823cb759a83f50` has the project_id: `a753517c0d4b40b598823cb759a83f50`.
- **`user_id`** - the ID that identifies you as a valid user in Saturn Cloud. Go to [https://app.community.saturnenterprise.io/api/user/token](https://app.community.saturnenterprise.io/api/user/token) and save the page as `token.json`. Do not share this file with others.

> **Protect your token, as it allows access to your account!**

***

## Connect to Saturn Cloud Project

Using the following code chunk, load your user token file into your notebook environment. We'll reference this later.

In [None]:
# Load token
import json

with open('config.json') as f:
  data = json.load(f)

In [None]:
project_id = [INSERT YOUR TOKEN]

Now you are ready to connect your local workspace to your Saturn Cloud project, allowing you to interact with it from this notebook. Your `user_id` is required (here shown as `data['token']`), as well as the `project_id` discussed earlier.

In [None]:
from dask_saturn.external import ExternalConnection
from dask_saturn import SaturnCluster
import dask_saturn
from dask.distributed import Client, progress

conn = ExternalConnection(
    project_id=project_id,
    base_url='https://app.community.saturnenterprise.io',
    saturn_token=data['token']
)
conn

## Set Up Cluster

Finally, you are ready to set up a cluster in this project! You'll see info messages logging here until the cluster is started and ready to use.

If you have a cluster already created on the project, here you can just start it up without creating a new one, using this same code. You can also ask it to change size using `cluster.scale()`. For more details, we have [documentation about managing clusters](https://www.saturncloud.io/docs/getting-started/create_cluster/).

In [None]:
cluster = SaturnCluster(
    external_connection=conn,
    n_workers=4,
    worker_size='8xlarge',
    scheduler_size='2xlarge',
    nthreads=32,
    worker_is_spot=False)


## Create Client Object

This lets us connect from our local environment to this new cluster, and when we call the object it gives us a link to the Dask Dashboard for that cluster. We can watch at this link to see how the cluster is behaving.

In [None]:
client = Client(cluster)
client.wait_for_workers(4)
client

***

### Environment Management
When using Saturn Cloud resources remotely, you should pay close attention to the versions of packages in your local environment and in the remote Saturn Cloud image. If your local workspace has a different image, including different packages or versions, than the Saturn resources, you'll need to resolve that before running Dask code or using your cluster.

To fix this easily, the first thing we recommend is checking that your local notebook kernel has the same versions of certain key libraries that your Saturn Cloud cluster image does, after you get things set up as we have discussed. These are some of the key libraries that must be matching for you to use Dask and Saturn Cloud resources smoothly.

* pandas: 1.2.3 or better
* dask: 2.30.0 or better
* distributed: 2.30.1 or better
* dask-saturn: 0.2.2 or better

All of this can be done with `pip` or `conda`. In many Jupyter Notebooks, you can use the `%pip` magic in regular code chunks to run these commands.
To find out about some conflicts early, you can run `client.get_versions(check=True)` after you set up your Saturn client object. _But that check won't tell you about pandas conflicts, so don't forget pandas!_

***

Now you're ready to run code from your local environment, using remote Dask clusters! See our examples to try it yourself.