# Starting a Dask CPU cluster using the Python module/NERSC Python Jupyter kernel

## Start your Dask CPU cluster in a Slurm job
Before you start your Dask client, we first have to start our Dask cluster
Open a Jupyter terminal (File -> New -> Terminal) or a normal terminal
and enter the following

```shell
salloc -N 2 -n 64 -t 30 -C cpu
```

and wait for your 2-node CPU job to start. When it's ready, we'll use a
script to launch your CPU cluster. By default the script will launch
64 total Dask workers (corresponds to the total number of 
Slurm tasks in your job). You can easily change the number of workers
by changing `-n`. 

```shell
cd $SCRATCH/nersc-notebooks/perlmutter/dask
./launch.sh 
```

Wait a minute or so for your cluster and workers to start. 
Then proceed to the cells below to start and connect your Dask client. 
Make sure you have selected the `NERSC Python` kernel in the top right corner 
of this Jupyter notebook. 

## Now connect to your cluster with Dask client

Now that your cluster and workers are ready, we can connect to them via the Dask client. Let's start it now. Let's also help proxy the connection to the very useful Dask dashboard. 

In [10]:
import dask
from dask.distributed import Client
import os

scheduler_file = os.path.join(os.environ["SCRATCH"], "scheduler_file.json")

dask.config.config["distributed"]["dashboard"]["link"] = "{JUPYTERHUB_SERVICE_PREFIX}proxy/{host}:{port}/status" 

client = Client(scheduler_file=scheduler_file)
client

0,1
Connection method: Scheduler file,Scheduler file: /pscratch/sd/s/sanjeevc/scheduler_file.json
Dashboard: /user/sanjeevc/perlmutter-login-node-base/proxy/10.249.17.7:8787/status,

0,1
Comm: tcp://10.249.17.7:8786,Workers: 0
Dashboard: /user/sanjeevc/perlmutter-login-node-base/proxy/10.249.17.7:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B


2024-10-08 15:52:02,752 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client


## Check your client and connect to the Dashboard

Hopefully some information about the client has appeared. 
You can also click on the link to the Dask dashboard. It will open
in a new tab. 


### Testing Prefect Flow


In [7]:
!prefect config set PREFECT_API_URL="https://ard-modeling-service.slac.stanford.edu/api" 

Set 'PREFECT_API_URL' to 'https://ard-modeling-service.slac.stanford.edu/api'.
[32mUpdated profile 'default'.[0m


In [8]:
from prefect import flow, task
from prefect_dask import DaskTaskRunner

@flow(task_runner=DaskTaskRunner(address=client.scheduler.address))
def workflow(a: float, b: float) -> float:
    output1 = add.submit(a, b)
    output2 = mult.submit(output1, b)
    return output2

@task
def add(a: float, b: float) -> float:
    return a + b

@task
def mult(a: float, b: float) -> float:
    return a * b

In [9]:
output = workflow(1, 2)
print(output.result())

type='unpersisted' artifact_type='result' artifact_description='Unpersisted result of type `float`'
