# Distributed Dask tips for coffea-casa

Dask works well at many scales ranging from a single machine to clusters of many machines. In our case we provide each user already preconfigured resource ready to be scale.

![dask distributed](https://docs.dask.org/en/latest/_images/dask-cluster-manager.svg)

The dask Client is the primary entry point for users of `dask.distributed`.

We pre-configured a Dask cluster for you automatically, and you just need to initialize a Client by pointing it to the address of a Scheduler (in coffea-casa it is always `tls://localhost:8786`):

In [20]:
from dask.distributed import Client

client = Client("tls://localhost:8786")
client

0,1
Connection method: Direct,
Dashboard: /user/oksana.shadura@cern.ch/proxy/8787/status,

0,1
Comm: tls://192.168.121.183:8786,Workers: 0
Dashboard: /user/oksana.shadura@cern.ch/proxy/8787/status,Total threads: 0
Started: Just now,Total memory: 0 B


At coffea-casa we are using adaptive scaling to optimise resource usage.

## Dask dependency management plugins

Dask’s plugin system enables you to run custom Python code for certain events. You can use plugins that are specific to schedulers, workers, or nannies. A worker plugin, for example, allows you to run custom Python code on all your workers at certain event in the worker’s lifecycle (e.g. when the worker process is started). Let's check dependency management plugins allowing you to install packages on workers:

In [12]:
from dask.distributed import PipInstall

plugin = PipInstall(packages=["hepconvert"], pip_options=["--upgrade"])

client.register_plugin(plugin)

In [13]:
from dask.distributed import CondaInstall

plugin = CondaInstall(packages=["pyhepmc"])
client.register_plugin(plugin)

Or we can simply execute custom function on worker:

In [46]:
def worker_setup(dask_worker):
    import os
    install_root_packages_cmd = "mamba install -y -c conda-forge root"
    os.system(install_root_packages_cmd)

In [47]:
client.register_worker_callbacks(worker_setup)

{'tls://red-c7123.unl.edu:36519': {'status': 'OK'},
 'tls://red-c7123.unl.edu:41071': {'status': 'OK'},
 'tls://red-c7123.unl.edu:42915': {'status': 'OK'},
 'tls://red-c7123.unl.edu:43757': {'status': 'OK'}}

Or to enable CMSSW environmnet:

In [38]:
 def worker_cmssw_setup(dask_worker):
    import os
    install_cmssw_packages_cmd = "source /cvmfs/cms.cern.ch/cmsset_default.sh; cd /cvmfs/cms.cern.ch/${SCRAM_ARCH}/cms/cmssw/CMSSW_12_6_5; cmsenv"
    os.system(install_cmssw_packages_cmd)

In [39]:
client.register_worker_callbacks(worker_cmssw_setup)

{'tls://red-c7123.unl.edu:36519': {'status': 'OK'},
 'tls://red-c7123.unl.edu:41071': {'status': 'OK'},
 'tls://red-c7123.unl.edu:42915': {'status': 'OK'},
 'tls://red-c7123.unl.edu:43757': {'status': 'OK'}}

Or enable environment variable:

In [24]:
def set_env(dask_worker):
        import pathlib, os
        path = str(pathlib.Path(dask_worker.local_directory))
        os.environ["HOME_DIR"] = path

In [25]:
client.register_worker_callbacks(set_env)

{'tls://red-c7123.unl.edu:36519': {'status': 'OK'},
 'tls://red-c7123.unl.edu:41071': {'status': 'OK'},
 'tls://red-c7123.unl.edu:42915': {'status': 'OK'},
 'tls://red-c7123.unl.edu:43757': {'status': 'OK'}}

In [29]:
client.run(lambda: os.environ["HOME_DIR"])

{'tls://red-c7123.unl.edu:36519': '/var/lib/condor/execute/dir_3977614/dask-scratch-space/worker-n7lkp1vq',
 'tls://red-c7123.unl.edu:41071': '/var/lib/condor/execute/dir_3977616/dask-scratch-space/worker-w99g3haf',
 'tls://red-c7123.unl.edu:42915': '/var/lib/condor/execute/dir_3977612/dask-scratch-space/worker-3y8tmbol',
 'tls://red-c7123.unl.edu:43757': '/var/lib/condor/execute/dir_3977611/dask-scratch-space/worker-zyiqagh8'}

You can simply create your plugins:

In [25]:
from dask.distributed import WorkerPlugin
class ErrorLogger(WorkerPlugin):
    def __init__(self, logger):
        self.logger = logger

    def setup(self, worker):
        self.worker = worker

    def transition(self, key, start, finish, *args, **kwargs):
        if finish == 'error':
            ts = self.worker.tasks[key]
            exc_info = (type(ts.exception), ts.exception, ts.traceback)
            self.logger.error(
                "Error during computation of '%s'.", key,
                exc_info=exc_info
            )

In [26]:
import logging
plugin = ErrorLogger(logging)
client.register_plugin(plugin) 

{'tls://red-c7123.unl.edu:35675': {'status': 'OK'},
 'tls://red-c7123.unl.edu:37019': {'status': 'OK'}}

Or you can upload your file using plugin `UploadFile` (`UploadDirectory` doesnt work for now on coffea-casa, we are fixing it):

In [46]:
from distributed.diagnostics.plugin import UploadFile

client.register_plugin(UploadFile("test_cc.py"))  

{'tls://red-c7123.unl.edu:35675': {'status': 'OK'},
 'tls://red-c7123.unl.edu:37019': {'status': 'OK'}}