### **Connecting to the EODC Dask**

This notebook will show how to connect to the EODC Dask in ArgoWorkflows

**First some imports and global settings**

In [None]:
from hera.workflows import script, Parameter, DAG, Workflow
from hera.shared import global_config

global_config.host = "https://dev.services.eodc.eu/workflows/"
global_config.namespace = "<YOUR NAMESPACE>"
global_config.token = "<YOUR TOKEN>"
global_config.image = "ghcr.io/eodcgmbh/cluster_image:2025.2.0"

**Writing scripts**

As the EODC Dask is running on the same cluster as ArgoWorkflows connecting to it works easily by setting the `address` and `proxy_address`. The image specified in `cluster_options.image` has to be the same as the one used for running the script. In a first step we can initialize the cluster and use it in a second step. To make sure we shut down the cluster at the end we need a final step.

In [None]:
@script()
def initialize_dask():
    from dask_gateway import Gateway

    # Connect to the Gateway
    gateway = Gateway(
        address="http://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local",
        proxy_address="tcp://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local:80")
    
    # Define cluster options
    cluster_options = gateway.cluster_options()

    # Set the number of cores per worker
    cluster_options.worker_cores = 8

    # Set the memory per worker (in GB)
    cluster_options.worker_memory = 16

    # Specify the Docker image to use for the workers
    cluster_options.image = "ghcr.io/eodcgmbh/cluster_image:2025.2.0"

    # Create a new cluster with the specified options
    cluster = gateway.new_cluster(cluster_options)

    # Automatically scale the cluster between 1 and 10 workers based on workload
    cluster.adapt(1, 10)  

    # Optionally, scale the cluster to use only one worker
    # cluster.scale(1)

    # Get a Dask client for the cluster
    client = cluster.get_client()

    print(cluster.name)

In [None]:
@script()
def use_dask(cluster_name):
    from dask_gateway import Gateway
    gateway = Gateway(
            address="http://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local",
            proxy_address="tcp://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local:80"
        )
    cluster = gateway.connect(cluster_name)
    client = cluster.get_client()

In [None]:
@script()
def shutdown_cluster():
    from dask_gateway import Gateway
    gateway = Gateway(
            address="http://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local",
            proxy_address="tcp://traefik-dask-gateway-internal.dask-gateway.svc.cluster.local:80"
        )
    
    if gateway.list_clusters():
        cluster = gateway.connect(gateway.list_clusters()[0].name)
        cluster.shutdown()

    else:
        print("No running clusters")

**Creating the Workflow**

For the workflow we need to make sure the cluster shuts down properly after use. To do this, we can define a DAG which will run on exit regardless of if our main workflow is successful.

In [None]:
with Workflow(
    generate_name="using-dask-",
    entrypoint="workflow",
    on_exit="shutdown"
) as w:

    with DAG(name="workflow"):
        init = initialize_dask()
        use = use_dask(arguments={"cluster_name": init.result})

        init >> use

    with DAG(name="shutdown"):
        shutdown_cluster()

**Submitting the Workflow**

In [None]:
w.create()