Skip to content
This repository was archived by the owner on Feb 22, 2022. It is now read-only.
This repository was archived by the owner on Feb 22, 2022. It is now read-only.

Dask-distributed cloud issues #2912

@mrocklin

Description

@mrocklin

Some notes on recent experience with the dask-distributed chart. CC @danielfrg, the original author of this chart, and @noelbundick and @martindurant who were involved in finding these issues.

Jupyter environment

It would be good to ensure that the Jupyter container and Dask worker containers share the same environment

Customizable environment

When deploying on different cloud providers we often want to include a few extra packages. For example on Amazon we tend to want s3fs, on Google gcsfs, on Azure, adlfs, or nothing when deploying on non-cloud systems. This also comes up with people want a container with slightly different versions of libraries.

To resolve this I recommend that we include small conda and pip install steps in setup scripts of the Docker container that refer to environment variables that optionally contain a list of extra packages to install. This might also allow us to remove some of the existing packages in the image.

Alternatively @noelbundick recommended having environment variables for each of the cloud providers, which would trigger the installation of packages specific to that cloud platform. I could go either way, with a slight preference to providing a list of explicit packages.

Rename to Dask

Is it likely that the community will ever want a helm chart of the single-machine scheduler? This seems somewhat unlikely to me. I'm inclined to rename this chart to just dask if that's feasible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions