Some notes on recent experience with the dask-distributed chart. CC @danielfrg, the original author of this chart, and @noelbundick and @martindurant who were involved in finding these issues.
Jupyter environment
It would be good to ensure that the Jupyter container and Dask worker containers share the same environment
Customizable environment
When deploying on different cloud providers we often want to include a few extra packages. For example on Amazon we tend to want s3fs, on Google gcsfs, on Azure, adlfs, or nothing when deploying on non-cloud systems. This also comes up with people want a container with slightly different versions of libraries.
To resolve this I recommend that we include small conda and pip install steps in setup scripts of the Docker container that refer to environment variables that optionally contain a list of extra packages to install. This might also allow us to remove some of the existing packages in the image.
Alternatively @noelbundick recommended having environment variables for each of the cloud providers, which would trigger the installation of packages specific to that cloud platform. I could go either way, with a slight preference to providing a list of explicit packages.
Rename to Dask
Is it likely that the community will ever want a helm chart of the single-machine scheduler? This seems somewhat unlikely to me. I'm inclined to rename this chart to just dask if that's feasible.
Some notes on recent experience with the dask-distributed chart. CC @danielfrg, the original author of this chart, and @noelbundick and @martindurant who were involved in finding these issues.
Jupyter environment
It would be good to ensure that the Jupyter container and Dask worker containers share the same environment
Customizable environment
When deploying on different cloud providers we often want to include a few extra packages. For example on Amazon we tend to want
s3fs, on Googlegcsfs, on Azure,adlfs, or nothing when deploying on non-cloud systems. This also comes up with people want a container with slightly different versions of libraries.To resolve this I recommend that we include small conda and pip install steps in setup scripts of the Docker container that refer to environment variables that optionally contain a list of extra packages to install. This might also allow us to remove some of the existing packages in the image.
Alternatively @noelbundick recommended having environment variables for each of the cloud providers, which would trigger the installation of packages specific to that cloud platform. I could go either way, with a slight preference to providing a list of explicit packages.
Rename to Dask
Is it likely that the community will ever want a helm chart of the single-machine scheduler? This seems somewhat unlikely to me. I'm inclined to rename this chart to just
daskif that's feasible.