Skip to content

Commit

Permalink
Document dask/daskhub helm chart (dask#6560)
Browse files Browse the repository at this point in the history
* Update kubernetes docs
  • Loading branch information
TomAugspurger authored and kumarprabhu1988 committed Oct 29, 2020
1 parent fea5d64 commit 6e0eb0f
Show file tree
Hide file tree
Showing 2 changed files with 181 additions and 19 deletions.
183 changes: 170 additions & 13 deletions docs/source/setup/kubernetes-helm.rst
@@ -1,11 +1,12 @@
Kubernetes and Helm
===================

It is easy to launch a Dask cluster and a Jupyter notebook server on cloud
It is easy to launch a Dask cluster and a Jupyter_ notebook server on cloud
resources using Kubernetes_ and Helm_.

.. _Kubernetes: https://kubernetes.io/
.. _Helm: https://helm.sh/
.. _Jupyter: https://jupyter.org/

This is particularly useful when you want to deploy a fresh Python environment
on Cloud services like Amazon Web Services, Google Compute Engine, or
Expand All @@ -15,7 +16,6 @@ If you already have Python environments running in a pre-existing Kubernetes
cluster, then you may prefer the :doc:`Kubernetes native<kubernetes-native>`
documentation, which is a bit lighter weight.


Launch Kubernetes Cluster
-------------------------

Expand All @@ -26,17 +26,16 @@ on one of the common cloud providers like Google, Amazon, or
Microsoft. We recommend the first part of the documentation in the guide
`Zero to JupyterHub <https://zero-to-jupyterhub.readthedocs.io/en/latest/>`_
that focuses on Kubernetes and Helm (you do not need to follow all of these
instructions). Also, JupyterHub is not necessary to deploy Dask:
instructions). In particular, you don't need to install JupyterHub.

- `Creating a Kubernetes Cluster <https://zero-to-jupyterhub.readthedocs.io/en/latest/create-k8s-cluster.html>`_
- `Setting up Helm <https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-helm.html>`_

Alternatively, you may want to experiment with Kubernetes locally using
`Minikube <https://kubernetes.io/docs/getting-started-guides/minikube/>`_.


Helm Install Dask
-----------------
Which Chart is Right for You?
-----------------------------

Dask maintains a Helm chart repository containing various charts for the Dask community
https://helm.dask.org/ .
Expand All @@ -45,7 +44,29 @@ You will need to add this to your known channels and update your local charts::
helm repo add dask https://helm.dask.org/
helm repo update

Now, you can launch Dask on your Kubernetes cluster using the Dask Helm_ chart::
We provides two Helm charts. The right one to choose depends on whether you're
deploying Dask for a single user or for many users.


================ =====================================================================
Helm Chart Use Case
================ =====================================================================
``dask/dask`` Single-user deployment with one notebook server and one Dask Cluster.
``dask/daskhub`` Multi-user deployment with JupyterHub and Dask Gateway.
================ =====================================================================

See :ref:`kubernetes-helm.single` or :ref:`kubernetes-helm.multi` for detailed
instructions on deploying either of these.
As you might suspect, deploying ``dask/daskhub`` is a bit more complicated since
there are more components. If you're just deploying for a single user we'd recommend
using ``dask/dask``.

.. _kubernetes-helm.single:

Helm Install Dask for a Single User
-----------------------------------

Once your Kubernetes cluster is ready, you can deploy dask using the Dask Helm_ chart::

helm install my-dask dask/dask

Expand All @@ -54,7 +75,7 @@ also an optional Jupyter server.


Verify Deployment
-----------------
^^^^^^^^^^^^^^^^^

This might take a minute to deploy. You can check its status with
``kubectl``::
Expand Down Expand Up @@ -93,7 +114,7 @@ Additionally, you can list all active helm deployments with::


Connect to Dask and Jupyter
---------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^

When we ran ``kubectl get services``, we saw some externally visible IPs:

Expand Down Expand Up @@ -129,7 +150,7 @@ variable automatically.
Configure Environment
---------------------
^^^^^^^^^^^^^^^^^^^^^

By default, the Helm deployment launches three workers using one core each and
a standard conda environment. We can customize this environment by creating a
Expand Down Expand Up @@ -186,7 +207,7 @@ list``


Check status and logs
---------------------
^^^^^^^^^^^^^^^^^^^^^

For standard issues, you should be able to see the worker status and logs using the
Dask dashboard (in particular, you can see the worker links from the ``info/`` page).
Expand Down Expand Up @@ -229,7 +250,7 @@ their logs with the following commands:
Delete a Helm deployment
------------------------
^^^^^^^^^^^^^^^^^^^^^^^^
You can always delete a helm deployment using its name::
Expand All @@ -240,11 +261,147 @@ Cloud service (you will need to delete those explicitly).
Avoid the Jupyter Server
------------------------
^^^^^^^^^^^^^^^^^^^^^^^^
Sometimes you do not need to run a Jupyter server alongside your Dask cluster.
.. code-block:: yaml
jupyter:
enabled: false
.. _kubernetes-helm.multi:
Helm Install Dask for Mulitple Users
------------------------------------
The ``dask/daskhub`` Helm Chart deploys JupyterHub_, `Dask Gateway`_, and configures
the two to work well together. In particular, Dask Gateway is registered as
a JupyterHub service so that Dask Gateway can re-use JupyterHub's authentication,
and the JupyterHub environment is configured to connect to the Dask Gateway
without any arguments.
.. note::
The ``dask/daskhub`` helm chart came out of the `Pangeo`_ project, a community
platform for big data geoscience.
.. _Pangeo: http://pangeo.io/
.. _Dask Gateway: https://gateway.dask.org/
.. _JupyterHub: https://jupyterhub.readthedocs.io/en/stable/
The ``dask/daskhub`` helm chart uses the JupyterHub and Dask-Gateway helm charts.
You'll want to consult the `JupyterHub helm documentation <https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-jupyterhub.html>`_ and
and `Dask Gateway helm documentation <https://gateway.dask.org/install-kube.html>`_ for further customization. The default values
are at https://github.com/dask/helm-chart/blob/master/daskhub/values.yaml.
Verify that you've set up a Kubernetes cluster and added Dask's helm charts:
.. code-block:: console
$ helm repo add dask https://helm.dask.org/
$ helm repo update
JupyterHub and Dask Gateway require a few secret tokens. We'll generate them
on the command line and insert the tokens in a ``secrets.yaml`` file that will
be passed to Helm.
Run the following command, and copy the output. This is our `token-1`.
.. code-block:: console
$ openssl rand -hex 32 # generate token-1
Run command again and copy the output again. This is our `token-2`.
.. code-block:: console
$ openssl rand -hex 32 # generate token-2
Now substitute those two values for ``<token-1>`` and ``<token-2>`` below.
Note that ``<token-2>`` is used twice, once for ``jupyterhub.hub.services.dask-gateway.apiToken``, and a second time for ``dask-gateway.gateway.auth.jupyterhub.apiToken``.
.. code-block:: yaml
# file: secrets.yaml
jupyterhub:
proxy:
secretToken: "<token-1>"
hub:
services:
dask-gateway:
apiToken: "<token-2>"
dask-gateway:
gateway:
auth:
jupyterhub:
apiToken: "<token-2>"
Now we're ready to install DaskHub
.. code-block:: console
$ helm upgrade --wait --install --render-subchart-notes \
dhub dask/daskhub \
--values=secrets.yaml
The output explains how to find the IPs for your JupyterHub depoyment.
.. code-block:: console
$ kubectl get service proxy-public
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
proxy-public LoadBalancer 10.43.249.239 35.202.158.223 443:31587/TCP,80:30500/TCP 2m40s
Creating a Dask Cluster
^^^^^^^^^^^^^^^^^^^^^^^
To create a Dask cluster on this deployment, users need to connect to the Dask Gateway
.. code-block:: python
>>> from dask_gateway import GatewayCluster
>>> cluster = GatewayCluster()
>>> client = cluster.get_client()
>>> cluster
Depending on the configuration, users may need to ``cluster.scale(n)`` to
get workers. See https://gateway.dask.org/ for more on Dask Gateway.
Matching the User Environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dask Clients will be running the JupyterHub's singleuser environment. To ensure
that the same environment is used for the scheduler and workers, you can provide
it as a Gateway option and configure the ``singleuser`` environment to default
to the value set by JupyterHub.
.. code-block:: yaml
# config.yaml
jupyterhub:
singleuser:
extraEnv:
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
dask-gateway:
extraConfig:
optionHandler: |
from dask_gateway_server.options import Options, Integer, Float, String
def option_handler(options):
if ":" not in options.image:
raise ValueError("When specifying an image you must also provide a tag")
return {
"image": options.image,
}
c.Backend.cluster_options = Options(
String("image", default="pangeo/base-notebook:2020.07.28", label="Image"),
handler=option_handler,
)
The user environment will need to include ``dask-gateway``. Any packages installed
manually after the ``singleuser`` pod started will not be included in the worker
environment.
17 changes: 11 additions & 6 deletions docs/source/setup/kubernetes.rst
Expand Up @@ -13,23 +13,28 @@ particularly in the cloud. You can use Kubernetes to launch Dask workers in the
following two ways:

1. **Helm**:
You can launch a Dask scheduler, several workers, and an optional Jupyter Notebook server
on a Kubernetes easily using Helm_

You can deploy Dask and (optionally) Jupyter or JupyterHub on Kubernetes
easily using Helm_

.. code-block:: bash
helm repo add dask https://helm.dask.org/ # add the Dask Helm chart repository
helm repo update # get latest Helm charts
helm install dask/dask # deploy standard Dask chart
# For single-user deployments, use dask/dask
helm install dask/dask # deploy standard Dask chart
# For multi-user deployments, use dask/daskhub
helm install dask/daskhub # deploy JupyterHub & Dask
This is a good choice if you want to do the following:

1. Run a managed Dask cluster for a long period of time
2. Also deploy a Jupyter server from which to run code
2. Also deploy a Jupyter / JupyterHub server from which to run code
3. Share the same Dask cluster between many automated services
4. Try out Dask for the first time on a cloud-based system
like Amazon, Google, or Microsoft Azure
(see also our :doc:`Cloud documentation <cloud>`)
like Amazon, Google, or Microsoft Azure where you already have
a Kubernetes cluster. If you don't already have Kubernetes deployed,
see our :doc:`Cloud documentation <cloud>`.

.. note::

Expand Down

0 comments on commit 6e0eb0f

Please sign in to comment.