Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask gateway configuration problems on pangeo-xxlarge platform #3

Closed
guillaumeeb opened this issue Aug 11, 2022 · 7 comments
Closed

Comments

@guillaumeeb
Copy link
Member

guillaumeeb commented Aug 11, 2022

I thinks this has already been said, but as I'm currently reviewing notebooks on the infrastructure, I just thought I'd open issues to note the problems.

So first, the Dashboard link is not working.

Clicking on the generated Dashboard link, for instance Dashboard: [/services/dask-gateway/clusters/daskhub.e9bff8eab5134c32a5db353c5655c1f1/status](https://pangeo-xxlarge.vm.fedcloud.eu/services/dask-gateway/clusters/daskhub.e9bff8eab5134c32a5db353c5655c1f1/status) leads to a 404 error.

Connecting a client to the cluster generates a version mismatch:

/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/client.py:1274: VersionMismatchWarning: Mismatched versions found

+---------+----------------+----------------+----------------+
| Package | client         | scheduler      | workers        |
+---------+----------------+----------------+----------------+
| lz4     | 4.0.0          | None           | None           |
| pandas  | 1.4.2          | None           | None           |
| python  | 3.9.13.final.0 | 3.10.5.final.0 | 3.10.5.final.0 |
+---------+----------------+----------------+----------------+
  warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))

It's probably because the Docker image used by Jupyterhub for singleuser notebook and dask-gateway for workers is not the same.

@guillaumeeb guillaumeeb changed the title Dask Dashboard is not accessible on pangeo-xxlarge platform Dask gateway configuration problems on pangeo-xxlarge platform Aug 11, 2022
@tinaok
Copy link
Collaborator

tinaok commented Aug 11, 2022

Yes it is due to the docker image on jupyterlab and Dask worker's image is not the same image.
But the warning did not prevent the job run when I tried last time. (and it is a good explanation we can use for tutorial, to show /understand distributed computing)

What was problematic on this configuration last week when I tried for the tutorial are ;

  • daskgateway password is not linked to DaskHub ( in the configuration of last week, one user could shutdown dask cluster of other user's dask cluster)
  • proxy of jupyterhub is not working thus can not have dask-labextension

@j34ni or @annefou might have some update on this, but your experience with kubctrl /jupyter hub might help?

@j34ni
Copy link
Collaborator

j34ni commented Aug 11, 2022

I agree that the fact that the Dask Gateway uses a password instead of JupyterHub to authenticate is an issue for the longer term.
However I find that it is an advantage for the workshop because we will be able to shutdown clusters left running by participants (or multiple clusters opened by mistake) and hence release resources.

@j34ni
Copy link
Collaborator

j34ni commented Aug 11, 2022

@guillaumeeb: The dashboard link now works with the latest versions of the setup (pangeo-foss4g, for instance) when we also install Grafana

@guillaumeeb
Copy link
Member Author

Yes it is due to the docker image on jupyterlab and Dask worker's image is not the same image.
But the warning did not prevent the job run when I tried last time. (and it is a good explanation we can use for tutorial, to show /understand distributed computing)

As I said in pangeo-data/foss4g-2022#20 (comment), I really think at least the images should be the same. Even if in this case versions are sufficiently closed for the Client/Cluster to be working, this really is a bad practice and often can cause unwanted errors.

About the dask-gateway authentication, I concur with @j34ni, this is really not an issue for the workshop, but should be addressed in a longer term.

And if the Dashboard link now works, that's great! @j34ni should I go back to pangeo-foss4g instance to test things?

@guillaumeeb
Copy link
Member Author

@j34ni I finally logged in the front VM of pangeo-foss4g deployment. Looking at the values.yaml file produced by the following command:

sudo helm get values daskhub -n daskhub

It looks like dask-gateway is not enabled on this instance, is that correct?

dask-gateway:
  enabled: false
  gateway:
    auth:
      simple:
        password: pangeo_dask
      type: simple
dask-kubernetes:
  enabled: true
jupyterhub:
  hub:
    baseUrl: /jupyterhub/
    config:
      GenericOAuthenticator:
        allowed_groups:
        - urn:mace:egi.eu:group:vo.pangeo.eu:role=member#aai.egi.eu
        authorize_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/auth
        claim_groups_key: eduperson_entitlement
        client_id: id
        client_secret: secret
        login_service: EGI Check-In
        oauth_callback_url: https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/hub/oauth_callback
        scope:
        - openid
        - email
        - profile
        - eduperson_entitlement
        token_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/token
        userdata_params:
          state: state
        userdata_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo
        username_key: preferred_username
      JupyterHub:
        authenticator_class: generic-oauth
  ingress:
    annotations:
      kubernetes.io/ingress.class: nginx
    enabled: true
  proxy:
    secretToken: hash
  singleuser:
    cpu:
      guarantee: 2
      limit: 4
    image:
      name: pangeo/ml-notebook
      tag: latest
    memory:
      guarantee: 4G
      limit: 16G

Would this be possible to make some tests on one instance or the other, or do you prefer to keep things as is? Currently, I don't have access to pangeo-xxlarge platform.

@j34ni
Copy link
Collaborator

j34ni commented Aug 11, 2022

@guillaumeeb: I did not manage to have a working infrastructure with at the same time EGI Check-in, a dask-gateway and increased CPU & memory limits.
The values.yaml you produced is what was in my email from Tue 2022-08-09 10:30.
Feel free to modify and do as many tests as you want on pangeo-foss4g.

@guillaumeeb
Copy link
Member Author

Closing this one as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants