Use dask/daskhub helm chart #697

TomAugspurger · 2020-08-26T15:31:50Z

No description provided.

TomAugspurger · 2020-08-26T18:07:15Z

I'm having some trouble connecting to the gateway from within the hub:

ClientConnectorSSLError: Cannot connect to host proxy-public:443 ssl:default [[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1076)]

That's connecting to http://proxy-public/services/dask-gateway/api/v1/clusters/. https://proxy-public/services/dask-gateway also doesn't work. Looking into it now.

TomAugspurger · 2020-08-26T18:58:39Z

I think this demonstrates the issue: On a singleuser pod in the kubernetes cluster, I want to make a request to http://proxy-public/services/dask-gateway/. When https is not enabled things work fine. I'm running into issues when it is enabled:

(notebook) jovyan@jupyter-tomaugspurger:~$ curl -LI http://proxy-public/services/dask-gateway/ -vv 
*   Trying 10.39.254.203:80...
* Connected to proxy-public (10.39.254.203) port 80 (#0)
> HEAD /services/dask-gateway/ HTTP/1.1
> Host: proxy-public
> User-Agent: curl/7.69.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 307 Temporary Redirect
HTTP/1.1 307 Temporary Redirect
< Location: https://proxy-public/services/dask-gateway/
Location: https://proxy-public/services/dask-gateway/
< Date: Wed, 26 Aug 2020 18:53:59 GMT
Date: Wed, 26 Aug 2020 18:53:59 GMT
< Content-Length: 18
Content-Length: 18
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8

< 
* Connection #0 to host proxy-public left intact
* Issue another request to this URL: 'https://proxy-public/services/dask-gateway/'
*   Trying 10.39.254.203:443...
* Connected to proxy-public (10.39.254.203) port 443 (#1)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /srv/conda/envs/notebook/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS alert, internal error (592):
* error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error
* Closing connection 1
curl: (35) error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal error
(notebook) jovyan@jupyter-tomaugspurger:~$

10.39.254.203 is the CLUSTER-IP for proxy-public.

$ kubectl -n staging get svc proxy-public18:56:00 2020
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
proxy-public   LoadBalancer   10.39.254.203   34.69.173.244   443:30970/TCP,80:30477/TCP   55d

@consideRatio do you have any guesses here (just off the top of your head, I'm happy to dig into this myself)? This is being used for the address passed to dask_gateway.Gateway().

consideRatio · 2020-08-27T12:54:13Z

Ah so proxy public is a service that switches target to the autohttps pod if automatic cert acquisition and tls termination is used. Then, the flow is proxy-public svc into autohttps pod into proxy-http service into proxy pod. So, you want to send traffic to proxy-http instead if you have enabled automatic https stuff.

Alternatively, use https traffic against the proxy-public svc for a detour through the autohttps pod.

TomAugspurger · 2020-08-27T13:18:48Z

Ohh thanks! I think when I tried proxy-http earlier I used http://proxy-http/services/dask-gateway:8000 instead of http://proxy-http:8000/services/dask-gateway. 😬

OK, so I'll need to figure out a semi-reliable way of detecting whether https is enabled, and if it is then we'll set the gateway address to http://proxy-http:8000/services/dask-gateway. Thanks!

consideRatio · 2020-08-27T14:23:06Z

If you are a pod in the namespace where the proxy-http service excist, you will have a env var named PROXY_HTTP_SVC i think or something like this, its one of various env var set by k8s kubelet to help containers find the ips etc of various k8s services in the namespace they run.

So, looking for thesr env vars indicates service availavility, which indicates if you should or not go there.

TomAugspurger · 2020-08-27T14:26:08Z

Thanks. I'm going to use PROXY_HTTP_SERVICE_HOST and PROXY_HTTP_SERVICE_PORT. Testing those out now but I think it'll work just fine.

As discovered in pangeo-data/pangeo-cloud-federation#697, the current use of `proxy-public` doesn't work when https is enabled. We detect this and use the appropriate service now.

* Handle https-enabled JupyterHub deployments As discovered in pangeo-data/pangeo-cloud-federation#697, the current use of `proxy-public` doesn't work when https is enabled. We detect this and use the appropriate service now.

TomAugspurger · 2020-08-27T15:48:18Z

OK this should mostly be good to go.

@tjcrone could you update the ooi secrets files to change the top key from pangeo to daskhub?

diff --git a/deployments/gcp-uscentral1b/secrets/staging.yaml b/deployments/gcp-uscentral1b/secrets/staging.yaml
index 3b79dda..257cefe 100644
--- a/deployments/gcp-uscentral1b/secrets/staging.yaml
+++ b/deployments/gcp-uscentral1b/secrets/staging.yaml
@@ -1,4 +1,4 @@
-pangeo:
+daskhub:

Both ooi/secrets/staging.yaml and ooi/secrets.prod.yaml.

TomAugspurger · 2020-08-27T16:07:36Z

cc @scottyhq as well if you have any questions / concerns. In theory there shouldn't really be any changes, other than a few of the environment variables being set for us automatically now.

scottyhq · 2020-08-27T22:46:21Z

Awesome, thanks for making this happen! Linking to pangeo-data/helm-chart#129 for future reference. But merge away!

consideRatio · 2020-08-28T11:43:38Z

deployments/icesat2/config/prod.yaml

+            mem_guarantee: 25G
+            environment: {'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility'}
+            tolerations: [{'key': 'nvidia.com/gpu','operator': 'Equal','value': 'present','effect': 'NoSchedule'}]
+            extra_resource_limits: {"nvidia.com/gpu": "1"}


I think applying this resource request will make the toleration automatically be applied by some controller in k8s, but it wont hurt also manually applying the toleration.

These are from a merge conflict, but @scottyhq might want to take a look at the comment :)

if I understand correctly specifying extra_resource_limits: {"nvidia.com/gpu": "1"} automatically sets tolerations: [{'key': 'nvidia.com/gpu','operator': 'Equal','value': 'present','effect': 'NoSchedule'}] ?

Honestly these settings we're copied over from GCP and we never did two much experimentation to see what was necessary or not. Also things may have changed with more recent AMI versions and CUDA setups. This issue has some additional details jupyterhub/zero-to-jupyterhub-k8s#994 (comment)

consideRatio · 2020-08-28T11:46:50Z

deployments/icesat2/config/staging.yaml

+          kubespawner_override:
+            image: pangeo/base-notebook:master
+        - display_name: "Staging ML-notebook"
+          description: "https://github.com/pangeo-data/pangeo-docker-images/tree/master/ml-notebook"


I think title or description should indicate you get a GPU machine, as that influence how the user may want to manually shut down the pod or so to save some money.

good call. Maybe "ML Notebook with GPU, please only use if you need it ;)"

TomAugspurger added 2 commits August 26, 2020 10:31

Use dask/daskhub helm chart

0c7ae4c

fixup

64d12f1

override for now

980c0b7

Merge remote-tracking branch 'upstream/staging' into daskhub

cfd052c

remove DASK_GATEWAY__ADDRESS

5b39e81

TomAugspurger mentioned this pull request Aug 27, 2020

Handle https-enabled JupyterHub deployments dask/helm-chart#100

Merged

fixup

3d92420

TomAugspurger marked this pull request as ready for review August 27, 2020 15:41

icesat, ooi

5ea9b9d

Merge remote-tracking branch 'upstream/staging' into daskhub

ed5da36

consideRatio reviewed Aug 28, 2020

View reviewed changes

TomAugspurger mentioned this pull request Aug 31, 2020

Re-enable OOI #704

Closed

disable ooi

fdf93b7

TomAugspurger merged commit 941e17b into pangeo-data:staging Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dask/daskhub helm chart #697

Use dask/daskhub helm chart #697

TomAugspurger commented Aug 26, 2020

TomAugspurger commented Aug 26, 2020

TomAugspurger commented Aug 26, 2020

consideRatio commented Aug 27, 2020 •

edited

Loading

TomAugspurger commented Aug 27, 2020

consideRatio commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

scottyhq commented Aug 27, 2020

consideRatio Aug 28, 2020

TomAugspurger Aug 28, 2020

scottyhq Aug 28, 2020

scottyhq Aug 28, 2020

consideRatio Aug 28, 2020

scottyhq Aug 28, 2020

Use dask/daskhub helm chart #697

Use dask/daskhub helm chart #697

Conversation

TomAugspurger commented Aug 26, 2020

TomAugspurger commented Aug 26, 2020

TomAugspurger commented Aug 26, 2020

consideRatio commented Aug 27, 2020 • edited Loading

TomAugspurger commented Aug 27, 2020

consideRatio commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

TomAugspurger commented Aug 27, 2020

scottyhq commented Aug 27, 2020

consideRatio Aug 28, 2020

Choose a reason for hiding this comment

TomAugspurger Aug 28, 2020

Choose a reason for hiding this comment

scottyhq Aug 28, 2020

Choose a reason for hiding this comment

scottyhq Aug 28, 2020

Choose a reason for hiding this comment

consideRatio Aug 28, 2020

Choose a reason for hiding this comment

scottyhq Aug 28, 2020

Choose a reason for hiding this comment

consideRatio commented Aug 27, 2020 •

edited

Loading