Pod containing spawned server dies regularly #1430

tarekmehrez · 2019-09-30T10:43:08Z

The spawned server for the authenticated user dies constantly whether it's idle or not (running a notebook). Is that behaviour expected? Is there a configurable parameter that affects the lifespan of the spawned server?

All other pods (hub, proxy, puller, .. etc) have been running for more than 30 days so it's hard to figure out why are all spawned servers being constantly killed.

Helm chart version: jupyterhub-0.9-445a953

consideRatio · 2019-09-30T20:33:33Z

There is a mechanism, see the documentation about culling of user pods and note the default values:

zero-to-jupyterhub-k8s/jupyterhub/values.yaml

Lines 333 to 339 in 83f7764

    
           cull: 
        
             enabled: true 
        
             users: false 
        
             timeout: 3600 
        
             every: 600 
        
             concurrency: 10 
        
             maxAge: 0

But, is this why they are dying?

If you doubt that, inspect what goes on by looking in the logs of the hub or the user pod if you can manage to capture these logs on error before the pod terminated and no longer available. I think you will find culling activity within the hub log.

kubectl get -n <your namespace> deploy/hub
kubectl get-n <your namespace> pod/jupyter-someuser
kubectl logs -n <your namespace> deploy/hub
kubectl logs -n <your namespace> pod/jupyter-someuser
kubectl describe -n <your namespace> deploy/hub
kubectl describe -n <your namespace> pod/jupyter-someuser

tarekmehrez · 2019-10-01T08:54:07Z

So even when a kernel is still running in the background, if the browser is closed and culling is enabled, these pods will be killed?
That might explain the behaviour I am seeing.

aguinaldoabbj · 2019-11-27T12:38:28Z

There is a mechanism, see the documentation about culling of user pods and note the default values:

zero-to-jupyterhub-k8s/jupyterhub/values.yaml

Lines 333 to 339 in 83f7764

cull:

enabled: true

users: false

timeout: 3600

every: 600

concurrency: 10

maxAge: 0

But, is this why they are dying?

If you doubt that, inspect what goes on by looking in the logs of the hub or the user pod if you can manage to capture these logs on error before the pod terminated and no longer available. I think you will find culling activity within the hub log.
kubectl get -n <your namespace> deploy/hub
kubectl get-n <your namespace> pod/jupyter-someuser
kubectl logs -n <your namespace> deploy/hub
kubectl logs -n <your namespace> pod/jupyter-someuser
kubectl describe -n <your namespace> deploy/hub
kubectl describe -n <your namespace> pod/jupyter-someuser

It's a reasonable mechanism, though. However, there's a sync problem between jupyterhub and kubernetes. Pod is culled but jupyterhub seems to be unaware of that. When the user logs in again, an error 503 appears and user can't spawn a new pod.

consideRatio · 2019-11-27T13:17:34Z

I'm never getting enough clarity about this, but there are two cullers in play. One by jupyterhub, and perhaps also one by the jupyter server created by jupyterhub.

The culling by jupyterhub is not communicating with KubeSpawner which is the logic within JupyterHub on how being aware of what servers are running etc though, and this may be the crux.

I don't think this is an issue in the latest version of the helm chart though, or even 0.8.2, as I think they have a check every 30 seconds to verify if everything as as it should regarding users and their routes.

What version of the jupyterhub helm chart are you using where you experience this @aguinaldoabbj?

aguinaldoabbj · 2019-11-27T13:41:09Z

Jupyterhub helm chart version 0.8.2 and app version 0.9.6 on Kubernetes 1.16.

consideRatio · 2019-11-27T13:49:20Z

:O Hmmm can you run 0.8.2 on k8s 1.16? Is that your kubectl client rather than the cluster itself?

aguinaldoabbj · 2019-11-27T14:00:20Z

No, I have upgraded the whole cluster to 1.16. The only big problem I had with that was hotfixed with the workaround suggested in jupyterhub/kubespawner#354

consideRatio · 2020-10-07T14:48:10Z

I'll go ahead and close this issue, as I don't have sufficient information to conclude a concrete action. It is an issue that I've not seen reported before though, so I suspect this is unrelated to the Helm chart.

consideRatio closed this as completed Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod containing spawned server dies regularly #1430

Pod containing spawned server dies regularly #1430

tarekmehrez commented Sep 30, 2019 •

edited

consideRatio commented Sep 30, 2019

tarekmehrez commented Oct 1, 2019

aguinaldoabbj commented Nov 27, 2019

consideRatio commented Nov 27, 2019

aguinaldoabbj commented Nov 27, 2019

consideRatio commented Nov 27, 2019

aguinaldoabbj commented Nov 27, 2019 •

edited

consideRatio commented Oct 7, 2020

Pod containing spawned server dies regularly #1430

Pod containing spawned server dies regularly #1430

Comments

tarekmehrez commented Sep 30, 2019 • edited

consideRatio commented Sep 30, 2019

tarekmehrez commented Oct 1, 2019

aguinaldoabbj commented Nov 27, 2019

consideRatio commented Nov 27, 2019

aguinaldoabbj commented Nov 27, 2019

consideRatio commented Nov 27, 2019

aguinaldoabbj commented Nov 27, 2019 • edited

consideRatio commented Oct 7, 2020

tarekmehrez commented Sep 30, 2019 •

edited

aguinaldoabbj commented Nov 27, 2019 •

edited