Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Evictions (memory, node shutodowns) - requires manual hub restart! #223

Closed
consideRatio opened this issue Aug 1, 2018 · 19 comments
Closed
Labels

Comments

@consideRatio
Copy link
Member

About

We have an issue when pods are removed unexpectedly that seems to require a hub restart. This could happen if a node fails, is preempted, or an admin deletes a user pod I think. This is really bad I'd say as it requires the specific user that has this problem to contact and administrator and then the administrator has to restart the hub to solve it.

I deem this vital to fix as preemtible nodes (google) and spot nodes (amazon) are an awesome way to reduce cost, but usage of them risks causing this kind of huge trouble right now!

I'm still lacking some insight in whats going on in the proxy and the user management within JupyterHub though so I'm hoping someone can pick this issue up from my writeup. @minrk or @yuvipanda perhaps you know someone to /cc or could give me a pointer on where to look to solve this?

Experience summary

  1. Pod dies
  2. "503 - Service Unavailable"
  3. hub/admin stop pod
  4. "Your server is stopping"
  5. Distracting consequence
  6. Issue still not resolved
  7. Hub restart
  8. Everything is fine

Experience log

1. A preemptible node was reclaimed and my user pod was lost

  • The user pod name was erik-2esundell
  • The user name was erik.sundell

2. Later - I visit the hub

image


3. Directly after - I visit hub/admin and press the stop server button

  • The button turns blue and is labeled start server again

image

  • The hub logs sais
[I 2018-08-01 09:36:05.670 JupyterHub proxy:264] Removing user erik.sundell from proxy (/user/erik.sundell/)                                                                                                                 
[I 2018-08-01 09:36:05.675 JupyterHub spawner:1644] Deleting pod jupyter-erik-2esundell                                                                                                                                      
[W 2018-08-01 09:36:05.688 JupyterHub spawner:1657] No pod jupyter-erik-2esundell to delete. Assuming already deleted.                                                                                                       
[W 2018-08-01 09:36:15.671 JupyterHub base:751] User erik.sundell: server is slow to stop
[I 2018-08-01 09:36:15.672 JupyterHub log:158] 202 DELETE /hub/api/users/erik.sundell/server (erik.sundell@10.20.0.1) 10013.08ms
  • The proxy chp logs sais
09:36:05.673 - info: [ConfigProxy] Removing route /user/erik.sundell
09:36:05.674 - info: [ConfigProxy] 204 DELETE /api/routes/user/erik.sundell
  • If I would press start server, this would show
    image

4. I revisit my singleuser server

  • What I see
    image

  • What the hub log sais after a while

[I 2018-08-01 09:36:50.337 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:36:52.736 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:36:57.838 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:36:58.940 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:04.053 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:09.154 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:14.235 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:19.374 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:24.450 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:29.546 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:34.699 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:36.529 JupyterHub proxy:301] Checking routes                                                                                                                                                             
[I 2018-08-01 09:37:39.774 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:44.877 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:49.953 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:37:55.064 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:38:00.139 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:38:05.262 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:38:10.430 JupyterHub base:978] erik.sundell is pending stop                                                                                                                                                 
[I 2018-08-01 09:38:15.512 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:20.614 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:25.705 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:28.067 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:29.443 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:34.546 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:36.529 JupyterHub proxy:301] Checking routes
[I 2018-08-01 09:38:39.654 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:44.747 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:49.873 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:38:54.987 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:00.088 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:05.170 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:10.253 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:15.341 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:20.452 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:25.532 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:30.642 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:35.720 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:36.529 JupyterHub proxy:301] Checking routes
[I 2018-08-01 09:39:40.825 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:45.924 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:51.031 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:39:56.143 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:01.262 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:06.360 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:11.459 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:14.980 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:17.269 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:22.437 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:27.544 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:32.660 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:36.529 JupyterHub proxy:301] Checking routes
[I 2018-08-01 09:40:37.750 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:42.819 JupyterHub base:978] erik.sundell is pending stop
[I 2018-08-01 09:40:47.880 JupyterHub base:978] erik.sundell is pending stop

[E 2018-08-01 09:40:50.170 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.stop_single_user.<locals>.stop() done, defined at /usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py:730> exception=TimeoutError('pod/jupyter-erik-2esundell did not disappear in 300 seconds!',)> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 740, in stop
        await user.stop(name)
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 548, in stop
        await spawner.stop()
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1664, in stop
        timeout=self.start_timeout
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/utils.py", line 155, in exponential_backoff
        raise TimeoutError(fail_message)
    TimeoutError: pod/jupyter-erik-2esundell did not disappear in 300 seconds!

5. I look again

  • This is now what I see
    image

  • I think it is because excessive refreshing of the pending stop page, once for every log message entry in the written point above.


6. I refresh

  • This is what I see
    image

  • This is what the proxy chp logs say - an route was added, but it is unreachable

09:41:36.531 - info: [ConfigProxy] Adding route /user/erik.sundell -> http://10.20.1.4:8888
09:41:36.532 - info: [ConfigProxy] 201 POST /api/routes/user/erik.sundell 
09:42:36.528 - info: [ConfigProxy] 200 GET /api/routes 
09:43:36.528 - info: [ConfigProxy] 200 GET /api/routes 
09:44:36.529 - info: [ConfigProxy] 200 GET /api/routes 
09:45:36.529 - info: [ConfigProxy] 200 GET /api/routes 
09:46:36.529 - info: [ConfigProxy] 200 GET /api/routes 
09:47:36.529 - info: [ConfigProxy] 200 GET /api/routes 
09:48:19.959 - error: [ConfigProxy] 503 GET /user/erik.sundell/ Error: connect EHOSTUNREACH 10.20.1.4:8888
    at Object.exports._errnoException (util.js:1020:11)
    at exports._exceptionWithHostPort (util.js:1043:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1086:14)

7. I restart the hub

  • The hub logs after restart
[I 2018-08-01 10:17:10.079 JupyterHub app:1667] Using Authenticator: builtins.GenericAuthenticator
[I 2018-08-01 10:17:10.079 JupyterHub app:1667] Using Spawner: kubespawner.spawner.KubeSpawner
[I 2018-08-01 10:17:10.080 JupyterHub app:1010] Loading cookie_secret from env[JPY_COOKIE_SECRET]
[W 2018-08-01 10:17:10.155 JupyterHub app:1129] JupyterHub.hub_connect_port is deprecated as of 0.9. Use JupyterHub.hub_connect_url to fully specify the URL for connecting to the Hub.
[I 2018-08-01 10:17:10.184 JupyterHub app:1199] Not using whitelist. Any authenticated user will be allowed.
[I 2018-08-01 10:17:10.237 JupyterHub reflector:176] watching for pods with label selector component=singleuser-server / field selector  in namespace jh
[W 2018-08-01 10:17:10.263 JupyterHub app:1460] erik.sundell appears to have stopped while the Hub was down
[W 2018-08-01 10:17:10.317 JupyterHub app:1513] Deleting OAuth client jupyterhub-user-erik-sundell
[I 2018-08-01 10:17:10.329 JupyterHub app:1849] Hub API listening on http://0.0.0.0:8081/hub/
[I 2018-08-01 10:17:10.329 JupyterHub app:1851] Private Hub API connect url http://10.0.8.147:8081/hub/
[I 2018-08-01 10:17:10.330 JupyterHub app:1864] Not starting proxy
[I 2018-08-01 10:17:10.334 JupyterHub proxy:301] Checking routes
[W 2018-08-01 10:17:10.335 JupyterHub proxy:363] Deleting stale route /user/erik.sundell/
[I 2018-08-01 10:17:10.337 JupyterHub app:1906] JupyterHub is now running at http://10.0.2.156:80/
[I 2018-08-01 10:18:10.341 JupyterHub proxy:301] Checking routes

8. I login again and everything works

  • The hub logs after restart and login
[I 2018-08-01 10:21:49.143 JupyterHub base:499] User logged in: erik.sundell
[I 2018-08-01 10:21:49.145 JupyterHub log:158] 302 GET /hub/oauth_callback?code=[secret]&state=[secret]&session_state=[secret] -> /user/erik.sundell/ (@10.20.0.1) 630.45ms
[I 2018-08-01 10:21:49.185 JupyterHub log:158] 302 GET /user/erik.sundell/ -> /hub/user/erik.sundell/ (@10.20.0.1) 1.56ms
[I 2018-08-01 10:21:49.322 JupyterHub spawner:1550] PVC claim-erik-2esundell already exists, so did not create new pvc.
[W 2018-08-01 10:21:59.231 JupyterHub base:679] User erik.sundell is slow to start (timeout=10)
[I 2018-08-01 10:21:59.231 JupyterHub base:1016] erik.sundell is pending spawn
[I 2018-08-01 10:21:59.238 JupyterHub log:158] 200 GET /hub/user/erik.sundell/ (erik.sundell@10.20.0.1) 10015.25ms
[I 2018-08-01 10:22:00.703 JupyterHub log:158] 200 GET /hub/api (@10.20.1.4) 0.75ms
[I 2018-08-01 10:22:03.854 JupyterHub base:628] User erik.sundell took 14.623 seconds to start
[I 2018-08-01 10:22:03.854 JupyterHub proxy:242] Adding user erik.sundell to proxy /user/erik.sundell/ => http://10.20.1.4:8888
[I 2018-08-01 10:22:03.857 JupyterHub users:510] Server erik.sundell is ready
@jgerardsimcock
Copy link

I can confirm that we experience this frequently and my solution is always to restart the hub. We are on GKE.

@consideRatio consideRatio changed the title Unexpected pod deaths - Super troubling UX Pod Evictions (memory, node shutodowns) - requires manual hub restart! Aug 9, 2018
@consideRatio
Copy link
Member Author

consideRatio commented Aug 9, 2018

@minrk @betatim @choldgraf this looks suspicious to me

1. Only DELETED state

We don't look for EVICTED state for example. I can look into exactly what states you can be in and what states will overlap, because they will. For example, you will be considered Pending even though you are ContainerCreating in the Status I recall. Perhaps we can find one state, including DELETED and EVICTED at the same time?

if ev['type'] == 'DELETED':
# This is an atomic delete operation on the dictionary!
self.resources.pop(resource.metadata.name, None)

2. Abort in finally

A finally statement has a stop and a break within it even though the except statements have continue etc... Hmmm... UPDATE: oops okay hmmm perhaps not an issue, I jumped the gun.

finally:
w.stop()
if self._stop_event.is_set():
self.log.info("%s watcher stopped", self.kind)
break

For reference, this is what happens if you have an try / except / else issue...

Try -> Except -> Finally

image

Try -> Else -> Finally

image

@consideRatio
Copy link
Member Author

@minrk helped me narrow the space where this needs to be fixed:

spawner.py@poll, or the function called from that including spawner.py@stop.

@kakal
Copy link

kakal commented Mar 15, 2019

Hi, @consideRatio
Is there any progress on this issue?
I have the same problem that have to restart the hub menully.
Would you please give me some suggestion for this defect?

@consideRatio
Copy link
Member Author

Hi @kakal! I'm happy to see some activity for this issue as I think it is very important. I have not done any additional work to resolve this since. Hoping to get back to it.

@consideRatio
Copy link
Member Author

consideRatio commented Oct 9, 2019

From check_spawner, kubespawner's poll is invoked, and after it returns check_spawner will decide to kill the evicted pod.

def poll(self):
"""
Check if the pod is still running.
Uses the same interface as subprocess.Popen.poll(): if the pod is
still running, returns None. If the pod has exited, return the
exit code if we can determine it, or 1 if it has exited but we
don't know how. These are the return values JupyterHub expects.
Note that a clean exit will have an exit code of zero, so it is
necessary to check that the returned value is None, rather than
just Falsy, to determine that the pod is still running.
"""
# have to wait for first load of data before we have a valid answer
if not self.pod_reflector.first_load_future.done():
yield self.pod_reflector.first_load_future
data = self.pod_reflector.pods.get(self.pod_name, None)
if data is not None:
if data.status.phase == 'Pending':
return None
ctr_stat = data.status.container_statuses
if ctr_stat is None: # No status, no container (we hope)
# This seems to happen when a pod is idle-culled.
return 1
for c in ctr_stat:
# return exit code if notebook container has terminated
if c.name == 'notebook':
if c.state.terminated:
# call self.stop to delete the pod
if self.delete_stopped_pods:
yield self.stop(now=True)
return c.state.terminated.exit_code
break
# None means pod is running or starting up
return None
# pod doesn't exist or has been deleted
return 1

What needs to happen, is that kubespawner or the hub realize that pods become evicted.

Questions raised

  • Is there a mechanism within kubespawner, or jupyterhub, that repeatedly probes if the spawned servers are running? Yes, JupyterHub only, it uses the spawner's poll method every 30 seconds according to the default poll_interval.
  • Is there a mechanism within kubespawner that would realize if the spawned servers shuts down? If so, does this catch evictions as well? Maybe...
    • A poll will shut down the pod with self.stop(now=True) if it finds a pod with a notebook container in a terminated state. But, it could also simply return 1 to indicate that it has an exit code of 1 (as if it were a system process).
    • Followup question: would reporting 1 back from poll when the pod has been evicted lead to the observed behavior?
  • Is the kubespawner's poll stopping the server, or is it jupyterhub, or both async?
  • Would a evicted pod show up in the list of self.pod_reflector.pods?
  • Is poll called in conjunction to a spawn somehow?

@consideRatio
Copy link
Member Author

consideRatio commented Oct 9, 2019

The pod reflector - code base investigation

The pod reflector will watch and update itself, and it will remove things from its list if it has

@consideRatio
Copy link
Member Author

consideRatio commented Oct 9, 2019

A Python 3 memory bomb for debugging

No... This failed. The process was instead killed. Hmmm... I ran into the "out of memory killer" or "OOM killer".

#!/usr/bin/env python
  
import sys
import time

if len(sys.argv) != 2:
    print("usage: fillmem <number-of-megabytes>")
    sys.exit()

count = int(sys.argv[1])

megabyte = (0,) * (1024 * 128)

data = megabyte * count

while True:
    time.sleep(1)

image

@rkdarst
Copy link

rkdarst commented Oct 9, 2019

For reference, I'm quite sure I'm seeing this too. I don't have any extra help in debugging, other than the database showed pod_name in the state dict still when it wasn't running anymore. Next week I'll be able to investigate more, can't do more this week.

@consideRatio
Copy link
Member Author

I think what happens is that we consider the pod terminated, but it remains in a Failed state.

kubernetes/kubernetes#54525 (comment)

In the poll function there is this section, note the check for ctr_status, I think that is missing for pods that has entered a Failed or Succeeded state where all containers have terminated according to specifications of these pod phases.

if data.status.phase == 'Pending':
return None
ctr_stat = data.status.container_statuses
if ctr_stat is None: # No status, no container (we hope)
# This seems to happen when a pod is idle-culled.
return 1

If a pod is evicted due to memory preassure etc, the pod will enter a Failed state, but the key difference from being culled into a Succeeded (?) state, is that the pod with the Failed state will remain and not be cleaned up quickly. It will have a pod phase reason being "Evicted" I think. So if we to surgically want to solve this, I think we should look for that specific scenario and decide what to do based on that. If we delete it we may loose some debugging information though, but it don't we will fail to spawn new pods as it blocks the pods name I think.

Perhaps we should look for this case, log stuff of relevance, and then delete it?

@consideRatio
Copy link
Member Author

consideRatio commented Oct 10, 2019

Below are three pods that will remain if stopped, and I can imagine could cause trouble by not being deleted and will remain listed in kubectl get pods for example, and also block creation of new user pods due to the new user pods having the same name as the old.

Observations

  • Any pod that has Success or Failed will probably remain for a while and block the pod name for future pods to be recreated.
  • the Failed Evicted pods lack a State of it's container, but instead has a pod level reason and message, as compared to having a container level reason.
  • This official documentation is the best reference on pod and container state etc.
  • There are subtle differences in the structure of kubectl describe and kubectl get -o yaml. For example, the describe will put the pods status: related to containers into the container information. So if there is missing state in a kubectl describe.

A pod quit - completes (pod phase: Success)

# Created by: kubectl run ubuntu-success --image=ubuntu --restart=Never -- /bin/bash -c 'sleep 10'
#
# Inspect by: kubectl describe ubuntu-success
#
Name:         ubuntu-success
Namespace:    jh-dev
Priority:     0
Node:         jh-dev-control-plane/172.19.0.1
Start Time:   Thu, 10 Oct 2019 09:50:27 +0200
Labels:       run=ubuntu-success
Annotations:  <none>
Status:       Succeeded
IP:           192.168.194.144
Containers:
  ubuntu-success:
    Container ID:  containerd://f2b1ae47a6ea4a848f9e433a4ff7eafdd991c7c15a2f15724fba461c4d044959
    Image:         ubuntu
    Image ID:      docker.io/library/ubuntu@sha256:b88f8848e9a1a4e4558ba7cfc4acc5879e1d0e7ac06401409062ad2627e6fb58
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/bash
      -c
      sleep 10
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 10 Oct 2019 09:50:30 +0200
      Finished:     Thu, 10 Oct 2019 09:50:40 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4bc55 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-4bc55:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4bc55
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                           Message
  ----    ------     ----  ----                           -------
  Normal  Scheduled  90s   default-scheduler              Successfully assigned jh-dev/ubuntu-success to jh-dev-control-plane
  Normal  Pulling    89s   kubelet, jh-dev-control-plane  Pulling image "ubuntu"
  Normal  Pulled     87s   kubelet, jh-dev-control-plane  Successfully pulled image "ubuntu"
  Normal  Created    87s   kubelet, jh-dev-control-plane  Created container ubuntu-success
  Normal  Started    87s   kubelet, jh-dev-control-plane  Started container ubuntu-success

A pod quit - errors (pod phase: Failed)

# Created by: kubectl run ubuntu --image=ubuntu --restart=Never -- /bin/bash -c 'sleep 30; time-to-error-this-is-a-bogus-command-not-to-be-found'
#
# Inspect by: kubectl describe pods ubuntu
#
Name:         ubuntu
Namespace:    jh-dev
Priority:     0
Node:         jh-dev-control-plane/172.19.0.1
Start Time:   Thu, 10 Oct 2019 09:40:54 +0200
Labels:       run=ubuntu
Annotations:  <none>
Status:       Failed
IP:           192.168.194.143
Containers:
  ubuntu:
    Container ID:  containerd://2432fa15a30413c2ca24bb7a39e9a94c542d2c09a67f71caba3d45867acd05e0
    Image:         ubuntu
    Image ID:      docker.io/library/ubuntu@sha256:b88f8848e9a1a4e4558ba7cfc4acc5879e1d0e7ac06401409062ad2627e6fb58
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/bash
      -c
      sleep 30; eeasldfj
    State:          Terminated
      Reason:       Error
      Exit Code:    127
      Started:      Thu, 10 Oct 2019 09:41:00 +0200
      Finished:     Thu, 10 Oct 2019 09:41:30 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4bc55 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-4bc55:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4bc55
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From                           Message
  ----    ------     ----   ----                           -------
  Normal  Scheduled  6m41s  default-scheduler              Successfully assigned jh-dev/ubuntu to jh-dev-control-plane
  Normal  Pulling    6m40s  kubelet, jh-dev-control-plane  Pulling image "ubuntu"
  Normal  Pulled     6m36s  kubelet, jh-dev-control-plane  Successfully pulled image "ubuntu"
  Normal  Created    6m35s  kubelet, jh-dev-control-plane  Created container ubuntu
  Normal  Started    6m35s  kubelet, jh-dev-control-plane  Started container ubuntu

A pod quit - evicted (pod phase: Failed, reason: Evicted)

Name:               proprietary-crawler-preemptive-79f7cbc6cf-rzck9
Namespace:          proprietary
Priority:           0
PriorityClassName:  <none>
Node:               gke-general-preemptible-40c94gb-pool-35dd7b4d-mv7w/
Start Time:         Fri, 26 Apr 2019 03:55:55 -0300
Labels:             app.kubernetes.io/instance=proprietary-crawler
                    app.kubernetes.io/name=proprietary-crawler
                    date=1544806882
                    pod-template-hash=3593767279
Annotations:        <none>
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: memory. Container proprietary-crawler was using 9402820Ki, which exceeds its request of 7G. 
IP:                 
Controlled By:      ReplicaSet/proprietary-crawler-preemptive-79f7cbc6cf
Containers:
  proprietary-crawler:
    Image:      <redacted>
    Port:       3333/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     8
      memory:  10G
    Requests:
      cpu:      4
      memory:   7G
    Liveness:   http-get http://:3333/health/proprietary-crawler/health delay=60s timeout=1s period=3s #success=1 #failure=5
    Readiness:  http-get http://:3333/health/proprietary-crawler/health delay=60s timeout=1s period=3s #success=1 #failure=3
    Environment:
      s_env:              prod
      JAVA_TOOL_OPTIONS:  -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Xms4g -Xmx8g
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cwcqv (ro)
Volumes:
  default-token-cwcqv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cwcqv
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  environment=production
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

kubectl get -o yaml pods

Succeeded

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2019-10-10T07:50:27Z"
  labels:
    run: ubuntu-success
  name: ubuntu-success
  namespace: jh-dev
  resourceVersion: "38440"
  selfLink: /api/v1/namespaces/jh-dev/pods/ubuntu-success
  uid: 0bc3db6d-68e4-42b2-af68-029d5108f81c
spec:
  containers:
  - args:
    - /bin/bash
    - -c
    - sleep 10
    image: ubuntu
    imagePullPolicy: Always
    name: ubuntu-success
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-4bc55
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: jh-dev-control-plane
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-4bc55
    secret:
      defaultMode: 420
      secretName: default-token-4bc55
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:50:27Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:50:41Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:50:41Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:50:27Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://f2b1ae47a6ea4a848f9e433a4ff7eafdd991c7c15a2f15724fba461c4d044959
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:b88f8848e9a1a4e4558ba7cfc4acc5879e1d0e7ac06401409062ad2627e6fb58
    lastState: {}
    name: ubuntu-success
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: containerd://f2b1ae47a6ea4a848f9e433a4ff7eafdd991c7c15a2f15724fba461c4d044959
        exitCode: 0
        finishedAt: "2019-10-10T07:50:40Z"
        reason: Completed
        startedAt: "2019-10-10T07:50:30Z"
  hostIP: 172.19.0.1
  phase: Succeeded
  podIP: 192.168.194.144
  qosClass: BestEffort
  startTime: "2019-10-10T07:50:27Z"

Errored

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2019-10-10T07:40:54Z"
  labels:
    run: ubuntu
  name: ubuntu
  namespace: jh-dev
  resourceVersion: "37485"
  selfLink: /api/v1/namespaces/jh-dev/pods/ubuntu
  uid: 2a0cb50c-24a5-4461-9384-ace2852092f1
spec:
  containers:
  - args:
    - /bin/bash
    - -c
    - sleep 30; eeasldfj
    image: ubuntu
    imagePullPolicy: Always
    name: ubuntu
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-4bc55
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: jh-dev-control-plane
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-4bc55
    secret:
      defaultMode: 420
      secretName: default-token-4bc55
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:40:54Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:41:31Z"
    message: 'containers with unready status: [ubuntu]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:41:31Z"
    message: 'containers with unready status: [ubuntu]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-10-10T07:40:54Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://2432fa15a30413c2ca24bb7a39e9a94c542d2c09a67f71caba3d45867acd05e0
    image: docker.io/library/ubuntu:latest
    imageID: docker.io/library/ubuntu@sha256:b88f8848e9a1a4e4558ba7cfc4acc5879e1d0e7ac06401409062ad2627e6fb58
    lastState: {}
    name: ubuntu
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: containerd://2432fa15a30413c2ca24bb7a39e9a94c542d2c09a67f71caba3d45867acd05e0
        exitCode: 127
        finishedAt: "2019-10-10T07:41:30Z"
        reason: Error
        startedAt: "2019-10-10T07:41:00Z"
  hostIP: 172.19.0.1
  phase: Failed
  podIP: 192.168.194.143
  qosClass: BestEffort
  startTime: "2019-10-10T07:40:54Z"

Evicted

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"my-pod-2887686308","uid":"2169cfab-74f5-11e6-be44-0800277ad4a8","apiVersion":"extensions","resourceVersion":"188007"}}
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
      my-pod'
  creationTimestamp: 2016-09-07T12:18:32Z
  generateName: my-pod-2887686308-
  labels:
    pod-template-hash: "2887686308"
    run: my-pod
  name: my-pod-2887686308-3x99a
  namespace: default
  ownerReferences:
  - apiVersion: extensions/v1beta1
    controller: true
    kind: ReplicaSet
    name: my-pod-2887686308
    uid: 2169cfab-74f5-11e6-be44-0800277ad4a8
  resourceVersion: "188038"
  selfLink: /api/v1/namespaces/default/pods/my-pod-2887686308-3x99a
  uid: 31277ddf-74f5-11e6-be44-0800277ad4a8
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: my-pod
    resources:
      requests:
        cpu: 100m
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-akryk
      readOnly: true
  dnsPolicy: ClusterFirst
  nodeName: kubernetes-node-1
  restartPolicy: Always
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: default-token-akryk
    secret:
      defaultMode: 420
      secretName: default-token-akryk
status:
  message: The node was low on memory.
  phase: Failed
  reason: Evicted
  startTime: 2016-09-07T12:18:32Z

https://github.com/microsoft/pai/blob/71ad3d63ff6b37301947c05663bc61b44d71032e/prometheus/test/data/no_condtion_pod.json#L177-L182

@consideRatio
Copy link
Member Author

consideRatio commented Oct 10, 2019

Hmmm... But in the initial description of the issue, things remained in...

[I 2018-08-01 09:37:04.053 JupyterHub base:978] erik.sundell is pending stop

Which is what happens while the spawner is waiting to see the pod disappear from its resource list...

try:
yield exponential_backoff(
lambda: self.pod_reflector.pods.get(self.pod_name, None) is None,
'pod/%s did not disappear in %s seconds!' % (self.pod_name, self.start_timeout),
timeout=self.start_timeout,
)
except TimeoutError:
self.log.error("Pod %s did not disappear, restarting pod reflector", self.pod_name)
self._start_watching_pods(replace=True)
raise

But that only happens when...

resource = ev['object']
if ev['type'] == 'DELETED':
# This is an atomic delete operation on the dictionary!
self.resources.pop(resource.metadata.name, None)
else:
# This is an atomic operation on the dictionary!
self.resources[resource.metadata.name] = resource

But hmm... Wasn't a delete call made just before this wait started?

delete_options.grace_period_seconds = grace_seconds
self.log.info("Deleting pod %s", self.pod_name)
try:
yield self.asynchronize(
self.api.delete_namespaced_pod,
name=self.pod_name,
namespace=self.namespace,
body=delete_options,
grace_period_seconds=grace_seconds,
)
except ApiException as e:
if e.status == 404:
self.log.warning(
"No pod %s to delete. Assuming already deleted.",
self.pod_name,
)
else:
raise

Yes, but it returned 404 as indicated by this log trace:

[W 2018-08-01 09:36:05.688 JupyterHub spawner:1657] No pod jupyter-erik-2esundell to delete. Assuming already deleted.

Why could we end up with a 404 on a pod delete request, while the pod remains in the list? has it been deleted already, but it did not trigger a DELETE V1WatchEvent at any time?

@consideRatio
Copy link
Member Author

Ah hmmm I realize that we can watch for k8s events about pods being Evicted and take action based on that... https://www.bluematador.com/blog/kubernetes-events-explained

@tritab
Copy link

tritab commented Jan 17, 2020

We've been experiencing this issue when a user pod exceeds the resource limit (e.g. consumes >8GB of RAM). In this scenario, k8s was restarting the user's pod with an OOM Killed message and the hub was unable to find the new user pod. This required the user to contact the admin team to manually delete their pod (we didn't try restarting the hub instead).

We have implemented a couple updates that don't resolve the underlying issue, but do improve the user experience.

  1. Set c.KubeSpawner.extra_pod_config = { "restartPolicy" : "Never" } This causes the hub to avoid restarting the pod and the user is sent to the spawner screen instead. While this isn't optimal, it does remove the admin team from the resolution.
  2. Installed https://github.com/yuvipanda/nbresuse into the notebook image(s). Initially, we had hoped to help users avoid exceeding the pod memory limit by showing a meter of their current memory usage. The extension does this well and has a nice warning feature where it turns red when exceeding a certain percent (e.g. 90%). This plugin has an added bonus that it actually proactively kills the offending process prior to the pod exceeding the memory limit. We weren't expecting this, but it seems to be reducing the frequency of these incidents.

The above items certainly don't resolve the primary issue, but have helped us reduce user impacts.

As far as the root cause, would it be possible to modify the hub to capture a unique label for the pod rather than the IP address? If the label were unique, then it has the upside that it doesn't change, even if the IP address does change due to being rescheduled by k8s.

@manics
Copy link
Member

manics commented Jan 17, 2020

As far as the root cause, would it be possible to modify the hub to capture a unique label for the pod rather than the IP address?

@tritab A similar issue was brought up in #386 which requires a persistent pod identifier for SSL certificates. The solution in that PR was to create a service. Presumably that should also solve your problem?

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/weve-been-seeing-503s-occasionally-for-user-servers/3824/2

@cameronraysmith
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/weve-been-seeing-503s-occasionally-for-user-servers/3824/2

Yes, in particular, the contents of the attached log.

jamesbond-inactive-to-503-hub-interface.txt

@consideRatio
Copy link
Member Author

I think the current status is that the hub won't require a manual restart, but that it will require some time before its check_routes function has concluded that a pod is no longer functional. It would be good to have an update on the situation or a concrete plan to make it more robust.

@consideRatio
Copy link
Member Author

I think the current status is that the hub won't require a manual restart, but that it will require some time before its check_routes function has concluded that a pod is no longer functional.

This is the situation, and the issue is related to avoiding manual restarts specifically. Let's mark this as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants