Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisted sessions are not restored at startup, only if requested by kernel ID, leaking pods? #1086

Closed
dnwe opened this issue May 10, 2022 · 7 comments · Fixed by #1095
Closed
Labels

Comments

@dnwe
Copy link
Contributor

dnwe commented May 10, 2022

Description

After changes were made under #737 (comment) persisted sessions were no longer loaded at startup, but only "on demand" if an /api/kernels/<kernel-id> came in that matched one of the persisted sessions.

In the scenario where a single enterprise-gateway replica is running in a Kube cluster and using KubernetesProcessProxy, this can lead to driver+executor Pods being leaked and remaining as idle processes that are never cleaned up.

Reproduce

  1. Deploy enterprise-gateway into a kube cluster and spinup some kernels
  2. kubectl delete --grace-period=0 the pod (or submit a kill -9 manually) so that it is restarted without stopping any kernels
  3. Observe time passing without querying those kernels
  4. Note that the idle driver+executor pods are still present and seemingly lost.

Expected behavior

--- a/enterprise_gateway/enterprisegatewayapp.py
+++ b/enterprise_gateway/enterprisegatewayapp.py
@@ -141,8 +141,7 @@ class EnterpriseGatewayApp(EnterpriseGat
         )
 
         # Attempt to start persisted sessions
-        # Commented as part of https://github.com/jupyter/enterprise_gateway/pull/737#issuecomment-567598751
-        # self.kernel_session_manager.start_sessions()
+        self.kernel_session_manager.start_sessions()
 
         self.contents_manager = None  # Gateways don't use contents manager

Context

  • Operating System and version: Kubernetes v1.21.12
  • Browser and version: N/A
  • Jupyter Server version: 2.6.0
@dnwe dnwe added the bug label May 10, 2022
@welcome
Copy link

welcome bot commented May 10, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@kevin-bates
Copy link
Member

Hi @dnwe - thanks for opening this issue.

I suppose this makes sense because, without the query, there's nothing that is going to "hydrate" a KernelManager to "connect" to outstanding kernels. Of course, the same "leak" can happen if there are no further uses of orphaned kernels (i.e., the users have walked away), which, I would view as outside the scope here.

I'm curious if you see this issue when multiple replicas are configured since that is the design center here (Active/Active)? If nothing is requesting a kernel's output following the termination of one of the EG nodes, then I'm not sure what the expected behavior should be. How are your kernels being terminated if users just stop operating against them? Are you relying completely on culling, and that's the difference here? Are you using a different client than Jupyter Lab, since I'm fairly certain it will send period inquiries for the kernels it knows about?

I suppose we could introduce a "mode" here that denotes Active/Active or Active/Passive and bring back the call for Active/Passive modes, but if all EG nodes are terminated (ungracefully), kernel pods will be leaked irrespective of "mode" if they're never queried following the EG node's termination and subsequent restart.

What is the expected behavior here? There really shouldn't be a difference between running two replicas, or one EG instance, followed by another - assuming a) they all use the same persisted session store and b) users/applications using those kernels issue a request.

I think we need more information as to what the desired behavior is before proceeding. Thanks again for spending time on this as this is something we definitely want to support.

@dnwe
Copy link
Contributor Author

dnwe commented May 10, 2022

The scenario here is that a single EG is responsible for managing the kernels within the kube namespace its running in. It can theoretically be restarted at any time if (e.g.) a node is rebooted underneath it or a new image/version is rolled. In this scenario it’s preferable to SIGKILL EG and for it to re-discover all the persistent kernels at startup when it rescheduled on another node such that they’re publicised on /api/kernels and their running state and any future idleness can be discovered externally

@kevin-bates
Copy link
Member

Understood. We'll need to introduce the notion of "mode" for this (it's not as simple as activating the old call). In the meantime, can you use multiple replicas where each is configured with client affinity? (Yes, that still "leaks" kernel pods if there are no additional uses of an orphaned kernel.)

Another approach would be to introduce a TTL to each kernel. Since the server pings the kernel every few seconds, we could have the kernel shut down if it hasn't been pinged in X amount of time.

@dnwe
Copy link
Contributor Author

dnwe commented May 10, 2022

Are you talking about multiple replicas and using cookie-based session affinity via nginx/traefik ingress, or NodePort and clientIP session affinity? I had seen some discussion around both before, but I'm not sure how it is intended to work if a given kernel is inadvertently queried on two running instances (e.g., remote IP is a gateway and cookie is lost by the client so affinity hashing isn't consistent).

I guess my goal is resiliency rather than availability. I'd like EG to be capable of being restarted at any time, rescheduled and then use the session persistence as the source of truth as to what should be found (via the kube-api) running in its owned KERNEL_NAMESPACE as those will still have been making good progress whilst it was briefly offline. Any driver/executor pods that are lying around in the namespace, aren't known to EG (/api/kernels) can theoretically be culled by my own garbage collection service (as we must assume they were once created, but never persisted). Similarly I'm fine for kernels to be reported "execution_state":"idle" as I can also perform my own tidyup of those.

@kevin-bates
Copy link
Member

Are you talking about multiple replicas and using cookie-based session affinity via nginx/traefik ingress, or NodePort and clientIP session affinity?

EG doesn't care or know how the affinity is configured (much like myself 😄) , just so long as when the request reaches an EG instance, that instance will be the "host" for that client for the duration of the kernel or until EG terminates.

Is this something you'd like to contribute? Are you okay with introducing a configurable "AvailabilityMode" to distinguish between single and multi-node operations?

I'm also curious about what you're using for your persistent storage.

@kevin-bates
Copy link
Member

@dnwe - are you able to take PR #1095 for a spin in your environment? You should find that setting the mode to 'active-passive' gives you the behavior you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants