-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-server setup #109
Comments
@ssanderson is doing this, using postgres to mediate the Hub state. He would know best what's in the way of getting it to work. |
@jaredbischof what do you mean by "as a single instance"? It's possible to use a shared database between multiple jupyterhubs, but you have to ensure that the same user is routed to the correct single-user server and hub, which means you either need an external service that sits in front of the built-in proxy and knows how to route users, or you need a way to share the built-in proxy across multiple jupyterhubs. Both of those things are probably doable, but neither are especially supported out of the box. |
Hi Scott, yeah I didn't know the best way to term what I meant but you're describing exactly what I'm talking about. Do you guys have a timeline for doing this? Just curious. Thanks for your response! |
I don't think there are any near-term plans to support that functionality in the main distribution. I'm working on a larger system for work (Quantopian) that manages clusters of jupyterhub servers. That project isn't in a particularly open-sourceable state, since it's wrapped up pretty intimately with our existing infrastructure. |
I'm going to give it a shot over the next few weeks for jupyter.wmflabs.org, possibly using Docker Swarm. Will keep you guys posted on how it goes. |
@yuvipanda great, thanks! You might look at https://github.com/compmodels/jupyterhub-deploy, where @jhamrick is using swarm to distribute user containers. That's not what is described here, though, which is multiple Hubs. |
(Many moons later...) So I've finally managed to set one up on https://tools.wmflabs.org/paws/hub/oauth_login, with a kubernetes backend. However, the jupyterhub instance itself (+ proxy) are running only once, so it's a SPOF. So the two components that exist in the one 'hub' pod now are: The proxy and the jupyterhub itself. Am I right in assuming that if I can somehow synchronize state between all the proxies, and use a mysql/postgres backend for jupyter, I can scale both of these separately however horizontally I want? Is there any useful state in the jupyterhub process itself that isn't stored in the db? If this is correct, I can probably work on a way to horizontally scale the proxy out, which should work... There are multiple ways to do this, from fanning out to all outputs via a wrapper vs implementing a different proxy with a compatible interface that uses etcd or something to sync data (that people who have more complex setups can use). But if jupyterhub itself stores state, we need to factor that out first... |
This also helps solve my other problem, which is availability. I like having at least two of everything so I can drain one of traffic and do stuff to it... So as questions:
I'm super interested in pushing this forward :) |
The Hub process can be killed and resumed while leaving all other processes up, so there isn't any long-term state that resides in the Hub. All state is meant to reside in the database. #185 is probably the best illustration of state that resides in the process—mainly transients, such as spawn_pending, etc. I doubt it would behave properly if you made two simultaneous spawn requests of the same user on different Hubs using the same database. However, you should be able to do failover - start a second Hub and migrate URL handling before taking down the first Hub. |
Ok, so I'll try and get the proxy to be distributable by putting some work into it this week and see how it goes! |
Not quite the same, but somewhat related - I now have a nginx-based Configurable HTTP proxy that jupyterhub can easily use (https://github.com/yuvipanda/jupyterhub-nginx-chp) - it just implements most of the swagger spec and all the jupyterhub functionality I tested works fine. When I hit limits of that, we can probably write another one that scales better across multiple machines. |
Good information by @yuvipanda, @ssanderson, and others related to this issue. As the issue is more than a year old and I'm not seeing a specific next action, I'm going to close this and mark it as "reference" so it will be discoverable and possibly included in future documentation. Thanks! |
I am trying to setup multiple hub instance behind ELB in AWS. Does JupyterHub support this kind of configuration yet? I have an ELB with two Jupyterhub instances attached to it and I've enabled SSL (Secured TCP) listeners so that it can connect to Python2/3 kernels. But when I access it via ELB it presents me a page from either one of these servers but when I try to click on 'Control Panel' or 'Home' or 'Create new notebook' it routes me to the other server and present me the login page again. Any directions to solve this will be much appreciated. Thanks |
To run JupyterHub with multiple instances behind a load-balancer, you would have to ensure that the load balancer sends requests for the same user to the same Hub instance every time. |
@minrk Do you need to ensure the load balancer sends the requests for the same user to the same Hub instance to ensure their data is there? If the data was stored or replicated across the Hubs could you spawn the users on any hub? |
Since a user's server is persistent, you have to make sure that the same Hub gets every request for a given user, at least while that user's server is running in order to route requests properly. This is in-memory state for the Hub, so it isn't shared across instances. The Hub can reconstruct this from information persisted to the database, and does this at startup, but it doesn't reconstruct the state on every request, which would be needed for a single user to be handled correctly across multiple Hubs. |
Am I correct in assuming that there are long-term plans to support multiple jupyterhub servers running as a single instance? I don't believe that this is possible right now unless I am missing something (please correct me if I'm wrong). It would be nice to have this for load balancing. Cheers!
The text was updated successfully, but these errors were encountered: