Multi-server setup #109

jaredbischof · 2014-12-04T17:20:52Z

Am I correct in assuming that there are long-term plans to support multiple jupyterhub servers running as a single instance? I don't believe that this is possible right now unless I am missing something (please correct me if I'm wrong). It would be nice to have this for load balancing. Cheers!

minrk · 2014-12-04T19:04:55Z

@ssanderson is doing this, using postgres to mediate the Hub state. He would know best what's in the way of getting it to work.

ssanderson · 2014-12-04T19:21:58Z

@jaredbischof what do you mean by "as a single instance"? It's possible to use a shared database between multiple jupyterhubs, but you have to ensure that the same user is routed to the correct single-user server and hub, which means you either need an external service that sits in front of the built-in proxy and knows how to route users, or you need a way to share the built-in proxy across multiple jupyterhubs. Both of those things are probably doable, but neither are especially supported out of the box.

jaredbischof · 2014-12-04T20:13:40Z

Hi Scott, yeah I didn't know the best way to term what I meant but you're describing exactly what I'm talking about. Do you guys have a timeline for doing this? Just curious. Thanks for your response!

ssanderson · 2014-12-04T20:21:00Z

I don't think there are any near-term plans to support that functionality in the main distribution. I'm working on a larger system for work (Quantopian) that manages clusters of jupyterhub servers. That project isn't in a particularly open-sourceable state, since it's wrapped up pretty intimately with our existing infrastructure.

yuvipanda · 2015-03-08T11:55:05Z

I'm going to give it a shot over the next few weeks for jupyter.wmflabs.org, possibly using Docker Swarm. Will keep you guys posted on how it goes.

minrk · 2015-03-08T19:16:52Z

@yuvipanda great, thanks! You might look at https://github.com/compmodels/jupyterhub-deploy, where @jhamrick is using swarm to distribute user containers. That's not what is described here, though, which is multiple Hubs.

yuvipanda · 2015-12-01T04:01:15Z

(Many moons later...)

So I've finally managed to set one up on https://tools.wmflabs.org/paws/hub/oauth_login, with a kubernetes backend. However, the jupyterhub instance itself (+ proxy) are running only once, so it's a SPOF.

So the two components that exist in the one 'hub' pod now are: The proxy and the jupyterhub itself. Am I right in assuming that if I can somehow synchronize state between all the proxies, and use a mysql/postgres backend for jupyter, I can scale both of these separately however horizontally I want? Is there any useful state in the jupyterhub process itself that isn't stored in the db?

If this is correct, I can probably work on a way to horizontally scale the proxy out, which should work... There are multiple ways to do this, from fanning out to all outputs via a wrapper vs implementing a different proxy with a compatible interface that uses etcd or something to sync data (that people who have more complex setups can use). But if jupyterhub itself stores state, we need to factor that out first...

yuvipanda · 2015-12-01T05:39:16Z

This also helps solve my other problem, which is availability. I like having at least two of everything so I can drain one of traffic and do stuff to it...

So as questions:

What state (if any?) is kept in the jupyterhub process itself?
If the answer to (1) is 'None', will just providing a scalable proxy be enough?
If the answer to (1) is not 'None' - what is the state that's kept in there?

I'm super interested in pushing this forward :)

minrk · 2015-12-01T09:31:38Z

The Hub process can be killed and resumed while leaving all other processes up, so there isn't any long-term state that resides in the Hub. All state is meant to reside in the database. #185 is probably the best illustration of state that resides in the process—mainly transients, such as spawn_pending, etc. I doubt it would behave properly if you made two simultaneous spawn requests of the same user on different Hubs using the same database. However, you should be able to do failover - start a second Hub and migrate URL handling before taking down the first Hub.

yuvipanda · 2015-12-01T22:56:07Z

Ok, so I'll try and get the proxy to be distributable by putting some work into it this week and see how it goes!

yuvipanda · 2016-01-26T00:04:24Z

Not quite the same, but somewhat related - I now have a nginx-based Configurable HTTP proxy that jupyterhub can easily use (https://github.com/yuvipanda/jupyterhub-nginx-chp) - it just implements most of the swagger spec and all the jupyterhub functionality I tested works fine. When I hit limits of that, we can probably write another one that scales better across multiple machines.

willingc · 2016-06-07T04:11:15Z

Good information by @yuvipanda, @ssanderson, and others related to this issue. As the issue is more than a year old and I'm not seeing a specific next action, I'm going to close this and mark it as "reference" so it will be discoverable and possibly included in future documentation. Thanks!

kishorchintal · 2016-09-30T18:57:03Z

I am trying to setup multiple hub instance behind ELB in AWS. Does JupyterHub support this kind of configuration yet? I have an ELB with two Jupyterhub instances attached to it and I've enabled SSL (Secured TCP) listeners so that it can connect to Python2/3 kernels. But when I access it via ELB it presents me a page from either one of these servers but when I try to click on 'Control Panel' or 'Home' or 'Create new notebook' it routes me to the other server and present me the login page again. Any directions to solve this will be much appreciated. Thanks

minrk · 2016-10-03T09:08:48Z

To run JupyterHub with multiple instances behind a load-balancer, you would have to ensure that the load balancer sends requests for the same user to the same Hub instance every time.

jsill14 · 2017-03-16T19:13:02Z

@minrk Do you need to ensure the load balancer sends the requests for the same user to the same Hub instance to ensure their data is there? If the data was stored or replicated across the Hubs could you spawn the users on any hub?

minrk · 2017-03-17T09:53:44Z

Since a user's server is persistent, you have to make sure that the same Hub gets every request for a given user, at least while that user's server is running in order to route requests properly. This is in-memory state for the Hub, so it isn't shared across instances. The Hub can reconstruct this from information persisted to the database, and does this at startup, but it doesn't reconstruct the state on every request, which would be needed for a single user to be handled correctly across multiple Hubs.

yuvipanda mentioned this issue Dec 2, 2015

Add a Swagger (or similar?) spec for the REST API jupyterhub/configurable-http-proxy#39

Closed

ryanlovett mentioned this issue Jan 27, 2016

Students with 503 Proxy Target Missing/500 Internal Server Error data-8/connector-instructors#18

Closed

willingc added the question label Jun 7, 2016

willingc closed this as completed Jun 7, 2016

willingc added the reference label Jun 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-server setup #109

Multi-server setup #109

jaredbischof commented Dec 4, 2014

minrk commented Dec 4, 2014

ssanderson commented Dec 4, 2014

jaredbischof commented Dec 4, 2014

ssanderson commented Dec 4, 2014

yuvipanda commented Mar 8, 2015

minrk commented Mar 8, 2015

yuvipanda commented Dec 1, 2015

yuvipanda commented Dec 1, 2015

minrk commented Dec 1, 2015

yuvipanda commented Dec 1, 2015

yuvipanda commented Jan 26, 2016

willingc commented Jun 7, 2016

kishorchintal commented Sep 30, 2016

minrk commented Oct 3, 2016

jsill14 commented Mar 16, 2017

minrk commented Mar 17, 2017

Multi-server setup #109

Multi-server setup #109

Comments

jaredbischof commented Dec 4, 2014

minrk commented Dec 4, 2014

ssanderson commented Dec 4, 2014

jaredbischof commented Dec 4, 2014

ssanderson commented Dec 4, 2014

yuvipanda commented Mar 8, 2015

minrk commented Mar 8, 2015

yuvipanda commented Dec 1, 2015

yuvipanda commented Dec 1, 2015

minrk commented Dec 1, 2015

yuvipanda commented Dec 1, 2015

yuvipanda commented Jan 26, 2016

willingc commented Jun 7, 2016

kishorchintal commented Sep 30, 2016

minrk commented Oct 3, 2016

jsill14 commented Mar 16, 2017

minrk commented Mar 17, 2017