Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
106 lines (85 sloc) 4.79 KB

Highly-available JupyterHub proxy with Traefik

Long Description

JupyterHub uses a proxy to direct incoming user requests to notebook servers. A proxy's routing table determines which requests are sent where. For example, if user userA's notebook server is available at the address 10.0.1.2:8000, the routing table should contain a mapping /user/userA -> 10.0.1.2:8000. For each request, the proxy determins the URL, consults this routing table & directs the request to appropriate address.

As users start / stop their servers, JupyterHub must dynamically modify the routing table to add / remove routes. Modifying the routing table should also not cause disruption to requests currently being processed.

configurable-http-proxy is currently the most used proxy implementation for JupyterHub. The routing table is kept in-memory, which means you can only run a single copy of configurable-http-proxy at a time. If the proxy process is disrupted for some reason (the node it is running on goes down, it uses too much memory, etc), the whole JupyterHub is unavailable. This is particularly a problem in dynamic large scale systems like Zero to JupyterHub on Kubernetes, where nodes dynamically come and go.

In this project, you will implement a JupyterHub proxy that uses traefik to do the routing, and etcd to store the routing table. This allows multiple copies of the proxy to be running easily, making the proxy highly available. You will also integrate this proxy implementation into our high-scale kubernetes distribution, Zero to JupyterHub on Kubernetes.

Stretch goal

Since configurable-http-proxy is written in nodejs, it requires admins to have nodejs installed before they can set up JupyterHub. This complicates setup, since you need to have two runtimes installed (nodejs & python3) than just one (python3). A stretch goal would be to make this situation better, by writing a proxy that runs in the same process as JupyterHub & does all the proxying required. This will make deploying JupyterHub much easier for smaller installs, make debugging easier and have a host of other benefits.

How can applicants make a contribution to the project?

We require students finish at least one project-specific microtask before they apply. https://github.com/jupyterhub/outreachy/labels/project-traefik-proxy lists the various microtasks that are specific to this project. You should complete at least one of them. Comment on the issue, or reach out to us at https://gitter.im/jupyterhub/jupyterhub for help!

Remember that we do not expect you to already have all the skills required to complete the tasks. Ask and we shall help!

Intern Benefits

You'll learn important development skills in this project:

  1. Asynchronous programming with Python
  2. Modeling & building distributed systems
  3. Tradeofss between simplicity, high-availability & latency in distributed systems
  4. Direct experience with modern large scale system tools, such as Kubernetes, etcd & treafik.

You'll also learn to work with a distributed community of people in various fields from across the world. Your work will be featured prominently on the Project Jupyter Blog, and lots of people around the world will likely use this proxy in many ways.

Community Benefits

JupyterHub is gaining adoption in large scale deployments that place a lot of value in highly available systems. The ability to make use of a highly-available proxy would be a big step in that direction. In the long term, it reduces the total amount of code the community will have to maintain, and leverage improvements in the traefik / etcd communities easily. There will also be other performance & reliability improvements as a side effect of this change.

Timeline

Use this timeline as a starting point for this project in the application. Feel free to make adjustments as appropriate:

  • Month 1:
    • Create a new jupyterhub-traefik-proxy package
    • Initial implementation of Proxy API for traefik
    • Common test suite for verifying uniform behavior of JupyterHub proxy implementations
  • Month 2:
    • Test deployments of jupyterhub using the new proxy
    • Implementations of both local toml and etcd configurations
  • Month 3:
    • Integrate traefik proxy implementation into zero-to-jupyterhub as the new default proxy
    • Profiling of performance with traefik proxy