Skip to content

Latest commit

 

History

History
 
 

jupyterhub

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Table of Contents generated with DocToc

Jupyter and JupyterHub

Background

Jupyter Notebook and JupyterLab as well as JupyterHub are developed by Project Jupyter, a non-profit, open source project.

Jupyter

Jupyter Notebook (previously named IPython Notebook) and JupyterLab are user interfaces for computational science and data science commonly used with Spark, Tensorflow and other big data processing frameworks. They are used by data scientists and ML engineers across a variety of organizations for interactive tasks. They support multiple languages through runners called "language kernels", and allow users to run code, save code/results, and share “notebooks” with code, documentation, visualization, and media easily.

JupyterHub

JupyterHub lets users manage authenticated access to multiple single-user Jupyter notebooks. JupyterHub delegates the launching of single-user notebooks to pluggable components called “spawners”. JupyterHub has a sub-project named kubespawner, maintained by the community, that enables users to provision single-user Jupyter notebooks backed by Kubernetes pods - the notebooks themselves are Kubernetes pods. kubeform_spawner extends kubespawner to enable users to have a form to specify cpu, memory, gpu, and desired image.

Quick Start

Refer to the user_guide for instructions on deploying JupyterHub via ksonnet.

Once that's completed, you will have a StatefulSet for JupyterHub, a configmap for configuration, and a LoadBalancer type of service, in addition to the requisite RBAC roles. If you are on Google Kubernetes Engine, the LoadBalancer type of service automatically creates an external IP address that can be used to access the Jupyter notebook. Note that this is for illustration purposes only. In a production environment, JupyterHub should be coupled with SSL and configured to use an authentication plugin.

If you're testing and want to avoid exposing JupyterHub on an external IP address, you can use kubectl instead to gain access to the hub on your local machine.

kubectl port-forward <jupyterhub-pod-name> 8000:8000

The above will expose JupyterHub on http://localhost:8000. The pod name can be obtained by running kubectl get pods, and will be tf-hub-0 by default.

Configuration

Configuration for JupyterHub is shipped separately and contained within the configmap defined by the core componenent. It is a Python file that is consumed by JupyterHub on starting up. The supplied configuration has reasonable defaults for the requisite fields and no authenticator configured by default. Furthermore, we provide a number of parameters that can be used to configure the core component. To see a list of ksonnet parameters run

ks prototype describe kubeflow-core

If the provided parameters don't provide the flexibility you need, you can take advantage of ksonnet to customize the core component and use a config file fully specified by you.

Configuration includes sections for KubeSpawner and Authenticators. Spawner parameters include the form used when provisioning new Jupyter notebooks, and configuration defining how JupyterHub creates and interacts with Kubernetes pods for individual notebooks. Authenticator parameters correspond to the authentication mechanism used by JupyterHub.

Additional information about configuration can be found in the Zero to JupyterHub with Kubernetes guide and the JupyterHub documentation.

Usage

If you're using the quick-start, the external IP address of the JupyterHub instance can be obtained from kubectl get svc.

 kubectl get svc

NAME         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
tf-hub-0       ClusterIP      None            <none>          <none>         1h
tf-hub-lb    LoadBalancer   10.43.246.148   xx.yy.zz.ww   80:32689/TCP   36m

Now, you can access the external IP, http://xx.yy.zz.ww, with your browser. When trying to spawn a new image, a configuration page should pop up, allowing configuration of the Jupyter notebook image, CPU, Memory, and additional resources. Using the default DummyAuthenticator, the hub should allow any username/password to access the hub and create new notebooks. You can use an alternate authenticator plugin if you want to secure your notebook server and use its administration functionality.

Customization

Using your own hub image

An image with JupyterHub 0.8.1, kubespawner 0.7.1 and two simple authenticator plugins can be built from within the docker/ directory using the Makefile provided. For example, if you're using Google Cloud Platform and have a project with ID foo configured to use gcr.io, you can do the following:

make build PROJECT_ID=foo
make push PROJECT_ID=foo

Notebook image

Images published under in the Jupyter docker-stacks repo should work directly with the Hub. The only requirements for the Jupyter notebook images that may be used with this instance of Hub is that notebook images must have the same version of JupyterHub installed (0.8.1 by default), and there must be a standard start-singleuser.sh accessible via the default PATH.

GitHub OAuth Setup

After creating the initial Hub and exposing it on a public IP address, you can add GitHub based authentication. First, you'll need to create a GitHub oauth application. The callback URL would be of the form http://xx.yy.zz.ww/hub/oauth_callback.

Once the GitHub application is created in the GitHub UI, update the manifest/config.yaml with the callback_url, client_id and client_secret provided by GitHub UI. You should comment out the DummyAuthenticator and set the JupyterHub authenticator_class to GitHubOAuthenticator. You will also set the oauth_callback_url, client_id, and client_secret for the authenticator. An example configuration section might look like:

c.JupyterHub.authenticator_class = GitHubOAuthenticator
c.GitHubOAuthenticator.oauth_callback_url = 'http://xx.yy.zz.ww/hub/oauth_callback'
c.GitHubOAuthenticator.client_id = 'client_id_here'
c.GitHubOAuthenticator.client_secret = 'client_secret_here'

Finally, you can update the configuration and apply the new configuration by doing the following:

ks apply ${ENVIRONMENT} -c ${COMPONENT_NAME}
kubectl delete pod tf-hub-0

By deleting the old pod, a new pod will come up with the new configuration and be configured to use the GitHub authenticator you specified in the previous step. You can additionally modify the JupyterHub configuration to add whitelists and admin users. For example, to limit the hub to only GitHub users, user1 and user2, one might use the following configuration:

c.Authenticator.whitelist = {'user1', 'user2'}

After changing the configuration and kubectl apply -f config.yaml, please note that the JupyterHub pod needs to be restarted before the new configuration is reflected.