Skip to content
Permalink
Browse files

Refactored and significantly updated the Jupyter notebooks docs (#633)

* WIP Initial commit of refactored notebooks section.

* Minor additions.

* Finished draft of notebooks quickstart.

* Renamed some files and clarified the notebooks setup doc.

* Updated the guide to custom notebook images.

* A couple more screenshots plus text refinements.

* Fixed references to JupyterHub throughout the docs.

* Updated in response to review comments.

* One more link fix.

* Clarified auth in custom notebooks doc.
  • Loading branch information...
sarahmaddox authored and k8s-ci-robot committed Apr 19, 2019
1 parent b549c91 commit 4e1d8ab2aa67367844633f2d7357edb209b444e5
@@ -34,7 +34,7 @@ <h4 class="section-head">What is Kubeflow?</h4>
</div>
<div class="text">
<h4>Notebooks</h4>
<p>A JupyterHub to create and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.</p>
<p>Services to create and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.</p>
</div>
</div>

@@ -53,7 +53,7 @@ Kubeflow started as an open sourcing of the way Google ran [TensorFlow](https://

## Notebooks

Included in Kubeflow is [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) to create and manage multi-user interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.
Kubeflow includes services for spawning and managing [Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/latest/). Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.

## Using Kubeflow

@@ -4,139 +4,6 @@ description = "Using Jupyter notebooks in Kubeflow"
weight = 10
+++

## Bringing up a Jupyter Notebook

1. To connect to Jupyter follow the [instructions](/docs/other-guides/accessing-uis)
to access the Kubeflow UI. From there you will be able to navigate to JupyterHub
![JupyterHub Link](/docs/images/jupyterlink.png)
1. Sign in
* On GCP you sign in using your Google Account
* If you are already logged into your Google Account you may not
be prompted to login again
* On all other platforms you can sign in using any username/password
1. Click the "Start My Server" button, and you will be greeted by a dialog screen.
1. Select a CPU or GPU image from the Image dropdown menu depending on whether you are doing CPU or GPU training, or whether or not you have GPUs in your cluster. We currently offer a cpu and gpu image for each tensorflow minor version(eg: 1.4.1,1.5.1,1.6.0). Or you can type in the name of any TF image you want to run.
1. Allocate memory, CPU, GPU, or other resources according to your need (1 CPU and 2Gi of Memory are good starting points)
* To allocate GPUs, make sure that you have GPUs available in your cluster
* Run the following command to check if there are any nvidia gpus available:
`kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"`
* If you have GPUs available, you can schedule your server on a GPU node by specifying the following json in `Extra Resource Limits` section: `{"nvidia.com/gpu": "1"}`
1. Click Spawn

* The images are 10's of GBs in size and can take a long time to download
depending on your network connection

* You can check the status of your pod by doing

```
kubectl -n ${NAMESPACE} describe pods jupyter-${USERNAME}
```

* Where ${USERNAME} is the name you used to login
* **GKE users** if you have IAP turned on the pod will be named differently

* If you signed on as USER@DOMAIN.EXT the pod will be named

```
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
```

1. You should now be greeted with a Jupyter Notebook interface.

The image supplied above can be used for training Tensorflow models with Jupyter. The images include all the requisite plugins, including [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) that you can use for rich visualizations and insights into your models.

To test the install, we can run a basic hello world (adapted from [mnist_softmax.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py) )

```
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
```

Paste the example into a new Python 3 Jupyter notebook and execute the code. This should result in a 0.9014 accuracy result against the test data.

Please note that when running on most cloud providers, the public IP address will be exposed to the internet and is an
unsecured endpoint by default. For a production deployment with SSL and authentication, refer to the [documentation](https://github.com/kubeflow/kubeflow/tree/{{< params "githubbranch" >}}/components/jupyterhub).


## Submitting k8s resources from Jupyter Notebook

The Jupyter Notebook pods are assigned the `jupyter-notebook` service account. This service account is bound to `jupyter-notebook` role which has namespace-scoped permissions to the following k8s resources:

* pods
* deployments
* services
* jobs
* tfjobs
* pytorchjobs

This means that you can directly create these k8s resources directly from your jupyter notebook. kubectl is already installed in the notebook, so you can create k8s resources running the following command in a jupyter notebook cell

```
!kubectl create -f myspec.yaml
```
## Creating a custom Jupyter image
You can create your own Jupyter image and use it in your Kubeflow cluster.
Your custom image needs to meet the requirements created by Kubeflow Notebook Controller. Kubeflow Notebook Controller manages the life-cycle of notebooks.
Kubeflow Web UI expects the Jupyer to be launched upon running the docker image with only `docker run`. For that you need to set the default command of your image to launch Jupyter. The Jupyter launch command needs to be set as follows:

* Set the working directory: `--notebook-dir=/home/jovyan`. This is because the folder `/home/jovyan` is backed by Kubernetes Persistent Volume (PV)
* Allow Jupyter to listen on all IPs: `--ip=0.0.0.0`
* Allow the user to run the notebook as root: `--allow-root`
* Set port: `--port=8888`
* Disable authentication. Kubeflow takes care of authentication. Use the following to allow passwordless access to Jupyter: `--NotebookApp.token='' --NotebookApp.password=''`
* Allow any origin to access your Jupyter: `--NotebookApp.allow_origin='*'`
* Set base_url: Kubeflow Notebook Controller manages the base URL for the notebook server using the environment variable called `NB_PREFIX`. Your should define the variable in your image and set the value of `base_url` as follows: `--NotebookApp.base_url=NB_PREFIX`

As an example your Dockerfile should contain the following:


```
ENV NB_PREFIX /
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
```

## Building docker images from Jupyter Notebook on GCP

If using Jupyter Notebooks on GKE, you can submit docker image builds to Cloud Build which builds your docker images and pushes them to Google Container Registry.

Activate the attached service account using

```
!gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}
```

If you have a Dockerfile in your current directory, you can submit a build using

```
!gcloud container builds submit --tag gcr.io/myproject/myimage:tag .
```

Advanced build documentation for docker images is available [here](https://cloud.google.com/cloud-build/docs/quickstart-docker)


Your Kubeflow deployment includes support for spawning and managing Jupyter
notebooks. See how to [set up your notebooks](/docs/notebooks/setup/) and
[explore more notebook functionality](/docs/notebooks/).
@@ -67,7 +67,7 @@ cd ${KFAPP}/ks_app
ks generate google-cloud-filestore-pv google-cloud-filestore-pv --name="kubeflow-gcfs" \
--storageCapacity="${GCFS_STORAGE}" \
--serverIP="${GCFS_INSTANCE_IP_ADDRESS}"
ks param set jupyterhub disks "kubeflow-gcfs"
ks param set jupyter disks "kubeflow-gcfs"
```

* **GCFS_STORAGE** The size of the Cloud Filestore persistent volume claim
@@ -53,7 +53,7 @@ For example, to mount additional physical volumes (PVs) in Jupyter:

```
cd ${KF_APP}/ks_app
ks param set jupyterhub disks "kubeflow-gcfs"
ks param set jupyter disks "kubeflow-gcfs"
```

You can then redeploy using `kfctl`:
@@ -65,7 +65,7 @@ Here's an overview of what you accomplish by following this guide:
* Setting up [Kubeflow][kubeflow] in a [GKE][kubernetes-engine]
cluster.

* Testing the code locally using a [Jupyter notebook][jupyterhub].
* Testing the code locally using a [Jupyter notebook][jupyter-notebook].

* Training the model:

@@ -294,38 +294,13 @@ gsutil mb -c regional -l us-central1 gs://${BUCKET_NAME}
The sample you downloaded contains all the code you need. If you like, you
can experiment with and test the code in a Jupyter notebook.

The Kubeflow deployment includes [JupyterHub][jupyterhub] and a
corresponding load balancer service. You can choose to connect to JupyterHub
using the Kubeflow URL or locally.
The Kubeflow deployment includes services for spawning and managing
[Jupyter notebooks][jupyter-notebook].

1. Choose one of the options below to connect to JupyterHub:

* Click **JUPYTERHUB** on the Kubeflow UI (see screenshot above).
* Alternatively, follow the
[Kubeflow guide to Jupyter notebooks][kubeflow-jupyter] to connect
to JupyterHub locally.

1. Click **Start My Server** if prompted to do so.

1. Leave the **Image** details at the default setting on the JupyterHub
**Spawner Options** page. The default gives you a standard CPU image
with a recent version of TensorFlow.

1. Click **Spawn**.

It takes a few minutes for the notebook server to start.
After a minute or so, you should see a message on the web page:

```
Your server is starting up.
You will be redirected automatically when it's ready for you.
```

You should also see an event log which you can check periodically
while the server starts.

When the server is ready, the Jupyter notebook dashboard opens in your
browser.
1. Follow the [Kubeflow notebooks setup guide](/docs/notebooks/setup/) to
create a Jupyter notebook server and open the Jupyter UI.
Accept the default settings when configuring your notebook server. The
default configuration gives you a standard CPU image with a recent version of TensorFlow.

1. Create a new notebook by clicking **New > Python 2** on the Jupyter
dashboard.
@@ -999,7 +974,6 @@ using the [GCP Console][gcp-console].
[ks-apply]: https://github.com/ksonnet/ksonnet/blob/master/docs/cli-reference/ks_apply.md

[flask]: http://flask.pocoo.org/
[jupyterhub]: https://jupyter.org/hub

[kubeflow]: https://www.kubeflow.org/
[kubeflow-core]: https://github.com/kubeflow/kubeflow/tree/master/kubeflow/core
@@ -1009,6 +983,6 @@ using the [GCP Console][gcp-console].

[deploy-script]: https://github.com/kubeflow/kubeflow/blob/master/scripts/gke/deploy.sh

[jupyterhub]: http://jupyter.org/hub
[kubeflow-jupyter]: /docs/components/jupyter/
[jupyter-notebook]: https://jupyter-notebook.readthedocs.io
[kubeflow-jupyter]: /docs/notebooks/
[jupyter-nbviewer]: https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#notebook-user-interface
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,5 @@
+++
title = "Jupyter Notebooks"
description = "Using Jupyter notebooks in Kubeflow"
weight = 35
+++
@@ -0,0 +1,85 @@
+++
title = "Create a Custom Jupyter Image"
description = "Creating a custom Docker image for your Jupyter notebook"
weight = 30
+++

This guide tells you how to configure a custom Docker image for your Jupyter
notebook server in Kubeflow.

Your custom image must meet the requirements of the Kubeflow notebook
controller which manages the life cycle of notebooks. The Kubeflow UI expects
Jupyter to start after launching the Docker image with `docker run`. You must
therefore set the default command of your Docker image to launch Jupyter.

Follow these steps to configure the launch command (`CMD`) in your Docker image:

* Set the working directory:

```
--notebook-dir=/home/jovyan
```

The `/home/jovyan` directory is backed by a
[Kubernetes persistent volume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).

* Allow Jupyter to listen on all IP addresses:

```
--ip=0.0.0.0
```

* Allow the user to run the notebook as root:

```
--allow-root
```

* Set the port:

```
--port=8888
```

* Disable authentication. (Kubeflow takes care of authentication based on
the type of authentication selected during deployment of Kubeflow. After
authentication to Kubeflow, users can access all Kubeflow components
from the UI, including notebooks.)

Use the following setting to allow passwordless access to your Jupyter
notebook servers:

```
--NotebookApp.token='' --NotebookApp.password=''
```

* Allow any origin to access your Jupyter notebook server:


```
--NotebookApp.allow_origin='*'
```

* Set the base URL. The Kubeflow notebook controller manages the base URL for
the notebook server using the environment variable called `NB_PREFIX`. Your
Docker image should define the variable and set the value of `base_url` as
follows:

```
--NotebookApp.base_url=NB_PREFIX
```

Below is an example of what your Dockerfile should contain:


```
ENV NB_PREFIX /
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
```

## Next steps

When starting a Jupyter notebook server from the
Kubeflow UI, specify your custom Docker image. See the guide to [setting up
your Jupyter notebooks](/docs/notebooks/setup/).
Oops, something went wrong.

0 comments on commit 4e1d8ab

Please sign in to comment.
You can’t perform that action at this time.