Skip to content

Commit 4e1d8ab

Browse files
sarahmaddoxk8s-ci-robot
authored andcommitted
Refactored and significantly updated the Jupyter notebooks docs (#633)
* WIP Initial commit of refactored notebooks section. * Minor additions. * Finished draft of notebooks quickstart. * Renamed some files and clarified the notebooks setup doc. * Updated the guide to custom notebook images. * A couple more screenshots plus text refinements. * Fixed references to JupyterHub throughout the docs. * Updated in response to review comments. * One more link fix. * Clarified auth in custom notebooks doc.
1 parent b549c91 commit 4e1d8ab

22 files changed

+489
-210
lines changed

content/_index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ <h4 class="section-head">What is Kubeflow?</h4>
3434
</div>
3535
<div class="text">
3636
<h4>Notebooks</h4>
37-
<p>A JupyterHub to create and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.</p>
37+
<p>Services to create and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.</p>
3838
</div>
3939
</div>
4040

content/docs/about/kubeflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Kubeflow started as an open sourcing of the way Google ran [TensorFlow](https://
5353

5454
## Notebooks
5555

56-
Included in Kubeflow is [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) to create and manage multi-user interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.
56+
Kubeflow includes services for spawning and managing [Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/latest/). Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.
5757

5858
## Using Kubeflow
5959

content/docs/components/jupyter.md

Lines changed: 3 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -4,139 +4,6 @@ description = "Using Jupyter notebooks in Kubeflow"
44
weight = 10
55
+++
66

7-
## Bringing up a Jupyter Notebook
8-
9-
1. To connect to Jupyter follow the [instructions](/docs/other-guides/accessing-uis)
10-
to access the Kubeflow UI. From there you will be able to navigate to JupyterHub
11-
![JupyterHub Link](/docs/images/jupyterlink.png)
12-
1. Sign in
13-
* On GCP you sign in using your Google Account
14-
* If you are already logged into your Google Account you may not
15-
be prompted to login again
16-
* On all other platforms you can sign in using any username/password
17-
1. Click the "Start My Server" button, and you will be greeted by a dialog screen.
18-
1. Select a CPU or GPU image from the Image dropdown menu depending on whether you are doing CPU or GPU training, or whether or not you have GPUs in your cluster. We currently offer a cpu and gpu image for each tensorflow minor version(eg: 1.4.1,1.5.1,1.6.0). Or you can type in the name of any TF image you want to run.
19-
1. Allocate memory, CPU, GPU, or other resources according to your need (1 CPU and 2Gi of Memory are good starting points)
20-
* To allocate GPUs, make sure that you have GPUs available in your cluster
21-
* Run the following command to check if there are any nvidia gpus available:
22-
`kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"`
23-
* If you have GPUs available, you can schedule your server on a GPU node by specifying the following json in `Extra Resource Limits` section: `{"nvidia.com/gpu": "1"}`
24-
1. Click Spawn
25-
26-
* The images are 10's of GBs in size and can take a long time to download
27-
depending on your network connection
28-
29-
* You can check the status of your pod by doing
30-
31-
```
32-
kubectl -n ${NAMESPACE} describe pods jupyter-${USERNAME}
33-
```
34-
35-
* Where ${USERNAME} is the name you used to login
36-
* **GKE users** if you have IAP turned on the pod will be named differently
37-
38-
* If you signed on as USER@DOMAIN.EXT the pod will be named
39-
40-
```
41-
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
42-
```
43-
44-
1. You should now be greeted with a Jupyter Notebook interface.
45-
46-
The image supplied above can be used for training Tensorflow models with Jupyter. The images include all the requisite plugins, including [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) that you can use for rich visualizations and insights into your models.
47-
48-
To test the install, we can run a basic hello world (adapted from [mnist_softmax.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py) )
49-
50-
```
51-
from tensorflow.examples.tutorials.mnist import input_data
52-
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
53-
54-
import tensorflow as tf
55-
56-
x = tf.placeholder(tf.float32, [None, 784])
57-
58-
W = tf.Variable(tf.zeros([784, 10]))
59-
b = tf.Variable(tf.zeros([10]))
60-
61-
y = tf.nn.softmax(tf.matmul(x, W) + b)
62-
63-
y_ = tf.placeholder(tf.float32, [None, 10])
64-
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
65-
66-
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
67-
68-
sess = tf.InteractiveSession()
69-
tf.global_variables_initializer().run()
70-
71-
for _ in range(1000):
72-
batch_xs, batch_ys = mnist.train.next_batch(100)
73-
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
74-
75-
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
76-
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
77-
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
78-
```
79-
80-
Paste the example into a new Python 3 Jupyter notebook and execute the code. This should result in a 0.9014 accuracy result against the test data.
81-
82-
Please note that when running on most cloud providers, the public IP address will be exposed to the internet and is an
83-
unsecured endpoint by default. For a production deployment with SSL and authentication, refer to the [documentation](https://github.com/kubeflow/kubeflow/tree/{{< params "githubbranch" >}}/components/jupyterhub).
84-
85-
86-
## Submitting k8s resources from Jupyter Notebook
87-
88-
The Jupyter Notebook pods are assigned the `jupyter-notebook` service account. This service account is bound to `jupyter-notebook` role which has namespace-scoped permissions to the following k8s resources:
89-
90-
* pods
91-
* deployments
92-
* services
93-
* jobs
94-
* tfjobs
95-
* pytorchjobs
96-
97-
This means that you can directly create these k8s resources directly from your jupyter notebook. kubectl is already installed in the notebook, so you can create k8s resources running the following command in a jupyter notebook cell
98-
99-
```
100-
!kubectl create -f myspec.yaml
101-
```
102-
## Creating a custom Jupyter image
103-
You can create your own Jupyter image and use it in your Kubeflow cluster.
104-
Your custom image needs to meet the requirements created by Kubeflow Notebook Controller. Kubeflow Notebook Controller manages the life-cycle of notebooks.
105-
Kubeflow Web UI expects the Jupyer to be launched upon running the docker image with only `docker run`. For that you need to set the default command of your image to launch Jupyter. The Jupyter launch command needs to be set as follows:
106-
107-
* Set the working directory: `--notebook-dir=/home/jovyan`. This is because the folder `/home/jovyan` is backed by Kubernetes Persistent Volume (PV)
108-
* Allow Jupyter to listen on all IPs: `--ip=0.0.0.0`
109-
* Allow the user to run the notebook as root: `--allow-root`
110-
* Set port: `--port=8888`
111-
* Disable authentication. Kubeflow takes care of authentication. Use the following to allow passwordless access to Jupyter: `--NotebookApp.token='' --NotebookApp.password=''`
112-
* Allow any origin to access your Jupyter: `--NotebookApp.allow_origin='*'`
113-
* Set base_url: Kubeflow Notebook Controller manages the base URL for the notebook server using the environment variable called `NB_PREFIX`. Your should define the variable in your image and set the value of `base_url` as follows: `--NotebookApp.base_url=NB_PREFIX`
114-
115-
As an example your Dockerfile should contain the following:
116-
117-
118-
```
119-
ENV NB_PREFIX /
120-
121-
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
122-
```
123-
124-
## Building docker images from Jupyter Notebook on GCP
125-
126-
If using Jupyter Notebooks on GKE, you can submit docker image builds to Cloud Build which builds your docker images and pushes them to Google Container Registry.
127-
128-
Activate the attached service account using
129-
130-
```
131-
!gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}
132-
```
133-
134-
If you have a Dockerfile in your current directory, you can submit a build using
135-
136-
```
137-
!gcloud container builds submit --tag gcr.io/myproject/myimage:tag .
138-
```
139-
140-
Advanced build documentation for docker images is available [here](https://cloud.google.com/cloud-build/docs/quickstart-docker)
141-
142-
7+
Your Kubeflow deployment includes support for spawning and managing Jupyter
8+
notebooks. See how to [set up your notebooks](/docs/notebooks/setup/) and
9+
[explore more notebook functionality](/docs/notebooks/).

content/docs/gke/cloud-filestore.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ cd ${KFAPP}/ks_app
6767
ks generate google-cloud-filestore-pv google-cloud-filestore-pv --name="kubeflow-gcfs" \
6868
--storageCapacity="${GCFS_STORAGE}" \
6969
--serverIP="${GCFS_INSTANCE_IP_ADDRESS}"
70-
ks param set jupyterhub disks "kubeflow-gcfs"
70+
ks param set jupyter disks "kubeflow-gcfs"
7171
```
7272
7373
* **GCFS_STORAGE** The size of the Cloud Filestore persistent volume claim

content/docs/gke/customizing-gke.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ For example, to mount additional physical volumes (PVs) in Jupyter:
5353

5454
```
5555
cd ${KF_APP}/ks_app
56-
ks param set jupyterhub disks "kubeflow-gcfs"
56+
ks param set jupyter disks "kubeflow-gcfs"
5757
```
5858

5959
You can then redeploy using `kfctl`:

content/docs/gke/gcp-e2e.md

Lines changed: 9 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Here's an overview of what you accomplish by following this guide:
6565
* Setting up [Kubeflow][kubeflow] in a [GKE][kubernetes-engine]
6666
cluster.
6767

68-
* Testing the code locally using a [Jupyter notebook][jupyterhub].
68+
* Testing the code locally using a [Jupyter notebook][jupyter-notebook].
6969

7070
* Training the model:
7171

@@ -294,38 +294,13 @@ gsutil mb -c regional -l us-central1 gs://${BUCKET_NAME}
294294
The sample you downloaded contains all the code you need. If you like, you
295295
can experiment with and test the code in a Jupyter notebook.
296296
297-
The Kubeflow deployment includes [JupyterHub][jupyterhub] and a
298-
corresponding load balancer service. You can choose to connect to JupyterHub
299-
using the Kubeflow URL or locally.
297+
The Kubeflow deployment includes services for spawning and managing
298+
[Jupyter notebooks][jupyter-notebook].
300299
301-
1. Choose one of the options below to connect to JupyterHub:
302-
303-
* Click **JUPYTERHUB** on the Kubeflow UI (see screenshot above).
304-
* Alternatively, follow the
305-
[Kubeflow guide to Jupyter notebooks][kubeflow-jupyter] to connect
306-
to JupyterHub locally.
307-
308-
1. Click **Start My Server** if prompted to do so.
309-
310-
1. Leave the **Image** details at the default setting on the JupyterHub
311-
**Spawner Options** page. The default gives you a standard CPU image
312-
with a recent version of TensorFlow.
313-
314-
1. Click **Spawn**.
315-
316-
It takes a few minutes for the notebook server to start.
317-
After a minute or so, you should see a message on the web page:
318-
319-
```
320-
Your server is starting up.
321-
You will be redirected automatically when it's ready for you.
322-
```
323-
324-
You should also see an event log which you can check periodically
325-
while the server starts.
326-
327-
When the server is ready, the Jupyter notebook dashboard opens in your
328-
browser.
300+
1. Follow the [Kubeflow notebooks setup guide](/docs/notebooks/setup/) to
301+
create a Jupyter notebook server and open the Jupyter UI.
302+
Accept the default settings when configuring your notebook server. The
303+
default configuration gives you a standard CPU image with a recent version of TensorFlow.
329304
330305
1. Create a new notebook by clicking **New > Python 2** on the Jupyter
331306
dashboard.
@@ -999,7 +974,6 @@ using the [GCP Console][gcp-console].
999974
[ks-apply]: https://github.com/ksonnet/ksonnet/blob/master/docs/cli-reference/ks_apply.md
1000975
1001976
[flask]: http://flask.pocoo.org/
1002-
[jupyterhub]: https://jupyter.org/hub
1003977
1004978
[kubeflow]: https://www.kubeflow.org/
1005979
[kubeflow-core]: https://github.com/kubeflow/kubeflow/tree/master/kubeflow/core
@@ -1009,6 +983,6 @@ using the [GCP Console][gcp-console].
1009983
1010984
[deploy-script]: https://github.com/kubeflow/kubeflow/blob/master/scripts/gke/deploy.sh
1011985
1012-
[jupyterhub]: http://jupyter.org/hub
1013-
[kubeflow-jupyter]: /docs/components/jupyter/
986+
[jupyter-notebook]: https://jupyter-notebook.readthedocs.io
987+
[kubeflow-jupyter]: /docs/notebooks/
1014988
[jupyter-nbviewer]: https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#notebook-user-interface
32.7 KB
Loading
25.4 KB
Loading
98.1 KB
Loading
125 KB
Loading

0 commit comments

Comments
 (0)