diff --git a/user_guide.md b/user_guide.md index b7a6374d484..1be7b320f07 100644 --- a/user_guide.md +++ b/user_guide.md @@ -1,21 +1,26 @@ -# Using Kubeflow +# Using Kubeflow -If you are unfamiliar with ksonnet you may want to start by reading the [tutorial](https://ksonnet.io/docs/tutorial) +This guide will walk you through the basics of deploying and interacting with Kubeflow. A basic understanding of Kubernetes, Tensorflow, and Ksonnet are useful in understanding the contents of this guide. -## Requirements +* [Kubernetes](https://kubernetes.io/docs/tutorials/kubernetes-basics/) +* [Tensorflow](https://www.tensorflow.org/get_started/) +* [Ksonnet](https://ksonnet.io/docs/tutorial) - * ksonnet version [0.8.0](https://ksonnet.io/#get-started) or later. - * See [below](#why-kubeflow-uses-ksonnet) for an explanation of why we use ksonnet - * Kubernetes >= 1.8 [see here](https://github.com/tensorflow/k8s#requirements) +## Requirements + * Kubernetes >= 1.8 [see here](https://github.com/tensorflow/k8s#requirements) + * ksonnet version [0.8.0](https://ksonnet.io/#get-started) or later. (See [below](#why-kubeflow-uses-ksonnet) for an explanation of why we use ksonnet) ## Deploy Kubeflow -Initialize a directory to contain your deployment +We will be using Ksonnet to deploy kubeflow into your cluster. + +Initialize a directory to contain your ksonnet application. + ``` ks init my-kubeflow ``` -Install the Kubeflow packages +Install the Kubeflow packages into your application. ``` cd my-kubeflow @@ -25,7 +30,6 @@ ks pkg install kubeflow/tf-serving ks pkg install kubeflow/tf-job ``` - Create the Kubeflow core component. The core component includes * JupyterHub * TensorFlow job controller @@ -36,45 +40,46 @@ NAMESPACE=kubeflow kubectl create namespace ${NAMESPACE} ks generate core kubeflow-core --name=kubeflow-core --namespace=${NAMESPACE} ``` - * Feel free to change the namespace to value that better suits your environment. + * Feel free to change the namespace to a value that better suits your kubernetes cluster. + +Ksonnet allows us to parameterize the Kubeflow deployment according to our needs. We will define two environments: nocloud, and cloud. -Define an environment that doesn't use any Cloud features - * This environment could be used for minikube or a full K8s cluster that doesn't depend on a cloud features. ``` ks env add nocloud +ks env add cloud ``` -The default Kubeflow deployment will be suitable for this no cloud environment so you can just deploy the core components +The `nocloud` environment can be used for minikube or other basic k8s clusters, the `cloud` environment will be used for GKE in this guide. + +If using GKE, we can configure our cloud environment to use GCP features with a single parameter: ``` -ks apply nocloud -c kubeflow-core +ks param set kubeflow-core cloud gke --env=cloud ``` -If the user is running on a Cloud they could create an environment for this. +Now let's set `${KF_ENV}` to `cloud` or `nocloud` to reflect our environment for the rest of the guide: ``` -ks env add cloud -ks param set kubeflow-core cloud gke --env=cloud -``` - * The cloud parameter triggers a set of curated cloud configs. +$ KF_ENV=cloud|nocloud +``` -They can then deploy to this environment +And apply the components to our Kubernetes cluster ``` -ks apply cloud -c kubeflow-core +ks apply ${KF_ENV} -c kubeflow-core ``` -At any time you can inspect the manifests for a particular component using `ks show` e.g +At any time you can inspect the kubernetes objects definitions for a particular ksonnet component using `ks show` e.g ``` -ks show cloud -c kubeflow-core +ks show ${KF_ENV} -c kubeflow-core ``` ### Bringing up a Notebook -Once you've deployed JupyterHub, a load balancer service is created. You can check its existence using the kubectl command line. +The kubeflow-core component deployed JupyterHub and a corresponding load balancer service. You can check its status using the kubectl command line. ```commandline kubectl get svc @@ -95,21 +100,53 @@ http://xx.yy.zz.ww:31942 For some cloud deployments, the LoadBalancer service may take up to five minutes display an external IP address. Re-executing `kubectl get svc` repeatedly will eventually show the external IP field populated. -Once you have an external IP, you can proceed to visit that in your browser. The hub by default is configured to take any username/password combination. After entering the username and password, you can start a single-notebook server, -request any resources (memory/CPU/GPU), and then proceed to perform single node training. +Once you have an external IP, you can proceed to visit that in your browser. You should see a sign in prompt. -We also ship standard docker images that you can use for training Tensorflow models with Jupyter. +1. Sign in using any username/password +1. Click the "Start My Server" button, you will be greeted by a dialog screen. + 1. Set the image to `gcr.io/kubeflow/tensorflow-notebook-cpu:v1` or `gcr.io/kubeflow/tensorflow-notebook-gpu:8fbc341245695e482848ac3c2034a99f7c1e5763` depending on whether doing CPU or GPU training, or whether or not you have GPUs in your cluster. + 1. Allocate memory, CPU, GPU, or other resources according to your need (1 CPU and 2Gi of Memory are good starting points) + 1. Click Spawn +1. Eventually you should now be greeted with a Jupyter interface. Note that the GPU image is several gigabytes in size and may take a few minutes to download and start. -* gcr.io/kubeflow/tensorflow-notebook-cpu -* gcr.io/kubeflow/tensorflow-notebook-gpu +The image supplied above can be used for training Tensorflow models with Jupyter. The images include all the requisite plugins, including [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) that you can use for rich visualizations and insights into your models. -In the spawn window, when starting a new Jupyter instance, you can supply one of the above images to get started, depending on whether -you want to run on CPUs or GPUs. The images include all the requisite plugins, including [Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) that you can use for rich visualizations and insights into your models. -Note that GPU-based image is several gigabytes in size and may take a few minutes to localize. +To test the install, we can run a basic hello world (adapted from [mnist_softmax.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py) ) -Also, when running on Google Kubernetes Engine, the public IP address will be exposed to the internet and is an -unsecured endpoint by default. For a production deployment with SSL and authentication, refer to the [documentation](components/jupyterhub). +``` +from tensorflow.examples.tutorials.mnist import input_data +mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) + +import tensorflow as tf +x = tf.placeholder(tf.float32, [None, 784]) + +W = tf.Variable(tf.zeros([784, 10])) +b = tf.Variable(tf.zeros([10])) + +y = tf.nn.softmax(tf.matmul(x, W) + b) + +y_ = tf.placeholder(tf.float32, [None, 10]) +cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) + +train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy) + +sess = tf.InteractiveSession() +tf.global_variables_initializer().run() + +for _ in range(1000): + batch_xs, batch_ys = mnist.train.next_batch(100) + sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) + +correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) +accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) +print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) +``` + +Paste the example into a new Python 3 Jupyter notebook and execute the code, this should result in a 0.9014 accuracy result against the test data. + +Please note that when running on most cloud providers, the public IP address will be exposed to the internet and is an +unsecured endpoint by default. For a production deployment with SSL and authentication, refer to the [documentation](components/jupyterhub). ### Serve a model @@ -124,12 +161,22 @@ MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=${NAMESPACE} --model_path=${MODEL_PATH} ``` -Deploy it in a particular environment. The deployment will pick up environment parameters (e.g. cloud) and customize the deployment appropriately +Deploy the model component. Ksonnet will pick up existing parameters for your environment (e.g. cloud, nocloud) and customize the resulting deployment appropriately + +``` +ks apply ${KF_ENV} -c ${MODEL_COMPONENT} +``` + +As before, a few pods and services have been created in your cluster. You can get the inception serving endpoint by querying kubernetes: ``` -ks apply cloud -c ${MODEL_COMPONENT} +kubectl get svc inception +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +inception LoadBalancer 10.35.255.136 ww.xx.yy.zz 9000:30936/TCP 28m ``` +In this example, you should be able to use the inception_client to hit ww.xx.yy.zz:9000 + ### Submiting a TensorFlow training job We treat each TensorFlow job as a [component](https://ksonnet.io/docs/tutorial#2-generate-and-deploy-an-app-component) in your APP. @@ -162,8 +209,7 @@ to directly edit the `params.libsonnet` file directly. To run your job ``` -ENVIRONMENT=cloud -ks apply ${ENVIRONMENT} -c ${JOB_NAME} +ks apply ${KF_ENV} -c ${JOB_NAME} ``` For information on monitoring your job please refer to the [TfJob docs](https://github.com/tensorflow/k8s#monitoring-your-job). @@ -182,7 +228,7 @@ ks generate tf-cnn ${CNN_JOB_NAME} --name=${CNN_JOB_NAME} --namespace=${NAMESPAC Submit it ``` -ks apply ${ENVIRONMENT} -c ${CNN_JOB_NAME} +ks apply ${KF_ENV} -c ${CNN_JOB_NAME} ``` The prototype provides a bunch of parameters to control how the job runs (e.g. use GPUs run distributed etc...). To see a list of paramets