## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow).

The task the agent will be learning to perform is to operate a Kuka Robotics arm simulated in the OpenAI Gym Bullet Physics 'KukaBulletEnv-v0' environment. Feel free to [skip to the end](http://localhost:8888/notebooks/kubeflow-rl/apps/agents_ppo/demo.ipynb#Rendering-the-model) to see what this will look like!

### Setup

This demo assumes you've followed the setup steps provided in the readme. Beyond that we still have a few in-notebook steps to perform.

The first are to specify the bucket to which to write logs, the address of the image encapsulating our training job, and the path on the filesystem running this notebook where the root of kubeflow/examples can be found.

For the latter by default the container is built with these at /home/jovyan/kubeflow-examples.

In [2]:
BUCKET="kubeflow-rl" # Your bucket name here!
EXAMPLES_ROOT="/home/jovyan/kubeflow-examples" # Your examples root here if different
NAMESPACE="rl"
PROJECT="kubeflow-rl" # Your project name here!
ZONE="us-east1-d"
CLUSTER="kubeflow-dev" # The name of your cluster here!
APP_NAME="agents"

In [3]:
import os
APP_ROOT=os.path.join(EXAMPLES_ROOT, "agents")

#### Verifying bucket

It may be confusing to debug error that result from running a training job attempting to log from a GCS bucket that doesn't exist or to which you don't have access. So let's verify our configuration in that regard before going any further. The following will create the bucket if it doesn't already exist otherwise will fail indicating the bucket already exists. Importantly if the bucket already exists but we don't have access to it this will be stated as well.

In [4]:
!gsutil mb gs://{BUCKET}

Creating gs://kubeflow-rl/...
ServiceException: 409 Bucket kubeflow-rl already exists.


#### Configuring Kubernetes credentials

Having done so we are ready to obtain the credentials for the Kubernetes cluster on which we will be running the training tasks.

In [5]:
!gcloud container clusters --project={PROJECT} --zone={ZONE} get-credentials {CLUSTER}

Fetching cluster endpoint and auth data.
kubeconfig entry generated for kubeflow-dev.


#### Configure ksonnet

Ksonnet is a tool to simplify configuration management for Kubernetes deployments which extends to making it easier to specify and re-configure distributed TensorFlow training jobs on Kubeflow.

We can initialize a new Ksonnet workspace and populate it with the necessary dependencies by running the following:

In [6]:
os.chdir(os.path.join(APP_ROOT, "app"))
!ks env add default

[34mINFO  [0mUsing context 'gke_kubeflow-rl_us-east1-d_kubeflow-dev' from the kubeconfig file specified at the environment variable $KUBECONFIG
[34mINFO  [0mCreating environment 'default' with namespace '', pointing at server at address 'https://35.229.119.45'
[34mINFO  [0mGenerating environment metadata at path '/home/jovyan/kubeflow-examples/agents/app/environments/default'
[34mINFO  [0mEnvironment 'default' pointing to namespace '' and server address at 'https://35.229.119.45' successfully created


For more information on Ksonnet check out their documentation [here](ksonnet.io).

### Building training image

First let's build the image we'll need to run our training job. We can do this with Google Container Builder as follows:

In [7]:
import datetime
import uuid
now=datetime.datetime.now()
BUILD_ID=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TRAINER_TAG="gcr.io/%s/%s:%s" % (PROJECT, APP_NAME, BUILD_ID)

In [None]:
os.chdir(APP_ROOT)
!gcloud container builds submit -t {TRAINER_TAG} .

### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the provided task. Here we'll launch and monitor a job.

#### Launching the TFJob

We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply a TFJob configuration (i.e. run a job). Here you can change the image to be a custom job image, such as one built and deployed with build.sh, or use the one provided here if you only want to change parameters. Below we'll display the templated job YAML for reference.

In [14]:
study = {"name": "replicated-kuka-demo",
         "experiments": [{"name": "kuka",
                          "image": TRAINER_TAG,
                          "env": "KukaBulletEnv-v0",
                          "num_replicas": 4}]
        }

STUDY_LOGS_ROOT = "gs://{0}/studies/{1}".format(BUCKET, study["name"])

print(STUDY_LOGS_ROOT)

gs://kubeflow-rl/studies/replicated-kuka-demo


In [15]:
import datetime
import uuid
import pprint

os.chdir(os.path.join(APP_ROOT, "app"))

print("Preparing study: %s..." % study["name"])

for experiment in study["experiments"]:
    
    print("Preparing experiment: %s" % experiment["name"])
    
    # Get and set the job container image for this experiment
    IMAGE = experiment["image"]
    !ks param set agents image {IMAGE}

    # Set the gym learning environment on which to train
    ENVIRONMENT = experiment["env"]
    !ks param set agents env {ENVIRONMENT}

    # Set the algorithm and network part to use for policy and value networks
    !ks param set agents algorithm "agents.ppo.PPOAlgorithm"
    !ks param set agents network "agents.scripts.networks.feed_forward_gaussian"

    # Run in training mode with 30 CPU and 30 agents for 20M steps
    !ks param set agents run_mode train
    !ks param set agents num_cpu 30
    !ks param set agents num_agents 30
    !ks param set agents steps 15e6

    !ks param set agents update_every 60
    !ks param set agents eval_episodes 25

    for replica_id in range(experiment["num_replicas"]):

        # Construct a unique name for the training job based on experiment["name"]
        now=datetime.datetime.now()
        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        BASE_NAME = experiment["name"]
        TRAIN_JOB_NAME=BASE_NAME + "-" + JOB_SALT
        !ks param set agents name {TRAIN_JOB_NAME}

        # Construct a log dir path for this experiment
        LOG_DIR="{0}/{1}".format(STUDY_LOGS_ROOT, TRAIN_JOB_NAME)
        !ks param set agents log_dir {LOG_DIR}

        if "replicas" not in experiment:
            experiment["replicas"] = []
        experiment["replicas"].append({"log_dir": LOG_DIR})
        
        print("Preparing replica %s of %s for experiment %s" % (replica_id + 1, experiment["replicas"], experiment["name"]))
        !ks apply default -c agents

Preparing study: replicated-kuka-demo...
Preparing experiment: kuka
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:0221-2315-5b40"' for component 'agents'
[34mINFO  [0mParameter 'env' successfully set to '"KukaBulletEnv-v0"' for component 'agents'
[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents'
[34mINFO  [0mParameter 'network' successfully set to '"agents.scripts.networks.feed_forward_gaussian"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'num_agents' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'steps' successfully set to '15e6' for component 'agents'
[34mINFO  [0mParameter 'update_every' successfully set to '60' for component 'agents'
[34mINFO  [0mParameter 'eval_episodes' successful

In [16]:
study

{'experiments': [{'env': 'KukaBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:0221-2315-5b40',
   'name': 'kuka',
   'num_replicas': 4,
   'replicas': [{'log_dir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-603f'},
    {'log_dir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-195e'},
    {'log_dir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-36d1'},
    {'log_dir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-afdd'}]}],
 'name': 'replicated-kuka-demo'}

Now we can list tfjobs and see that a job has been created.

In [17]:
!kubectl get tfjobs -n rl

NAME                  AGE
kuka-0221-2329-195e   7s
kuka-0221-2329-36d1   5s
kuka-0221-2329-603f   9s
kuka-0221-2329-afdd   3s


#### Monitoring training

The IDs, status, and other metadata of pods involved in the training job can be displayed using the following:

In [24]:
!kubectl get pods -n rl --show-all

NAME                                        READY     STATUS    RESTARTS   AGE
kubeflow-train-l245f-607462752              1/3       Error     0          7d
kuka-0221-2329-195e-master-u4ky-0-9rgfh     1/1       Running   0          23s
kuka-0221-2329-36d1-master-yu6l-0-7z4fd     0/1       Pending   0          21s
kuka-0221-2329-603f-master-30qx-0-9ctf2     1/1       Running   0          25s
kuka-0221-2329-afdd-master-kbn7-0-f67rb     0/1       Pending   0          19s
tboard-0221-1650-51ca-tb-fd9fb675f-5mt79    1/1       Running   0          6h
tboard-0221-1654-ccf4-tb-75b64479f9-cnnw6   1/1       Running   0          6h
tboard-0221-1704-a7e4-tb-584968fc-5cztf     1/1       Running   0          6h


In [30]:
TRAIN_JOB_NAME

'kuka-0221-2329-afdd'

Obtain the ID of the master pod and print logs

In [32]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'kuka-0221-2329-afdd-master-kbn7-0-f67rb'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Start a new run and write summaries and checkpoints to gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-195e.
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-2329-195e',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7f321b8cbde8>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.AdamOptimizer'>,
 'polic

##### Launching tensorboard

**NOTE:** There is currently a bug where launching TensorBoard immeditatly after starting the training job will lead to your TensorBoard deployment not displaying any summaries despite these being available later in the run. So either hold off for a few minutes before proceeding to the following or if you find that your TensorBoard deployment only displays a TensorFlow graph try repeating the following 5-10min later in the run as well as examining the logs as above to check the run is proceeding as expected.

The following will create a tensorboard deployment using the STUDY_LOGS_ROOT as the source of tensorboard logs.

In [33]:
HPARAM_SET="tboard"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TBOARD_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
SECRET_NAME="gcp-credentials"
SECRET_FILE_NAME="secret.json"
NAMESPACE="rl"

!ks param set tensorboard name {TBOARD_JOB_NAME}
!ks param set tensorboard namespace {NAMESPACE}
!ks param set tensorboard log_dir {STUDY_LOGS_ROOT}
!ks param set tensorboard secret {SECRET_NAME}
!ks param set tensorboard secret_file_name {SECRET_FILE_NAME}
!ks show default -c tensorboard

!ks apply default -c tensorboard

[34mINFO  [0mParameter 'name' successfully set to '"tboard-0221-2330-5c5c"' for component 'tensorboard'
[34mINFO  [0mParameter 'namespace' successfully set to '"rl"' for component 'tensorboard'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl/studies/replicated-kuka-demo"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret' successfully set to '"gcp-credentials"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret_file_name' successfully set to '"secret.json"' for component 'tensorboard'
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: tboard-0221-2330-5c5c-tb
  namespace: rl
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: tensorboard
        tb-job: tboard-0221-2330-5c5c
      name: tboard-0221-2330-5c5c
      namespace: rl
    spec:
      containers:
      - command:
        - /usr/local/bin/tensorboard
        - --logdir=gs://kubeflow-rl/studies/replicated-kuka-demo
        - --port=80
        en

##### Connecting to Tensorboard

To connect to TensorBoard from a browser running on your local machine start the kubernetes proxy with `kubectl proxy` then access TensorBoard via the url given by evaluating the next cell

In [34]:
PROXY_PORT=8001
url=("http://127.0.0.1:{proxy_port}/api/v1/proxy/namespaces/{namespace}/services/{service_name}:80/".format(
    proxy_port=PROXY_PORT, namespace=NAMESPACE, service_name=TBOARD_JOB_NAME + "-tb"))
print(url)

http://127.0.0.1:8001/api/v1/proxy/namespaces/rl/services/tboard-0221-2330-5c5c-tb:80/


Note that if this is the first time you've launched a TensorBoard deployment there will be a delay in the availability of the TensorBoard UI while the container is being pulled. You obtain further details using `kubectl describe ...` or by examining the training pod logs (such as to see that the run has not yet progressed to the point where summaries would have been written).

Below is a screen grab from Tensorboard of the mean_score for four identically parameterized workers training on the Kuka environment:

![](tboard_mean_score.png)

As you can see there is variability both between and within workers but some still seem to reach stably high performance.

## Rendering!

Let's take a look at the trained model performing the task!!

#### Initiating render jobs

Here we'll create a render job for each of the experiments in our study. This will generate multiple MP4 videos of the agent performing the task for each replica run with renders landing in ${LOG_DIR}/render/[some_unique_subdir_name].

In [52]:
import datetime
import uuid
import os

os.chdir(os.path.join(APP_ROOT, "app"))

for experiment in study["experiments"]:
    for replica in experiment["replicas"]:
        
        LOG_DIR = replica["log_dir"]
        IMAGE = experiment["image"]
        
        now=datetime.datetime.now()

        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        RENDER_JOB_NAME="render-" + JOB_SALT
        !ks param set agents_render name {RENDER_JOB_NAME}

        !ks param set agents_render log_dir {LOG_DIR}
        !ks param set agents_render image {IMAGE}
        
        !ks apply default -c agents_render

[34mINFO  [0mParameter 'name' successfully set to '"render-0221-1705-4149"' for component 'agents_render'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc"' for component 'agents_render'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:0221-1635-d869"' for component 'agents_render'
[34mINFO  [0mUpdating tfjobs rl.render-0221-1705-4149
[34mINFO  [0mCreating non-existent tfjobs rl.render-0221-1705-4149


It will take about 15min for each render to run. We can see whether the renders are complete by checking the third column in the following:

In [43]:
!kubectl get jobs -n rl --show-all

NAME                                  DESIRED   SUCCESSFUL   AGE
kuka-0221-1650-31dc-master-swbw-0     1         0            3m
render-0221-1651-b45a-master-v278-0   1         0            2m
render-0221-1653-6acb-master-rs5y-0   1         0            4s


First let's get the GCS path of the directory containing renders for an experiment of interest. For a reminder, here's the structure of our study:

In [44]:
study

{'experiments': [{'env': 'KukaBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:0221-1635-d869',
   'name': 'kuka',
   'num_replicas': 1,
   'replicas': [{'log_dir': 'gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc'}]}],
 'name': 'replicated-kuka-demo'}

Let's take a look at the renders for the first replica of the first experiment. First let's take a look at the log dir and render/ subdir of that experiment:

In [54]:
kuka_experiment_logdir = study["experiments"][0]["replicas"][0]["log_dir"]
!gsutil ls {kuka_experiment_logdir}
!gsutil ls {kuka_experiment_logdir}/render

gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/checkpoint
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/config.yaml
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/model.ckpt-850020.data-00000-of-00001
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/model.ckpt-850020.index
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/model.ckpt-850020.meta
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/eval/
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/render/
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/train/
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/render/0221-1651-aad9/
gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/render/0221-1653-6a90/


Now let's specify exactly which render directory from which to pull a render for inspection. In my case above the render dir would be "gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0208-1010-4f3a/render/0208-2228-7d1d/":

In [35]:
#RENDER_DIR="gs://[your bucket]/studies/[study name]/[job name]/render/[render id]" e.g.
RENDER_DIR="gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/render/0221-1653-6a90/"

Now let's take a look at one of those renders:

In [36]:
!mkdir -p /tmp/agents-render
!gsutil cp `gsutil ls {RENDER_DIR} | grep mp4 | head -n1` /tmp/agents-render/render.mp4

Copying gs://kubeflow-rl/studies/replicated-kuka-demo/kuka-0221-1650-31dc/render/0221-1653-6a90/openaigym.video.0.44.video000000.mp4...
/ [1 files][195.7 KiB/195.7 KiB]                                                
Operation completed over 1 objects/195.7 KiB.                                    


#### Inspecting the result

When the job is complete there will be a subdirectory of the log dir named "render" with a number of short videos of episodes of the agent performing the grasping task. Here's an example of what one of those looks like in a well-trained model.

In [37]:
import io
import base64
from IPython.display import HTML

mp4_path = '/tmp/agents-render/render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Great job! 🎉🎉🎉

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments (from the ID fields [here](https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/__init__.py)) and tweet your results! E.g.
    - RacecarBulletEnv-v0
    - MinitaurBulletDuckEnv-v0
    - HalfCheetahBulletEnv-v0
- Take a shot at implementing your own gym learning environment and repeat the above.