## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow). The agent will be learning to operate a simulated robotic arm.

### Setup

This demo assumes you've followed the setup steps provided in the README file.

Firstly we need to provide the path on our filesystem where the examples can be found, the namespace in which to run jobs (and in which NFS is deployed, and the ID and mount path for our NFS volume.

In [117]:
EXAMPLES_ROOT="/home/jovyan/kubeflow-examples" # Your examples root here if different
NAMESPACE="kubeflow"
APP_NAME="agents"
PROJECT="kubeflow-rl" # Your gcloud project here!
NFS_CLAIM_ID="nfs-1"
NFS_MOUNT_PATH="/mnt/nfs-1"

In [118]:
import os
APP_ROOT=os.path.join(EXAMPLES_ROOT, "agents")

#### Configure ksonnet

Ksonnet is a tool to simplify configuration management for Kubernetes deployments which extends to making it easier to specify and re-configure distributed TensorFlow training jobs on Kubeflow. This example ships with a Ksonnet workspace in the app subdirectory. In order to use it first we'll need to register our Kubernetes cluster by running `ks env set default` from the root of the Ksonnet app.

In [None]:
os.chdir(os.path.join(APP_ROOT, "app"))
!ks env add default
# Did the above fail? That means you already have a default environment.
# Try running `ks env set default` instead!

For more information on Ksonnet check out their documentation [here](ksonnet.io).

### Building training image

Currently in order to train our model on Kubeflow we'll need to bundle our local workspace into a Docker container. We can perform such a build using Google Container Builder. Running this command requires that the current workspace has been authenticated with the Google Cloud Platform. In the future this will not be required.

In [120]:
import datetime
import uuid

def gen_trainer_tag(project, app_name, registry_target="gcr.io"):
    now=datetime.datetime.now()
    build_id=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
    return "%s/%s/%s:%s" % (registry_target, project, app_name, build_id)

TRAINER_TAG = gen_trainer_tag(PROJECT, APP_NAME)

In [121]:
LAST_CACHE_FROM="tensorflow/tensorflow:1.4.1"

In [122]:
import yaml
import logging
import tempfile

def _generate_gcb_config(trainer_tag, build_dir, cache_from=LAST_CACHE_FROM):
    if not os.path.exists(build_dir):
        raise ValueError("Can't find provided build directory: %s" % build_dir)
    os.chdir(build_dir)
        
    build_config = {
        "steps": [
            {
                "name": "gcr.io/cloud-builders/docker",
                "args": [
                    "pull", cache_from
                ]
            },
            {
                "name": "gcr.io/cloud-builders/docker",
                "args": [
                    "build",
                    "--cache-from",
                    cache_from,
                    "-t", trainer_tag,
                    "."
                ]
            }
        ],
        "images": [trainer_tag]
    }
    
    d = tempfile.mkdtemp()
    
    output_config_path = os.path.join(d, "build.yaml")
    if build_dir is not None:
        output_config_path = os.path.join(build_dir, output_config_path)
    with open(output_config_path, "w") as f:
        f.write(yaml.dump(build_config))

    logging.info("Generated build config: %s" % output_config_path)
    return output_config_path
        
BUILD_CONFIG_PATH = _generate_gcb_config(TRAINER_TAG, APP_ROOT, LAST_CACHE_FROM)
BUILD_CONFIG_PATH

'/tmp/tmp_dzp1ks3/build.yaml'

In [None]:
os.chdir(APP_ROOT)
!gcloud container builds submit --config {BUILD_CONFIG_PATH} .
# This is still slow because GCB doesn't cache images, has to pull multi-GB image every time

In [124]:
# Only run this cell if the build above was successful!
LAST_CACHE_FROM=TRAINER_TAG

The time to build this image is rather long on account of the time it takes to pip install pybullet and the lack of caching from previous builds. Hang tight as we're working on a much faster method to ship workspaces to speed up this process.

### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the robot arm control task (picking up blocks). Here we'll launch and monitor such a job.

#### Launching the TFJob

In local jargon a Study is simply collection of (probably related) Experiments which are themselves simply sets of parameters used for separate training runs. Here we'll run a study of the repeatability of training runs performed with identical parameter sets. As you can imaging there is a lot of room for expressing and carrying out more interesting structured studies.

In [135]:
study = {"name": "replicated-kuka-demo",
         "experiments": [{"name": "kuka",
                          "image": TRAINER_TAG,
                          "env": "KukaBulletEnv-v0",
                          "num_replicas": 4}]
        }

STUDY_LOGS_ROOT = "/mnt/nfs-1/train_dirs/studies/{0}".format(study["name"])
os.makedirs(STUDY_LOGS_ROOT, exist_ok=True)

print(STUDY_LOGS_ROOT)

/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo


Now we'll use [ksonnet](https://ksonnet.io/) to run a TFJob for each experiment in our study. Here when `ks param set` is called a parameter is written to the corresponding section of app/components/params.libsonnet. Note that parameters set from previous runs do carry over from previous usages of components.

A YAML form of the configuration for any component can be displayed using `ks show {environment e.g. default} -c {component name e.g. train}`. Jobs are submitted via `ks apply {environment} -c {component name}`.

In [136]:
import datetime
import uuid
import pprint

os.chdir(os.path.join(APP_ROOT, "app"))

print("Preparing study: %s..." % study["name"])

for experiment in study["experiments"]:
    
    print("Preparing experiment: %s" % experiment["name"])
    
    # Get and set the job container image for this experiment
    IMAGE = experiment["image"]
    !ks param set train image {IMAGE}

    # Set the gym learning environment on which to train
    ENVIRONMENT = experiment["env"]
    !ks param set train env {ENVIRONMENT}
    
    !ks param set train namespace {NAMESPACE}

    # Set the algorithm and network part to use for policy and value networks
    !ks param set train algorithm "agents.ppo.PPOAlgorithm"
    !ks param set train network "agents.scripts.networks.feed_forward_gaussian"

    # Run in training mode with 30 CPU and 30 agents for 20M steps
    !ks param set train run_mode train
    !ks param set train num_cpu 30
    !ks param set train num_agents 30
    !ks param set train steps 15e6

    !ks param set train update_every 60
    !ks param set train eval_episodes 25

    for replica_id in range(experiment["num_replicas"]):

        # Construct a unique name for the training job based on experiment["name"]
        now=datetime.datetime.now()
        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        BASE_NAME = experiment["name"]
        TRAIN_JOB_NAME=BASE_NAME + "-" + JOB_SALT
        !ks param set train name {TRAIN_JOB_NAME}

        # Construct a log dir path for this experiment
        LOG_DIR="{0}/{1}".format(STUDY_LOGS_ROOT, TRAIN_JOB_NAME)
        !ks param set train log_dir {LOG_DIR}

        if "replicas" not in experiment:
            experiment["replicas"] = []
        experiment["replicas"].append({"log_dir": LOG_DIR})
        
        print("Preparing replica %s of %s for experiment %s" % (replica_id + 1, experiment["replicas"], experiment["name"]))
        !ks apply default -c train

Preparing study: replicated-kuka-demo...
Preparing experiment: kuka
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:0405-1658-39bf"' for component 'train'
[34mINFO  [0mParameter 'env' successfully set to '"KukaBulletEnv-v0"' for component 'train'
[34mINFO  [0mParameter 'namespace' successfully set to '"kubeflow"' for component 'train'
[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'train'
[34mINFO  [0mParameter 'network' successfully set to '"agents.scripts.networks.feed_forward_gaussian"' for component 'train'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'train'
[34mINFO  [0mParameter 'num_cpu' successfully set to '30' for component 'train'
[34mINFO  [0mParameter 'num_agents' successfully set to '30' for component 'train'
[34mINFO  [0mParameter 'steps' successfully set to '15e6' for component 'train'
[34mINFO  [0mParameter 'update_every' successfully se

The following shows our updated study config with the log directories to which each of our experiments will be writing TensorBoard logs.

In [137]:
study

{'experiments': [{'env': 'KukaBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:0405-1658-39bf',
   'name': 'kuka',
   'num_replicas': 4,
   'replicas': [{'log_dir': '/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-5ed3'},
    {'log_dir': '/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-ae54'},
    {'log_dir': '/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-e6f1'},
    {'log_dir': '/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-545d'}]}],
 'name': 'replicated-kuka-demo'}

#### Monitoring training

With the following we can list the TFJobs that are currently running in {NAMESPACE} and verify that the above has created the jobs we expected.

In [138]:
!kubectl get tfjobs -n {NAMESPACE}

NAME                  AGE
kuka-0405-1707-545d   4s
kuka-0405-1707-5ed3   12s
kuka-0405-1707-ae54   10s
kuka-0405-1707-e6f1   6s


We can display the IDs and status of all pods in the namespace of our training jobs via the following:

In [148]:
!kubectl get pods -n {NAMESPACE} --show-all

NAME                                      READY     STATUS    RESTARTS   AGE
ambassador-56c8966c67-9pvbm               2/2       Running   0          36d
ambassador-56c8966c67-mvqjc               2/2       Running   0          31d
ambassador-56c8966c67-sch5s               2/2       Running   0          36d
argo-ui-5d7fbb58d4-m6lmm                  1/1       Running   0          31d
jupyter-cwbeitel                          1/1       Running   0          16d
kuka-0405-1707-545d-master-b1pj-0-26qj4   1/1       Running   0          1m
kuka-0405-1707-5ed3-master-oa84-0-td6bq   1/1       Running   0          1m
kuka-0405-1707-ae54-master-h5wu-0-j8zv2   1/1       Running   0          1m
kuka-0405-1707-e6f1-master-wg99-0-9jjz4   1/1       Running   0          1m
nfs-1-provisioner-7497d85d76-j9492        1/1       Running   0          31d
nfs-2-provisioner-59bf96c5d4-g22wx        1/1       Running   0          31d
tf-hub-0                                  1/1       Running   0          36d
tf-

The following can be used to print logs from the last TFJob submitted above:

In [149]:
TRAIN_JOB_NAME

'kuka-0405-1707-545d'

In [150]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'kuka-0405-1707-545d-master-b1pj-0-26qj4'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Start a new run and write summaries and checkpoints to /mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-545d.
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': '/mnt/nfs-1/train_dirs/studies/replicated-kuka-demo/kuka-0405-1707-545d',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7f543588f488>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.AdamOptimizer'

Awesome! It looks like our training job started up just fine. Next we'll want to be able to look at things with TensorBoard.

##### Launching tensorboard

If you're running this demo in `gcr.io/kubeflow/tensorflow-notebook-cpu` or any other environment with the TensorHub extension installed then you're only a few clicks away from monitoring your job with TensorBoard.

But first we need to know that this extension is configured to run the command `tensorboard --logdir /home/jovyan` so in order for our logs on NFS to occur (at least symbollically) with in this tree we'll need to create a symbolic link as follows:

In [None]:
!ln -s {NFS_MOUNT_PATH}/train_dirs /home/jovyan/logs

Now we're ready to open TensorBoard. This can be done from the "New" dropdown on `http://localhost:8000/user/{your-username}/tree` by selecting "Tensorboard" which will open TensorBoard in a new tab. Note that you may need to grant permission for this through your pop-up blocker if you have one.

For reference below we have a plot of the mean_score variable with respect to number of training steps for a study parameterized as above.

![](tboard_mean_score.png)

As you can see there is variability both between and within replicas but some still seem to reach stably high performance.

## Rendering

Let's take a look and see what our agent looks like when it's performing the robotic-arm-control-picking-up-blocks task. This will involve running render jobs in batch that will use parameters from the most recent checkpoint to restore a version of the model and then capture video sequences as the task is performed.

#### Initiating render jobs

Here we'll create a render job for each of the experiments in our study. This will generate multiple MP4 videos of the agent performing the task for each replica run with renders landing in ${LOG_DIR}/render/[some_unique_subdir_name].

In [102]:
import datetime
import uuid
import os

os.chdir(os.path.join(APP_ROOT, "app"))

for experiment in study["experiments"]:
    for replica in experiment["replicas"]:
        
        LOG_DIR = replica["log_dir"]
        IMAGE = experiment["image"]
        
        now=datetime.datetime.now()

        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        RENDER_JOB_NAME="render-" + JOB_SALT
        !ks param set render name {RENDER_JOB_NAME}

        !ks param set render log_dir {LOG_DIR}
        !ks param set render image {IMAGE}
        
        !ks apply default -c render

[34mINFO  [0mParameter 'name' successfully set to '"render-0319-2043-cccf"' for component 'render'
[34mINFO  [0mParameter 'log_dir' successfully set to '"/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1734-9cff"' for component 'render'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:0319-1806-6614"' for component 'render'
[34mINFO  [0mUpdating tfjobs kubeflow.render-0319-2043-cccf
[34mINFO  [0mCreating non-existent tfjobs kubeflow.render-0319-2043-cccf
[34mINFO  [0mParameter 'name' successfully set to '"render-0319-2043-a832"' for component 'render'
[34mINFO  [0mParameter 'log_dir' successfully set to '"/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1734-6554"' for component 'render'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:0319-1806-6614"' for component 'render'
[34mINFO  [0mUpdating tfjobs kubeflow.render-0319-2043-a832
[34mINFO  [0mCreating

It will take about 15min for each render to run. We can see whether the renders are complete by checking the third column in the following:

In [57]:
!kubectl get jobs -n {NAMESPACE} --show-all

NAME                                  DESIRED   SUCCESSFUL   AGE
kuka-0319-1734-6554-master-a43s-0     1         0            14m
kuka-0319-1734-9cff-master-dbpf-0     1         0            14m
kuka-0319-1735-222e-master-raq6-0     1         0            13m
kuka-0319-1735-f24e-master-gx11-0     1         0            14m
render-0319-1748-3227-master-rncq-0   1         0            41s
render-0319-1748-7e6f-master-kv1m-0   1         0            34s
render-0319-1748-9983-master-bdrg-0   1         0            36s
render-0319-1748-d6d4-master-r8g6-0   1         0            39s


First let's get the GCS path of the directory containing renders for an experiment of interest. For a reminder, here's the structure of our study:

In [82]:
study

{'experiments': [{'env': 'KukaBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:0319-1806-6614',
   'name': 'kuka',
   'num_replicas': 4,
   'replicas': [{'log_dir': '/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1734-9cff'},
    {'log_dir': '/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1734-6554'},
    {'log_dir': '/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1735-f24e'},
    {'log_dir': '/mnt/nfs-1/train_dirs/kubeflow-rl/studies/replicated-kuka-demo-1/kuka-0319-1735-222e'}]}],
 'name': 'replicated-kuka-demo-1'}

Let's take a look at the renders for the first replica of the first experiment. If we've run multiple renders each render job will be stored in a separate subdirectory of render/ named according to the render ID, e.g.:

In [188]:
kuka_experiment_logdir = study["experiments"][0]["replicas"][0]["log_dir"]
!ls {kuka_experiment_logdir}/render

0319-1815-77cc	0319-2043-c21d


When OpenAI Gym writes renders to disk it includes a file manifest and some basic statistics. We'll use the following functions to load and display these for all of the renders that have been performed for the log dir we have chosen.

In [186]:
import pprint
def load_gym_metadata_for_render_path(path):
    """Load render manifest and stats from render output path.
    
    path: a path to which renders and metadata was written using an OpenAI Gym monitor call.
    
    """
    manifest = None
    stats = None
    counter = 0
    files = os.listdir(path)
    for filename in files:
        file_path = os.path.join(path, filename)
        if "manifest" in filename:
            with open(file_path, "r") as f:
                manifest = json.loads(f.readline())
        elif "episode_batch" in filename:
            with open(file_path, "r") as f:
                stats = json.loads(f.readline())
                stats["readable_timestamp"] = datetime.datetime.fromtimestamp(
                    float(stats["initial_reset_timestamp"])).strftime('%Y-%m-%d %H:%M:%S')
                    
    return manifest, stats

In [187]:
def list_renders(logdir):
    renders = {}
    for _, render_ids, _ in os.walk(os.path.join(renders_root)):
        for render_id in render_ids:
            render_dir = os.path.join(renders_root, render_id)
            manifest, stats = load_gym_metadata_for_render_path(render_dir)
            renders[render_id] = {
                "manifest": manifest,
                "stats": stats
            }

    return renders

renders = list_renders(kuka_experiment_logdir)
pprint.pprint(renders)

{'0319-1815-77cc': {'manifest': {'env_info': {'env_id': 'KukaBulletEnv-v0',
                                              'gym_version': '0.9.4'},
                                 'stats': 'openaigym.episode_batch.0.45.stats.json',
                                 'videos': [['openaigym.video.0.45.video000000.mp4',
                                             'openaigym.video.0.45.video000000.meta.json'],
                                            ['openaigym.video.0.45.video000001.mp4',
                                             'openaigym.video.0.45.video000001.meta.json'],
                                            ['openaigym.video.0.45.video000002.mp4',
                                             'openaigym.video.0.45.video000002.meta.json'],
                                            ['openaigym.video.0.45.video000003.mp4',
                                             'openaigym.video.0.45.video000003.meta.json'],
                                            ['openaigym.vide

#### Inspecting the result

Lastly we'll pick one of the renders above to display inline in the notebook by providing a render ID and filename (ending in mp4). This file path will then be passed to a snippet that will load and display the content of the video as an HTML widget.

In [None]:
def get_render_path(log_dir, render_id, filename):
    return os.path.join(log_dir, "render", render_id, filename)

render_path = get_render_path(kuka_experiment_logdir, "0319-2043-c21d", "openaigym.video.0.44.video000000.mp4")

In [191]:
import io
import base64
from IPython.display import HTML

video = io.open(render_path, 'rb').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Great job! 🎉🎉🎉

Well that concludes this demonstration. If you have suggestions on how to make it better please feel free to [open an issue](https://github.com/kubeflow/examples) and let us know your feedback.

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments (from the ID fields [here](https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/__init__.py)) and tweet your results! E.g.
    - RacecarBulletEnv-v0
    - MinitaurBulletDuckEnv-v0
    - HalfCheetahBulletEnv-v0
- Take a shot at implementing your own gym learning environment and repeat the above.