## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow).

The task the agent will be learning to perform is to operate a Kuka Robotics arm simulated in the OpenAI Gym Bullet Physics 'KukaBulletEnv-v0' environment. Feel free to [skip to the end](http://localhost:8888/notebooks/kubeflow-rl/apps/agents_ppo/demo.ipynb#Rendering-the-model) to see what this will look like!

### Setup

We need to create a Google Cloud Storage bucket to store job logs as well as a unique subdirectory of that bucket to store logs for this particular run. With the following we first create the GCS bucket then generate the path of a log dir to use in a later step.

Set the variables below to a project and bucket suitable for your use.

In [4]:
# GCP project to use
PROJECT="kubeflow-rl"

# Bucket to use
BUCKET=PROJECT+"-kf"

# K8s cluster to use
CLUSTER="kubeflow-5fc52116"
ZONE="us-east1-d"
NAMESPACE="rl"

# Find the path to the workspace ml-app dir 
import subprocess, os
def find_workspace_root(cwd=None):
    if cwd is None:
        cwd = subprocess.check_output(["pwd"]).strip()
    files = subprocess.check_output(["ls", cwd])
    if "app.yaml" in files:
        return cwd
    else:
        cwd = '/'.join(os.path.split(cwd)[:-1])
        return find_workspace_root(cwd)
APP_ROOT=find_workspace_root()

# Needed for launching tensorboard
SECRET_NAME = "gcp-credentials"

**Attention:** You will need GCP credentials to access the cluster and GCP resources.

If you're running this on your local machine you can authenticate for the GCP project you specified above in the usual way (i.e. `gcloud auth login` followed by `gcloud config set project <your-project-name>`; in this case skip the following step.

If you're running this on JupyterLab you can do the following to provide the right credentials:

- Create a service account with the appropriate roles and download the private key
- Use JupyterLab to upload the service account to your pod
- Set the path to your service account in the cell below and then execute it to activate the service account

In [None]:
KEY_FILE="/Users/cb/Downloads/kubeflow-rl-ec0f4f646339.json"
!gcloud auth activate-service-account --key-file={KEY_FILE}

In [None]:
!gsutil mb -p {PROJECT} gs://{BUCKET}

In [None]:
!gcloud container clusters --project={PROJECT} --zone={ZONE} get-credentials {CLUSTER}

In [None]:
!kubectl create namespace {NAMESPACE}

Download and install ksonnet if needed

In [None]:
!if ! [[ $(which ks) ]]; then mkdir -p ${HOME}/bin && curl -L -o ${HOME}/bin/ks "https://github.com/ksonnet/ksonnet/releases/download/v0.8.0/ks-linux-amd64" && chmod a+rx ${HOME}/bin/ks; fi

If running on GCP (or possibly another Cloud) you probably need to create a key with credentials to use for your job

In [None]:
SECRET_FILE_NAME="secret.json"
!kubectl create -n {NAMESPACE} secret generic {SECRET_NAME} --from-file={SECRET_FILE_NAME}={KEY_FILE}

### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the provided task. Here we'll launch and monitor a job.

#### Launching the TFJob

We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply a TFJob configuration (i.e. run a job). Here you can change the image to be a custom job image, such as one built and deployed with build.sh, or use the one provided here if you only want to change parameters. Below we'll display the templated job YAML for reference.

In [None]:
# Check your cluster and see if that matches one of the existing ksonnet environments
# You want the kubernetes master server to be the same as the server listed for the ks environment
!kubectl cluster-info
!ks env list

In [64]:
image = "gcr.io/kubeflow-rl/agents:tf1.4.1-0208-0959-a276"

study = {"name": "replicated-env-comparison",
         "experiments": [{"name": "kuka",
                          "image": image,
                          "env": "KukaBulletEnv-v0",
                          "num_replicas": 4},
                         {"name": "pendulum",
                          "image": image,
                          "env": "InvertedPendulumBulletEnv-v0",
                          "num_replicas": 4},
#                          {"name": "cartpole",
#                           "image": image,
#                           "env": "CartPoleBulletEnv-v0",
#                           "num_replicas": 4},
#                          {"name": "racecar",
#                           "image": image,
#                           "env": "RacecarBulletEnv-v0",
#                           "num_replicas": 4}
                        ]
        }


STUDY_LOGS_ROOT = "gs://{0}/studies/{1}".format(BUCKET, study["name"])

print(STUDY_LOGS_ROOT)

gs://kubeflow-rl-kf/studies/replicated-env-comparison


In [65]:
import datetime
import uuid
import os
import pprint

os.chdir(APP_ROOT)

print("Preparing study: %s..." % study["name"])

for experiment in study["experiments"]:
    
    print("Preparing experiment: %s" % experiment["name"])
    
    # Get and set the job container image for this experiment
    IMAGE = experiment["image"]
    !ks param set agents image {IMAGE}

    # Set the gym learning environment on which to train
    ENVIRONMENT = experiment["env"]
    !ks param set agents env {ENVIRONMENT}

    # Set the algorithm and network part to use for policy and value networks
    !ks param set agents algorithm "agents.ppo.PPOAlgorithm"
    !ks param set agents network "agents.scripts.networks.feed_forward_gaussian"

    # Run in training mode with 30 CPU and 30 agents for 20M steps
    !ks param set agents run_mode train
    !ks param set agents num_cpu 30
    !ks param set agents num_agents 30
    !ks param set agents steps 15e6

    !ks param set agents update_every 60
    !ks param set agents eval_episodes 25

    for replica_id in range(experiment["num_replicas"]):

        # Construct a unique name for the training job based on experiment["name"]
        now=datetime.datetime.now()
        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        BASE_NAME = experiment["name"]
        TRAIN_JOB_NAME=BASE_NAME + "-" + JOB_SALT
        !ks param set agents name {TRAIN_JOB_NAME}

        # Construct a log dir path for this experiment
        LOG_DIR="{0}/{1}".format(STUDY_LOGS_ROOT, TRAIN_JOB_NAME)
        !ks param set agents log_dir {LOG_DIR}

        if "replicas" not in experiment:
            experiment["replicas"] = []
        experiment["replicas"].append({"log_dir": LOG_DIR})
        
        print("Preparing replica %s of %s for experiment %s" % (replica_id + 1, experiment["replicas"], experiment["name"]))
        !ks apply gke -c agents


Preparing study: replicated-env-comparison...
Preparing experiment: kuka
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:tf1.4.1-0208-0959-a276"' for component 'agents'
[34mINFO  [0mParameter 'env' successfully set to '"KukaBulletEnv-v0"' for component 'agents'
[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents'
[34mINFO  [0mParameter 'network' successfully set to '"agents.scripts.networks.feed_forward_gaussian"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'num_agents' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'steps' successfully set to '15e6' for component 'agents'
[34mINFO  [0mParameter 'update_every' successfully set to '60' for component 'agents'
[34mINFO  [0mParameter 'eval_episode

In [66]:
study

{'experiments': [{'env': 'KukaBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:tf1.4.1-0208-0959-a276',
   'name': 'kuka',
   'num_replicas': 4,
   'replicas': [{'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0208-1010-4f3a'},
    {'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0208-1010-10df'},
    {'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0208-1010-7964'},
    {'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0208-1010-922d'}]},
  {'env': 'InvertedPendulumBulletEnv-v0',
   'image': 'gcr.io/kubeflow-rl/agents:tf1.4.1-0208-0959-a276',
   'name': 'pendulum',
   'num_replicas': 4,
   'replicas': [{'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/pendulum-0208-1010-5e21'},
    {'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/pendulum-0208-1010-9de4'},
    {'log_dir': 'gs://kubeflow-rl-kf/studies/replicated-env-comparison/pendulum-0208-1010-7b7f'},
   

Now we can list tfjobs and see that a job has been created.

In [29]:
!kubectl get tfjobs -n {NAMESPACE}

No resources found.


#### Monitoring training

The IDs, status, and other metadata of pods involved in the training job can be displayed using the following:

In [30]:
!kubectl get pods -n rl --show-all

NAME                                         READY     STATUS        RESTARTS   AGE
racecar-0207-1613-4a0f-master-82d4-0-kpn69   0/1       Terminating   0          6s
racecar-0207-1613-d464-master-eeyb-0-nrtsn   0/1       Terminating   0          6s
tboard-0206-1115-6e4a-tb-58dd946cc6-8dcvk    1/1       Running       0          1d


In [46]:
TRAIN_JOB_NAME

'kuka-mts-tiny-0206-1404-ae16'

Obtain the ID of the master pod and print logs

In [None]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

#### Launching tensorboard

In [74]:
HPARAM_SET="tboard"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TBOARD_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
SECRET_NAME="gcp-credentials"
SECRET_FILE_NAME="secret.json"
NAMESPACE="rl"

!ks param set tensorboard name {TBOARD_JOB_NAME}
!ks param set tensorboard namespace {NAMESPACE}
!ks param set tensorboard log_dir {STUDY_LOGS_ROOT}
!ks param set tensorboard secret {SECRET_NAME}
!ks param set tensorboard secret_file_name {SECRET_FILE_NAME}
!ks show default -c tensorboard

!ks apply gke -c tensorboard

[34mINFO  [0mParameter 'name' successfully set to '"tboard-0208-1157-67b3"' for component 'tensorboard'
[34mINFO  [0mParameter 'namespace' successfully set to '"rl"' for component 'tensorboard'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl-kf/studies/replicated-env-comparison"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret' successfully set to '"gcp-credentials"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret_file_name' successfully set to '"secret.json"' for component 'tensorboard'
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: tboard-0208-1157-67b3-tb
  namespace: rl
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: tensorboard
        tb-job: tboard-0208-1157-67b3
      name: tboard-0208-1157-67b3
      namespace: rl
    spec:
      containers:
      - command:
        - /usr/local/bin/tensorboard
        - --logdir=gs://kubeflow-rl-kf/studies/replicated-env-comparison
        - --po

### Connecting to Tensorboard

To connect to tensorboard use kubectl proxy and then access it and the url given by the URL returned by evaluating the next cell

In [75]:
PROXY_PORT=8001
# url=("http://127.0.0.1:{proxy_port}/api/v1/proxy/namespaces/{namespace}/services/{service_name}:80/".format(
#     proxy_port=PROXY_PORT, namespace=NAMESPACE, service_name=TRAIN_JOB_NAME + "-tb"))
url=("http://127.0.0.1:{proxy_port}/api/v1/proxy/namespaces/{namespace}/services/{service_name}:80/".format(
    proxy_port=PROXY_PORT, namespace=NAMESPACE, service_name=TBOARD_JOB_NAME + "-tb"))
print(url)

http://127.0.0.1:8001/api/v1/proxy/namespaces/rl/services/tboard-0208-1157-67b3-tb:80/


In [None]:
# TODO: Include a screen capture of what tboard looks like for this run

### Deleting jobs

In [None]:
!kubectl delete tfjobs -n {NAMESPACE} {TRAIN_JOB_NAME}

### Rendering the model

#### Initiating render jobs

Here we'll create a render job for each of the experiments in our study.

In [53]:
import datetime
import uuid
import os

os.chdir(APP_ROOT)

for experiment in study["experiments"]:

    for replica_id in experiment["replicas"]:

        LOG_DIR = replica["log_dir"]
        IMAGE = experiment["image"]
        
        now=datetime.datetime.now()

        JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
        RENDER_JOB_NAME="render-" + JOB_SALT
        !ks param set agents_render name {RENDER_JOB_NAME}

        !ks param set agents_render log_dir {LOG_DIR}
        !ks param set agents_render image {IMAGE}
        
        !ks apply gke -c agents

[34mINFO  [0mParameter 'name' successfully set to '"render-0207-1634-7523"' for component 'agents_render'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0207-1626-2caf"' for component 'agents_render'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:tf1.4.1-0207-1615-e05a"' for component 'agents_render'
---
apiVersion: tensorflow.org/v1alpha1
kind: TfJob
metadata:
  name: render-0207-1634-7523
  namespace: rl
spec:
  replicaSpecs:
  - replicas: 1
    template:
      spec:
        containers:
        - args:
          - --run_mode=render
          - --logdir=gs://kubeflow-rl-kf/studies/replicated-env-comparison/kuka-0207-1626-2caf
          - --num_agents=1
          image: gcr.io/kubeflow-rl/agents:tf1.4.1-0207-1615-e05a
          name: tensorflow
          resources:
            limits:
              cpu: 4
            requests:
              cpu: 4
        restartPolicy: OnFailure
  

[34mINFO  [0mParameter 'name' successfully set to '"render-0207-1635-4dff"' for component 'agents_render'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl-kf/studies/replicated-env-comparison/cartpole-0207-1626-98d4"' for component 'agents_render'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:tf1.4.1-0207-1615-e05a"' for component 'agents_render'
---
apiVersion: tensorflow.org/v1alpha1
kind: TfJob
metadata:
  name: render-0207-1635-4dff
  namespace: rl
spec:
  replicaSpecs:
  - replicas: 1
    template:
      spec:
        containers:
        - args:
          - --run_mode=render
          - --logdir=gs://kubeflow-rl-kf/studies/replicated-env-comparison/cartpole-0207-1626-98d4
          - --num_agents=1
          image: gcr.io/kubeflow-rl/agents:tf1.4.1-0207-1615-e05a
          name: tensorflow
          resources:
            limits:
              cpu: 4
            requests:
              cpu: 4
        restartPolicy: OnFa

In [53]:
%%bash
kubectl get pods -n rl --show-all

NAME                                               READY     STATUS             RESTARTS   AGE
kuka-mts-tiny-0206-1404-ae16-master-2vgb-0-4nv4g   0/1       CrashLoopBackOff   3          2m
kuka-mts-tiny-0206-1406-29a2-master-tx86-0-2k9g6   0/1       Error              1          25s
tboard-0206-1115-6e4a-tb-58dd946cc6-8dcvk          1/1       Running            0          2h


Now let's take a look at one of those renders:

In [None]:
kuka_experiment = study["experiments"][0]
kuka_experiment
!gsutil ls {LOG_DIR}/render

In [8]:
!mkdir -p /tmp/agents-render
RENDER_DIR="gs://kubeflow-rl-kf/jobs/agents/pybullet-kuka-0205-1949-e9eb/render/0206-1637-f3a1"
!gsutil cp `gsutil ls {RENDER_DIR} | grep mp4 | head -n4 | tail -n1` /tmp/agents-render/render.mp4

Copying gs://kubeflow-rl-kf/jobs/agents/pybullet-kuka-0205-1949-e9eb/render/0206-1637-f3a1/openaigym.video.0.44.video000003.mp4...
- [1 files][223.9 KiB/223.9 KiB]                                                
Operation completed over 1 objects/223.9 KiB.                                    


#### Inspecting the result

When the job is complete there will be a subdirectory of the log dir named "render" with a number of short videos of episodes of the agent performing the grasping task. Here's an example of what one of those looks like in a well-trained model.

In [9]:
import io
import base64
from IPython.display import HTML

mp4_path = '/tmp/agents-render/render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Great job! 🎉🎉🎉

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments (from the ID fields [here](https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/__init__.py)) and tweet your results! E.g.
    - RacecarBulletEnv-v0
    - MinitaurBulletDuckEnv-v0
    - HalfCheetahBulletEnv-v0
- Take a shot at implementing your own gym learning environment and repeat the above.