## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow).

The task the agent will be learning to perform is to operate a Kuka Robotics arm simulated in the OpenAI Gym Bullet Physics 'KukaBulletEnv-v0' environment. Feel free to [skip to the end](http://localhost:8888/notebooks/kubeflow-rl/apps/agents_ppo/demo.ipynb#Rendering-the-model) to see what this will look like!

### Setup

We need to create a Google Cloud Storage bucket to store job logs as well as a unique subdirectory of that bucket to store logs for this particular run. With the following we first create the GCS bucket then generate the path of a log dir to use in a later step.

Set the variables below to a project and bucket suitable for your use.

In [24]:
# GCP project to use
PROJECT="kubeflow-rl"

# Bucket to use
BUCKET=PROJECT+"-kf"

# K8s cluster to use
CLUSTER="kubeflow-5fc52116"
ZONE="us-east1-d"
NAMESPACE="rl"

# Find the path to the workspace ml-app dir 
import subprocess, os
def find_workspace_root(cwd=None):
    if cwd is None:
        cwd = subprocess.check_output(["pwd"]).strip()
    files = subprocess.check_output(["ls", cwd])
    if "WORKSPACE" in files:
        return cwd
    else:
        cwd = '/'.join(os.path.split(cwd)[:-1])
        return find_workspace_root(cwd)
ML_APP_DIR=os.path.join(find_workspace_root(), "ml-app")

# Needed for launching tensorboard
SECRET_NAME = "gcp-credentials"

**Attention:** You will need GCP credentials to access the cluster and GCP resources.

If you're running this on your local machine you can authenticate for the GCP project you specified above in the usual way (i.e. `gcloud auth login` followed by `gcloud config set project <your-project-name>`; in this case skip the following step.

If you're running this on JupyterLab you can do the following to provide the right credentials:

- Create a service account with the appropriate roles and download the private key
- Use JupyterLab to upload the service account to your pod
- Set the path to your service account in the cell below and then execute it to activate the service account

In [18]:
KEY_FILE="/Users/cb/Downloads/kubeflow-rl-ec0f4f646339.json"
!gcloud auth activate-service-account --key-file={KEY_FILE}

Activated service account credentials for: [cwbeitel-kubeflow-rl@kubeflow-rl.iam.gserviceaccount.com]


In [19]:
!gsutil mb -p {PROJECT} gs://{BUCKET}

Creating gs://kubeflow-rl-kf/...
AccessDeniedException: 403 cwbeitel-kubeflow-rl@kubeflow-rl.iam.gserviceaccount.com does not have storage.buckets.create access to project 991277910492.


In [20]:
!gcloud container clusters --project={PROJECT} --zone={ZONE} get-credentials {CLUSTER}

Fetching cluster endpoint and auth data.
[1;31mERROR:[0m (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission for "projects/kubeflow-rl/zones/us-east1-d/clusters/kubeflow-5fc52116".


In [3]:
!kubectl create namespace {NAMESPACE}

Error from server (AlreadyExists): namespaces "rl" already exists


Download and install ksonnet if needed

In [11]:
!if ! [[ $(which ks) ]]; then mkdir -p ${HOME}/bin && curl -L -o ${HOME}/bin/ks "https://github.com/ksonnet/ksonnet/releases/download/v0.8.0/ks-linux-amd64" && chmod a+rx ${HOME}/bin/ks; fi

If running on GCP (or possibly another Cloud) you probably need to create a key with credentials to use for your job

In [57]:
SECRET_FILE_NAME="secret.json"
!kubectl create -n {NAMESPACE} secret generic {SECRET_NAME} --from-file={SECRET_FILE_NAME}={KEY_FILE}

secret "gcp-credentials" created


### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the provided task. Here we'll launch and monitor a job.

#### Launching the TFJob

We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply a TFJob configuration (i.e. run a job). Here you can change the image to be a custom job image, such as one built and deployed with build.sh, or use the one provided here if you only want to change parameters. Below we'll display the templated job YAML for reference.

In [4]:
# Check your cluster and see if that matches one of the existing ksonnet environments
# You want the kubernetes master server to be the same as the server listed for the ks environment
!kubectl cluster-info
!ks env list

[0;32mKubernetes master[0m is running at [0;33mhttps://35.185.119.177[0m
[0;32mGLBCDefaultBackend[0m is running at [0;33mhttps://35.185.119.177/api/v1/namespaces/kube-system/services/default-http-backend/proxy[0m
[0;32mHeapster[0m is running at [0;33mhttps://35.185.119.177/api/v1/namespaces/kube-system/services/heapster/proxy[0m
[0;32mKubeDNS[0m is running at [0;33mhttps://35.185.119.177/api/v1/namespaces/kube-system/services/kube-dns/proxy[0m
[0;32mkubernetes-dashboard[0m is running at [0;33mhttps://35.185.119.177/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy[0m

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[31mERROR [0mlstat /Users/cb/environments: no such file or directory


In [6]:
import datetime
import uuid
import os

os.chdir(ML_APP_DIR)

HPARAM_SET="pybullet-kuka"

now=datetime.datetime.now()

JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TRAIN_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
LOG_DIR="gs://{0}/jobs/{1}".format(BUCKET, TRAIN_JOB_NAME)

IMAGE="gcr.io/kubeflow-rl/agents:agents-0202-1020-01ec"

!ks param set agents env "KukaBulletEnv-v0"

!ks param set agents run_mode train
!ks param set agents gcp_project kubeflow-rl
!ks param set agents num_cpu 31
!ks param set agents num_agents 30
!ks param set agents sync_replicas False
!ks param set agents steps 4e7
!ks param set agents update_every 30
!ks param set agents max_length 1000
!ks param set agents eval_episodes 25

# Trigger an async render job every 10 minutes
!ks param set agents render_secs 600

!ks param set agents algorithm "agents.algorithms.ppo.ppo.PPO"
!ks param set agents network "agents.scripts.networks.feed_forward_gaussian"

!ks param set agents job_tag {JOB_SALT}
!ks param set agents logdir {LOG_DIR}
!ks param set agents name {TRAIN_JOB_NAME}
!ks param set agents image {IMAGE}
!ks show default -c agents

!ks apply gke -c agents

[34mINFO  [0mParameter 'env' successfully set to '"KukaBulletEnv-v0"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'gcp_project' successfully set to '"kubeflow-rl"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '31' for component 'agents'
[34mINFO  [0mParameter 'num_agents' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'sync_replicas' successfully set to '"False"' for component 'agents'
[34mINFO  [0mParameter 'steps' successfully set to '4e7' for component 'agents'
[34mINFO  [0mParameter 'update_every' successfully set to '30' for component 'agents'
[34mINFO  [0mParameter 'max_length' successfully set to '1000' for component 'agents'
[34mINFO  [0mParameter 'eval_episodes' successfully set to '25' for component 'agents'
[34mINFO  [0mParameter 'render_secs' successfully set to '600' for component 'agents'
[34mINFO  [0mParam

Now we can list tfjobs and see that a job has been created.

In [7]:
!kubectl get tfjobs -n {NAMESPACE} -o yaml {TRAIN_JOB_NAME}

apiVersion: tensorflow.org/v1alpha1
kind: TfJob
metadata:
  clusterName: ""
  creationTimestamp: 2018-02-02T18:53:36Z
  generation: 0
  name: pybullet-kuka-0202-1053-3a98
  namespace: rl
  resourceVersion: "4043397"
  selfLink: /apis/tensorflow.org/v1alpha1/namespaces/rl/tfjobs/pybullet-kuka-0202-1053-3a98
  uid: 5ff5fc69-084a-11e8-b604-42010af00218
spec:
  RuntimeId: zu80
  replicaSpecs:
  - IsDefaultPS: false
    replicas: 1
    template:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - args:
          - --run_mode=train
          - --logdir=gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98
          - --hparam_set_id=pybullet_kuka_ff
          - --run_base_tag=0e90193e
          - --sync_replicas=False
          - --num_gpus=0
          - --algorithm=agents.algorithms.ppo.ppo.PPO
          - --num_agents=30
          - --eval_episodes=25
          - --env=KukaBulletEnv-v0
          - --max_length=1000
    

#### Monitoring training

The IDs, status, and other metadata of pods involved in the training job can be displayed using the following:

In [8]:
!kubectl get pods -n rl --show-all

NAME                                               READY     STATUS        RESTARTS   AGE
pybullet-kuka-0202-1044-fe76-master-v79e-0-79mc2   1/1       Terminating   0          9m
pybullet-kuka-0202-1052-f773-master-qcpn-0-f8brv   1/1       Terminating   0          1m
pybullet-kuka-0202-1053-3a98-master-zu80-0-cdn9s   1/1       Running       0          2s
ubuntu                                             1/1       Running       0          9d


Logs from a specific pod can be displayed with the following (or streamed by adding the --follow flag):

In [9]:
!kubectl -n {NAMESPACE} get pods -o yaml pybullet-kuka-0202-1053-3a98-master-zu80-0-cdn9s

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"rl","name":"pybullet-kuka-0202-1053-3a98-master-zu80-0","uid":"5ffa2649-084a-11e8-b604-42010af00218","apiVersion":"batch","resourceVersion":"4043392"}}
  creationTimestamp: 2018-02-02T18:53:36Z
  generateName: pybullet-kuka-0202-1053-3a98-master-zu80-0-
  labels:
    controller-uid: 5ffa2649-084a-11e8-b604-42010af00218
    job-name: pybullet-kuka-0202-1053-3a98-master-zu80-0
    job_type: MASTER
    runtime_id: zu80
    task_index: "0"
    tensorflow.org: ""
    tf_job_name: pybullet-kuka-0202-1053-3a98
  name: pybullet-kuka-0202-1053-3a98-master-zu80-0-cdn9s
  namespace: rl
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: pybullet-kuka-0202-1053-3a98-master-zu80-0
    uid: 5ffa2649-084a-11e8-b604-42010af00218
  resourceV

In [10]:
TRAIN_JOB_NAME

'pybullet-kuka-0202-1053-3a98'

Obtain the ID of the master pod and print logs

In [17]:
import subprocess
TRAIN_JOB_NAME="pybullet-kuka-0202-1053-3a98"
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'pybullet-kuka-0202-1053-3a98-master-zu80-0-cdn9s'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Start a new run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98.
{'algorithm': <class 'agents.algorithms.ppo.ppo.PPO'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7ff0334ae938>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.AdamOpt

2018-02-02 18:59:14.574057: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0954871774]
2018-02-02 18:59:14.574331: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [5.21509401e-06]
2018-02-02 18:59:14.584562: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00536976848]
2018-02-02 18:59:14.584691: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 18:59:17.230531: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.195469692][-0.357547522]
2018-02-02 18:59:17.230651: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [3.81992641e-06]
2018-02-02 18:59:19.758928: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.286263466]
2018-02-02 18:59:19.758954: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.115149826]
2018-02-02 18:59:19.759241: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [3.47672926e-06]
2018-02-02 18:59:19.769215: I tensorflow/core/kerne

2018-02-02 19:09:55.200745: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.267291844]
2018-02-02 19:09:55.200745: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0852061212]
2018-02-02 19:09:55.205789: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [2.09618347e-13]
2018-02-02 19:09:55.221462: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0036235291]
2018-02-02 19:09:55.221624: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 650010, global step 1460010).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 650010, global step 1460010).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 810000, global step 1485000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 810000, global step 1485000)

Phase eval (phase step 800010, global step 1790010).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 800010, global step 1790010).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 990000, global step 1815000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 990000, global step 1815000).
2018-02-02 19:14:44.118432: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.317059517][-0.939750791]
2018-02-02 19:14:44.118532: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [4.9691138e-05]
2018-02-02 19:14:46.510082: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.136948913]
2018-02-02 19:14:46.510088: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.216507345]
2018-02-02 19:14:46.510428: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [6.30382433e-17]
2018-02-02 19:14:46.520556: I 

2018-02-02 19:30:38.743090: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.015271985]
2018-02-02 19:30:38.743234: I tensorflow/core/kernels/logging_ops.cc:79] increase penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 1350000, global step 3000000).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 1350000, global step 3000000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 1650020, global step 3025020).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 1650020, global step 3025020).
2018-02-02 19:31:01.635995: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.405077934][-0.0922234952]
2018-02-02 19:31:01.636131: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [5.31222358e-05]
2018-02-02 19:31:04.338249: I tensorflow/core/kernels/logging_ops.cc:

2018-02-02 19:48:30.469129: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00657164445]
2018-02-02 19:48:30.469155: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.305036634]
2018-02-02 19:48:30.469438: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [8.94109729e-33]
2018-02-02 19:48:30.479478: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00958037842]
2018-02-02 19:48:35.395780: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.268074244][0.50729692]
2018-02-02 19:48:35.395875: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-7.52690612e-05]
2018-02-02 19:48:37.850721: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.299663693]
2018-02-02 19:48:37.850721: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00379255693]
2018-02-02 19:48:37.850942: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [8.94109729e-33]
2018-02-02 19:48:37.861113: I tensorflow/c

2018-02-02 19:54:57.186825: I tensorflow/core/kernels/logging_ops.cc:79] increase penalty [0]
2018-02-02 19:55:03.310159: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.283685356][0.24664259]
2018-02-02 19:55:03.310263: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [4.02623809e-05]
2018-02-02 19:55:03.328991: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 19:55:03.422998: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:03.533004: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:03.645436: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:03.747281: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:03.847582: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:03.949469: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 19:55:05.877002: I tensorflow/core/kern

2018-02-02 20:29:08.489183: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00944741815]
2018-02-02 20:29:14.802887: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.230364919][0.132991582]
2018-02-02 20:29:14.802981: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-2.36327178e-05]
2018-02-02 20:29:17.384867: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.134470209]
2018-02-02 20:29:17.384867: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.256918132]
2018-02-02 20:29:17.385198: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 20:29:17.395440: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00212047272]
2018-02-02 20:29:17.395578: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 3000000, global step 6630000).
INFO:tensorflow:
-----------------------------

--------------------------------------------------
Phase train (phase step 4020010, global step 7370010).
2018-02-02 20:43:42.208068: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.26285395][0.0290236808]
2018-02-02 20:43:42.208189: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-6.93154334e-07]
2018-02-02 20:43:44.959828: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.131670848]
2018-02-02 20:43:44.959850: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0540414751]
2018-02-02 20:43:44.960187: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 20:43:44.969375: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00331098936]
2018-02-02 20:43:44.969549: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 20:43:50.534795: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.230599493][-0.00779377716]
2018-02-02 20:43:50.534918: I tensorflow/

2018-02-02 21:04:37.990212: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 21:04:38.101670: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 21:04:38.220295: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 21:04:38.330214: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 21:04:38.433716: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 21:04:38.525294: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00715286331]
2018-02-02 21:04:38.525305: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.00752672786]
2018-02-02 21:04:38.525764: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:04:38.535552: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00544620957]
2018-02-02 21:04:38.535683: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 21:04:42.516550: I tensorflow/core/kernels/logg

2018-02-02 21:20:35.190105: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:20:35.201462: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0110295657]
2018-02-02 21:20:41.195869: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.247830257][0.0319034122]
2018-02-02 21:20:41.195999: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [2.0436335e-05]
2018-02-02 21:20:43.901141: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0903443545]
2018-02-02 21:20:43.901141: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.230923355]
2018-02-02 21:20:43.901655: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:20:43.910733: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00020276432]
2018-02-02 21:20:43.910904: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 21:20:49.157512: I tensorflow/core/kernels/logging_ops.cc:79] retur

2018-02-02 21:36:47.294883: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:36:47.303928: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00410241261]
2018-02-02 21:36:47.304068: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 21:36:51.409080: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.26916787][-0.216700524]
2018-02-02 21:36:51.409240: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [1.62967681e-05]
2018-02-02 21:36:54.122981: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000288262148]
2018-02-02 21:36:54.123093: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0529118255]
2018-02-02 21:36:54.123405: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:36:54.132147: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00184849917]
2018-02-02 21:36:54.132280: I tensorflow/core/kernels/logging_ops.cc:79] d

2018-02-02 21:49:40.742034: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.282291025][-0.256064683]
2018-02-02 21:49:40.742142: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-1.11310319e-05]
2018-02-02 21:49:43.480368: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000360182632]
2018-02-02 21:49:43.480445: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0279867835]
2018-02-02 21:49:43.480874: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 21:49:43.490496: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.000966904219]
2018-02-02 21:49:43.490633: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 21:49:48.038085: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.287356883][-0.261728168]
2018-02-02 21:49:48.038182: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [5.44319164e-06]
2018-02-02 21:49:50.693660: I ten

2018-02-02 22:03:13.824712: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:03:13.833557: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00218604156]
2018-02-02 22:03:13.833712: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 22:03:17.322919: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.296214253][-0.207344532]
2018-02-02 22:03:17.323027: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [8.8704428e-06]
2018-02-02 22:03:20.054967: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000152226945]
2018-02-02 22:03:20.054970: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0754663348]
2018-02-02 22:03:20.055453: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:03:20.065805: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00611503888]
2018-02-02 22:03:20.065974: I tensorflow/core/kernels/logging_ops.cc:79] d

2018-02-02 22:13:12.322172: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000458887807]
2018-02-02 22:13:12.322552: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:13:12.337954: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0157861747]
2018-02-02 22:13:12.338110: I tensorflow/core/kernels/logging_ops.cc:79] increase penalty [0]
2018-02-02 22:13:15.782006: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.299086481][-0.471434891]
2018-02-02 22:13:15.782119: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-3.8535436e-05]
2018-02-02 22:13:18.514329: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000361238344]
2018-02-02 22:13:18.514338: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.191375241]
2018-02-02 22:13:18.514818: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:13:18.523701: I tensorflow/core/kernels/logging_ops.cc:79] k

2018-02-02 22:21:08.180875: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.287435889]
2018-02-02 22:21:08.186175: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:21:08.197613: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00538725592]
2018-02-02 22:21:08.197751: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 22:21:13.412676: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.368635416][-0.738683879]
2018-02-02 22:21:13.412798: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [2.78031039e-05]
2018-02-02 22:21:16.048719: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.12931174]
2018-02-02 22:21:16.048738: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0871603936]
2018-02-02 22:21:16.049299: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:21:16.063211: I tensorflow/core/kernels/logging_ops.cc:79] kl ch

2018-02-02 22:29:53.083903: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:29:53.190327: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:29:53.306107: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 22:29:53.414421: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 22:29:54.252734: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.0231122132]
2018-02-02 22:29:54.252742: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.270178854]
2018-02-02 22:29:54.253127: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:29:54.262411: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00892410707]
2018-02-02 22:30:00.049746: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.349620938][-0.744181335]
2018-02-02 22:30:00.049859: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [3.20551553e-05]
2018-02-02 2

2018-02-02 22:39:25.782588: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.36224547][-0.636083484]
2018-02-02 22:39:25.782687: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-2.3047065e-05]
2018-02-02 22:39:28.330665: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.150071129]
2018-02-02 22:39:28.330706: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.150163636]
2018-02-02 22:39:28.331065: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:39:28.344618: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0137295304]
2018-02-02 22:39:28.344764: I tensorflow/core/kernels/logging_ops.cc:79] increase penalty [0]
2018-02-02 22:39:32.679115: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.322449684][-0.765838206]
2018-02-02 22:39:32.679278: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [2.78261177e-05]
2018-02-02 22:39:34.272080: I tensorflow/c

2018-02-02 22:48:33.175717: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.315401614][-0.55364722]
2018-02-02 22:48:33.175829: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-1.11373902e-05]
2018-02-02 22:48:33.804427: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 22:48:33.903123: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:48:34.005122: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:48:34.117691: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:48:34.222148: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 22:48:34.637652: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 22:48:34.740776: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:48:34.839655: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [4]
2018-02-02 22:48:34.947219: I tensorflow/core/kernels/

successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successful

2018-02-02 22:58:43.674010: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [4]
2018-02-02 22:58:43.777998: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [4]
2018-02-02 22:58:43.880572: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 22:58:43.993085: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 22:58:44.087917: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.136125103]
2018-02-02 22:58:44.087948: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.191504404]
2018-02-02 22:58:44.088419: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 22:58:44.104966: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0127277011]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 6775020, global step 14935020).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 6775020, global step 14935020

2018-02-02 23:07:21.725445: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.311476707][-0.691235244]
2018-02-02 23:07:21.725576: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [0.000102823127]
2018-02-02 23:07:24.574130: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00444578752]
2018-02-02 23:07:24.574168: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.24616459]
2018-02-02 23:07:24.583234: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:07:24.594702: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0105993245]
2018-02-02 23:07:29.866634: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.302982688][-0.663573802]
2018-02-02 23:07:29.866755: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-9.71379632e-05]
2018-02-02 23:07:30.206491: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:07:30.316544: I tensorflow/core

2018-02-02 23:12:39.738499: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.289257944][-0.606097639]
2018-02-02 23:12:39.738600: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-5.76782213e-07]
2018-02-02 23:12:42.551544: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00043753616]
2018-02-02 23:12:42.551582: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.268144727]
2018-02-02 23:12:42.552021: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:12:42.562871: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00858636945]
2018-02-02 23:12:46.470344: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.298841536][-0.605040967]
2018-02-02 23:12:46.470491: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-5.19610076e-05]
2018-02-02 23:12:49.219194: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000846914132]
2018-02-02 23:12:49.219216: 

2018-02-02 23:18:42.870271: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-2.33478545e-06]
2018-02-02 23:18:45.088669: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:18:45.182474: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:18:45.284894: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 23:18:45.399321: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:18:45.482129: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.124889687]
2018-02-02 23:18:45.482129: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0387751646]
2018-02-02 23:18:45.482494: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:18:45.492775: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0122483019]
2018-02-02 23:18:50.293637: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.271086842][-0.454357296]
2018-02-02 2

successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successful

2018-02-02 23:27:47.039410: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:27:47.142530: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:27:47.248375: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 23:27:47.346353: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:27:47.454112: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:27:47.555306: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:27:47.650342: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:27:47.746923: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:27:47.836782: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0148014696]
2018-02-02 23:27:47.836753: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.375497699]
2018-02-02 23:27:47.837058: I tensorflow/core/kernels/logging_ops.cc:79] current penal

2018-02-02 23:32:32.414067: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00692634797]
2018-02-02 23:32:32.414218: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 7650000, global step 16860000).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 7650000, global step 16860000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 9210020, global step 16885020).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 9210020, global step 16885020).
2018-02-02 23:32:51.014702: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.276031315][-0.673824191]
2018-02-02 23:32:51.014821: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-7.73417196e-06]
2018-02-02 23:32:52.557460: I tensorflow/core/kernels/logging_o

2018-02-02 23:35:52.044441: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:35:52.144018: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:35:52.244658: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:35:52.335124: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.247009888]
2018-02-02 23:35:52.335197: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0573866144]
2018-02-02 23:35:52.335690: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:35:52.346044: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0119623486]
2018-02-02 23:35:59.019186: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.331325769][-0.610848546]
2018-02-02 23:35:59.019288: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [1.87454225e-06]
2018-02-02 23:36:01.094586: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23

successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successfully grasped a block!!!
successful

2018-02-02 23:47:35.173786: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-1.52233124e-05]
2018-02-02 23:47:36.032909: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [9]
2018-02-02 23:47:36.135981: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [10]
2018-02-02 23:47:36.239907: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-02 23:47:36.926149: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-02 23:47:37.054081: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:47:37.153296: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [8]
2018-02-02 23:47:37.261259: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [8]
2018-02-02 23:47:37.373693: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [6]
2018-02-02 23:47:37.476119: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-02 23:47:37.569466: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! 

2018-02-02 23:54:15.053919: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:54:15.062715: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00670346105]
2018-02-02 23:54:15.062876: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-02 23:54:20.438019: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.0560243502][-0.312948048]
2018-02-02 23:54:20.438129: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-1.27583189e-05]
2018-02-02 23:54:23.126885: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.769846499]
2018-02-02 23:54:23.126978: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0805598572]
2018-02-02 23:54:23.127237: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-02 23:54:23.136570: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00488104764]
2018-02-02 23:54:23.136709: I tensorflow/core/kernels/logging_ops.cc:79] d

2018-02-03 00:00:41.610603: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-03 00:00:41.715518: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-03 00:00:41.818303: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [4]
2018-02-03 00:00:41.914567: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [3]
2018-02-03 00:00:42.012800: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-03 00:00:42.111832: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:00:42.218111: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:00:42.298172: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.370822608]
2018-02-03 00:00:42.298214: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0159595944]
2018-02-03 00:00:42.298551: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:00:42.309502: I tensorflow/core/kernels/logging_ops.cc:79] kl cha

2018-02-03 00:08:10.972459: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [2.62451181e-06]
2018-02-03 00:08:13.478166: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00803325139]
2018-02-03 00:08:13.478166: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.043941725]
2018-02-03 00:08:13.478413: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:08:13.495439: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00236220378]
2018-02-03 00:08:13.495581: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 8550000, global step 18840000).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 8550000, global step 18840000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 10290020, global step 18865020).
INFO:tensorfl

--------------------------------------------------
Phase train (phase step 10470020, global step 19195020).
2018-02-03 00:14:16.777493: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.274140179][-0.331187904]
2018-02-03 00:14:16.777619: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [1.00151065e-05]
2018-02-03 00:14:19.679497: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000881535874]
2018-02-03 00:14:19.679528: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0157908835]
2018-02-03 00:14:19.679958: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:14:19.695675: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00451852428]
2018-02-03 00:14:19.695850: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-03 00:14:22.766661: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.272028148][-0.31178996]
2018-02-03 00:14:22.766754: I tensorflow

2018-02-03 00:20:38.780820: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:20:38.883634: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:20:38.988039: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:20:39.440870: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0299307946]
2018-02-03 00:20:39.440875: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.106952131]
2018-02-03 00:20:39.445994: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:20:39.464968: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00839829072]
2018-02-03 00:20:43.928738: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.284059107][-0.281150252]
2018-02-03 00:20:43.928845: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-8.22448726e-07]
2018-02-03 00:20:46.810810: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.124310635]

2018-02-03 00:27:35.109296: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.00277085672]
2018-02-03 00:27:35.109359: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [0.0564997867]
2018-02-03 00:27:35.114686: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:27:35.130469: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00426436681]
2018-02-03 00:27:35.130702: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 9075000, global step 19995000).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 9075000, global step 19995000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 10920020, global step 20020020).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 10920020, global step 20020020)

2018-02-03 00:34:32.992024: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:34:33.111085: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:34:33.215029: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000319794635]
2018-02-03 00:34:33.215047: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.00130177755]
2018-02-03 00:34:33.215494: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:34:33.224774: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0104411924]
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 9275010, global step 20435010).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 9275010, global step 20435010).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 11160000, global step 20460000).
INFO:tensorflow:
----------------------

2018-02-03 00:40:48.380023: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-03 00:40:51.643254: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.275353909][-0.237977326]
2018-02-03 00:40:51.643371: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [7.31913246e-07]
2018-02-03 00:40:54.419312: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:40:54.529345: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000188208927]
2018-02-03 00:40:54.529407: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0459408239]
2018-02-03 00:40:54.529780: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:40:54.540275: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0125636077]
2018-02-03 00:40:57.850125: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.275451779][-0.233227536]
2018-02-03 00:40:57.850222: I tensorflow/core/kernels/loggin

Phase eval (phase step 9650010, global step 21260010).
INFO:tensorflow:
--------------------------------------------------
Phase eval (phase step 9650010, global step 21260010).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 11610000, global step 21285000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 11610000, global step 21285000).
2018-02-03 00:46:28.837641: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.279120147][-0.205362603]
2018-02-03 00:46:28.837759: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-2.26572665e-05]
2018-02-03 00:46:31.332936: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-03 00:46:31.456660: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [5]
2018-02-03 00:46:31.576370: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [1]
2018-02-03 00:46:31.687612: I tensorflow/core/kernels/loggin

--------------------------------------------------
Phase train (phase step 11850010, global step 21725010).
2018-02-03 00:53:05.104958: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.280502856][-0.210607037]
2018-02-03 00:53:05.105085: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-2.39898673e-05]
2018-02-03 00:53:07.900181: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000105762527]
2018-02-03 00:53:07.900189: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0723978654]
2018-02-03 00:53:07.900556: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:53:07.916865: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.00319240801]
2018-02-03 00:53:07.917093: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-03 00:53:10.407704: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.280264616][-0.210547656]
2018-02-03 00:53:10.407803: I tensorf

2018-02-03 00:58:22.705823: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-3.16507976e-05]
2018-02-03 00:58:25.460478: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-03 00:58:25.590742: I tensorflow/core/kernels/logging_ops.cc:79] kl cutoff! [2]
2018-02-03 00:58:25.685236: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [8.78626e-05]
2018-02-03 00:58:25.685266: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.119917586]
2018-02-03 00:58:25.685683: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 00:58:25.704623: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0174744818]
2018-02-03 00:58:25.704809: I tensorflow/core/kernels/logging_ops.cc:79] increase penalty [0]
2018-02-03 00:58:27.887408: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.285906762][-0.190020472]
2018-02-03 00:58:27.887520: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: 

--------------------------------------------------
Phase train (phase step 12240000, global step 22440000).
2018-02-03 01:03:53.836754: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.290788919][-0.185660839]
2018-02-03 01:03:53.836906: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [4.16903167e-05]
2018-02-03 01:03:56.973137: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000120710945]
2018-02-03 01:03:56.973178: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0992927551]
2018-02-03 01:03:56.977699: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 01:03:56.988647: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.0059877797]
2018-02-03 01:03:56.988827: I tensorflow/core/kernels/logging_ops.cc:79] decrease penalty [0]
2018-02-03 01:04:00.675435: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.292191595][-0.188124701]
2018-02-03 01:04:00.675566: I tensorflo

--------------------------------------------------
Phase eval (phase step 10325010, global step 22745010).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 12420000, global step 22770000).
INFO:tensorflow:
--------------------------------------------------
Phase train (phase step 12420000, global step 22770000).
2018-02-03 01:09:23.773733: I tensorflow/core/kernels/logging_ops.cc:79] return and value: [-0.318865389][-0.208427191]
2018-02-03 01:09:23.773845: I tensorflow/core/kernels/logging_ops.cc:79] normalized advantage: [-6.44938154e-06]
2018-02-03 01:09:26.866150: I tensorflow/core/kernels/logging_ops.cc:79] value loss: [0.000354202057]
2018-02-03 01:09:26.866216: I tensorflow/core/kernels/logging_ops.cc:79] policy loss: [-0.0358680785]
2018-02-03 01:09:26.866643: I tensorflow/core/kernels/logging_ops.cc:79] current penalty: [0]
2018-02-03 01:09:26.884976: I tensorflow/core/kernels/logging_ops.cc:79] kl change: [0.01136728]
2

#### Debug

Re-try with last version of agents known to work propperly: 459c4f88ece996eac3489e6e97a6ee0b30bdd6b3

In [79]:
IMAGE="gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"

HPARAM_SET="pybullet-kuka"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TRAIN_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
LOG_DIR="gs://{0}/jobs/{1}".format(BUCKET, TRAIN_JOB_NAME)

!ks param set agents algorithm "agents.ppo.PPOAlgorithm"
!ks param set agents run_mode train
!ks param set agents num_cpu 31
!ks param set agents job_tag {JOB_SALT}
!ks param set agents logdir {LOG_DIR}
!ks param set agents name {TRAIN_JOB_NAME}
!ks param set agents image {IMAGE}
!ks param set agents save_checkpoint_secs 600
!ks apply gke -c agents

[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '31' for component 'agents'
[34mINFO  [0mParameter 'job_tag' successfully set to '"0205-1044-6f0c"' for component 'agents'
[34mINFO  [0mParameter 'logdir' successfully set to '"gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1044-6f0c"' for component 'agents'
[34mINFO  [0mParameter 'name' successfully set to '"pybullet-kuka-0205-1044-6f0c"' for component 'agents'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"' for component 'agents'
[34mINFO  [0mParameter 'save_checkpoint_secs' successfully set to '600' for component 'agents'
[34mINFO  [0mUpdating tfjobs rl.pybullet-kuka-0205-1044-6f0c
[34mINFO  [0mCreating non-existent tfjobs rl.pybullet-kuka-0205-1044-6f0c


In [80]:
TRAIN_JOB_NAME

'pybullet-kuka-0205-1044-6f0c'

In [86]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'pybullet-kuka-0205-1044-6f0c-master-8li0-0-pmpqz'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Start a new run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1044-6f0c.
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1044-6f0c',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7fb1c6cbf848>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.Ada

In [90]:
# Standard run, replica 2

IMAGE="gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"

HPARAM_SET="pybullet-kuka"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TRAIN_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
LOG_DIR="gs://{0}/jobs/{1}".format(BUCKET, TRAIN_JOB_NAME)

!ks param set agents algorithm "agents.ppo.PPOAlgorithm"
!ks param set agents run_mode train
!ks param set agents num_cpu 31
!ks param set agents job_tag {JOB_SALT}
!ks param set agents logdir {LOG_DIR}
!ks param set agents name {TRAIN_JOB_NAME}
!ks param set agents image {IMAGE}
!ks param set agents save_checkpoint_secs 600
!ks apply gke -c agents

[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '31' for component 'agents'
[34mINFO  [0mParameter 'job_tag' successfully set to '"0205-1049-ceb9"' for component 'agents'
[34mINFO  [0mParameter 'logdir' successfully set to '"gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1049-ceb9"' for component 'agents'
[34mINFO  [0mParameter 'name' successfully set to '"pybullet-kuka-0205-1049-ceb9"' for component 'agents'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"' for component 'agents'
[34mINFO  [0mParameter 'save_checkpoint_secs' successfully set to '600' for component 'agents'
[34mINFO  [0mUpdating tfjobs rl.pybullet-kuka-0205-1049-ceb9
[34mINFO  [0mCreating non-existent tfjobs rl.pybullet-kuka-0205-1049-ceb9


In [91]:
TRAIN_JOB_NAME

'pybullet-kuka-0205-1049-ceb9'

In [87]:
# Debug with update_every=60

IMAGE="gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"

HPARAM_SET="pybullet-kuka"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TRAIN_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
LOG_DIR="gs://{0}/jobs/{1}".format(BUCKET, TRAIN_JOB_NAME)

!ks param set agents algorithm "agents.ppo.PPOAlgorithm"
!ks param set agents run_mode train
!ks param set agents num_cpu 31
!ks param set agents job_tag {JOB_SALT}
!ks param set agents logdir {LOG_DIR}
!ks param set agents name {TRAIN_JOB_NAME}
!ks param set agents update_every 60
!ks param set agents image {IMAGE}
!ks param set agents save_checkpoint_secs 600
!ks apply gke -c agents

[34mINFO  [0mParameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"train"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '31' for component 'agents'
[34mINFO  [0mParameter 'job_tag' successfully set to '"0205-1047-eaa8"' for component 'agents'
[34mINFO  [0mParameter 'logdir' successfully set to '"gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1047-eaa8"' for component 'agents'
[34mINFO  [0mParameter 'name' successfully set to '"pybullet-kuka-0205-1047-eaa8"' for component 'agents'
[34mINFO  [0mParameter 'update_every' successfully set to '60' for component 'agents'
[34mINFO  [0mParameter 'image' successfully set to '"gcr.io/kubeflow-rl/agents:agents-0205-1039-a8d6"' for component 'agents'
[34mINFO  [0mParameter 'save_checkpoint_secs' successfully set to '600' for component 'agents'
[34mINFO  [0mUpdating tfjobs rl.pybullet-kuka-0205-1047-eaa8
[34

In [89]:
TRAIN_JOB_NAME

'pybullet-kuka-0205-1047-eaa8'

In [88]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + TRAIN_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'pybullet-kuka-0205-1047-eaa8-master-bjxn-0-sm6mm'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Start a new run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1047-eaa8.
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl-kf/jobs/pybullet-kuka-0205-1047-eaa8',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7fc73f9bd848>,
 'normalize_ranges': True,
 'num_agents': 30,
 'num_gpus': 0,
 'optimizer': <class 'tensorflow.python.training.adam.Ada

#### Launching tensorboard

In [72]:
HPARAM_SET="tboard"
now=datetime.datetime.now()
JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
TBOARD_JOB_NAME=HPARAM_SET + "-" + JOB_SALT
# LOG_DIR="gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98"
SECRET_NAME="gcp-credentials"
SECRET_FILE_NAME="secret.json"
NAMESPACE="rl"

#!ks param set tensorboard name {TRAIN_JOB_NAME}
!ks param set tensorboard name {TBOARD_JOB_NAME}
!ks param set tensorboard namespace {NAMESPACE}
!ks param set tensorboard log_dir {LOG_DIR}
!ks param set tensorboard secret {SECRET_NAME}
!ks param set tensorboard secret_file_name {SECRET_FILE_NAME}
!ks show default -c tensorboard

!ks apply gke -c tensorboard

[34mINFO  [0mParameter 'name' successfully set to '"tboard-0204-1444-7582"' for component 'tensorboard'
[34mINFO  [0mParameter 'namespace' successfully set to '"rl"' for component 'tensorboard'
[34mINFO  [0mParameter 'log_dir' successfully set to '"gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret' successfully set to '"gcp-credentials"' for component 'tensorboard'
[34mINFO  [0mParameter 'secret_file_name' successfully set to '"secret.json"' for component 'tensorboard'
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: tboard-0204-1444-7582-tb
  namespace: rl
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: tensorboard
        tb-job: tboard-0204-1444-7582
      name: tboard-0204-1444-7582
      namespace: rl
    spec:
      containers:
      - command:
        - /usr/local/bin/tensorboard
        - --logdir=gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38
        - --po

### Connecting to Tensorboard

To connect to tensorboard use kubectl proxy and then access it and the url given by the URL returned by evaluating the next cell

In [73]:
PROXY_PORT=8001
# url=("http://127.0.0.1:{proxy_port}/api/v1/proxy/namespaces/{namespace}/services/{service_name}:80/".format(
#     proxy_port=PROXY_PORT, namespace=NAMESPACE, service_name=TRAIN_JOB_NAME + "-tb"))
url=("http://127.0.0.1:{proxy_port}/api/v1/proxy/namespaces/{namespace}/services/{service_name}:80/".format(
    proxy_port=PROXY_PORT, namespace=NAMESPACE, service_name=TBOARD_JOB_NAME + "-tb"))
print(url)

http://127.0.0.1:8001/api/v1/proxy/namespaces/rl/services/tboard-0204-1444-7582-tb:80/


### Deleting jobs

In [34]:
!kubectl delete tfjobs -n {NAMESPACE} {TRAIN_JOB_NAME}

tfjob "pybullet-kuka-0123-1000-3ad9" deleted


### Rendering the model

#### Initiating a rendering job directly

Launching a rendering job is as simple as the following:

In [64]:
import datetime
import uuid
import os

os.chdir(ML_APP_DIR)

now=datetime.datetime.now()

JOB_SALT=now.strftime("%m%d-%H%M") + "-" + uuid.uuid4().hex[0:4]
RENDER_JOB_NAME="render-" + JOB_SALT

# IMAGE="gcr.io/kubeflow-rl/agents:agents-0202-1020-01ec"
# DEBUG:
IMAGE="gcr.io/kubeflow-rl/agents:agents-0204-1005-9e57"

!ks param set agents name {RENDER_JOB_NAME}

# Note for rendering our resource needs are much smaller
!ks param set agents num_cpu 4

# By default the logdir for the last submitted job will be rendered. To render for
# an arbitrary logdir, specify it here.
# !ks param set agents logdir gs://kubeflow-rl-kf/jobs/pybullet-kuka-0202-1053-3a98

!ks param set agents run_mode render
!ks apply gke -c agents

[34mINFO  [0mParameter 'name' successfully set to '"render-0204-1113-c80b"' for component 'agents'
[34mINFO  [0mParameter 'num_cpu' successfully set to '4' for component 'agents'
[34mINFO  [0mParameter 'run_mode' successfully set to '"render"' for component 'agents'
[34mINFO  [0mUpdating tfjobs rl.render-0204-1113-c80b
[34mINFO  [0mCreating non-existent tfjobs rl.render-0204-1113-c80b


In [65]:
%%bash
kubectl get pods -n rl --show-all

NAME                                               READY     STATUS              RESTARTS   AGE
pybullet-kuka-0204-1028-bc38-master-kilp-0-m97zv   1/1       Running             0          44m
render-0204-1029-cd74-master-q4z0-0-dr9hx          0/1       Completed           0          43m
render-0204-1113-c80b-master-78uh-0-48cwt          0/1       ContainerCreating   0          1s
tboard-0204-0911-f303-tb-688bbb9cbb-kw6rm          1/1       Running             0          2h
tboard-0204-1029-9727-tb-77f888cb99-9f4mg          1/1       Running             0          44m
ubuntu                                             1/1       Running             0          11d


In [66]:
import subprocess
master_pod = subprocess.check_output(["kubectl", "-n", NAMESPACE, "get", "pods", "--selector=tf_job_name=" + RENDER_JOB_NAME,
                                      "-o", "jsonpath='{.items[*].metadata.name}'"]).decode("utf-8")
print(master_pod)
!kubectl logs -n {NAMESPACE} {master_pod}

'render-0204-1113-c80b-master-78uh-0-48cwt'
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Resume run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38.
INFO:tensorflow:Resume run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38.
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'init_output_factor': 0.1,
 'init_std': 0.35,
 'kl_cutoff_coef': 1000,
 'kl_cutoff_factor': 2,
 'kl_init_penalty': 1,
 'kl_target': 0.01,
 'learning_rate': 0.0001,
 'log_device_placement': False,
 'logdir': 'gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38',
 'max_length': 1000,
 'network': <function feed_forward_gaussian at 0x7f1a5ea7b7d0>,
 'normaliz

List the contents of the render subdir that was created by the render job (when finished):

In [67]:
!gsutil ls {LOG_DIR}/render

gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38/render/0204-1829-3a10/


In [70]:
!mkdir -p /tmp/agents-render
!gsutil cp `gsutil ls {LOG_DIR}/render/0204-1913-30b5 | grep mp4 | head -n1` /tmp/agents-render/render.mp4

Copying gs://kubeflow-rl-kf/jobs/pybullet-kuka-0204-1028-bc38/render/0204-1913-30b5/openaigym.video.0.44.video000000.mp4...
- [1 files][215.5 KiB/215.5 KiB]                                                
Operation completed over 1 objects/215.5 KiB.                                    


#### Inspecting the result

When the job is complete there will be a subdirectory of the log dir named "render" with a number of short videos of episodes of the agent performing the grasping task. Here's an example of what one of those looks like in a well-trained model.

In [71]:
import io
import base64
from IPython.display import HTML

mp4_path = '/tmp/agents-render/render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Great job! 🎉🎉🎉

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments (from the ID fields [here](https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/__init__.py)) and tweet your results! E.g.
    - RacecarBulletEnv-v0
    - MinitaurBulletDuckEnv-v0
    - HalfCheetahBulletEnv-v0
- Take a shot at implementing your own gym learning environment and repeat the above.