### Disclaimer

Distribution authorized to U.S. Government agencies and their contractors. Other requests for this document shall be referred to the MIT Lincoln Laboratory Technology Office.

This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

© 2019 Massachusetts Institute of Technology.

The software/firmware is provided to you on an As-Is basis

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.


### Treasure Hunt Challenge

This notebook uses [Stable Baselines](https://stable-baselines.readthedocs.io/en/master/) to train an agent for the [GOSEEK-Challenge](https://github.mit.edu/TESS/goseek-challenge). 

Proximal Policy Optimization is used to train an agent defined by a CNN-LSTM network. The agent's observations consist of RGB, segmentation, and depth images and relative pose. This, along with the reward function, is defined in the [GoSeekFullPerception](https://github.mit.edu/TESS/tesse-gym/blob/master/src/tesse_gym/tasks/goseek/goseek_full_perception.py#L30) [gym environment](https://gym.openai.com/). 


__Contents__
- [Configure Environment](#Configuration)
- [Define Model](#Define-the-Model)
- [Train Model](#Train-the-Model)
- [Visualize Results](#Visualize-Results)

In [1]:
from pathlib import Path

from gym import spaces
from stable_baselines.common.policies import CnnLstmPolicy
from stable_baselines.common.vec_env import SubprocVecEnv, DummyVecEnv
from stable_baselines import PPO2
from tesse.msgs import *

from tesse_gym import get_network_config
from tesse_gym.tasks.goseek import GoSeekFullPerception, decode_observations


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


# Configuration

#### Set sim path

In [2]:
filename = Path("../../goseek-challenge/simulator/goseek-v0.1.4.x86_64")
assert filename.exists(), f"Must set a valid path!"

#### Set environment parameters


__Note__ To minimize training time during initial use, we've set `total_timestamps` and `n_environments` to 1e5 and 2 respectively. Setting `total_timestamps` to 3e6 and `n_environments` to 4 should produce an agent that approximates our baseline. 

In [3]:
n_environments = 5  # number of environments to train over
total_timesteps = 500001  # number of training timesteps
scene_id = [1, 2, 3, 4, 5]  # list all available scenes
n_targets = 30  # number of targets spawned in each scene
target_found_reward = 2  # reward per found target
episode_length = 400


def make_unity_env(filename, num_env):
    """ Create a wrapped Unity environment. """

    def make_env(rank):
        def _thunk():
            env = GoSeekFullPerception(
                str(filename),
                network_config=get_network_config(worker_id=rank),
                n_targets=n_targets,
                episode_length=episode_length,
                scene_id=scene_id[rank%len(scene_id)],#np.random.choice(scene_id),
                target_found_reward=target_found_reward,
            )
            return env

        return _thunk

    return SubprocVecEnv([make_env(i) for i in range(num_env)])

#### Launch environments.

In [4]:
env = make_unity_env(filename, n_environments)

LOAD_MODEL = False

# Define the Model 

The following network assumes an observation of consisting of RGB, segmentation, and depth images along with the agent's relative pose from start. Images are processed using the Stable Baseline default CNN. The resulting feature vector is concatenated with the pose vector and given to an LSTM.

In [5]:
import tensorflow as tf
from stable_baselines.common.policies import nature_cnn

#### Define network to consume images and pose

In [6]:

def decode_tensor_observations(observation, img_shape=(-1, 240, 320, 5)):
    """ Decode observation vector into images and poses.

    Args:
        observation (np.ndarray): Shape (N,) observation array of flattened
            images concatenated with a pose vector. Thus, N is equal to N*H*W*C + N*3.
        img_shape (Tuple[int, int, int, int]): Shapes of all images stacked in (N, H, W, C).
            Default value is (-1, 240, 320, 5).
    
    Returns:
        Tuple[tf.Tensor, tf.Tensor]: Tensors with the following information
            - Tensor of shape (N, `img_shape[1:]`) containing RGB,
                segmentation, and depth images stacked across the channel dimension.
            - Tensor of shape (N, 3) containing (x, y, heading) relative to starting point.
                (x, y) are in meters, heading is given in degrees in the range [-180, 180].
    """
    
    imgs = tf.reshape(observation[:, :-3], img_shape)[..., -2:]
    pose = observation[:, -3:]
#     im1 = tf.image.resize(
#         imgs[..., :3], tf.constant([img_shape[1]//10, img_shape[2]//10], dtype=np.int32), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
#     )
#     im2 = tf.image.resize(
#         tf.expand_dims(imgs[..., 3], axis=3), tf.constant([img_shape[1]//10, img_shape[2]//10], dtype=np.int32), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
#     )
#     im3 = tf.image.resize(
#         tf.expand_dims(imgs[..., 4], axis=3), tf.constant([img_shape[1]//10, img_shape[2]//10], dtype=np.int32), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
#     )

#     new_imgs = im2 #tf.concat([im1, im2, im3], axis=3)


#     return tf.reshape(new_imgs, [-1, new_imgs.shape[1]*new_imgs.shape[2]*new_imgs.shape[3]]), imgs, pose
    return imgs, pose

In [7]:

from stable_baselines.a2c.utils import conv, linear, conv_to_fc

def attention_cnn(scaled_images, **kwargs):
    """Nature CNN with region-sensitive module"""
        
    def linear2d(input_tensor, num_hidden, scope):
        b, h, w = input_tensor.shape
        with tf.variable_scope(scope):
            tensors = []
            for i in range(h):
                weight = tf.get_variable("w"+str(i), [w, num_hidden], initializer=tf.initializers.orthogonal())
                bias = tf.get_variable("b"+str(i), [num_hidden], initializer=tf.constant_initializer(0.0))
                tensors.append(tf.matmul(input_tensor[:,i,:], weight) + bias)
            return tf.stack(tensors, axis=1)
        
    def attention_block(tensor, g, scope):
        b, h, w, f = tensor.shape
        ls = tf.reshape(tensor, (-1, h*w, f))
        print("ls",ls.get_shape())
        g_size = g.get_shape()[-1].value
        print("g", g.get_shape())

        with tf.variable_scope(scope):
            lsat = linear2d(ls, num_hidden=g_size, scope='lsat') # (-1, h*w, g_size)
            lsat = tf.nn.relu(lsat)
            print("lsat", lsat.get_shape())
            ### TODO is including also the batch dimension correct? ###
            g_tiled = tf.tile(tf.reshape(g, (-1, 1, g_size)), [1, h*w, 1])
            compatibility = tf.reduce_sum(tf.multiply(lsat, g_tiled), axis=-1, keepdims=True) #tf.tensordot(lsat, g_tiled, axes=((-1), (-1))) # (-1, h*w, 1)
    #         compatibility = tf.reshape(compatibility, shape=[-1, h*w, 1]) # (-1, h*w)
            print("compatibility", compatibility.get_shape())
            attention = tf.nn.softmax(compatibility, axis=1, name="attention_softmax") # (-1, h*w)
        #     attention = tf.tile(tf.reshape(attention, shape=(-1, h*w, 1)), [1, 1, f]) # (-1, h*w, f)
            attention_tiled = tf.tile(attention, [1, 1, f]) # (-1, h*w, f)
            print("attention", attention_tiled.get_shape())
            weighted_ls = attention_tiled * ls
            return weighted_ls, attention


    c1 = tf.nn.relu(conv(scaled_images, 'c1', n_filters=16, filter_size=8, stride=2, init_scale=np.sqrt(2), **kwargs))
    c2 = tf.nn.relu(conv(c1, 'c2', n_filters=24, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
    c3 = tf.nn.relu(conv(c2, 'c3', n_filters=32, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
#     c3 = tf.nn.l2_normalize(c3, axis=-1)

    g = tf.nn.relu(linear(conv_to_fc(c3), n_hidden=64, scope="g", init_scale=np.sqrt(2)))
    
#     g2, attn2_layer = attention_block(c2, g, 'attn2') 
#     g2 = conv_to_fc(g2)
    g3, attn3_layer = attention_block(c3, g, 'attn3') 
    g3 = conv_to_fc(g3)
    
    gsa = g3#tf.concat((g2, g3), 1)
    
    lastln = tf.nn.relu(linear(gsa, 'fc1', n_hidden=256, init_scale=np.sqrt(2)))
    
    return lastln 

In [8]:
def image_and_pose_network(observation, **kwargs):
    """ Network to process image and pose data.
    
    Use the stable baselines nature_cnn to process images. The resulting
    feature vector is then combined with the pose estimate and given to an
    LSTM (LSTM defined in PPO2 below).
    
    Args:
        raw_observations (tf.Tensor): 1D tensor containing image and 
            pose data.
        
    Returns:
        tf.Tensor: Feature vector. 
    """
    orig_imgs, pose = decode_tensor_observations(observation)
    scaled_imgs = tf.image.resize_images(orig_imgs, [40, 40], method=1) # 1: nearest
    image_features = attention_cnn(scaled_imgs)
#     print(image_features.shape, imgs.shape, pose.shape)
    return tf.concat((image_features, pose), axis=-1)

#### Register custom network

Outputs of the network defined above will be fed into an LSTM defined below in PPO2.

In [9]:
if tf.test.gpu_device_name(): 
    print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
else:
    print("Please install GPU version of TF")

Default GPU Device:/device:GPU:0


In [10]:
# policy_kwargs = {'cnn_extractor': image_and_pose_network}
policy_kwargs = {'cnn_extractor': image_and_pose_network}

In [11]:

if LOAD_MODEL:
    MODEL_WEIGHTS_PATH = "results/goseek-ppo-realattention-forwardreward-fartargets/final_model.pkl"
    assert MODEL_WEIGHTS_PATH, f"Must give a model weights path!"
    model = PPO2.load(str(MODEL_WEIGHTS_PATH),env=env, tensorboard_log="./tensorboard/", gamma=0.995, learning_rate=0.0002)
else:
    model = PPO2(
        CnnLstmPolicy,
        env,
        n_steps=100,
        verbose=1,
        tensorboard_log="./tensorboard/",
        nminibatches=5,
        gamma=0.995,
        learning_rate=0.00025,
        policy_kwargs=policy_kwargs,
    )

Instructions for updating:
Colocations handled automatically by placer.
ls (5, 25, 32)
g (5, 64)
lsat (5, 25, 64)
compatibility (5, 25, 1)
attention (5, 25, 32)
ls (100, 25, 32)
g (100, 64)
lsat (100, 25, 64)
compatibility (100, 25, 1)
attention (100, 25, 32)
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


# Train the Model

#### Define logging directory and callback function to save checkpoints

In [12]:
log_dir = Path("results/goseek-ppo-realattention-forwardreward-50fartargets")
log_dir.mkdir(parents=True, exist_ok=True)

total_updates = 0
def save_checkpoint_callback(local_vars, global_vars):
    global total_updates
#     print(f"=== local vars ===\n{local_vars.keys()}")  # add this line 
#     total_updates = local_vars["n_updates"]
    total_updates += 1
    if total_updates % 1000 == 0:
        local_vars["self"].save(str(log_dir / f"{total_updates:06d}.pkl"))
        print("Saving model")

In [13]:
if LOAD_MODEL:
    model.learn(total_timesteps=total_timesteps, tb_log_name='50fartargets', callback=save_checkpoint_callback, reset_num_timesteps=False)
else:
    model.learn(total_timesteps=total_timesteps, callback=save_checkpoint_callback)

## TODO:
## add little penalty (-0.01) for using the grab action... because it grabs too much
## DONE add little reward (0.01) for going forward... bcs it rotates too much
## visualize attention?

--------------------------------------
| approxkl           | 0.0010325776  |
| clipfrac           | 0.0           |
| explained_variance | -0.0137       |
| fps                | 8             |
| n_updates          | 1             |
| policy_entropy     | 1.3843722     |
| policy_loss        | -0.0061120847 |
| serial_timesteps   | 100           |
| time_elapsed       | 7.01e-05      |
| total_timesteps    | 500           |
| value_loss         | 0.032731924   |
--------------------------------------
-------------------------------------
| approxkl           | 0.0023111817 |
| clipfrac           | 0.0125       |
| explained_variance | 0.278        |
| fps                | 15           |
| n_updates          | 2            |
| policy_entropy     | 1.370815     |
| policy_loss        | -0.002890921 |
| serial_timesteps   | 200          |
| time_elapsed       | 58           |
| total_timesteps    | 1000         |
| value_loss         | 0.022559594  |
-------------------------------------

-------------------------------------
| approxkl           | 0.0064087496 |
| clipfrac           | 0.0995       |
| explained_variance | 0.136        |
| fps                | 17           |
| n_updates          | 18           |
| policy_entropy     | 0.7588175    |
| policy_loss        | -0.008281929 |
| serial_timesteps   | 1800         |
| time_elapsed       | 484          |
| total_timesteps    | 9000         |
| value_loss         | 0.096004926  |
-------------------------------------
-------------------------------------
| approxkl           | 0.015476751  |
| clipfrac           | 0.16199997   |
| explained_variance | 0.554        |
| fps                | 18           |
| n_updates          | 19           |
| policy_entropy     | 0.6355119    |
| policy_loss        | 0.0031921086 |
| serial_timesteps   | 1900         |
| time_elapsed       | 513          |
| total_timesteps    | 9500         |
| value_loss         | 0.039673887  |
-------------------------------------
Saving model

--------------------------------------
| approxkl           | 0.00017743678 |
| clipfrac           | 0.0015        |
| explained_variance | 0.976         |
| fps                | 20            |
| n_updates          | 35            |
| policy_entropy     | 0.02910234    |
| policy_loss        | -0.0006334331 |
| serial_timesteps   | 3500          |
| time_elapsed       | 908           |
| total_timesteps    | 17500         |
| value_loss         | 0.00016268584 |
--------------------------------------
-------------------------------------
| approxkl           | 0.0010090873 |
| clipfrac           | 0.003        |
| explained_variance | 0.981        |
| fps                | 19           |
| n_updates          | 36           |
| policy_entropy     | 0.047466487  |
| policy_loss        | -0.001282389 |
| serial_timesteps   | 3600         |
| time_elapsed       | 932          |
| total_timesteps    | 18000        |
| value_loss         | 4.873485e-05 |
-------------------------------------

-------------------------------------
| approxkl           | 0.0011158899 |
| clipfrac           | 0.0235       |
| explained_variance | 0.0353       |
| fps                | 20           |
| n_updates          | 52           |
| policy_entropy     | 0.17394587   |
| policy_loss        | -0.003653658 |
| serial_timesteps   | 5200         |
| time_elapsed       | 1.32e+03     |
| total_timesteps    | 26000        |
| value_loss         | 0.18828496   |
-------------------------------------
----------------------------------------
| approxkl           | 3.445618e-05    |
| clipfrac           | 0.0             |
| explained_variance | 0.129           |
| fps                | 19              |
| n_updates          | 53              |
| policy_entropy     | 0.19568841      |
| policy_loss        | -0.000103112754 |
| serial_timesteps   | 5300            |
| time_elapsed       | 1.35e+03        |
| total_timesteps    | 26500           |
| value_loss         | 0.07528128      |
--------------

--------------------------------------
| approxkl           | 0.014897658   |
| clipfrac           | 0.033999998   |
| explained_variance | 0.435         |
| fps                | 26            |
| n_updates          | 69            |
| policy_entropy     | 0.19078429    |
| policy_loss        | -7.811752e-05 |
| serial_timesteps   | 6900          |
| time_elapsed       | 1.74e+03      |
| total_timesteps    | 34500         |
| value_loss         | 0.057495814   |
--------------------------------------
Saving model
--------------------------------------
| approxkl           | 0.0030762542  |
| clipfrac           | 0.019499999   |
| explained_variance | 0.99          |
| fps                | 25            |
| n_updates          | 70            |
| policy_entropy     | 0.12756538    |
| policy_loss        | -0.0018031945 |
| serial_timesteps   | 7000          |
| time_elapsed       | 1.76e+03      |
| total_timesteps    | 35000         |
| value_loss         | 0.0004824095  |
------------

-------------------------------------
| approxkl           | 0.0034596995 |
| clipfrac           | 0.021000002  |
| explained_variance | 0.349        |
| fps                | 19           |
| n_updates          | 85           |
| policy_entropy     | 0.18974888   |
| policy_loss        | -0.003814437 |
| serial_timesteps   | 8500         |
| time_elapsed       | 2.13e+03     |
| total_timesteps    | 42500        |
| value_loss         | 0.06150656   |
-------------------------------------
-------------------------------------
| approxkl           | 0.0027672742 |
| clipfrac           | 0.026000002  |
| explained_variance | 0.798        |
| fps                | 20           |
| n_updates          | 86           |
| policy_entropy     | 0.17109895   |
| policy_loss        | -0.005189655 |
| serial_timesteps   | 8600         |
| time_elapsed       | 2.15e+03     |
| total_timesteps    | 43000        |
| value_loss         | 0.0020027512 |
-------------------------------------
------------

--------------------------------------
| approxkl           | 0.0011930363  |
| clipfrac           | 0.0145        |
| explained_variance | 0.902         |
| fps                | 20            |
| n_updates          | 102           |
| policy_entropy     | 0.12280337    |
| policy_loss        | -0.0014813818 |
| serial_timesteps   | 10200         |
| time_elapsed       | 2.54e+03      |
| total_timesteps    | 51000         |
| value_loss         | 0.0028552392  |
--------------------------------------
--------------------------------------
| approxkl           | 9.549391e-06  |
| clipfrac           | 0.0           |
| explained_variance | 0.97          |
| fps                | 19            |
| n_updates          | 103           |
| policy_entropy     | 0.10175176    |
| policy_loss        | 0.00019012878 |
| serial_timesteps   | 10300         |
| time_elapsed       | 2.57e+03      |
| total_timesteps    | 51500         |
| value_loss         | 0.0005879775  |
-------------------------

---------------------------------------
| approxkl           | 0.00088838517  |
| clipfrac           | 0.0064999997   |
| explained_variance | 0.956          |
| fps                | 26             |
| n_updates          | 119            |
| policy_entropy     | 0.1482716      |
| policy_loss        | -0.00013736721 |
| serial_timesteps   | 11900          |
| time_elapsed       | 2.93e+03       |
| total_timesteps    | 59500          |
| value_loss         | 0.001019506    |
---------------------------------------
Saving model
--------------------------------------
| approxkl           | 0.003351309   |
| clipfrac           | 0.026999999   |
| explained_variance | 0.954         |
| fps                | 21            |
| n_updates          | 120           |
| policy_entropy     | 0.1007532     |
| policy_loss        | -0.0043566898 |
| serial_timesteps   | 12000         |
| time_elapsed       | 2.95e+03      |
| total_timesteps    | 60000         |
| value_loss         | 0.0006230802  |

---------------------------------------
| approxkl           | 0.00017164327  |
| clipfrac           | 0.0035         |
| explained_variance | 0.986          |
| fps                | 21             |
| n_updates          | 136            |
| policy_entropy     | 0.12820135     |
| policy_loss        | -0.00017837527 |
| serial_timesteps   | 13600          |
| time_elapsed       | 3.34e+03       |
| total_timesteps    | 68000          |
| value_loss         | 0.0006412569   |
---------------------------------------
--------------------------------------
| approxkl           | 0.0020018362  |
| clipfrac           | 0.02          |
| explained_variance | 0.0353        |
| fps                | 20            |
| n_updates          | 137           |
| policy_entropy     | 0.21316667    |
| policy_loss        | -0.0020749182 |
| serial_timesteps   | 13700         |
| time_elapsed       | 3.36e+03      |
| total_timesteps    | 68500         |
| value_loss         | 0.066955194   |
------------

-------------------------------------
| approxkl           | 0.0035826224 |
| clipfrac           | 0.018000001  |
| explained_variance | 0.106        |
| fps                | 17           |
| n_updates          | 153          |
| policy_entropy     | 0.17628913   |
| policy_loss        | -0.001630065 |
| serial_timesteps   | 15300        |
| time_elapsed       | 3.76e+03     |
| total_timesteps    | 76500        |
| value_loss         | 0.086496204  |
-------------------------------------
--------------------------------------
| approxkl           | 0.0005521535  |
| clipfrac           | 0.0105        |
| explained_variance | 0.961         |
| fps                | 25            |
| n_updates          | 154           |
| policy_entropy     | 0.08824446    |
| policy_loss        | -0.0038073403 |
| serial_timesteps   | 15400         |
| time_elapsed       | 3.79e+03      |
| total_timesteps    | 77000         |
| value_loss         | 0.0014018777  |
--------------------------------------

Saving model
-------------------------------------
| approxkl           | 0.0038323519 |
| clipfrac           | 0.013499999  |
| explained_variance | 0.977        |
| fps                | 20           |
| n_updates          | 170          |
| policy_entropy     | 0.1652705    |
| policy_loss        | -0.005482073 |
| serial_timesteps   | 17000        |
| time_elapsed       | 4.17e+03     |
| total_timesteps    | 85000        |
| value_loss         | 0.0013977264 |
-------------------------------------
--------------------------------------
| approxkl           | 0.00083511847 |
| clipfrac           | 0.0084999995  |
| explained_variance | 0.946         |
| fps                | 19            |
| n_updates          | 171           |
| policy_entropy     | 0.18090495    |
| policy_loss        | 0.00070975174 |
| serial_timesteps   | 17100         |
| time_elapsed       | 4.19e+03      |
| total_timesteps    | 85500         |
| value_loss         | 0.0014519272  |
-------------------------

--------------------------------------
| approxkl           | 0.00025567258 |
| clipfrac           | 0.0015        |
| explained_variance | 0.942         |
| fps                | 22            |
| n_updates          | 187           |
| policy_entropy     | 0.12344458    |
| policy_loss        | -0.0005386348 |
| serial_timesteps   | 18700         |
| time_elapsed       | 4.58e+03      |
| total_timesteps    | 93500         |
| value_loss         | 0.0015547571  |
--------------------------------------
--------------------------------------
| approxkl           | 0.0004808703  |
| clipfrac           | 0.006         |
| explained_variance | 0.946         |
| fps                | 20            |
| n_updates          | 188           |
| policy_entropy     | 0.10712816    |
| policy_loss        | -0.0012034492 |
| serial_timesteps   | 18800         |
| time_elapsed       | 4.6e+03       |
| total_timesteps    | 94000         |
| value_loss         | 0.0014131644  |
-------------------------

--------------------------------------
| approxkl           | 2.1432825e-05 |
| clipfrac           | 0.0           |
| explained_variance | 0.932         |
| fps                | 27            |
| n_updates          | 204           |
| policy_entropy     | 0.0385629     |
| policy_loss        | -0.0008812713 |
| serial_timesteps   | 20400         |
| time_elapsed       | 4.97e+03      |
| total_timesteps    | 102000        |
| value_loss         | 0.000761618   |
--------------------------------------
--------------------------------------
| approxkl           | 0.01854948    |
| clipfrac           | 0.11749999    |
| explained_variance | 0.267         |
| fps                | 26            |
| n_updates          | 205           |
| policy_entropy     | 0.28900486    |
| policy_loss        | -0.0019442432 |
| serial_timesteps   | 20500         |
| time_elapsed       | 4.99e+03      |
| total_timesteps    | 102500        |
| value_loss         | 0.0604826     |
-------------------------

--------------------------------------
| approxkl           | 0.0005627909  |
| clipfrac           | 0.0059999996  |
| explained_variance | 0.104         |
| fps                | 18            |
| n_updates          | 221           |
| policy_entropy     | 0.13599391    |
| policy_loss        | -0.0014203321 |
| serial_timesteps   | 22100         |
| time_elapsed       | 5.35e+03      |
| total_timesteps    | 110500        |
| value_loss         | 0.08333637    |
--------------------------------------
--------------------------------------
| approxkl           | 0.00045970027 |
| clipfrac           | 0.0064999997  |
| explained_variance | 0.904         |
| fps                | 21            |
| n_updates          | 222           |
| policy_entropy     | 0.113048315   |
| policy_loss        | -8.26268e-05  |
| serial_timesteps   | 22200         |
| time_elapsed       | 5.38e+03      |
| total_timesteps    | 111000        |
| value_loss         | 0.00094247516 |
-------------------------

--------------------------------------
| approxkl           | 0.00037599087 |
| clipfrac           | 0.0084999995  |
| explained_variance | 0.967         |
| fps                | 24            |
| n_updates          | 238           |
| policy_entropy     | 0.09232046    |
| policy_loss        | 7.841727e-05  |
| serial_timesteps   | 23800         |
| time_elapsed       | 5.78e+03      |
| total_timesteps    | 119000        |
| value_loss         | 0.0006612323  |
--------------------------------------
--------------------------------------
| approxkl           | 0.0001718559  |
| clipfrac           | 0.0054999995  |
| explained_variance | 0.975         |
| fps                | 25            |
| n_updates          | 239           |
| policy_entropy     | 0.067893565   |
| policy_loss        | -0.0004454575 |
| serial_timesteps   | 23900         |
| time_elapsed       | 5.8e+03       |
| total_timesteps    | 119500        |
| value_loss         | 0.00063164847 |
-------------------------

---------------------------------------
| approxkl           | 0.000869836    |
| clipfrac           | 0.0019999999   |
| explained_variance | 0.937          |
| fps                | 20             |
| n_updates          | 255            |
| policy_entropy     | 0.024140043    |
| policy_loss        | -0.00068750855 |
| serial_timesteps   | 25500          |
| time_elapsed       | 6.17e+03       |
| total_timesteps    | 127500         |
| value_loss         | 0.0009536225   |
---------------------------------------
----------------------------------------
| approxkl           | 3.0048724e-09   |
| clipfrac           | 0.0             |
| explained_variance | 0.922           |
| fps                | 20              |
| n_updates          | 256             |
| policy_entropy     | 0.017261889     |
| policy_loss        | -1.05857836e-07 |
| serial_timesteps   | 25600           |
| time_elapsed       | 6.19e+03        |
| total_timesteps    | 128000          |
| value_loss         | 0.0009

---------------------------------------
| approxkl           | 7.0402384e-06  |
| clipfrac           | 0.0            |
| explained_variance | 0.638          |
| fps                | 19             |
| n_updates          | 272            |
| policy_entropy     | 0.03924849     |
| policy_loss        | -0.00013388728 |
| serial_timesteps   | 27200          |
| time_elapsed       | 6.57e+03       |
| total_timesteps    | 136000         |
| value_loss         | 0.00081585365  |
---------------------------------------
--------------------------------------
| approxkl           | 0.0037276074  |
| clipfrac           | 0.014499998   |
| explained_variance | 0.302         |
| fps                | 19            |
| n_updates          | 273           |
| policy_entropy     | 0.16009687    |
| policy_loss        | -0.0009558603 |
| serial_timesteps   | 27300         |
| time_elapsed       | 6.6e+03       |
| total_timesteps    | 136500        |
| value_loss         | 0.046813466   |
------------

--------------------------------------
| approxkl           | 0.0039678933  |
| clipfrac           | 0.046000004   |
| explained_variance | -0.02         |
| fps                | 18            |
| n_updates          | 289           |
| policy_entropy     | 0.41257232    |
| policy_loss        | -0.0046493122 |
| serial_timesteps   | 28900         |
| time_elapsed       | 6.98e+03      |
| total_timesteps    | 144500        |
| value_loss         | 0.64664876    |
--------------------------------------
Saving model
---------------------------------------
| approxkl           | 0.013148954    |
| clipfrac           | 0.109000005    |
| explained_variance | 0.605          |
| fps                | 19             |
| n_updates          | 290            |
| policy_entropy     | 0.42922014     |
| policy_loss        | -8.9844645e-05 |
| serial_timesteps   | 29000          |
| time_elapsed       | 7.01e+03       |
| total_timesteps    | 145000         |
| value_loss         | 0.006800206    |


--------------------------------------
| approxkl           | 0.00053457136 |
| clipfrac           | 0.0054999995  |
| explained_variance | 0.813         |
| fps                | 20            |
| n_updates          | 306           |
| policy_entropy     | 0.11544037    |
| policy_loss        | -0.0023006287 |
| serial_timesteps   | 30600         |
| time_elapsed       | 7.41e+03      |
| total_timesteps    | 153000        |
| value_loss         | 0.0010017443  |
--------------------------------------
--------------------------------------
| approxkl           | 0.0002505317  |
| clipfrac           | 0.005499999   |
| explained_variance | 0.927         |
| fps                | 20            |
| n_updates          | 307           |
| policy_entropy     | 0.07827816    |
| policy_loss        | 0.00043868698 |
| serial_timesteps   | 30700         |
| time_elapsed       | 7.43e+03      |
| total_timesteps    | 153500        |
| value_loss         | 0.0002497424  |
-------------------------

--------------------------------------
| approxkl           | 0.00020036768 |
| clipfrac           | 0.003         |
| explained_variance | 0.787         |
| fps                | 20            |
| n_updates          | 323           |
| policy_entropy     | 0.10661515    |
| policy_loss        | -0.0009649981 |
| serial_timesteps   | 32300         |
| time_elapsed       | 7.82e+03      |
| total_timesteps    | 161500        |
| value_loss         | 0.000578376   |
--------------------------------------
---------------------------------------
| approxkl           | 0.00027956202  |
| clipfrac           | 0.0054999995   |
| explained_variance | 0.877          |
| fps                | 21             |
| n_updates          | 324            |
| policy_entropy     | 0.073711365    |
| policy_loss        | -0.00068674044 |
| serial_timesteps   | 32400          |
| time_elapsed       | 7.84e+03       |
| total_timesteps    | 162000         |
| value_loss         | 0.00072693813  |
-------------

Saving model
---------------------------------------
| approxkl           | 0.001416981    |
| clipfrac           | 0.017          |
| explained_variance | 0.546          |
| fps                | 19             |
| n_updates          | 340            |
| policy_entropy     | 0.14748327     |
| policy_loss        | -0.00093542447 |
| serial_timesteps   | 34000          |
| time_elapsed       | 8.21e+03       |
| total_timesteps    | 170000         |
| value_loss         | 0.0006179906   |
---------------------------------------
--------------------------------------
| approxkl           | 0.0010604735  |
| clipfrac           | 0.012         |
| explained_variance | 0.07          |
| fps                | 19            |
| n_updates          | 341           |
| policy_entropy     | 0.1533697     |
| policy_loss        | -0.0014149236 |
| serial_timesteps   | 34100         |
| time_elapsed       | 8.24e+03      |
| total_timesteps    | 170500        |
| value_loss         | 0.06271891    |

--------------------------------------
| approxkl           | 0.004117512   |
| clipfrac           | 0.0255        |
| explained_variance | 0.219         |
| fps                | 19            |
| n_updates          | 357           |
| policy_entropy     | 0.1590908     |
| policy_loss        | -0.0026232954 |
| serial_timesteps   | 35700         |
| time_elapsed       | 8.6e+03       |
| total_timesteps    | 178500        |
| value_loss         | 0.05929154    |
--------------------------------------
--------------------------------------
| approxkl           | 0.00052830507 |
| clipfrac           | 0.0095        |
| explained_variance | 0.0241        |
| fps                | 20            |
| n_updates          | 358           |
| policy_entropy     | 0.11734761    |
| policy_loss        | 0.00045851967 |
| serial_timesteps   | 35800         |
| time_elapsed       | 8.63e+03      |
| total_timesteps    | 179000        |
| value_loss         | 0.034649983   |
-------------------------

--------------------------------------
| approxkl           | 0.0018294677  |
| clipfrac           | 0.018         |
| explained_variance | 0.37          |
| fps                | 26            |
| n_updates          | 374           |
| policy_entropy     | 0.106010035   |
| policy_loss        | -0.0040279715 |
| serial_timesteps   | 37400         |
| time_elapsed       | 9.01e+03      |
| total_timesteps    | 187000        |
| value_loss         | 0.0010439096  |
--------------------------------------
--------------------------------------
| approxkl           | 0.019444318   |
| clipfrac           | 0.013499999   |
| explained_variance | 0.901         |
| fps                | 26            |
| n_updates          | 375           |
| policy_entropy     | 0.064448915   |
| policy_loss        | -0.0029825338 |
| serial_timesteps   | 37500         |
| time_elapsed       | 9.03e+03      |
| total_timesteps    | 187500        |
| value_loss         | 0.00073673186 |
-------------------------

---------------------------------------
| approxkl           | 0.00017610218  |
| clipfrac           | 0.0015         |
| explained_variance | 0.911          |
| fps                | 18             |
| n_updates          | 391            |
| policy_entropy     | 0.0025740457   |
| policy_loss        | -0.00046521565 |
| serial_timesteps   | 39100          |
| time_elapsed       | 9.41e+03       |
| total_timesteps    | 195500         |
| value_loss         | 0.0006890384   |
---------------------------------------
----------------------------------------
| approxkl           | 4.0680504e-11   |
| clipfrac           | 0.0             |
| explained_variance | 0.912           |
| fps                | 20              |
| n_updates          | 392             |
| policy_entropy     | 0.0017771067    |
| policy_loss        | -1.05381005e-07 |
| serial_timesteps   | 39200           |
| time_elapsed       | 9.44e+03        |
| total_timesteps    | 196000          |
| value_loss         | 0.0005

---------------------------------------
| approxkl           | 3.7030886e-12  |
| clipfrac           | 0.0            |
| explained_variance | 0.676          |
| fps                | 20             |
| n_updates          | 408            |
| policy_entropy     | 0.0027490102   |
| policy_loss        | -1.3160705e-07 |
| serial_timesteps   | 40800          |
| time_elapsed       | 9.81e+03       |
| total_timesteps    | 204000         |
| value_loss         | 0.00024018763  |
---------------------------------------
---------------------------------------
| approxkl           | 4.058276e-12   |
| clipfrac           | 0.0            |
| explained_variance | 0.709          |
| fps                | 20             |
| n_updates          | 409            |
| policy_entropy     | 0.002744893    |
| policy_loss        | -4.0531155e-08 |
| serial_timesteps   | 40900          |
| time_elapsed       | 9.83e+03       |
| total_timesteps    | 204500         |
| value_loss         | 0.00021126722  |


--------------------------------------
| approxkl           | 7.367131e-12  |
| clipfrac           | 0.0           |
| explained_variance | 0.807         |
| fps                | 20            |
| n_updates          | 424           |
| policy_entropy     | 0.0029730373  |
| policy_loss        | -6.818772e-08 |
| serial_timesteps   | 42400         |
| time_elapsed       | 1.02e+04      |
| total_timesteps    | 212000        |
| value_loss         | 0.00022593075 |
--------------------------------------
---------------------------------------
| approxkl           | 3.875093e-12   |
| clipfrac           | 0.0            |
| explained_variance | 0.834          |
| fps                | 21             |
| n_updates          | 425            |
| policy_entropy     | 0.003016461    |
| policy_loss        | -1.7166135e-08 |
| serial_timesteps   | 42500          |
| time_elapsed       | 1.02e+04       |
| total_timesteps    | 212500         |
| value_loss         | 0.00016431515  |
-------------

--------------------------------------
| approxkl           | 0.0035755155  |
| clipfrac           | 0.013000001   |
| explained_variance | 0.916         |
| fps                | 22            |
| n_updates          | 441           |
| policy_entropy     | 0.05053454    |
| policy_loss        | -0.0019239262 |
| serial_timesteps   | 44100         |
| time_elapsed       | 1.06e+04      |
| total_timesteps    | 220500        |
| value_loss         | 0.00054677034 |
--------------------------------------
--------------------------------------
| approxkl           | 0.0006467075  |
| clipfrac           | 0.0025        |
| explained_variance | 0.0451        |
| fps                | 19            |
| n_updates          | 442           |
| policy_entropy     | 0.06701659    |
| policy_loss        | -0.0006211037 |
| serial_timesteps   | 44200         |
| time_elapsed       | 1.06e+04      |
| total_timesteps    | 221000        |
| value_loss         | 0.17163703    |
-------------------------

--------------------------------------
| approxkl           | 0.010445381   |
| clipfrac           | 0.0775        |
| explained_variance | 0.043         |
| fps                | 19            |
| n_updates          | 458           |
| policy_entropy     | 0.2710513     |
| policy_loss        | -0.0060658455 |
| serial_timesteps   | 45800         |
| time_elapsed       | 1.1e+04       |
| total_timesteps    | 229000        |
| value_loss         | 0.15500446    |
--------------------------------------
--------------------------------------
| approxkl           | 0.0043443004  |
| clipfrac           | 0.014500001   |
| explained_variance | 0.947         |
| fps                | 20            |
| n_updates          | 459           |
| policy_entropy     | 0.10837654    |
| policy_loss        | -0.0014604282 |
| serial_timesteps   | 45900         |
| time_elapsed       | 1.1e+04       |
| total_timesteps    | 229500        |
| value_loss         | 0.0022399784  |
-------------------------

--------------------------------------
| approxkl           | 0.00067500625 |
| clipfrac           | 0.005         |
| explained_variance | 0.924         |
| fps                | 19            |
| n_updates          | 475           |
| policy_entropy     | 0.043694764   |
| policy_loss        | 0.0005456341  |
| serial_timesteps   | 47500         |
| time_elapsed       | 1.14e+04      |
| total_timesteps    | 237500        |
| value_loss         | 0.00067613344 |
--------------------------------------
--------------------------------------
| approxkl           | 0.0003975493  |
| clipfrac           | 0.0009999999  |
| explained_variance | 0.928         |
| fps                | 20            |
| n_updates          | 476           |
| policy_entropy     | 0.03843221    |
| policy_loss        | -0.0003497944 |
| serial_timesteps   | 47600         |
| time_elapsed       | 1.14e+04      |
| total_timesteps    | 238000        |
| value_loss         | 0.0010309148  |
-------------------------

-------------------------------------
| approxkl           | 0.0053177355 |
| clipfrac           | 0.049499996  |
| explained_variance | 0.0237       |
| fps                | 19           |
| n_updates          | 492          |
| policy_entropy     | 0.8738949    |
| policy_loss        | -0.004573512 |
| serial_timesteps   | 49200        |
| time_elapsed       | 1.19e+04     |
| total_timesteps    | 246000       |
| value_loss         | 0.5641848    |
-------------------------------------
-------------------------------------
| approxkl           | 0.015777312  |
| clipfrac           | 0.18749999   |
| explained_variance | 0.0538       |
| fps                | 22           |
| n_updates          | 493          |
| policy_entropy     | 1.090381     |
| policy_loss        | -0.008771414 |
| serial_timesteps   | 49300        |
| time_elapsed       | 1.19e+04     |
| total_timesteps    | 246500       |
| value_loss         | 0.13821873   |
-------------------------------------
------------

------------------------------------
| approxkl           | 0.09448703  |
| clipfrac           | 0.15550001  |
| explained_variance | 0.641       |
| fps                | 18          |
| n_updates          | 509         |
| policy_entropy     | 0.9150747   |
| policy_loss        | 0.011708167 |
| serial_timesteps   | 50900       |
| time_elapsed       | 1.23e+04    |
| total_timesteps    | 254500      |
| value_loss         | 0.01358484  |
------------------------------------
Saving model
---------------------------------------
| approxkl           | 0.02189039     |
| clipfrac           | 0.1345         |
| explained_variance | 0.237          |
| fps                | 18             |
| n_updates          | 510            |
| policy_entropy     | 1.110802       |
| policy_loss        | -0.00015137959 |
| serial_timesteps   | 51000          |
| time_elapsed       | 1.23e+04       |
| total_timesteps    | 255000         |
| value_loss         | 0.039645214    |
--------------------------

--------------------------------------
| approxkl           | 0.0018581639  |
| clipfrac           | 0.01          |
| explained_variance | -0.064        |
| fps                | 18            |
| n_updates          | 526           |
| policy_entropy     | 1.1525887     |
| policy_loss        | -0.0050822897 |
| serial_timesteps   | 52600         |
| time_elapsed       | 1.27e+04      |
| total_timesteps    | 263000        |
| value_loss         | 0.67203504    |
--------------------------------------
--------------------------------------
| approxkl           | 0.023280697   |
| clipfrac           | 0.12999998    |
| explained_variance | 0.0175        |
| fps                | 19            |
| n_updates          | 527           |
| policy_entropy     | 1.1027858     |
| policy_loss        | 0.00058240007 |
| serial_timesteps   | 52700         |
| time_elapsed       | 1.28e+04      |
| total_timesteps    | 263500        |
| value_loss         | 0.5749899     |
-------------------------

-------------------------------------
| approxkl           | 0.00627701   |
| clipfrac           | 0.0815       |
| explained_variance | 0.135        |
| fps                | 18           |
| n_updates          | 543          |
| policy_entropy     | 0.91315114   |
| policy_loss        | -0.008285305 |
| serial_timesteps   | 54300        |
| time_elapsed       | 1.32e+04     |
| total_timesteps    | 271500       |
| value_loss         | 1.269329     |
-------------------------------------
-------------------------------------
| approxkl           | 0.0049136053 |
| clipfrac           | 0.049999997  |
| explained_variance | 0.558        |
| fps                | 18           |
| n_updates          | 544          |
| policy_entropy     | 0.829697     |
| policy_loss        | -0.004686732 |
| serial_timesteps   | 54400        |
| time_elapsed       | 1.32e+04     |
| total_timesteps    | 272000       |
| value_loss         | 0.25130996   |
-------------------------------------
------------

Saving model
-------------------------------------
| approxkl           | 0.031523194  |
| clipfrac           | 0.145        |
| explained_variance | 0.483        |
| fps                | 19           |
| n_updates          | 560          |
| policy_entropy     | 0.89460975   |
| policy_loss        | -0.007844002 |
| serial_timesteps   | 56000        |
| time_elapsed       | 1.36e+04     |
| total_timesteps    | 280000       |
| value_loss         | 0.12028011   |
-------------------------------------
-------------------------------------
| approxkl           | 0.021949928  |
| clipfrac           | 0.137        |
| explained_variance | 0.331        |
| fps                | 18           |
| n_updates          | 561          |
| policy_entropy     | 0.7697894    |
| policy_loss        | 0.0038673454 |
| serial_timesteps   | 56100        |
| time_elapsed       | 1.36e+04     |
| total_timesteps    | 280500       |
| value_loss         | 0.18047714   |
-------------------------------------

--------------------------------------
| approxkl           | 0.01472359    |
| clipfrac           | 0.11849998    |
| explained_variance | 0.112         |
| fps                | 19            |
| n_updates          | 577           |
| policy_entropy     | 0.9582472     |
| policy_loss        | -0.0042027053 |
| serial_timesteps   | 57700         |
| time_elapsed       | 1.4e+04       |
| total_timesteps    | 288500        |
| value_loss         | 0.20557562    |
--------------------------------------
-------------------------------------
| approxkl           | 0.008753966  |
| clipfrac           | 0.096999995  |
| explained_variance | -0.0221      |
| fps                | 18           |
| n_updates          | 578          |
| policy_entropy     | 0.89594      |
| policy_loss        | -0.010323724 |
| serial_timesteps   | 57800        |
| time_elapsed       | 1.41e+04     |
| total_timesteps    | 289000       |
| value_loss         | 0.059654813  |
-------------------------------------

-------------------------------------
| approxkl           | 0.0043728766 |
| clipfrac           | 0.038999997  |
| explained_variance | -0.0674      |
| fps                | 18           |
| n_updates          | 594          |
| policy_entropy     | 0.6742344    |
| policy_loss        | -0.006292475 |
| serial_timesteps   | 59400        |
| time_elapsed       | 1.45e+04     |
| total_timesteps    | 297000       |
| value_loss         | 0.8692444    |
-------------------------------------
--------------------------------------
| approxkl           | 0.013372712   |
| clipfrac           | 0.11649998    |
| explained_variance | 0.0232        |
| fps                | 19            |
| n_updates          | 595           |
| policy_entropy     | 0.72248685    |
| policy_loss        | -0.0063251844 |
| serial_timesteps   | 59500         |
| time_elapsed       | 1.45e+04      |
| total_timesteps    | 297500        |
| value_loss         | 0.79574513    |
--------------------------------------

-------------------------------------
| approxkl           | 0.008841174  |
| clipfrac           | 0.07399999   |
| explained_variance | -0.0551      |
| fps                | 24           |
| n_updates          | 611          |
| policy_entropy     | 0.57252777   |
| policy_loss        | -0.004041128 |
| serial_timesteps   | 61100        |
| time_elapsed       | 1.49e+04     |
| total_timesteps    | 305500       |
| value_loss         | 0.22768378   |
-------------------------------------
-------------------------------------
| approxkl           | 0.004578703  |
| clipfrac           | 0.067999996  |
| explained_variance | 0.498        |
| fps                | 18           |
| n_updates          | 612          |
| policy_entropy     | 0.60181296   |
| policy_loss        | -0.006773678 |
| serial_timesteps   | 61200        |
| time_elapsed       | 1.49e+04     |
| total_timesteps    | 306000       |
| value_loss         | 0.10980289   |
-------------------------------------
------------

--------------------------------------
| approxkl           | 0.013118188   |
| clipfrac           | 0.0895        |
| explained_variance | 0.266         |
| fps                | 18            |
| n_updates          | 628           |
| policy_entropy     | 0.73097134    |
| policy_loss        | -0.0038040045 |
| serial_timesteps   | 62800         |
| time_elapsed       | 1.54e+04      |
| total_timesteps    | 314000        |
| value_loss         | 1.0451138     |
--------------------------------------
-------------------------------------
| approxkl           | 0.003975869  |
| clipfrac           | 0.023        |
| explained_variance | 0.804        |
| fps                | 18           |
| n_updates          | 629          |
| policy_entropy     | 0.6140913    |
| policy_loss        | -0.004672522 |
| serial_timesteps   | 62900        |
| time_elapsed       | 1.54e+04     |
| total_timesteps    | 314500       |
| value_loss         | 0.039062746  |
-------------------------------------

-------------------------------------
| approxkl           | 0.007996492  |
| clipfrac           | 0.051999997  |
| explained_variance | 0.302        |
| fps                | 19           |
| n_updates          | 645          |
| policy_entropy     | 0.66026014   |
| policy_loss        | -0.010460887 |
| serial_timesteps   | 64500        |
| time_elapsed       | 1.58e+04     |
| total_timesteps    | 322500       |
| value_loss         | 0.050637245  |
-------------------------------------
-------------------------------------
| approxkl           | 0.03155849   |
| clipfrac           | 0.0785       |
| explained_variance | -0.0887      |
| fps                | 17           |
| n_updates          | 646          |
| policy_entropy     | 0.7579549    |
| policy_loss        | -0.010631366 |
| serial_timesteps   | 64600        |
| time_elapsed       | 1.58e+04     |
| total_timesteps    | 323000       |
| value_loss         | 0.17399979   |
-------------------------------------
------------

-------------------------------------
| approxkl           | 0.006792707  |
| clipfrac           | 0.053000003  |
| explained_variance | -0.197       |
| fps                | 17           |
| n_updates          | 662          |
| policy_entropy     | 0.65762174   |
| policy_loss        | -0.007662059 |
| serial_timesteps   | 66200        |
| time_elapsed       | 1.62e+04     |
| total_timesteps    | 331000       |
| value_loss         | 0.10417414   |
-------------------------------------
-------------------------------------
| approxkl           | 0.0021056568 |
| clipfrac           | 0.0205       |
| explained_variance | 0.0373       |
| fps                | 19           |
| n_updates          | 663          |
| policy_entropy     | 0.65356183   |
| policy_loss        | -0.005068233 |
| serial_timesteps   | 66300        |
| time_elapsed       | 1.62e+04     |
| total_timesteps    | 331500       |
| value_loss         | 0.9606916    |
-------------------------------------
------------

-------------------------------------
| approxkl           | 0.005684143  |
| clipfrac           | 0.054500002  |
| explained_variance | 0.158        |
| fps                | 19           |
| n_updates          | 679          |
| policy_entropy     | 0.42087692   |
| policy_loss        | -0.008756387 |
| serial_timesteps   | 67900        |
| time_elapsed       | 1.66e+04     |
| total_timesteps    | 339500       |
| value_loss         | 0.3078845    |
-------------------------------------
Saving model
--------------------------------------
| approxkl           | 0.02133676    |
| clipfrac           | 0.045499995   |
| explained_variance | 0.297         |
| fps                | 18            |
| n_updates          | 680           |
| policy_entropy     | 0.51681054    |
| policy_loss        | -0.0026660468 |
| serial_timesteps   | 68000         |
| time_elapsed       | 1.67e+04      |
| total_timesteps    | 340000        |
| value_loss         | 0.2628602     |
-------------------------

--------------------------------------
| approxkl           | 0.027464833   |
| clipfrac           | 0.049499996   |
| explained_variance | 0.0145        |
| fps                | 18            |
| n_updates          | 696           |
| policy_entropy     | 0.56706196    |
| policy_loss        | -0.0028983238 |
| serial_timesteps   | 69600         |
| time_elapsed       | 1.71e+04      |
| total_timesteps    | 348000        |
| value_loss         | 0.36173686    |
--------------------------------------
-------------------------------------
| approxkl           | 0.020885777  |
| clipfrac           | 0.20650001   |
| explained_variance | 0.385        |
| fps                | 19           |
| n_updates          | 697          |
| policy_entropy     | 0.59157574   |
| policy_loss        | -0.011117291 |
| serial_timesteps   | 69700        |
| time_elapsed       | 1.71e+04     |
| total_timesteps    | 348500       |
| value_loss         | 0.012750288  |
-------------------------------------

-------------------------------------
| approxkl           | 0.006266517  |
| clipfrac           | 0.06649999   |
| explained_variance | 0.344        |
| fps                | 19           |
| n_updates          | 713          |
| policy_entropy     | 0.8567851    |
| policy_loss        | -0.008332621 |
| serial_timesteps   | 71300        |
| time_elapsed       | 1.75e+04     |
| total_timesteps    | 356500       |
| value_loss         | 0.010305479  |
-------------------------------------
-------------------------------------
| approxkl           | 0.011683598  |
| clipfrac           | 0.1295       |
| explained_variance | 0.0339       |
| fps                | 18           |
| n_updates          | 714          |
| policy_entropy     | 0.86053735   |
| policy_loss        | -0.007631832 |
| serial_timesteps   | 71400        |
| time_elapsed       | 1.75e+04     |
| total_timesteps    | 357000       |
| value_loss         | 0.13460766   |
-------------------------------------
------------

Saving model
-------------------------------------
| approxkl           | 0.00456138   |
| clipfrac           | 0.0455       |
| explained_variance | -0.0279      |
| fps                | 18           |
| n_updates          | 730          |
| policy_entropy     | 0.7620359    |
| policy_loss        | -0.007928001 |
| serial_timesteps   | 73000        |
| time_elapsed       | 1.79e+04     |
| total_timesteps    | 365000       |
| value_loss         | 0.23339963   |
-------------------------------------
--------------------------------------
| approxkl           | 0.061421882   |
| clipfrac           | 0.062         |
| explained_variance | 0.103         |
| fps                | 19            |
| n_updates          | 731           |
| policy_entropy     | 0.6544247     |
| policy_loss        | -0.0017254989 |
| serial_timesteps   | 73100         |
| time_elapsed       | 1.8e+04       |
| total_timesteps    | 365500        |
| value_loss         | 0.28870746    |
-------------------------

--------------------------------------
| approxkl           | 0.022822622   |
| clipfrac           | 0.103999995   |
| explained_variance | 0.0535        |
| fps                | 20            |
| n_updates          | 747           |
| policy_entropy     | 0.8061773     |
| policy_loss        | -0.0056020166 |
| serial_timesteps   | 74700         |
| time_elapsed       | 1.84e+04      |
| total_timesteps    | 373500        |
| value_loss         | 0.60192144    |
--------------------------------------
--------------------------------------
| approxkl           | 0.030930256   |
| clipfrac           | 0.10700001    |
| explained_variance | 0.819         |
| fps                | 18            |
| n_updates          | 748           |
| policy_entropy     | 0.8372632     |
| policy_loss        | -0.0074078925 |
| serial_timesteps   | 74800         |
| time_elapsed       | 1.84e+04      |
| total_timesteps    | 374000        |
| value_loss         | 0.009281615   |
-------------------------

-------------------------------------
| approxkl           | 0.010965602  |
| clipfrac           | 0.1635       |
| explained_variance | 0.188        |
| fps                | 19           |
| n_updates          | 764          |
| policy_entropy     | 0.8589385    |
| policy_loss        | -0.008483929 |
| serial_timesteps   | 76400        |
| time_elapsed       | 1.88e+04     |
| total_timesteps    | 382000       |
| value_loss         | 0.28462452   |
-------------------------------------
-------------------------------------
| approxkl           | 0.0071701417 |
| clipfrac           | 0.070999995  |
| explained_variance | 0.297        |
| fps                | 20           |
| n_updates          | 765          |
| policy_entropy     | 0.679472     |
| policy_loss        | -0.005830643 |
| serial_timesteps   | 76500        |
| time_elapsed       | 1.88e+04     |
| total_timesteps    | 382500       |
| value_loss         | 1.3428513    |
-------------------------------------
------------

-------------------------------------
| approxkl           | 0.039519414  |
| clipfrac           | 0.103        |
| explained_variance | 0.374        |
| fps                | 20           |
| n_updates          | 781          |
| policy_entropy     | 0.7148165    |
| policy_loss        | -0.002762323 |
| serial_timesteps   | 78100        |
| time_elapsed       | 1.92e+04     |
| total_timesteps    | 390500       |
| value_loss         | 0.092090875  |
-------------------------------------
-------------------------------------
| approxkl           | 0.027548686  |
| clipfrac           | 0.1295       |
| explained_variance | 0.455        |
| fps                | 19           |
| n_updates          | 782          |
| policy_entropy     | 0.724751     |
| policy_loss        | -0.006439267 |
| serial_timesteps   | 78200        |
| time_elapsed       | 1.93e+04     |
| total_timesteps    | 391000       |
| value_loss         | 0.1361318    |
-------------------------------------
------------

--------------------------------------
| approxkl           | 0.013035039   |
| clipfrac           | 0.10649999    |
| explained_variance | 0.147         |
| fps                | 19            |
| n_updates          | 798           |
| policy_entropy     | 0.66978776    |
| policy_loss        | -0.0061303815 |
| serial_timesteps   | 79800         |
| time_elapsed       | 1.96e+04      |
| total_timesteps    | 399000        |
| value_loss         | 0.43842635    |
--------------------------------------
-------------------------------------
| approxkl           | 0.017964939  |
| clipfrac           | 0.21400002   |
| explained_variance | 0.173        |
| fps                | 19           |
| n_updates          | 799          |
| policy_entropy     | 0.6707222    |
| policy_loss        | -0.017985191 |
| serial_timesteps   | 79900        |
| time_elapsed       | 1.97e+04     |
| total_timesteps    | 399500       |
| value_loss         | 0.14866078   |
-------------------------------------

--------------------------------------
| approxkl           | 0.0024065252  |
| clipfrac           | 0.029499998   |
| explained_variance | 0.312         |
| fps                | 19            |
| n_updates          | 815           |
| policy_entropy     | 0.583508      |
| policy_loss        | -0.0038011007 |
| serial_timesteps   | 81500         |
| time_elapsed       | 2.01e+04      |
| total_timesteps    | 407500        |
| value_loss         | 0.12565738    |
--------------------------------------
-------------------------------------
| approxkl           | 0.0076291123 |
| clipfrac           | 0.086500004  |
| explained_variance | 0.323        |
| fps                | 20           |
| n_updates          | 816          |
| policy_entropy     | 0.5449096    |
| policy_loss        | -0.013290852 |
| serial_timesteps   | 81600        |
| time_elapsed       | 2.01e+04     |
| total_timesteps    | 408000       |
| value_loss         | 0.17521222   |
-------------------------------------

--------------------------------------
| approxkl           | 0.0047215084  |
| clipfrac           | 0.052999996   |
| explained_variance | 0.546         |
| fps                | 20            |
| n_updates          | 832           |
| policy_entropy     | 0.54422176    |
| policy_loss        | -0.0045846165 |
| serial_timesteps   | 83200         |
| time_elapsed       | 2.05e+04      |
| total_timesteps    | 416000        |
| value_loss         | 0.8966137     |
--------------------------------------
-------------------------------------
| approxkl           | 0.005468254  |
| clipfrac           | 0.070999995  |
| explained_variance | 0.922        |
| fps                | 23           |
| n_updates          | 833          |
| policy_entropy     | 0.5107771    |
| policy_loss        | -0.004268557 |
| serial_timesteps   | 83300        |
| time_elapsed       | 2.05e+04     |
| total_timesteps    | 416500       |
| value_loss         | 0.022172024  |
-------------------------------------

--------------------------------------
| approxkl           | 0.0036291871  |
| clipfrac           | 0.033999994   |
| explained_variance | 0.628         |
| fps                | 27            |
| n_updates          | 849           |
| policy_entropy     | 0.23961917    |
| policy_loss        | -0.0033881045 |
| serial_timesteps   | 84900         |
| time_elapsed       | 2.09e+04      |
| total_timesteps    | 424500        |
| value_loss         | 0.029304426   |
--------------------------------------
Saving model
-------------------------------------
| approxkl           | 0.0016330795 |
| clipfrac           | 0.021000002  |
| explained_variance | 0.939        |
| fps                | 21           |
| n_updates          | 850          |
| policy_entropy     | 0.18226604   |
| policy_loss        | -0.002646619 |
| serial_timesteps   | 85000        |
| time_elapsed       | 2.09e+04     |
| total_timesteps    | 425000       |
| value_loss         | 0.0025431125 |
------------------------

--------------------------------------
| approxkl           | 0.0011044845  |
| clipfrac           | 0.01          |
| explained_variance | 0.744         |
| fps                | 20            |
| n_updates          | 866           |
| policy_entropy     | 0.1638222     |
| policy_loss        | -0.0030224286 |
| serial_timesteps   | 86600         |
| time_elapsed       | 2.13e+04      |
| total_timesteps    | 433000        |
| value_loss         | 0.032250904   |
--------------------------------------
--------------------------------------
| approxkl           | 0.0044616316  |
| clipfrac           | 0.047999997   |
| explained_variance | 0.435         |
| fps                | 19            |
| n_updates          | 867           |
| policy_entropy     | 0.31474066    |
| policy_loss        | -0.0049971687 |
| serial_timesteps   | 86700         |
| time_elapsed       | 2.13e+04      |
| total_timesteps    | 433500        |
| value_loss         | 0.09700108    |
-------------------------

-------------------------------------
| approxkl           | 0.001674328  |
| clipfrac           | 0.019        |
| explained_variance | 0.276        |
| fps                | 20           |
| n_updates          | 883          |
| policy_entropy     | 0.28643584   |
| policy_loss        | -0.004526796 |
| serial_timesteps   | 88300        |
| time_elapsed       | 2.17e+04     |
| total_timesteps    | 441500       |
| value_loss         | 0.12623906   |
-------------------------------------
-------------------------------------
| approxkl           | 0.0014352255 |
| clipfrac           | 0.0205       |
| explained_variance | 0.661        |
| fps                | 25           |
| n_updates          | 884          |
| policy_entropy     | 0.3316875    |
| policy_loss        | -0.00232047  |
| serial_timesteps   | 88400        |
| time_elapsed       | 2.17e+04     |
| total_timesteps    | 442000       |
| value_loss         | 0.02866436   |
-------------------------------------
------------

Saving model
--------------------------------------
| approxkl           | 0.0070994617  |
| clipfrac           | 0.088999994   |
| explained_variance | 0.547         |
| fps                | 22            |
| n_updates          | 900           |
| policy_entropy     | 0.6838742     |
| policy_loss        | -0.0056820028 |
| serial_timesteps   | 90000         |
| time_elapsed       | 2.21e+04      |
| total_timesteps    | 450000        |
| value_loss         | 0.29622984    |
--------------------------------------
--------------------------------------
| approxkl           | 0.004019804   |
| clipfrac           | 0.048499998   |
| explained_variance | 0.538         |
| fps                | 20            |
| n_updates          | 901           |
| policy_entropy     | 0.5045891     |
| policy_loss        | -0.0019802828 |
| serial_timesteps   | 90100         |
| time_elapsed       | 2.21e+04      |
| total_timesteps    | 450500        |
| value_loss         | 0.26145914    |
------------

--------------------------------------
| approxkl           | 0.0018534018  |
| clipfrac           | 0.026999999   |
| explained_variance | 0.693         |
| fps                | 19            |
| n_updates          | 917           |
| policy_entropy     | 0.28530756    |
| policy_loss        | -0.0040391935 |
| serial_timesteps   | 91700         |
| time_elapsed       | 2.25e+04      |
| total_timesteps    | 458500        |
| value_loss         | 0.06769319    |
--------------------------------------
-------------------------------------
| approxkl           | 0.0026286137 |
| clipfrac           | 0.02         |
| explained_variance | 0.967        |
| fps                | 19           |
| n_updates          | 918          |
| policy_entropy     | 0.33134925   |
| policy_loss        | -0.005569921 |
| serial_timesteps   | 91800        |
| time_elapsed       | 2.25e+04     |
| total_timesteps    | 459000       |
| value_loss         | 0.011103304  |
-------------------------------------

--------------------------------------
| approxkl           | 0.0010499273  |
| clipfrac           | 0.01          |
| explained_variance | 0.965         |
| fps                | 25            |
| n_updates          | 934           |
| policy_entropy     | 0.17051837    |
| policy_loss        | -0.0023886126 |
| serial_timesteps   | 93400         |
| time_elapsed       | 2.29e+04      |
| total_timesteps    | 467000        |
| value_loss         | 0.0012784199  |
--------------------------------------
--------------------------------------
| approxkl           | 0.010029952   |
| clipfrac           | 0.094         |
| explained_variance | 0.456         |
| fps                | 22            |
| n_updates          | 935           |
| policy_entropy     | 0.5411286     |
| policy_loss        | -0.0049098767 |
| serial_timesteps   | 93500         |
| time_elapsed       | 2.29e+04      |
| total_timesteps    | 467500        |
| value_loss         | 0.50176615    |
-------------------------

--------------------------------------
| approxkl           | 0.023682943   |
| clipfrac           | 0.083         |
| explained_variance | 0.097         |
| fps                | 20            |
| n_updates          | 951           |
| policy_entropy     | 0.34489673    |
| policy_loss        | -0.0065740338 |
| serial_timesteps   | 95100         |
| time_elapsed       | 2.33e+04      |
| total_timesteps    | 475500        |
| value_loss         | 0.08176492    |
--------------------------------------
-------------------------------------
| approxkl           | 0.0021764874 |
| clipfrac           | 0.0215       |
| explained_variance | 0.735        |
| fps                | 22           |
| n_updates          | 952          |
| policy_entropy     | 0.20321998   |
| policy_loss        | -0.005298433 |
| serial_timesteps   | 95200        |
| time_elapsed       | 2.33e+04     |
| total_timesteps    | 476000       |
| value_loss         | 0.0016041184 |
-------------------------------------

--------------------------------------
| approxkl           | 0.007992472   |
| clipfrac           | 0.085999995   |
| explained_variance | 0.155         |
| fps                | 18            |
| n_updates          | 968           |
| policy_entropy     | 0.71762145    |
| policy_loss        | -0.0062879412 |
| serial_timesteps   | 96800         |
| time_elapsed       | 2.37e+04      |
| total_timesteps    | 484000        |
| value_loss         | 0.55290496    |
--------------------------------------
-------------------------------------
| approxkl           | 0.004245664  |
| clipfrac           | 0.058499992  |
| explained_variance | -0.0836      |
| fps                | 20           |
| n_updates          | 969          |
| policy_entropy     | 0.46703792   |
| policy_loss        | -0.003601852 |
| serial_timesteps   | 96900        |
| time_elapsed       | 2.37e+04     |
| total_timesteps    | 484500       |
| value_loss         | 0.13017297   |
-------------------------------------

--------------------------------------
| approxkl           | 0.0008786831  |
| clipfrac           | 0.013499999   |
| explained_variance | 0.439         |
| fps                | 28            |
| n_updates          | 985           |
| policy_entropy     | 0.21047473    |
| policy_loss        | -0.0027629987 |
| serial_timesteps   | 98500         |
| time_elapsed       | 2.41e+04      |
| total_timesteps    | 492500        |
| value_loss         | 0.028892815   |
--------------------------------------
--------------------------------------
| approxkl           | 0.0011054     |
| clipfrac           | 0.013499999   |
| explained_variance | 0.0516        |
| fps                | 20            |
| n_updates          | 986           |
| policy_entropy     | 0.2664786     |
| policy_loss        | -0.0021086163 |
| serial_timesteps   | 98600         |
| time_elapsed       | 2.41e+04      |
| total_timesteps    | 493000        |
| value_loss         | 0.42919302    |
-------------------------

KeyboardInterrupt: 

In [None]:
model.save(str(log_dir) +"/final_model.pkl")

# Visualize Results

__Note__: Stable-Baselines requires that policy input dimensions be consistent across training and testing. Thus, the number of environments used for visualization must be a multiple of the number of environments used for training. The observation vector is then appropriately duplicated during inference. 

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

#### Load model

In [None]:
MODEL_WEIGHTS_PATH = "results/goseek-ppo-realattention-forwardreward-50fartargets/final_model.pkl"
assert MODEL_WEIGHTS_PATH, f"Must give a model weights path!"

model = PPO2.load(str(MODEL_WEIGHTS_PATH))
n_train_envs = model.act_model.initial_state.shape[0]

#### Visualize all observed images

In [None]:
obs = env.reset()
img_shape = (-1, 240, 320, 5)
imgs = np.reshape(obs[:, :-3], img_shape)[..., -2:]
print(imgs.shape)
rgb, segmentation, depth, pose = decode_observations(obs)
lstm_state = None

print(pose)

assert (
    n_train_envs % obs.shape[0] == 0
), f"The number of visualization environments must be a multiple of the training environments"

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].imshow(rgb[0])
ax[1].imshow(segmentation[0])
ax[2].imshow(depth[0])

print(rgb[0].shape)
print(segmentation[0].shape)
print(depth[0].shape)

#### Run an episode and plot the first person agent view

In [None]:
import cv2

# TODO:
# - check that the state is correct (segmentation and values of classes)
# - check the robot pose values
done = False
fig, ax = plt.subplots(1, obs.shape[0])
ax = [ax] if obs.shape[0] == 1 else ax

for i in range(episode_length):
    actions, lstm_state = model.predict(
        np.concatenate((n_train_envs // obs.shape[0]) * [obs]),
        state=lstm_state,
        deterministic=False
    )

    actions = actions[: obs.shape[0]]

    obs, reward, done, _ = env.step(actions)
#     print(actions, done, reward)

    plt.cla()
    rgb, segmentation, depth, pose = decode_observations(obs)
    

    for i in range(obs.shape[0]):
        print(reward)
#         print(segmentation[i][:,10])
        print(np.max(depth[i]), np.min(depth[i]))
        ax[i].imshow(cv2.resize(depth[i], dsize=(40,40), interpolation=cv2.INTER_NEAREST), vmin=0., vmax=1.)
#         ax[i].imshow(depth[i], vmin=0., vmax=1.)

    
    fig.canvas.draw()

obs = env.reset()
rgb, segmentation, depth, pose = decode_observations(obs)
lstm_state = None