# PPO in Stable Baselines

In single-agent PPO, `MlpPolicy` was used in `PPO1` as follows:

```
model = PPO1(MlpPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

```

`MlpPolicy` is found in `stable_baselines/common/policies.py`, inheriting `FeedForwardPolicy`, which inherits from `ActorCriticPolicy`.

In `FeedForwardPolicy`'s `__init__`, there contains the following:
```
if net_arch is None:
    if layers is None:
        layers = [64, 64]
    net_arch = [dict(vf=layers, pi=layers)]

with tf.variable_scope("model", reuse=reuse):
    if feature_extraction == "cnn":
        pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
    else:
        pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

    self._value_fn = linear(vf_latent, 'vf', 1)

    self._proba_distribution, self._policy, self.q_value = \
        self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)
```

Since `MlpPolicy` uses `feature_extraction="mlp"`, look into `mlp_extractor` (here)[https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/policies.py].

`mlp_extractor` constructs a MLP that receive observations as input and outputs a latent representation for the policy and a value network. Amount and size of hidden layers and how many shared between policy and value network can be spcified using `net_arch`.

In `mlp_extractor`, it iterates through `net_arch` and creates layers, specifically using `latent = act_fun(linear(latent, ...))`. Therfore, look into `act_fun` and `linear`, which belongs in stable_baselines.common.tf_layers.

`FeedForwardPolicy`'s default for `act_fun` is `tf.tanh`. Linear contains:

```
def linear(input_tensor, scope, n_hidden, *, init_scale=1.0, init_bias=0.0):
    """
    Creates a fully connected layer for TensorFlow
    :param input_tensor: (TensorFlow Tensor) The input tensor for the fully connected layer
    :param scope: (str) The TensorFlow variable scope
    :param n_hidden: (int) The number of hidden neurons
    :param init_scale: (int) The initialization scale
    :param init_bias: (int) The initialization offset bias
    :return: (TensorFlow Tensor) fully connected layer
    """
    with tf.variable_scope(scope):
        n_input = input_tensor.get_shape()[1].value
        weight = tf.get_variable("w", [n_input, n_hidden], initializer=ortho_init(init_scale))
        bias = tf.get_variable("b", [n_hidden], initializer=tf.constant_initializer(init_bias))
        return tf.matmul(input_tensor, weight) + bias
```

Therefore, to transform this model into a Bayesian neural network, the linear layer needs to be changed into DenseVariational instead of a linear layer. We can do this by modifying `FeedForwardPolicy` (which `MlpPolicy` inherits) with a new `bnn_extractor`, then creating a `BnnPolicy` to replace `MlpPolicy`.

In [44]:
from tensorflow.keras import backend as K
from tensorflow.keras import activations, initializers
from tensorflow.keras.layers import Layer

import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfp.__version__

'0.8.0'

In [45]:
def bnn_extractor(flat_observations, net_arch, act_fun):
    """
    Constructs an variational layer that receives observations as an input and outputs a latent representation for the policy and
    a value network. The ``net_arch`` parameter allows to specify the amount and size of the hidden layers and how many
    of them are shared between the policy network and the value network. It is assumed to be a list with the following
    structure:
    1. An arbitrary length (zero allowed) number of integers each specifying the number of units in a shared layer.
       If the number of ints is zero, there will be no shared layers.
    2. An optional dict, to specify the following non-shared layers for the value network and the policy network.
       It is formatted like ``dict(vf=[<value layer sizes>], pi=[<policy layer sizes>])``.
       If it is missing any of the keys (pi or vf), no non-shared layers (empty list) is assumed.
    For example to construct a network with one shared layer of size 55 followed by two non-shared layers for the value
    network of size 255 and a single non-shared layer of size 128 for the policy network, the following layers_spec
    would be used: ``[55, dict(vf=[255, 255], pi=[128])]``. A simple shared network topology with two layers of size 128
    would be specified as [128, 128].
    :param flat_observations: (tf.Tensor) The observations to base policy and value function on.
    :param net_arch: ([int or dict]) The specification of the policy and value networks.
        See above for details on its formatting.
    :param act_fun: (tf function) The activation function to use for the networks.
    :return: (tf.Tensor, tf.Tensor) latent_policy, latent_value of the specified network.
        If all layers are shared, then ``latent_policy == latent_value``
    """
    latent = flat_observations
    policy_only_layers = []  # Layer sizes of the network that only belongs to the policy network
    value_only_layers = []  # Layer sizes of the network that only belongs to the value network
    kernel_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p)

    # Iterate through the shared layers and build the shared parts of the network
    for idx, layer in enumerate(net_arch):
        if isinstance(layer, int):  # Check that this is a shared layer
            layer_size = layer
#             latent = act_fun(linear(latent, "shared_fc{}".format(idx), layer_size, init_scale=np.sqrt(2)))
            latent = act_fun(tfp.layers.DenseFlipout(layer_size, name="shared_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))
        else:
            assert isinstance(layer, dict), "Error: the net_arch list can only contain ints and dicts"
            if 'pi' in layer:
                assert isinstance(layer['pi'], list), "Error: net_arch[-1]['pi'] must contain a list of integers."
                policy_only_layers = layer['pi']

            if 'vf' in layer:
                assert isinstance(layer['vf'], list), "Error: net_arch[-1]['vf'] must contain a list of integers."
                value_only_layers = layer['vf']
            break  # From here on the network splits up in policy and value network

    # Build the non-shared part of the network
    latent_policy = latent
    latent_value = latent
    for idx, (pi_layer_size, vf_layer_size) in enumerate(zip_longest(policy_only_layers, value_only_layers)):
        if pi_layer_size is not None:
            assert isinstance(pi_layer_size, int), "Error: net_arch[-1]['pi'] must only contain integers."
#             latent_policy = act_fun(linear(latent_policy, "pi_fc{}".format(idx), pi_layer_size, init_scale=np.sqrt(2)))
            latent_policy = act_fun(tfp.layers.DenseFlipout(pi_layer_size, name="pi_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))

        if vf_layer_size is not None:
            assert isinstance(vf_layer_size, int), "Error: net_arch[-1]['vf'] must only contain integers."
#             latent_value = act_fun(linear(latent_value, "vf_fc{}".format(idx), vf_layer_size, init_scale=np.sqrt(2)))
            latent_value = act_fun(tfp.layers.DenseFlipout(vf_layer_size, name="vf_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))

    return latent_policy, latent_value

In [46]:
from stable_baselines.common.policies import ActorCriticPolicy, nature_cnn

class FeedForwardPolicy(ActorCriticPolicy):
    """
    Policy object that implements actor critic, using a feed forward neural network.
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param layers: ([int]) (deprecated, use net_arch instead) The size of the Neural network for the policy
        (if None, default to [64, 64])
    :param net_arch: (list) Specification of the actor-critic policy network architecture (see mlp_extractor
        documentation for details).
    :param act_fun: (tf.func) the activation function to use in the neural network.
    :param cnn_extractor: (function (TensorFlow Tensor, ``**kwargs``): (TensorFlow Tensor)) the CNN feature extraction
    :param feature_extraction: (str) The feature extraction type ("cnn" or "mlp")
    :param kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, layers=None, net_arch=None,
                 act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
        super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
                                                scale=(feature_extraction == "cnn"))

        self._kwargs_check(feature_extraction, kwargs)

        if layers is not None:
            warnings.warn("Usage of the `layers` parameter is deprecated! Use net_arch instead "
                          "(it has a different semantics though).", DeprecationWarning)
            if net_arch is not None:
                warnings.warn("The new `net_arch` parameter overrides the deprecated `layers` parameter!",
                              DeprecationWarning)

        if net_arch is None:
            if layers is None:
                layers = [64, 64]
            net_arch = [dict(vf=layers, pi=layers)]

        with tf.variable_scope("model", reuse=reuse):
            if feature_extractifon == "cnn":
                pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
            elif feature_extraction == "bnn":
                pi_latent, vf_latent = bnn_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)
            else:
                pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

            self._value_fn = linear(vf_latent, 'vf', 1)

            self._proba_distribution, self._policy, self.q_value = \
                self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)

        self._setup_init()

    def step(self, obs, state=None, mask=None, deterministic=False):
        if deterministic:
            action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        else:
            action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        return action, value, self.initial_state, neglogp

    def proba_step(self, obs, state=None, mask=None):
        return self.sess.run(self.policy_proba, {self.obs_ph: obs})

    def value(self, obs, state=None, mask=None):
        return self.sess.run(self.value_flat, {self.obs_ph: obs})

In [47]:
import warnings
from itertools import zip_longest
from abc import ABC, abstractmethod

import numpy as np
import tensorflow as tf
from gym.spaces import Discrete

from stable_baselines.common.tf_util import batch_to_seq, seq_to_batch
from stable_baselines.common.tf_layers import conv, linear, conv_to_fc, lstm
from stable_baselines.common.distributions import make_proba_dist_type, CategoricalProbabilityDistribution, \
    MultiCategoricalProbabilityDistribution, DiagGaussianProbabilityDistribution, BernoulliProbabilityDistribution
from stable_baselines.common.input import observation_input
from stable_baselines.common.policies import nature_cnn

In [48]:
class BnnPolicy(FeedForwardPolicy):
    """
    Policy object that implements actor critic, using a Bayesian neural net (2 layers of 64)
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param _kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
        super(BnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
                                        feature_extraction="bnn", **_kwargs)

# Single-Agent PPO with BNN

In [49]:
#!/usr/bin/env python3

# Train single CPU PPO1 on slimevolley.
# Should solve it (beat existing AI on average over 1000 trials) in 3 hours on single CPU, within 3M steps.

import os
import gym
import slimevolleygym
from slimevolleygym import SurvivalRewardEnv

from stable_baselines.ppo1 import PPO1
from stable_baselines.common.policies import MlpPolicy
from stable_baselines import logger
from stable_baselines.common.callbacks import EvalCallback

NUM_TIMESTEPS = int(5e6)
SEED = 721
EVAL_FREQ = 250000
EVAL_EPISODES = 10  # was 1000
LOGDIR = "bnn_ppo1" # moved to zoo afterwards.

logger.configure(folder=LOGDIR)

env = gym.make("SlimeVolley-v0")
env.seed(SEED)

Logging to bnn_ppo1


[721]

In [50]:
# take mujoco hyperparams (but doubled timesteps_per_actorbatch to cover more steps.)
model = PPO1(BnnPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

eval_callback = EvalCallback(env, best_model_save_path=LOGDIR, log_path=LOGDIR, eval_freq=EVAL_FREQ, n_eval_episodes=EVAL_EPISODES)

model.learn(total_timesteps=NUM_TIMESTEPS, callback=eval_callback)

model.save(os.path.join(LOGDIR, "final_model")) # probably never get to this point.

env.close()

********** Iteration 0 ************


  "{} != {}".format(self.training_env, self.eval_env))


Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     7.84e-05 |       0.00000 |       0.04411 |      1.73e-05 |       2.07942
     -0.00062 |       0.00000 |       0.03292 |       0.00029 |       2.07914
     -0.00160 |       0.00000 |       0.02951 |       0.00108 |       2.07834
     -0.00268 |       0.00000 |       0.02699 |       0.00231 |       2.07711
     -0.00385 |       0.00000 |       0.02678 |       0.00383 |       2.07558
     -0.00404 |       0.00000 |       0.02534 |       0.00593 |       2.07349
     -0.00434 |       0.00000 |       0.02458 |       0.00717 |       2.07226
     -0.00456 |       0.00000 |       0.02396 |       0.00804 |       2.07140
     -0.00506 |       0.00000 |       0.02358 |       0.00846 |       2.07099
     -0.00584 |       0.00000 |       0.02300 |       0.00945 |       2.07000
Evaluating losses...
     -0.00610 |       0.00000 |       0.02276 |       0.01018 |       2.06928
-----------------------------

********** Iteration 11 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -5.77e-05 |       0.00000 |       0.03078 |       0.00104 |       1.99237
     -0.00254 |       0.00000 |       0.02879 |       0.00186 |       1.98336
     -0.00310 |       0.00000 |       0.02836 |       0.00368 |       1.97321
     -0.00386 |       0.00000 |       0.02837 |       0.00452 |       1.96979
     -0.00529 |       0.00000 |       0.02705 |       0.00472 |       1.97222
     -0.00484 |       0.00000 |       0.02707 |       0.00474 |       1.96891
     -0.00565 |       0.00000 |       0.02647 |       0.00553 |       1.96812
     -0.00596 |       0.00000 |       0.02634 |       0.00537 |       1.96939
     -0.00653 |       0.00000 |       0.02597 |       0.00599 |       1.97178
     -0.00703 |       0.00000 |       0.02573 |       0.00636 |       1.97357
Evaluating losses...
     -0.00783 |       0.00000 |       0.02531 |       0.00617 |       

     -0.00910 |       0.00000 |       0.01785 |       0.00613 |       1.91403
     -0.00937 |       0.00000 |       0.01788 |       0.00595 |       1.91431
Evaluating losses...
     -0.00894 |       0.00000 |       0.01790 |       0.00573 |       1.91716
----------------------------------
| EpLenMean       | 636          |
| EpRewMean       | -4.8         |
| EpThisIter      | 6            |
| EpisodesSoFar   | 110          |
| TimeElapsed     | 78.1         |
| TimestepsSoFar  | 69632        |
| ev_tdlam_before | 0.809        |
| loss_ent        | 1.9171613    |
| loss_kl         | 0.0057316576 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.008941463 |
| loss_vf_loss    | 0.017901668  |
----------------------------------
********** Iteration 17 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00057 |       0.00000 |       0.01882 |       0.00151 |       1.91192
     -0.00119 |       0.00000 |       0.01809 | 

********** Iteration 22 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     5.54e-06 |       0.00000 |       0.01664 |       0.00165 |       1.85567
     -0.00161 |       0.00000 |       0.01589 |       0.00253 |       1.86750
     -0.00100 |       0.00000 |       0.01545 |       0.00346 |       1.87287
     -0.00461 |       0.00000 |       0.01566 |       0.00397 |       1.87390
     -0.00481 |       0.00000 |       0.01535 |       0.00383 |       1.87255
     -0.00329 |       0.00000 |       0.01511 |       0.00502 |       1.87527
     -0.00538 |       0.00000 |       0.01521 |       0.00496 |       1.87157
     -0.00494 |       0.00000 |       0.01505 |       0.00497 |       1.86742
     -0.00595 |       0.00000 |       0.01517 |       0.00529 |       1.87275
     -0.00675 |       0.00000 |       0.01484 |       0.00564 |       1.86767
Evaluating losses...
     -0.00733 |       0.00000 |       0.01484 |       0.00510 |       

     -0.00283 |       0.00000 |       0.01419 |       0.00412 |       1.79265
     -0.00473 |       0.00000 |       0.01378 |       0.00458 |       1.78855
     -0.00370 |       0.00000 |       0.01411 |       0.00519 |       1.78633
Evaluating losses...
     -0.00442 |       0.00000 |       0.01393 |       0.00481 |       1.79248
----------------------------------
| EpLenMean       | 636          |
| EpRewMean       | -4.87        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 182          |
| TimeElapsed     | 127          |
| TimestepsSoFar  | 114688       |
| ev_tdlam_before | 0.851        |
| loss_ent        | 1.7924844    |
| loss_kl         | 0.0048053386 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.004416044 |
| loss_vf_loss    | 0.013930206  |
----------------------------------
********** Iteration 28 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00032 |       0.00000 |       0.01494 | 

********** Iteration 33 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00055 |       0.00000 |       0.01798 |       0.00246 |       1.73365
     -0.00344 |       0.00000 |       0.01748 |       0.00324 |       1.72626
     -0.00415 |       0.00000 |       0.01689 |       0.00367 |       1.72482
     -0.00294 |       0.00000 |       0.01669 |       0.00431 |       1.72312
     -0.00572 |       0.00000 |       0.01641 |       0.00526 |       1.71910
     -0.00574 |       0.00000 |       0.01594 |       0.00587 |       1.71467
     -0.00360 |       0.00000 |       0.01642 |       0.00672 |       1.71035
     -0.00605 |       0.00000 |       0.01609 |       0.00663 |       1.72161
     -0.00548 |       0.00000 |       0.01621 |       0.00675 |       1.72048
     -0.00649 |       0.00000 |       0.01618 |       0.00734 |       1.71392
Evaluating losses...
     -0.00911 |       0.00000 |       0.01578 |       0.00743 |       

     -0.00554 |       0.00000 |       0.02094 |       0.00477 |       1.70039
     -0.00224 |       0.00000 |       0.02085 |       0.00506 |       1.70171
Evaluating losses...
     -0.00539 |       0.00000 |       0.02050 |       0.00524 |       1.69585
-----------------------------------
| EpLenMean       | 623           |
| EpRewMean       | -4.88         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 253           |
| TimeElapsed     | 176           |
| TimestepsSoFar  | 159744        |
| ev_tdlam_before | 0.788         |
| loss_ent        | 1.6958477     |
| loss_kl         | 0.0052383365  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0053919386 |
| loss_vf_loss    | 0.020498253   |
-----------------------------------
********** Iteration 39 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -2.98e-05 |       0.00000 |       0.01657 |       0.00254 |       1.72152
     -0.00215 |       0.00000 |   

********** Iteration 44 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00102 |       0.00000 |       0.01840 |       0.00282 |       1.71875
     -0.00203 |       0.00000 |       0.01816 |       0.00386 |       1.70174
     -0.00231 |       0.00000 |       0.01747 |       0.00513 |       1.69645
     -0.00561 |       0.00000 |       0.01725 |       0.00612 |       1.69425
     -0.00584 |       0.00000 |       0.01717 |       0.00579 |       1.69484
     -0.00620 |       0.00000 |       0.01700 |       0.00598 |       1.69828
     -0.00714 |       0.00000 |       0.01664 |       0.00680 |       1.69563
     -0.00341 |       0.00000 |       0.01696 |       0.00611 |       1.70060
     -0.00396 |       0.00000 |       0.01652 |       0.00662 |       1.70017
     -0.00429 |       0.00000 |       0.01648 |       0.00667 |       1.70325
Evaluating losses...
     -0.00703 |       0.00000 |       0.01606 |       0.00666 |       

     -0.00467 |       0.00000 |       0.01621 |       0.00567 |       1.69410
     -0.00395 |       0.00000 |       0.01643 |       0.00560 |       1.69563
     -0.00610 |       0.00000 |       0.01599 |       0.00612 |       1.69353
Evaluating losses...
     -0.00682 |       0.00000 |       0.01616 |       0.00596 |       1.69443
-----------------------------------
| EpLenMean       | 625           |
| EpRewMean       | -4.88         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 325           |
| TimeElapsed     | 225           |
| TimestepsSoFar  | 204800        |
| ev_tdlam_before | 0.841         |
| loss_ent        | 1.6944258     |
| loss_kl         | 0.00595619    |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0068150396 |
| loss_vf_loss    | 0.016163463   |
-----------------------------------
********** Iteration 50 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00130 |       0.00000 |   

********** Iteration 55 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00156 |       0.00000 |       0.02081 |       0.00351 |       1.56040
      0.00035 |       0.00000 |       0.01969 |       0.00383 |       1.55758
     -0.00216 |       0.00000 |       0.01922 |       0.00402 |       1.55852
     -0.00290 |       0.00000 |       0.01894 |       0.00445 |       1.55392
     -0.00406 |       0.00000 |       0.01878 |       0.00520 |       1.55691
     -0.00432 |       0.00000 |       0.01850 |       0.00525 |       1.55334
     -0.00226 |       0.00000 |       0.01819 |       0.00493 |       1.55103
     -0.00284 |       0.00000 |       0.01819 |       0.00555 |       1.56191
     -0.00373 |       0.00000 |       0.01797 |       0.00515 |       1.55248
     -0.00472 |       0.00000 |       0.01799 |       0.00550 |       1.55432
Evaluating losses...
     -0.00451 |       0.00000 |       0.01757 |       0.00546 |       

     -0.00472 |       0.00000 |       0.01596 |       0.00690 |       1.53061
     -0.00544 |       0.00000 |       0.01610 |       0.00708 |       1.52609
     -0.00565 |       0.00000 |       0.01547 |       0.00720 |       1.53220
Evaluating losses...
     -0.00862 |       0.00000 |       0.01560 |       0.00703 |       1.53360
---------------------------------
| EpLenMean       | 626         |
| EpRewMean       | -4.87       |
| EpThisIter      | 7           |
| EpisodesSoFar   | 397         |
| TimeElapsed     | 276         |
| TimestepsSoFar  | 249856      |
| ev_tdlam_before | 0.851       |
| loss_ent        | 1.5335972   |
| loss_kl         | 0.007028835 |
| loss_pol_entpen | 0.0         |
| loss_pol_surr   | -0.00862473 |
| loss_vf_loss    | 0.015596065 |
---------------------------------
********** Iteration 61 ************
Eval num_timesteps=249856, episode_reward=-4.70 +/- 0.64
Episode length: 655.40 +/- 52.34
New best mean reward!
Optimizing...
     pol_surr |    pol_entpe

********** Iteration 66 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00057 |       0.00000 |       0.02280 |       0.00347 |       1.50622
     -0.00107 |       0.00000 |       0.02139 |       0.00429 |       1.50112
     -0.00274 |       0.00000 |       0.02071 |       0.00441 |       1.49088
     -0.00113 |       0.00000 |       0.02056 |       0.00512 |       1.49773
     -0.00489 |       0.00000 |       0.02030 |       0.00508 |       1.49203
     -0.00199 |       0.00000 |       0.02013 |       0.00564 |       1.48741
     -0.00644 |       0.00000 |       0.01992 |       0.00554 |       1.48494
     -0.00391 |       0.00000 |       0.01967 |       0.00549 |       1.48438
     -0.00510 |       0.00000 |       0.01944 |       0.00574 |       1.48124
     -0.00498 |       0.00000 |       0.01883 |       0.00579 |       1.47860
Evaluating losses...
     -0.00280 |       0.00000 |       0.01868 |       0.00582 |       

     -0.00307 |       0.00000 |       0.01573 |       0.00681 |       1.50949
     -0.00603 |       0.00000 |       0.01576 |       0.00675 |       1.50566
     -0.00711 |       0.00000 |       0.01539 |       0.00702 |       1.50827
Evaluating losses...
     -0.00444 |       0.00000 |       0.01523 |       0.00728 |       1.50938
----------------------------------
| EpLenMean       | 646          |
| EpRewMean       | -4.78        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 466          |
| TimeElapsed     | 335          |
| TimestepsSoFar  | 294912       |
| ev_tdlam_before | 0.829        |
| loss_ent        | 1.5093814    |
| loss_kl         | 0.0072758407 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.004444112 |
| loss_vf_loss    | 0.015226175  |
----------------------------------
********** Iteration 72 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00034 |       0.00000 |       0.01719 | 

********** Iteration 77 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00255 |       0.00000 |       0.01747 |       0.00464 |       1.46486
     -0.00084 |       0.00000 |       0.01648 |       0.00509 |       1.46664
     -0.00190 |       0.00000 |       0.01600 |       0.00537 |       1.46865
     -0.00191 |       0.00000 |       0.01564 |       0.00517 |       1.46708
     -0.00206 |       0.00000 |       0.01515 |       0.00591 |       1.47571
     -0.00289 |       0.00000 |       0.01493 |       0.00600 |       1.47456
     -0.00398 |       0.00000 |       0.01468 |       0.00636 |       1.47465
     -0.00362 |       0.00000 |       0.01455 |       0.00625 |       1.47447
     -0.00583 |       0.00000 |       0.01441 |       0.00665 |       1.47730
     -0.00334 |       0.00000 |       0.01436 |       0.00638 |       1.47295
Evaluating losses...
     -0.00561 |       0.00000 |       0.01402 |       0.00683 |       

     -0.00309 |       0.00000 |       0.01878 |       0.00687 |       1.47195
     -0.00469 |       0.00000 |       0.01857 |       0.00687 |       1.47700
Evaluating losses...
     -0.00443 |       0.00000 |       0.01842 |       0.00703 |       1.47413
----------------------------------
| EpLenMean       | 668          |
| EpRewMean       | -4.78        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 534          |
| TimeElapsed     | 390          |
| TimestepsSoFar  | 339968       |
| ev_tdlam_before | 0.791        |
| loss_ent        | 1.474127     |
| loss_kl         | 0.0070288116 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.004431707 |
| loss_vf_loss    | 0.018421797  |
----------------------------------
********** Iteration 83 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00126 |       0.00000 |       0.01619 |       0.00448 |       1.44902
     -0.00251 |       0.00000 |       0.01547 | 

********** Iteration 88 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00094 |       0.00000 |       0.01695 |       0.00397 |       1.36201
    -6.74e-05 |       0.00000 |       0.01557 |       0.00485 |       1.36901
     -0.00157 |       0.00000 |       0.01486 |       0.00514 |       1.36942
     -0.00307 |       0.00000 |       0.01445 |       0.00559 |       1.36992
     -0.00309 |       0.00000 |       0.01443 |       0.00561 |       1.36987
     -0.00245 |       0.00000 |       0.01368 |       0.00573 |       1.36866
     -0.00230 |       0.00000 |       0.01384 |       0.00603 |       1.37088
     -0.00225 |       0.00000 |       0.01356 |       0.00631 |       1.36976
     -0.00534 |       0.00000 |       0.01294 |       0.00640 |       1.36830
     -0.00498 |       0.00000 |       0.01298 |       0.00661 |       1.37004
Evaluating losses...
     -0.00601 |       0.00000 |       0.01308 |       0.00679 |       

     -0.00339 |       0.00000 |       0.01216 |       0.00774 |       1.25490
     -0.00349 |       0.00000 |       0.01183 |       0.00731 |       1.25115
     -0.00562 |       0.00000 |       0.01165 |       0.00777 |       1.25375
Evaluating losses...
     -0.00583 |       0.00000 |       0.01156 |       0.00836 |       1.25526
-----------------------------------
| EpLenMean       | 650           |
| EpRewMean       | -4.85         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 604           |
| TimeElapsed     | 442           |
| TimestepsSoFar  | 385024        |
| ev_tdlam_before | 0.856         |
| loss_ent        | 1.2552624     |
| loss_kl         | 0.008356126   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0058278665 |
| loss_vf_loss    | 0.011560908   |
-----------------------------------
********** Iteration 94 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00169 |       0.00000 |   

********** Iteration 99 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00092 |       0.00000 |       0.02137 |       0.00458 |       1.26740
     -0.00020 |       0.00000 |       0.02059 |       0.00529 |       1.26551
     -0.00083 |       0.00000 |       0.01983 |       0.00548 |       1.26399
     -0.00291 |       0.00000 |       0.01931 |       0.00602 |       1.26524
     -0.00330 |       0.00000 |       0.01880 |       0.00575 |       1.26773
     -0.00401 |       0.00000 |       0.01889 |       0.00619 |       1.26781
     -0.00371 |       0.00000 |       0.01820 |       0.00604 |       1.26858
     -0.00336 |       0.00000 |       0.01768 |       0.00596 |       1.26939
     -0.00424 |       0.00000 |       0.01768 |       0.00641 |       1.26783
     -0.00394 |       0.00000 |       0.01782 |       0.00676 |       1.26888
Evaluating losses...
     -0.00636 |       0.00000 |       0.01729 |       0.00664 |       

     -0.00347 |       0.00000 |       0.01283 |       0.00649 |       1.27804
     -0.00130 |       0.00000 |       0.01236 |       0.00618 |       1.27636
     -0.00201 |       0.00000 |       0.01241 |       0.00670 |       1.27488
Evaluating losses...
     -0.00251 |       0.00000 |       0.01260 |       0.00654 |       1.27628
----------------------------------
| EpLenMean       | 661          |
| EpRewMean       | -4.86        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 672          |
| TimeElapsed     | 495          |
| TimestepsSoFar  | 430080       |
| ev_tdlam_before | 0.844        |
| loss_ent        | 1.2762773    |
| loss_kl         | 0.006542188  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002511823 |
| loss_vf_loss    | 0.012603762  |
----------------------------------
********** Iteration 105 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00065 |       0.00000 |       0.01439 |

********** Iteration 110 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00048 |       0.00000 |       0.01786 |       0.00381 |       1.18328
      0.00012 |       0.00000 |       0.01693 |       0.00502 |       1.18824
     -0.00153 |       0.00000 |       0.01635 |       0.00543 |       1.18770
     -0.00231 |       0.00000 |       0.01623 |       0.00556 |       1.18508
     -0.00261 |       0.00000 |       0.01580 |       0.00523 |       1.18262
     -0.00254 |       0.00000 |       0.01571 |       0.00637 |       1.18239
     -0.00431 |       0.00000 |       0.01536 |       0.00573 |       1.18270
     -0.00326 |       0.00000 |       0.01528 |       0.00677 |       1.18261
     -0.00528 |       0.00000 |       0.01531 |       0.00646 |       1.17968
     -0.00433 |       0.00000 |       0.01501 |       0.00728 |       1.18308
Evaluating losses...
     -0.00577 |       0.00000 |       0.01494 |       0.00688 |      

     -0.00307 |       0.00000 |       0.01415 |       0.00705 |       1.18104
     -0.00513 |       0.00000 |       0.01397 |       0.00730 |       1.17890
     -0.00439 |       0.00000 |       0.01397 |       0.00737 |       1.18091
Evaluating losses...
     -0.00796 |       0.00000 |       0.01388 |       0.00779 |       1.18002
----------------------------------
| EpLenMean       | 699          |
| EpRewMean       | -4.85        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 735          |
| TimeElapsed     | 544          |
| TimestepsSoFar  | 475136       |
| ev_tdlam_before | 0.793        |
| loss_ent        | 1.1800176    |
| loss_kl         | 0.007790352  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.007957622 |
| loss_vf_loss    | 0.013883371  |
----------------------------------
********** Iteration 116 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00239 |       0.00000 |       0.01119 |

********** Iteration 121 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00018 |       0.00000 |       0.01426 |       0.00413 |       1.10332
      0.00306 |       0.00000 |       0.01284 |       0.00474 |       1.09778
      0.00077 |       0.00000 |       0.01276 |       0.00456 |       1.09477
     -0.00171 |       0.00000 |       0.01216 |       0.00514 |       1.09264
      0.00044 |       0.00000 |       0.01177 |       0.00564 |       1.08843
     -0.00290 |       0.00000 |       0.01152 |       0.00568 |       1.08410
     -0.00288 |       0.00000 |       0.01141 |       0.00581 |       1.08385
     -0.00331 |       0.00000 |       0.01140 |       0.00627 |       1.08319
     -0.00352 |       0.00000 |       0.01113 |       0.00621 |       1.08259
     -0.00272 |       0.00000 |       0.01123 |       0.00626 |       1.08193
Evaluating losses...
     -0.00327 |       0.00000 |       0.01104 |       0.00600 |      

     -0.00473 |       0.00000 |       0.01461 |       0.00664 |       1.04234
     -0.00379 |       0.00000 |       0.01461 |       0.00726 |       1.04195
     -0.00531 |       0.00000 |       0.01436 |       0.00736 |       1.04557
     -0.00389 |       0.00000 |       0.01369 |       0.00707 |       1.04262
Evaluating losses...
     -0.00663 |       0.00000 |       0.01403 |       0.00746 |       1.04448
----------------------------------
| EpLenMean       | 744          |
| EpRewMean       | -4.85        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 795          |
| TimeElapsed     | 597          |
| TimestepsSoFar  | 520192       |
| ev_tdlam_before | 0.812        |
| loss_ent        | 1.0444777    |
| loss_kl         | 0.0074611516 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.006627579 |
| loss_vf_loss    | 0.014029428  |
----------------------------------
********** Iteration 127 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |

********** Iteration 132 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00158 |       0.00000 |       0.01753 |       0.00358 |       0.96628
      0.00071 |       0.00000 |       0.01670 |       0.00522 |       0.96524
     -0.00084 |       0.00000 |       0.01633 |       0.00525 |       0.96210
     -0.00246 |       0.00000 |       0.01618 |       0.00515 |       0.95797
     -0.00301 |       0.00000 |       0.01569 |       0.00550 |       0.95583
     -0.00333 |       0.00000 |       0.01552 |       0.00570 |       0.95545
     -0.00116 |       0.00000 |       0.01514 |       0.00607 |       0.95190
     -0.00451 |       0.00000 |       0.01494 |       0.00586 |       0.95001
     -0.00353 |       0.00000 |       0.01498 |       0.00582 |       0.94848
     -0.00500 |       0.00000 |       0.01459 |       0.00595 |       0.94737
Evaluating losses...
     -0.00445 |       0.00000 |       0.01453 |       0.00565 |      

     -0.00123 |       0.00000 |       0.01448 |       0.00529 |       1.03338
     -0.00209 |       0.00000 |       0.01393 |       0.00593 |       1.03294
     -0.00274 |       0.00000 |       0.01423 |       0.00563 |       1.03438
Evaluating losses...
     -0.00558 |       0.00000 |       0.01411 |       0.00567 |       1.03311
-----------------------------------
| EpLenMean       | 769           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 853           |
| TimeElapsed     | 645           |
| TimestepsSoFar  | 565248        |
| ev_tdlam_before | 0.767         |
| loss_ent        | 1.033108      |
| loss_kl         | 0.0056715906  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0055776583 |
| loss_vf_loss    | 0.014110147   |
-----------------------------------
********** Iteration 138 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00239 |       0.00000 |  

********** Iteration 143 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00125 |       0.00000 |       0.01479 |       0.00358 |       0.95214
     -0.00140 |       0.00000 |       0.01351 |       0.00384 |       0.95293
    -4.37e-05 |       0.00000 |       0.01313 |       0.00395 |       0.95373
     -0.00374 |       0.00000 |       0.01279 |       0.00470 |       0.95490
     -0.00017 |       0.00000 |       0.01259 |       0.00494 |       0.95081
     -0.00304 |       0.00000 |       0.01259 |       0.00530 |       0.95227
     -0.00196 |       0.00000 |       0.01223 |       0.00492 |       0.95150
     -0.00275 |       0.00000 |       0.01208 |       0.00519 |       0.95331
     -0.00311 |       0.00000 |       0.01192 |       0.00496 |       0.95177
     -0.00189 |       0.00000 |       0.01217 |       0.00561 |       0.95141
Evaluating losses...
     -0.00423 |       0.00000 |       0.01164 |       0.00559 |      

     -0.00247 |       0.00000 |       0.01453 |       0.00492 |       0.87369
     -0.00429 |       0.00000 |       0.01429 |       0.00489 |       0.87116
     -0.00282 |       0.00000 |       0.01436 |       0.00518 |       0.87130
Evaluating losses...
     -0.00453 |       0.00000 |       0.01406 |       0.00519 |       0.86974
----------------------------------
| EpLenMean       | 740          |
| EpRewMean       | -4.89        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 914          |
| TimeElapsed     | 694          |
| TimestepsSoFar  | 610304       |
| ev_tdlam_before | 0.803        |
| loss_ent        | 0.86974496   |
| loss_kl         | 0.0051927217 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.004526261 |
| loss_vf_loss    | 0.01406482   |
----------------------------------
********** Iteration 149 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00014 |       0.00000 |       0.01443 |

********** Iteration 154 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00257 |       0.00000 |       0.01506 |       0.00346 |       0.90584
     -0.00078 |       0.00000 |       0.01421 |       0.00372 |       0.90340
     -0.00120 |       0.00000 |       0.01403 |       0.00383 |       0.90255
     -0.00117 |       0.00000 |       0.01331 |       0.00423 |       0.89990
     -0.00023 |       0.00000 |       0.01309 |       0.00440 |       0.89957
    -9.34e-05 |       0.00000 |       0.01311 |       0.00458 |       0.89694
     -0.00285 |       0.00000 |       0.01279 |       0.00479 |       0.89770
     -0.00206 |       0.00000 |       0.01247 |       0.00486 |       0.89862
     -0.00317 |       0.00000 |       0.01281 |       0.00516 |       0.89543
     -0.00149 |       0.00000 |       0.01223 |       0.00559 |       0.89354
Evaluating losses...
     -0.00129 |       0.00000 |       0.01221 |       0.00572 |      

     -0.00120 |       0.00000 |       0.01086 |       0.00388 |       0.76682
     -0.00369 |       0.00000 |       0.01077 |       0.00397 |       0.76747
     -0.00122 |       0.00000 |       0.01057 |       0.00425 |       0.76609
Evaluating losses...
     -0.00230 |       0.00000 |       0.01045 |       0.00417 |       0.76615
-----------------------------------
| EpLenMean       | 781           |
| EpRewMean       | -4.84         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 970           |
| TimeElapsed     | 743           |
| TimestepsSoFar  | 655360        |
| ev_tdlam_before | 0.838         |
| loss_ent        | 0.7661504     |
| loss_kl         | 0.0041683493  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0023020217 |
| loss_vf_loss    | 0.01045118    |
-----------------------------------
********** Iteration 160 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00244 |       0.00000 |  

********** Iteration 165 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00224 |       0.00000 |       0.01585 |       0.00340 |       0.79936
      0.00100 |       0.00000 |       0.01505 |       0.00350 |       0.79991
     -0.00111 |       0.00000 |       0.01474 |       0.00433 |       0.79967
     -0.00215 |       0.00000 |       0.01423 |       0.00427 |       0.80104
     -0.00130 |       0.00000 |       0.01373 |       0.00443 |       0.79994
     -0.00379 |       0.00000 |       0.01397 |       0.00503 |       0.79916
     -0.00427 |       0.00000 |       0.01338 |       0.00477 |       0.79843
     -0.00269 |       0.00000 |       0.01335 |       0.00509 |       0.79936
     -0.00308 |       0.00000 |       0.01330 |       0.00527 |       0.79937
     -0.00291 |       0.00000 |       0.01313 |       0.00506 |       0.79730
Evaluating losses...
     -0.00379 |       0.00000 |       0.01283 |       0.00503 |      

     -0.00283 |       0.00000 |       0.01367 |       0.00463 |       0.76875
     -0.00308 |       0.00000 |       0.01367 |       0.00472 |       0.76823
     -0.00430 |       0.00000 |       0.01308 |       0.00493 |       0.77005
Evaluating losses...
     -0.00576 |       0.00000 |       0.01319 |       0.00489 |       0.76982
-----------------------------------
| EpLenMean       | 804           |
| EpRewMean       | -4.79         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 1026          |
| TimeElapsed     | 792           |
| TimestepsSoFar  | 700416        |
| ev_tdlam_before | 0.773         |
| loss_ent        | 0.7698214     |
| loss_kl         | 0.004887154   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0057573635 |
| loss_vf_loss    | 0.013188232   |
-----------------------------------
********** Iteration 171 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00133 |       0.00000 |  

********** Iteration 176 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00211 |       0.00000 |       0.01978 |       0.00339 |       0.78560
      0.00377 |       0.00000 |       0.01879 |       0.00341 |       0.78527
      0.00060 |       0.00000 |       0.01766 |       0.00337 |       0.78723
      0.00049 |       0.00000 |       0.01719 |       0.00381 |       0.78333
      0.00154 |       0.00000 |       0.01678 |       0.00377 |       0.78405
     -0.00178 |       0.00000 |       0.01636 |       0.00419 |       0.78596
     -0.00288 |       0.00000 |       0.01605 |       0.00429 |       0.78661
     -0.00635 |       0.00000 |       0.01599 |       0.00421 |       0.78670
     -0.00055 |       0.00000 |       0.01571 |       0.00480 |       0.78767
     -0.00046 |       0.00000 |       0.01556 |       0.00454 |       0.78612
Evaluating losses...
     -0.00137 |       0.00000 |       0.01520 |       0.00472 |      

     -0.00319 |       0.00000 |       0.01483 |       0.00543 |       0.76901
     -0.00141 |       0.00000 |       0.01444 |       0.00541 |       0.77117
     -0.00475 |       0.00000 |       0.01443 |       0.00557 |       0.77018
Evaluating losses...
     -0.00366 |       0.00000 |       0.01421 |       0.00562 |       0.77037
----------------------------------
| EpLenMean       | 843          |
| EpRewMean       | -4.81        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 1077         |
| TimeElapsed     | 841          |
| TimestepsSoFar  | 745472       |
| ev_tdlam_before | 0.759        |
| loss_ent        | 0.7703748    |
| loss_kl         | 0.0056214817 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003658347 |
| loss_vf_loss    | 0.014213044  |
----------------------------------
********** Iteration 182 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00231 |       0.00000 |       0.01516 |

********** Iteration 187 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00074 |       0.00000 |       0.01961 |       0.00349 |       0.78429
     -0.00073 |       0.00000 |       0.01831 |       0.00410 |       0.77751
      0.00078 |       0.00000 |       0.01762 |       0.00440 |       0.77396
     -0.00154 |       0.00000 |       0.01677 |       0.00469 |       0.77018
     -0.00170 |       0.00000 |       0.01661 |       0.00526 |       0.76553
     -0.00229 |       0.00000 |       0.01609 |       0.00520 |       0.76484
     -0.00239 |       0.00000 |       0.01619 |       0.00510 |       0.76297
     -0.00440 |       0.00000 |       0.01557 |       0.00528 |       0.76096
     -0.00216 |       0.00000 |       0.01548 |       0.00566 |       0.75849
     -0.00490 |       0.00000 |       0.01543 |       0.00576 |       0.75775
Evaluating losses...
     -0.00351 |       0.00000 |       0.01532 |       0.00577 |      

     -0.00048 |       0.00000 |       0.01348 |       0.00477 |       0.75537
     -0.00261 |       0.00000 |       0.01331 |       0.00513 |       0.75476
     -0.00369 |       0.00000 |       0.01335 |       0.00529 |       0.75492
Evaluating losses...
     -0.00337 |       0.00000 |       0.01298 |       0.00499 |       0.75693
-----------------------------------
| EpLenMean       | 866           |
| EpRewMean       | -4.87         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 1130          |
| TimeElapsed     | 896           |
| TimestepsSoFar  | 790528        |
| ev_tdlam_before | 0.739         |
| loss_ent        | 0.75693154    |
| loss_kl         | 0.0049872394  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0033654312 |
| loss_vf_loss    | 0.012979542   |
-----------------------------------
********** Iteration 193 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00230 |       0.00000 |  

********** Iteration 198 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00047 |       0.00000 |       0.01649 |       0.00315 |       0.63289
      0.00012 |       0.00000 |       0.01528 |       0.00391 |       0.63237
     -0.00287 |       0.00000 |       0.01446 |       0.00391 |       0.62932
      0.00073 |       0.00000 |       0.01386 |       0.00395 |       0.62729
     -0.00108 |       0.00000 |       0.01350 |       0.00448 |       0.62595
     -0.00232 |       0.00000 |       0.01315 |       0.00453 |       0.62502
     -0.00145 |       0.00000 |       0.01299 |       0.00509 |       0.62329
     -0.00121 |       0.00000 |       0.01267 |       0.00523 |       0.62471
     -0.00417 |       0.00000 |       0.01258 |       0.00502 |       0.62382
     -0.00402 |       0.00000 |       0.01243 |       0.00554 |       0.62102
Evaluating losses...
     -0.00526 |       0.00000 |       0.01232 |       0.00520 |      

     -0.00152 |       0.00000 |       0.01036 |       0.00513 |       0.70863
     -0.00119 |       0.00000 |       0.01025 |       0.00499 |       0.70855
     -0.00180 |       0.00000 |       0.01031 |       0.00530 |       0.70822
Evaluating losses...
     -0.00383 |       0.00000 |       0.01011 |       0.00543 |       0.70848
----------------------------------
| EpLenMean       | 889          |
| EpRewMean       | -4.88        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 1179         |
| TimeElapsed     | 947          |
| TimestepsSoFar  | 835584       |
| ev_tdlam_before | 0.781        |
| loss_ent        | 0.7084828    |
| loss_kl         | 0.0054324903 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003828904 |
| loss_vf_loss    | 0.010109531  |
----------------------------------
********** Iteration 204 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00157 |       0.00000 |       0.01857 |

********** Iteration 209 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00305 |       0.00000 |       0.01777 |       0.00310 |       0.65414
      0.00166 |       0.00000 |       0.01661 |       0.00344 |       0.65264
     7.04e-05 |       0.00000 |       0.01589 |       0.00346 |       0.65144
     -0.00215 |       0.00000 |       0.01528 |       0.00368 |       0.65221
     -0.00147 |       0.00000 |       0.01464 |       0.00378 |       0.65182
     -0.00412 |       0.00000 |       0.01463 |       0.00371 |       0.65119
     -0.00231 |       0.00000 |       0.01413 |       0.00397 |       0.65141
     -0.00287 |       0.00000 |       0.01394 |       0.00404 |       0.65202
     -0.00013 |       0.00000 |       0.01387 |       0.00380 |       0.65086
     -0.00253 |       0.00000 |       0.01336 |       0.00386 |       0.64898
Evaluating losses...
     -0.00373 |       0.00000 |       0.01346 |       0.00405 |      

     -0.00073 |       0.00000 |       0.01301 |       0.00437 |       0.71420
     -0.00157 |       0.00000 |       0.01278 |       0.00466 |       0.71415
     -0.00222 |       0.00000 |       0.01273 |       0.00466 |       0.71564
Evaluating losses...
     -0.00316 |       0.00000 |       0.01269 |       0.00463 |       0.71437
----------------------------------
| EpLenMean       | 959          |
| EpRewMean       | -4.86        |
| EpThisIter      | 4            |
| EpisodesSoFar   | 1223         |
| TimeElapsed     | 997          |
| TimestepsSoFar  | 880640       |
| ev_tdlam_before | 0.682        |
| loss_ent        | 0.714374     |
| loss_kl         | 0.004631463  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003156901 |
| loss_vf_loss    | 0.012694398  |
----------------------------------
********** Iteration 215 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00436 |       0.00000 |       0.01431 |

********** Iteration 220 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00019 |       0.00000 |       0.01915 |       0.00354 |       0.72394
      0.00066 |       0.00000 |       0.01789 |       0.00384 |       0.72312
     -0.00028 |       0.00000 |       0.01724 |       0.00455 |       0.71844
      0.00062 |       0.00000 |       0.01674 |       0.00462 |       0.71752
    -5.20e-05 |       0.00000 |       0.01644 |       0.00482 |       0.71475
     -0.00286 |       0.00000 |       0.01602 |       0.00457 |       0.71651
     -0.00246 |       0.00000 |       0.01565 |       0.00520 |       0.71283
     -0.00207 |       0.00000 |       0.01537 |       0.00555 |       0.71270
     -0.00308 |       0.00000 |       0.01506 |       0.00509 |       0.71301
     -0.00196 |       0.00000 |       0.01484 |       0.00510 |       0.71145
Evaluating losses...
     -0.00291 |       0.00000 |       0.01460 |       0.00539 |      

      0.00052 |       0.00000 |       0.01033 |       0.00515 |       0.70909
     -0.00314 |       0.00000 |       0.00995 |       0.00486 |       0.70941
     -0.00188 |       0.00000 |       0.01015 |       0.00505 |       0.70810
Evaluating losses...
      0.00062 |       0.00000 |       0.01003 |       0.00531 |       0.70891
-----------------------------------
| EpLenMean       | 1.06e+03      |
| EpRewMean       | -4.85         |
| EpThisIter      | 3             |
| EpisodesSoFar   | 1262          |
| TimeElapsed     | 1.05e+03      |
| TimestepsSoFar  | 925696        |
| ev_tdlam_before | 0.682         |
| loss_ent        | 0.70891327    |
| loss_kl         | 0.0053060604  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00062218774 |
| loss_vf_loss    | 0.01003488    |
-----------------------------------
********** Iteration 226 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00274 |       0.00000 |  

********** Iteration 231 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00181 |       0.00000 |       0.01914 |       0.00374 |       0.71025
      0.00179 |       0.00000 |       0.01824 |       0.00413 |       0.70908
      0.00059 |       0.00000 |       0.01748 |       0.00473 |       0.70735
     -0.00060 |       0.00000 |       0.01728 |       0.00465 |       0.70792
     -0.00052 |       0.00000 |       0.01729 |       0.00535 |       0.70697
     -0.00173 |       0.00000 |       0.01707 |       0.00494 |       0.70516
     -0.00294 |       0.00000 |       0.01664 |       0.00528 |       0.70644
     -0.00308 |       0.00000 |       0.01621 |       0.00554 |       0.70520
     -0.00386 |       0.00000 |       0.01650 |       0.00594 |       0.70352
     -0.00376 |       0.00000 |       0.01598 |       0.00615 |       0.70432
Evaluating losses...
     -0.00376 |       0.00000 |       0.01591 |       0.00580 |      

     -0.00087 |       0.00000 |       0.01353 |       0.00534 |       0.68665
     -0.00122 |       0.00000 |       0.01346 |       0.00512 |       0.68521
     -0.00482 |       0.00000 |       0.01335 |       0.00516 |       0.68691
Evaluating losses...
     -0.00063 |       0.00000 |       0.01314 |       0.00492 |       0.68550
-----------------------------------
| EpLenMean       | 1.12e+03      |
| EpRewMean       | -4.8          |
| EpThisIter      | 3             |
| EpisodesSoFar   | 1303          |
| TimeElapsed     | 1.1e+03       |
| TimestepsSoFar  | 970752        |
| ev_tdlam_before | 0.689         |
| loss_ent        | 0.6854957     |
| loss_kl         | 0.0049170665  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0006293794 |
| loss_vf_loss    | 0.013139513   |
-----------------------------------
********** Iteration 237 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00165 |       0.00000 |  

********** Iteration 242 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00153 |       0.00000 |       0.01952 |       0.00401 |       0.68263
      0.00325 |       0.00000 |       0.01724 |       0.00437 |       0.68570
      0.00057 |       0.00000 |       0.01619 |       0.00475 |       0.68601
      0.00042 |       0.00000 |       0.01558 |       0.00454 |       0.68435
      0.00187 |       0.00000 |       0.01521 |       0.00465 |       0.68452
     -0.00105 |       0.00000 |       0.01500 |       0.00481 |       0.68588
     -0.00179 |       0.00000 |       0.01498 |       0.00522 |       0.68519
     -0.00422 |       0.00000 |       0.01455 |       0.00522 |       0.68479
     -0.00382 |       0.00000 |       0.01421 |       0.00551 |       0.68556
     -0.00128 |       0.00000 |       0.01380 |       0.00581 |       0.68765
Evaluating losses...
     -0.00457 |       0.00000 |       0.01364 |       0.00599 |      

     -0.00269 |       0.00000 |       0.01355 |       0.00503 |       0.68650
     -0.00204 |       0.00000 |       0.01356 |       0.00476 |       0.68633
      0.00058 |       0.00000 |       0.01315 |       0.00528 |       0.68487
     -0.00075 |       0.00000 |       0.01326 |       0.00514 |       0.68566
Evaluating losses...
     -0.00346 |       0.00000 |       0.01302 |       0.00517 |       0.68601
-----------------------------------
| EpLenMean       | 1.23e+03      |
| EpRewMean       | -4.74         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1335          |
| TimeElapsed     | 1.15e+03      |
| TimestepsSoFar  | 1015808       |
| ev_tdlam_before | 0.614         |
| loss_ent        | 0.6860084     |
| loss_kl         | 0.005166316   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0034551737 |
| loss_vf_loss    | 0.013019777   |
-----------------------------------
********** Iteration 248 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 253 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00593 |       0.00000 |       0.01813 |       0.00362 |       0.62333
      0.00169 |       0.00000 |       0.01633 |       0.00428 |       0.62411
     -0.00010 |       0.00000 |       0.01567 |       0.00415 |       0.62595
      0.00072 |       0.00000 |       0.01539 |       0.00452 |       0.62525
      0.00153 |       0.00000 |       0.01527 |       0.00444 |       0.62504
     -0.00171 |       0.00000 |       0.01488 |       0.00444 |       0.62629
     -0.00265 |       0.00000 |       0.01454 |       0.00442 |       0.62623
     -0.00084 |       0.00000 |       0.01461 |       0.00486 |       0.62638
     -0.00205 |       0.00000 |       0.01406 |       0.00483 |       0.62564
    -4.31e-05 |       0.00000 |       0.01425 |       0.00498 |       0.62601
Evaluating losses...
     -0.00174 |       0.00000 |       0.01382 |       0.00478 |      

     -0.00210 |       0.00000 |       0.01324 |       0.00541 |       0.67859
     -0.00113 |       0.00000 |       0.01302 |       0.00552 |       0.67789
     -0.00184 |       0.00000 |       0.01298 |       0.00594 |       0.67930
Evaluating losses...
     -0.00332 |       0.00000 |       0.01294 |       0.00624 |       0.67924
-----------------------------------
| EpLenMean       | 1.36e+03      |
| EpRewMean       | -4.69         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1361          |
| TimeElapsed     | 1.2e+03       |
| TimestepsSoFar  | 1060864       |
| ev_tdlam_before | 0.56          |
| loss_ent        | 0.6792375     |
| loss_kl         | 0.006240464   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0033193477 |
| loss_vf_loss    | 0.012941858   |
-----------------------------------
********** Iteration 259 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00226 |       0.00000 |  

********** Iteration 264 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00265 |       0.00000 |       0.01121 |       0.00435 |       0.76955
      0.00236 |       0.00000 |       0.01053 |       0.00448 |       0.76778
      0.00195 |       0.00000 |       0.01004 |       0.00508 |       0.76533
      0.00221 |       0.00000 |       0.00979 |       0.00514 |       0.76564
     -0.00074 |       0.00000 |       0.00964 |       0.00550 |       0.76681
     -0.00062 |       0.00000 |       0.00955 |       0.00591 |       0.76558
      0.00138 |       0.00000 |       0.00941 |       0.00567 |       0.76469
     -0.00208 |       0.00000 |       0.00930 |       0.00595 |       0.76516
     -0.00146 |       0.00000 |       0.00908 |       0.00675 |       0.76495
     -0.00220 |       0.00000 |       0.00868 |       0.00600 |       0.76615
Evaluating losses...
     -0.00288 |       0.00000 |       0.00888 |       0.00608 |      

     -0.00070 |       0.00000 |       0.00596 |       0.00549 |       0.74584
     -0.00025 |       0.00000 |       0.00601 |       0.00545 |       0.74610
     -0.00160 |       0.00000 |       0.00601 |       0.00569 |       0.74721
Evaluating losses...
     -0.00327 |       0.00000 |       0.00590 |       0.00574 |       0.74663
-----------------------------------
| EpLenMean       | 1.58e+03      |
| EpRewMean       | -4.72         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1382          |
| TimeElapsed     | 1.25e+03      |
| TimestepsSoFar  | 1105920       |
| ev_tdlam_before | 0.572         |
| loss_ent        | 0.7466275     |
| loss_kl         | 0.0057401448  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0032660395 |
| loss_vf_loss    | 0.0059023863  |
-----------------------------------
********** Iteration 270 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00114 |       0.00000 |  

********** Iteration 275 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00698 |       0.00000 |       0.01509 |       0.00462 |       0.76536
      0.00106 |       0.00000 |       0.01311 |       0.00481 |       0.76393
     -0.00105 |       0.00000 |       0.01210 |       0.00494 |       0.76328
      0.00073 |       0.00000 |       0.01167 |       0.00523 |       0.76295
     -0.00034 |       0.00000 |       0.01128 |       0.00523 |       0.76042
     -0.00121 |       0.00000 |       0.01119 |       0.00560 |       0.76036
     -0.00243 |       0.00000 |       0.01075 |       0.00624 |       0.75949
     -0.00359 |       0.00000 |       0.01058 |       0.00606 |       0.75905
     -0.00062 |       0.00000 |       0.01052 |       0.00590 |       0.75942
     -0.00108 |       0.00000 |       0.01032 |       0.00638 |       0.76040
Evaluating losses...
     -0.00254 |       0.00000 |       0.01013 |       0.00661 |      

     -0.00323 |       0.00000 |       0.01045 |       0.00468 |       0.57131
     -0.00374 |       0.00000 |       0.01031 |       0.00471 |       0.57120
     -0.00393 |       0.00000 |       0.01013 |       0.00478 |       0.57091
Evaluating losses...
     -0.00449 |       0.00000 |       0.00988 |       0.00486 |       0.57092
-----------------------------------
| EpLenMean       | 1.81e+03      |
| EpRewMean       | -4.65         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1402          |
| TimeElapsed     | 1.3e+03       |
| TimestepsSoFar  | 1150976       |
| ev_tdlam_before | 0.588         |
| loss_ent        | 0.57092476    |
| loss_kl         | 0.004855207   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0044915415 |
| loss_vf_loss    | 0.00987622    |
-----------------------------------
********** Iteration 281 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00221 |       0.00000 |  

********** Iteration 286 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00037 |       0.00000 |       0.01009 |       0.00419 |       0.70031
     -0.00046 |       0.00000 |       0.00954 |       0.00651 |       0.70080
     -0.00081 |       0.00000 |       0.00928 |       0.00556 |       0.69699
     -0.00202 |       0.00000 |       0.00914 |       0.00665 |       0.70006
     -0.00025 |       0.00000 |       0.00889 |       0.00705 |       0.70076
     -0.00125 |       0.00000 |       0.00878 |       0.00621 |       0.70041
     -0.00205 |       0.00000 |       0.00832 |       0.00711 |       0.70152
     -0.00079 |       0.00000 |       0.00859 |       0.00637 |       0.69987
     -0.00279 |       0.00000 |       0.00839 |       0.00719 |       0.69948
     -0.00546 |       0.00000 |       0.00837 |       0.00702 |       0.69946
Evaluating losses...
     -0.00194 |       0.00000 |       0.00809 |       0.00754 |      

     -0.00415 |       0.00000 |       0.00994 |       0.00519 |       0.68941
     -0.00289 |       0.00000 |       0.00994 |       0.00536 |       0.69043
     -0.00190 |       0.00000 |       0.00977 |       0.00540 |       0.69302
Evaluating losses...
     -0.00437 |       0.00000 |       0.00950 |       0.00574 |       0.69175
-----------------------------------
| EpLenMean       | 2.03e+03      |
| EpRewMean       | -4.53         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1420          |
| TimeElapsed     | 1.35e+03      |
| TimestepsSoFar  | 1196032       |
| ev_tdlam_before | 0.569         |
| loss_ent        | 0.6917506     |
| loss_kl         | 0.005740494   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0043706265 |
| loss_vf_loss    | 0.009500708   |
-----------------------------------
********** Iteration 292 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00397 |       0.00000 |  

********** Iteration 297 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00283 |       0.00000 |       0.00786 |       0.00377 |       0.64197
     -0.00044 |       0.00000 |       0.00712 |       0.00442 |       0.64403
     -0.00044 |       0.00000 |       0.00680 |       0.00454 |       0.64770
     -0.00156 |       0.00000 |       0.00644 |       0.00446 |       0.64649
     -0.00188 |       0.00000 |       0.00610 |       0.00492 |       0.64600
     -0.00323 |       0.00000 |       0.00611 |       0.00524 |       0.64879
     -0.00257 |       0.00000 |       0.00595 |       0.00550 |       0.64865
     -0.00410 |       0.00000 |       0.00581 |       0.00580 |       0.64669
     -0.00252 |       0.00000 |       0.00580 |       0.00583 |       0.64681
     -0.00252 |       0.00000 |       0.00584 |       0.00603 |       0.64930
Evaluating losses...
     -0.00328 |       0.00000 |       0.00566 |       0.00576 |      

     -0.00429 |       0.00000 |       0.00850 |       0.00570 |       0.70651
     -0.00433 |       0.00000 |       0.00833 |       0.00601 |       0.70385
     -0.00463 |       0.00000 |       0.00857 |       0.00619 |       0.70363
Evaluating losses...
     -0.00438 |       0.00000 |       0.00820 |       0.00593 |       0.70482
-----------------------------------
| EpLenMean       | 2.19e+03      |
| EpRewMean       | -4.43         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1438          |
| TimeElapsed     | 1.4e+03       |
| TimestepsSoFar  | 1241088       |
| ev_tdlam_before | 0.603         |
| loss_ent        | 0.70482165    |
| loss_kl         | 0.0059267893  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0043801577 |
| loss_vf_loss    | 0.008204806   |
-----------------------------------
********** Iteration 303 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00357 |       0.00000 |  

********** Iteration 308 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00083 |       0.00000 |       0.00873 |       0.00579 |       0.80564
     -0.00178 |       0.00000 |       0.00800 |       0.00890 |       0.81021
     -0.00286 |       0.00000 |       0.00766 |       0.00864 |       0.80923
     -0.00161 |       0.00000 |       0.00748 |       0.00860 |       0.80960
     -0.00215 |       0.00000 |       0.00738 |       0.00777 |       0.81117
     -0.00481 |       0.00000 |       0.00713 |       0.00883 |       0.81279
     -0.00426 |       0.00000 |       0.00695 |       0.00807 |       0.81402
     -0.00549 |       0.00000 |       0.00693 |       0.00891 |       0.81455
     -0.00492 |       0.00000 |       0.00681 |       0.00889 |       0.81439
     -0.00441 |       0.00000 |       0.00679 |       0.01056 |       0.81496
Evaluating losses...
     -0.00794 |       0.00000 |       0.00664 |       0.00960 |      

     -0.00466 |       0.00000 |       0.00771 |       0.00673 |       0.71161
     -0.00301 |       0.00000 |       0.00747 |       0.00654 |       0.71048
     -0.00418 |       0.00000 |       0.00750 |       0.00597 |       0.71248
Evaluating losses...
     -0.00379 |       0.00000 |       0.00731 |       0.00637 |       0.71154
----------------------------------
| EpLenMean       | 2.4e+03      |
| EpRewMean       | -4.22        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 1455         |
| TimeElapsed     | 1.46e+03     |
| TimestepsSoFar  | 1286144      |
| ev_tdlam_before | 0.635        |
| loss_ent        | 0.71153754   |
| loss_kl         | 0.0063732103 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003789871 |
| loss_vf_loss    | 0.007313012  |
----------------------------------
********** Iteration 314 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00147 |       0.00000 |       0.00436 |

********** Iteration 319 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00204 |       0.00000 |       0.00904 |       0.00387 |       0.70272
      0.00228 |       0.00000 |       0.00836 |       0.00421 |       0.69842
     -0.00029 |       0.00000 |       0.00793 |       0.00521 |       0.69412
     -0.00038 |       0.00000 |       0.00790 |       0.00552 |       0.69225
     -0.00160 |       0.00000 |       0.00788 |       0.00568 |       0.69263
     -0.00256 |       0.00000 |       0.00756 |       0.00520 |       0.69615
     -0.00265 |       0.00000 |       0.00750 |       0.00487 |       0.69765
     -0.00193 |       0.00000 |       0.00743 |       0.00543 |       0.69689
     -0.00325 |       0.00000 |       0.00734 |       0.00595 |       0.69525
     -0.00225 |       0.00000 |       0.00735 |       0.00621 |       0.69298
Evaluating losses...
     -0.00404 |       0.00000 |       0.00712 |       0.00570 |      

     -0.00221 |       0.00000 |       0.00788 |       0.00586 |       0.69557
     -0.00346 |       0.00000 |       0.00780 |       0.00640 |       0.69523
     -0.00270 |       0.00000 |       0.00755 |       0.00632 |       0.69418
Evaluating losses...
     -0.00272 |       0.00000 |       0.00753 |       0.00631 |       0.69390
-----------------------------------
| EpLenMean       | 2.51e+03      |
| EpRewMean       | -4.04         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1470          |
| TimeElapsed     | 1.51e+03      |
| TimestepsSoFar  | 1331200       |
| ev_tdlam_before | 0.52          |
| loss_ent        | 0.69389576    |
| loss_kl         | 0.006309024   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0027211246 |
| loss_vf_loss    | 0.0075253234  |
-----------------------------------
********** Iteration 325 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00401 |       0.00000 |  

********** Iteration 330 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00183 |       0.00000 |       0.00830 |       0.00409 |       0.72738
     -0.00075 |       0.00000 |       0.00799 |       0.00486 |       0.72369
    -9.60e-05 |       0.00000 |       0.00766 |       0.00578 |       0.72093
     -0.00213 |       0.00000 |       0.00736 |       0.00556 |       0.72014
     -0.00176 |       0.00000 |       0.00718 |       0.00643 |       0.71885
     -0.00394 |       0.00000 |       0.00709 |       0.00606 |       0.71930
     -0.00336 |       0.00000 |       0.00682 |       0.00555 |       0.71864
     -0.00293 |       0.00000 |       0.00682 |       0.00605 |       0.71682
     -0.00351 |       0.00000 |       0.00669 |       0.00580 |       0.71755
     -0.00433 |       0.00000 |       0.00634 |       0.00635 |       0.71743
Evaluating losses...
     -0.00306 |       0.00000 |       0.00631 |       0.00636 |      

     -0.00241 |       0.00000 |       0.00520 |       0.00646 |       0.65363
     -0.00370 |       0.00000 |       0.00525 |       0.00646 |       0.65316
     -0.00397 |       0.00000 |       0.00504 |       0.00631 |       0.65307
Evaluating losses...
     -0.00392 |       0.00000 |       0.00491 |       0.00634 |       0.65123
-----------------------------------
| EpLenMean       | 2.6e+03       |
| EpRewMean       | -3.83         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1486          |
| TimeElapsed     | 1.56e+03      |
| TimestepsSoFar  | 1376256       |
| ev_tdlam_before | 0.371         |
| loss_ent        | 0.6512323     |
| loss_kl         | 0.006341909   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0039226357 |
| loss_vf_loss    | 0.0049059493  |
-----------------------------------
********** Iteration 336 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00188 |       0.00000 |  

********** Iteration 341 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00063 |       0.00000 |       0.01371 |       0.00388 |       0.65956
      0.00144 |       0.00000 |       0.01230 |       0.00405 |       0.66045
     -0.00094 |       0.00000 |       0.01169 |       0.00432 |       0.65853
     -0.00060 |       0.00000 |       0.01122 |       0.00455 |       0.65805
     -0.00106 |       0.00000 |       0.01078 |       0.00472 |       0.65707
     -0.00275 |       0.00000 |       0.01075 |       0.00477 |       0.65847
     -0.00196 |       0.00000 |       0.01031 |       0.00516 |       0.65856
     -0.00333 |       0.00000 |       0.01047 |       0.00495 |       0.65982
     -0.00284 |       0.00000 |       0.01003 |       0.00506 |       0.65956
     -0.00196 |       0.00000 |       0.01020 |       0.00547 |       0.65929
Evaluating losses...
     -0.00294 |       0.00000 |       0.00989 |       0.00528 |      

      0.00016 |       0.00000 |       0.00206 |       0.00507 |       0.66132
     -0.00189 |       0.00000 |       0.00196 |       0.00539 |       0.66206
     -0.00069 |       0.00000 |       0.00193 |       0.00603 |       0.66357
Evaluating losses...
     -0.00364 |       0.00000 |       0.00183 |       0.00578 |       0.66329
-----------------------------------
| EpLenMean       | 2.7e+03       |
| EpRewMean       | -3.47         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1502          |
| TimeElapsed     | 1.61e+03      |
| TimestepsSoFar  | 1421312       |
| ev_tdlam_before | 0.454         |
| loss_ent        | 0.66328925    |
| loss_kl         | 0.00577654    |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0036399923 |
| loss_vf_loss    | 0.0018276573  |
-----------------------------------
********** Iteration 347 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00228 |       0.00000 |  

********** Iteration 352 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00071 |       0.00000 |       0.00499 |       0.00502 |       0.73401
     -0.00141 |       0.00000 |       0.00436 |       0.00654 |       0.73674
     -0.00179 |       0.00000 |       0.00388 |       0.00647 |       0.73727
     -0.00358 |       0.00000 |       0.00370 |       0.00563 |       0.73628
     -0.00165 |       0.00000 |       0.00344 |       0.00698 |       0.73671
     -0.00420 |       0.00000 |       0.00343 |       0.00630 |       0.73743
     -0.00439 |       0.00000 |       0.00333 |       0.00648 |       0.73506
     -0.00614 |       0.00000 |       0.00310 |       0.00621 |       0.73431
     -0.00385 |       0.00000 |       0.00315 |       0.00718 |       0.73377
     -0.00437 |       0.00000 |       0.00304 |       0.00652 |       0.73461
Evaluating losses...
     -0.00511 |       0.00000 |       0.00305 |       0.00610 |      

     -0.00440 |       0.00000 |       0.00882 |       0.00615 |       0.69630
     -0.00492 |       0.00000 |       0.00886 |       0.00632 |       0.69507
     -0.00384 |       0.00000 |       0.00866 |       0.00733 |       0.69742
Evaluating losses...
     -0.00403 |       0.00000 |       0.00844 |       0.00682 |       0.69769
-----------------------------------
| EpLenMean       | 2.77e+03      |
| EpRewMean       | -3.18         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1518          |
| TimeElapsed     | 1.66e+03      |
| TimestepsSoFar  | 1466368       |
| ev_tdlam_before | 0.47          |
| loss_ent        | 0.69769496    |
| loss_kl         | 0.006817547   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0040291543 |
| loss_vf_loss    | 0.008435046   |
-----------------------------------
********** Iteration 358 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00029 |       0.00000 |  

********** Iteration 363 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00180 |       0.00000 |       0.00300 |       0.00369 |       0.59093
     -0.00166 |       0.00000 |       0.00267 |       0.00476 |       0.59319
     -0.00084 |       0.00000 |       0.00257 |       0.00523 |       0.59416
      0.00071 |       0.00000 |       0.00247 |       0.00647 |       0.59436
     -0.00183 |       0.00000 |       0.00238 |       0.00562 |       0.59675
     -0.00154 |       0.00000 |       0.00226 |       0.00614 |       0.59619
     -0.00402 |       0.00000 |       0.00228 |       0.00595 |       0.59815
     -0.00425 |       0.00000 |       0.00216 |       0.00662 |       0.60029
     -0.00428 |       0.00000 |       0.00211 |       0.00586 |       0.60023
     -0.00322 |       0.00000 |       0.00212 |       0.00670 |       0.60184
Evaluating losses...
     -0.00470 |       0.00000 |       0.00199 |       0.00659 |      

********** Iteration 374 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.00908 |       0.00308 |       0.61278
     -0.00141 |       0.00000 |       0.00829 |       0.00390 |       0.61340
     -0.00200 |       0.00000 |       0.00794 |       0.00387 |       0.61183
     -0.00331 |       0.00000 |       0.00765 |       0.00467 |       0.61173
     -0.00263 |       0.00000 |       0.00740 |       0.00427 |       0.61018
     -0.00329 |       0.00000 |       0.00729 |       0.00431 |       0.60956
     -0.00439 |       0.00000 |       0.00704 |       0.00479 |       0.60956
     -0.00378 |       0.00000 |       0.00689 |       0.00478 |       0.60899
     -0.00368 |       0.00000 |       0.00682 |       0.00476 |       0.60899
     -0.00482 |       0.00000 |       0.00679 |       0.00483 |       0.61036
Evaluating losses...
     -0.00352 |       0.00000 |       0.00661 |       0.00473 |      

     -0.00376 |       0.00000 |       0.00540 |       0.00489 |       0.63550
     -0.00237 |       0.00000 |       0.00510 |       0.00524 |       0.63707
     -0.00268 |       0.00000 |       0.00518 |       0.00505 |       0.63635
Evaluating losses...
     -0.00284 |       0.00000 |       0.00495 |       0.00577 |       0.63636
-----------------------------------
| EpLenMean       | 2.9e+03       |
| EpRewMean       | -2.67         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1547          |
| TimeElapsed     | 1.77e+03      |
| TimestepsSoFar  | 1556480       |
| ev_tdlam_before | 0.576         |
| loss_ent        | 0.6363559     |
| loss_kl         | 0.005771026   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0028383941 |
| loss_vf_loss    | 0.0049519297  |
-----------------------------------
********** Iteration 380 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00074 |       0.00000 |  

********** Iteration 385 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00172 |       0.00000 |       0.00633 |       0.00336 |       0.67554
      0.00057 |       0.00000 |       0.00551 |       0.00342 |       0.67725
     -0.00099 |       0.00000 |       0.00513 |       0.00344 |       0.67797
      0.00040 |       0.00000 |       0.00475 |       0.00370 |       0.67872
     -0.00134 |       0.00000 |       0.00432 |       0.00377 |       0.67884
     -0.00066 |       0.00000 |       0.00430 |       0.00389 |       0.67930
     -0.00088 |       0.00000 |       0.00416 |       0.00415 |       0.67974
     -0.00020 |       0.00000 |       0.00399 |       0.00438 |       0.67846
     -0.00154 |       0.00000 |       0.00382 |       0.00454 |       0.67890
     -0.00184 |       0.00000 |       0.00383 |       0.00472 |       0.67795
Evaluating losses...
     -0.00327 |       0.00000 |       0.00374 |       0.00433 |      

     -0.00162 |       0.00000 |       0.00385 |       0.00388 |       0.62222
     -0.00251 |       0.00000 |       0.00390 |       0.00403 |       0.62151
     -0.00423 |       0.00000 |       0.00372 |       0.00398 |       0.62203
Evaluating losses...
     -0.00339 |       0.00000 |       0.00360 |       0.00397 |       0.62227
-----------------------------------
| EpLenMean       | 2.9e+03       |
| EpRewMean       | -2.44         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1564          |
| TimeElapsed     | 1.82e+03      |
| TimestepsSoFar  | 1601536       |
| ev_tdlam_before | 0.503         |
| loss_ent        | 0.62227196    |
| loss_kl         | 0.0039655548  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0033927737 |
| loss_vf_loss    | 0.0036014947  |
-----------------------------------
********** Iteration 391 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00237 |       0.00000 |  

********** Iteration 396 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00087 |       0.00000 |       0.00568 |       0.00358 |       0.71818
      0.00023 |       0.00000 |       0.00510 |       0.00439 |       0.71579
     -0.00243 |       0.00000 |       0.00480 |       0.00521 |       0.71393
      0.00022 |       0.00000 |       0.00460 |       0.00465 |       0.71553
     -0.00251 |       0.00000 |       0.00443 |       0.00568 |       0.71462
     -0.00203 |       0.00000 |       0.00427 |       0.00528 |       0.71550
     -0.00419 |       0.00000 |       0.00426 |       0.00497 |       0.71597
     -0.00384 |       0.00000 |       0.00409 |       0.00567 |       0.71616
     -0.00327 |       0.00000 |       0.00406 |       0.00497 |       0.71690
     -0.00449 |       0.00000 |       0.00412 |       0.00540 |       0.71659
Evaluating losses...
     -0.00290 |       0.00000 |       0.00396 |       0.00549 |      

     -0.00220 |       0.00000 |       0.00728 |       0.00440 |       0.63744
     -0.00212 |       0.00000 |       0.00695 |       0.00444 |       0.63997
     -0.00251 |       0.00000 |       0.00681 |       0.00470 |       0.64091
Evaluating losses...
     -0.00215 |       0.00000 |       0.00672 |       0.00474 |       0.64037
-----------------------------------
| EpLenMean       | 2.92e+03      |
| EpRewMean       | -2.06         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1579          |
| TimeElapsed     | 1.87e+03      |
| TimestepsSoFar  | 1646592       |
| ev_tdlam_before | 0.374         |
| loss_ent        | 0.6403723     |
| loss_kl         | 0.004744348   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0021471214 |
| loss_vf_loss    | 0.00671939    |
-----------------------------------
********** Iteration 402 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00101 |       0.00000 |  

********** Iteration 407 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00013 |       0.00000 |       0.00839 |       0.00309 |       0.64979
     -0.00202 |       0.00000 |       0.00724 |       0.00394 |       0.64702
     -0.00119 |       0.00000 |       0.00690 |       0.00432 |       0.64710
     -0.00340 |       0.00000 |       0.00650 |       0.00450 |       0.64684
     -0.00304 |       0.00000 |       0.00627 |       0.00517 |       0.64638
     -0.00355 |       0.00000 |       0.00611 |       0.00528 |       0.64726
     -0.00382 |       0.00000 |       0.00605 |       0.00466 |       0.64852
     -0.00410 |       0.00000 |       0.00592 |       0.00489 |       0.64795
     -0.00351 |       0.00000 |       0.00561 |       0.00498 |       0.64948
     -0.00524 |       0.00000 |       0.00564 |       0.00522 |       0.64875
Evaluating losses...
     -0.00480 |       0.00000 |       0.00558 |       0.00500 |      

     -0.00256 |       0.00000 |       0.00297 |       0.00544 |       0.67221
     -0.00217 |       0.00000 |       0.00291 |       0.00563 |       0.67391
     -0.00143 |       0.00000 |       0.00275 |       0.00605 |       0.67283
Evaluating losses...
     -0.00328 |       0.00000 |       0.00294 |       0.00517 |       0.67133
-----------------------------------
| EpLenMean       | 2.94e+03      |
| EpRewMean       | -1.98         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1594          |
| TimeElapsed     | 1.92e+03      |
| TimestepsSoFar  | 1691648       |
| ev_tdlam_before | 0.704         |
| loss_ent        | 0.6713252     |
| loss_kl         | 0.0051680794  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0032840907 |
| loss_vf_loss    | 0.002944162   |
-----------------------------------
********** Iteration 413 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -6.84e-05 |       0.00000 |  

********** Iteration 418 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00136 |       0.00000 |       0.01223 |       0.00323 |       0.64064
     -0.00110 |       0.00000 |       0.01129 |       0.00424 |       0.63949
      0.00037 |       0.00000 |       0.01039 |       0.00489 |       0.63765
     -0.00053 |       0.00000 |       0.00996 |       0.00545 |       0.63585
     -0.00378 |       0.00000 |       0.00970 |       0.00464 |       0.63503
     -0.00134 |       0.00000 |       0.00959 |       0.00547 |       0.63483
     -0.00418 |       0.00000 |       0.00925 |       0.00512 |       0.63480
     -0.00376 |       0.00000 |       0.00894 |       0.00551 |       0.63467
     -0.00446 |       0.00000 |       0.00893 |       0.00556 |       0.63512
     -0.00502 |       0.00000 |       0.00891 |       0.00507 |       0.63489
Evaluating losses...
     -0.00420 |       0.00000 |       0.00857 |       0.00568 |      

     -0.00369 |       0.00000 |       0.00727 |       0.00394 |       0.56854
     -0.00215 |       0.00000 |       0.00708 |       0.00391 |       0.56742
     -0.00344 |       0.00000 |       0.00692 |       0.00435 |       0.56825
Evaluating losses...
     -0.00349 |       0.00000 |       0.00677 |       0.00487 |       0.56832
-----------------------------------
| EpLenMean       | 2.97e+03      |
| EpRewMean       | -1.91         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1609          |
| TimeElapsed     | 1.97e+03      |
| TimestepsSoFar  | 1736704       |
| ev_tdlam_before | 0.553         |
| loss_ent        | 0.56832266    |
| loss_kl         | 0.004872963   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0034860787 |
| loss_vf_loss    | 0.006771838   |
-----------------------------------
********** Iteration 424 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00106 |       0.00000 |  

********** Iteration 429 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00244 |       0.00000 |       0.00394 |       0.00292 |       0.61585
      0.00142 |       0.00000 |       0.00365 |       0.00427 |       0.61254
     -0.00198 |       0.00000 |       0.00335 |       0.00353 |       0.61287
     -0.00148 |       0.00000 |       0.00318 |       0.00351 |       0.61230
     -0.00169 |       0.00000 |       0.00311 |       0.00369 |       0.61358
     -0.00134 |       0.00000 |       0.00302 |       0.00397 |       0.61281
     -0.00148 |       0.00000 |       0.00298 |       0.00427 |       0.61097
     -0.00243 |       0.00000 |       0.00295 |       0.00417 |       0.61271
     -0.00189 |       0.00000 |       0.00292 |       0.00429 |       0.61191
     -0.00245 |       0.00000 |       0.00281 |       0.00424 |       0.61116
Evaluating losses...
     -0.00239 |       0.00000 |       0.00268 |       0.00407 |      

     -0.00157 |       0.00000 |       0.00595 |       0.00402 |       0.62337
     -0.00278 |       0.00000 |       0.00576 |       0.00460 |       0.62386
     -0.00314 |       0.00000 |       0.00568 |       0.00479 |       0.62223
Evaluating losses...
     -0.00205 |       0.00000 |       0.00547 |       0.00520 |       0.62258
-----------------------------------
| EpLenMean       | 2.99e+03      |
| EpRewMean       | -1.9          |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1623          |
| TimeElapsed     | 2.03e+03      |
| TimestepsSoFar  | 1781760       |
| ev_tdlam_before | 0.439         |
| loss_ent        | 0.622576      |
| loss_kl         | 0.0052015954  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0020463641 |
| loss_vf_loss    | 0.005473883   |
-----------------------------------
********** Iteration 435 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00152 |       0.00000 |  

********** Iteration 440 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00293 |       0.00000 |       0.00126 |       0.00291 |       0.64375
      0.00118 |       0.00000 |       0.00087 |       0.00312 |       0.64211
      0.00018 |       0.00000 |       0.00074 |       0.00357 |       0.64205
     -0.00124 |       0.00000 |       0.00065 |       0.00352 |       0.64090
     -0.00011 |       0.00000 |       0.00059 |       0.00387 |       0.64142
     -0.00152 |       0.00000 |       0.00057 |       0.00430 |       0.64002
     -0.00176 |       0.00000 |       0.00052 |       0.00406 |       0.64110
     -0.00150 |       0.00000 |       0.00050 |       0.00432 |       0.64188
     -0.00120 |       0.00000 |       0.00050 |       0.00424 |       0.64150
     -0.00258 |       0.00000 |       0.00045 |       0.00489 |       0.64103
Evaluating losses...
     -0.00323 |       0.00000 |       0.00044 |       0.00452 |      

     -0.00261 |       0.00000 |       0.00199 |       0.00396 |       0.69051
     -0.00306 |       0.00000 |       0.00192 |       0.00407 |       0.69086
     -0.00236 |       0.00000 |       0.00191 |       0.00447 |       0.69089
Evaluating losses...
     -0.00296 |       0.00000 |       0.00188 |       0.00437 |       0.69035
-----------------------------------
| EpLenMean       | 2.98e+03      |
| EpRewMean       | -1.78         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1638          |
| TimeElapsed     | 2.08e+03      |
| TimestepsSoFar  | 1826816       |
| ev_tdlam_before | 0.527         |
| loss_ent        | 0.6903462     |
| loss_kl         | 0.0043710554  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0029582893 |
| loss_vf_loss    | 0.0018790159  |
-----------------------------------
********** Iteration 446 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00108 |       0.00000 |  

********** Iteration 451 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00110 |       0.00000 |       0.00344 |       0.00344 |       0.73869
      0.00090 |       0.00000 |       0.00289 |       0.00347 |       0.73836
     -0.00171 |       0.00000 |       0.00269 |       0.00363 |       0.73785
    -7.36e-05 |       0.00000 |       0.00242 |       0.00397 |       0.73944
     -0.00216 |       0.00000 |       0.00238 |       0.00413 |       0.73818
     -0.00203 |       0.00000 |       0.00221 |       0.00428 |       0.73734
     -0.00212 |       0.00000 |       0.00226 |       0.00430 |       0.73812
     -0.00215 |       0.00000 |       0.00220 |       0.00450 |       0.73860
     -0.00244 |       0.00000 |       0.00209 |       0.00420 |       0.73786
     -0.00144 |       0.00000 |       0.00204 |       0.00452 |       0.73764
Evaluating losses...
     -0.00345 |       0.00000 |       0.00196 |       0.00466 |      

     -0.00187 |       0.00000 |       0.00193 |       0.00347 |       0.68274
     -0.00249 |       0.00000 |       0.00195 |       0.00358 |       0.68242
     -0.00142 |       0.00000 |       0.00207 |       0.00343 |       0.68314
Evaluating losses...
     -0.00216 |       0.00000 |       0.00206 |       0.00383 |       0.68347
----------------------------------
| EpLenMean       | 2.98e+03     |
| EpRewMean       | -1.72        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 1653         |
| TimeElapsed     | 2.13e+03     |
| TimestepsSoFar  | 1871872      |
| ev_tdlam_before | 0.522        |
| loss_ent        | 0.6834721    |
| loss_kl         | 0.0038298168 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002164602 |
| loss_vf_loss    | 0.0020609398 |
----------------------------------
********** Iteration 457 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00429 |       0.00000 |       0.00502 |

********** Iteration 462 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00099 |       0.00000 |       0.00462 |       0.00273 |       0.63578
      0.00336 |       0.00000 |       0.00403 |       0.00319 |       0.63491
      0.00022 |       0.00000 |       0.00380 |       0.00585 |       0.63075
     -0.00058 |       0.00000 |       0.00352 |       0.00396 |       0.63045
     -0.00080 |       0.00000 |       0.00351 |       0.00400 |       0.63165
     -0.00096 |       0.00000 |       0.00347 |       0.00352 |       0.63213
     -0.00211 |       0.00000 |       0.00323 |       0.00408 |       0.63127
     -0.00245 |       0.00000 |       0.00319 |       0.00414 |       0.63059
     -0.00145 |       0.00000 |       0.00328 |       0.00413 |       0.63100
     -0.00234 |       0.00000 |       0.00328 |       0.00412 |       0.63106
Evaluating losses...
     -0.00216 |       0.00000 |       0.00317 |       0.00477 |      

     -0.00236 |       0.00000 |       0.00191 |       0.00403 |       0.66369
     -0.00311 |       0.00000 |       0.00190 |       0.00346 |       0.66407
     -0.00216 |       0.00000 |       0.00183 |       0.00361 |       0.66512
Evaluating losses...
     -0.00282 |       0.00000 |       0.00180 |       0.00370 |       0.66471
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -1.54        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 1668         |
| TimeElapsed     | 2.18e+03     |
| TimestepsSoFar  | 1916928      |
| ev_tdlam_before | 0.0591       |
| loss_ent        | 0.6647096    |
| loss_kl         | 0.003699225  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002820317 |
| loss_vf_loss    | 0.001798268  |
----------------------------------
********** Iteration 468 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00072 |       0.00000 |       0.00540 |

********** Iteration 473 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00148 |       0.00000 |       0.00671 |       0.00229 |       0.55363
     -0.00025 |       0.00000 |       0.00571 |       0.00304 |       0.55517
     -0.00182 |       0.00000 |       0.00549 |       0.00326 |       0.55490
     -0.00133 |       0.00000 |       0.00518 |       0.00318 |       0.55327
     -0.00111 |       0.00000 |       0.00501 |       0.00367 |       0.55393
     -0.00245 |       0.00000 |       0.00481 |       0.00353 |       0.55393
     -0.00233 |       0.00000 |       0.00468 |       0.00387 |       0.55290
     -0.00253 |       0.00000 |       0.00447 |       0.00368 |       0.55402
     -0.00318 |       0.00000 |       0.00445 |       0.00392 |       0.55368
     -0.00326 |       0.00000 |       0.00442 |       0.00359 |       0.55268
Evaluating losses...
     -0.00401 |       0.00000 |       0.00438 |       0.00396 |      

     -0.00086 |       0.00000 |       0.00151 |       0.00333 |       0.61445
     -0.00163 |       0.00000 |       0.00145 |       0.00338 |       0.61399
     -0.00152 |       0.00000 |       0.00144 |       0.00313 |       0.61495
Evaluating losses...
     -0.00184 |       0.00000 |       0.00136 |       0.00340 |       0.61439
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -1.52         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1683          |
| TimeElapsed     | 2.23e+03      |
| TimestepsSoFar  | 1961984       |
| ev_tdlam_before | 0.161         |
| loss_ent        | 0.61438745    |
| loss_kl         | 0.0033999374  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0018409123 |
| loss_vf_loss    | 0.0013552145  |
-----------------------------------
********** Iteration 479 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00117 |       0.00000 |  

********** Iteration 484 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00095 |       0.00000 |       0.00784 |       0.00220 |       0.53681
      0.00132 |       0.00000 |       0.00638 |       0.00242 |       0.53603
      0.00123 |       0.00000 |       0.00583 |       0.00349 |       0.53647
      0.00057 |       0.00000 |       0.00549 |       0.00288 |       0.53516
      0.00042 |       0.00000 |       0.00513 |       0.00378 |       0.53425
     -0.00118 |       0.00000 |       0.00492 |       0.00374 |       0.53513
      0.00044 |       0.00000 |       0.00479 |       0.00343 |       0.53410
     -0.00148 |       0.00000 |       0.00451 |       0.00372 |       0.53316
     -0.00144 |       0.00000 |       0.00446 |       0.00388 |       0.53344
     -0.00244 |       0.00000 |       0.00444 |       0.00377 |       0.53246
Evaluating losses...
     -0.00208 |       0.00000 |       0.00412 |       0.00407 |      

     -0.00062 |       0.00000 |       0.00369 |       0.00453 |       0.72970
     -0.00179 |       0.00000 |       0.00348 |       0.00458 |       0.72910
     -0.00272 |       0.00000 |       0.00347 |       0.00439 |       0.72986
     -0.00171 |       0.00000 |       0.00358 |       0.00465 |       0.73030
Evaluating losses...
     -0.00262 |       0.00000 |       0.00333 |       0.00466 |       0.73001
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -1.43         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1698          |
| TimeElapsed     | 2.29e+03      |
| TimestepsSoFar  | 2007040       |
| ev_tdlam_before | 0.498         |
| loss_ent        | 0.7300084     |
| loss_kl         | 0.004662621   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0026244572 |
| loss_vf_loss    | 0.00333196    |
-----------------------------------
********** Iteration 490 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 495 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00191 |       0.00000 |       0.00552 |       0.00315 |       0.66213
     -0.00245 |       0.00000 |       0.00494 |       0.00405 |       0.66276
     -0.00413 |       0.00000 |       0.00462 |       0.00473 |       0.66209
     -0.00563 |       0.00000 |       0.00451 |       0.00522 |       0.66270
     -0.00577 |       0.00000 |       0.00438 |       0.00526 |       0.66279
     -0.00515 |       0.00000 |       0.00420 |       0.00539 |       0.66320
     -0.00590 |       0.00000 |       0.00410 |       0.00612 |       0.66304
     -0.00578 |       0.00000 |       0.00406 |       0.00572 |       0.66228
     -0.00657 |       0.00000 |       0.00413 |       0.00569 |       0.66251
     -0.00738 |       0.00000 |       0.00400 |       0.00617 |       0.66302
Evaluating losses...
     -0.00743 |       0.00000 |       0.00397 |       0.00592 |      

     -0.00127 |       0.00000 |       0.00066 |       0.00342 |       0.59727
     -0.00056 |       0.00000 |       0.00063 |       0.00358 |       0.59549
     -0.00265 |       0.00000 |       0.00062 |       0.00341 |       0.59456
Evaluating losses...
     -0.00248 |       0.00000 |       0.00058 |       0.00353 |       0.59457
----------------------------------
| EpLenMean       | 3.02e+03     |
| EpRewMean       | -1.26        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 1713         |
| TimeElapsed     | 2.34e+03     |
| TimestepsSoFar  | 2052096      |
| ev_tdlam_before | -0.575       |
| loss_ent        | 0.59456706   |
| loss_kl         | 0.0035263554 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002480633 |
| loss_vf_loss    | 0.0005819    |
----------------------------------
********** Iteration 501 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00015 |       0.00000 |       0.00282 |

********** Iteration 506 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00247 |       0.00000 |       0.00709 |       0.00232 |       0.63865
      0.00052 |       0.00000 |       0.00645 |       0.00303 |       0.64018
      0.00135 |       0.00000 |       0.00601 |       0.00305 |       0.63954
     -0.00165 |       0.00000 |       0.00591 |       0.00349 |       0.64194
     -0.00234 |       0.00000 |       0.00563 |       0.00338 |       0.64058
     -0.00058 |       0.00000 |       0.00563 |       0.00364 |       0.64095
      0.00043 |       0.00000 |       0.00552 |       0.00355 |       0.64230
     -0.00188 |       0.00000 |       0.00540 |       0.00382 |       0.64282
     -0.00168 |       0.00000 |       0.00528 |       0.00397 |       0.64269
     -0.00024 |       0.00000 |       0.00516 |       0.00392 |       0.64286
Evaluating losses...
     -0.00448 |       0.00000 |       0.00506 |       0.00403 |      

     -0.00153 |       0.00000 |       0.00080 |       0.00344 |       0.65798
     -0.00064 |       0.00000 |       0.00066 |       0.00366 |       0.65836
     -0.00227 |       0.00000 |       0.00078 |       0.00361 |       0.65801
Evaluating losses...
     -0.00224 |       0.00000 |       0.00073 |       0.00338 |       0.65875
-----------------------------------
| EpLenMean       | 3e+03         |
| EpRewMean       | -1.19         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1728          |
| TimeElapsed     | 2.39e+03      |
| TimestepsSoFar  | 2097152       |
| ev_tdlam_before | 0.591         |
| loss_ent        | 0.658753      |
| loss_kl         | 0.0033808788  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0022446017 |
| loss_vf_loss    | 0.0007278108  |
-----------------------------------
********** Iteration 512 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00064 |       0.00000 |  

********** Iteration 517 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00016 |       0.00000 |       0.00357 |       0.00255 |       0.70816
      0.00076 |       0.00000 |       0.00318 |       0.00286 |       0.70709
      0.00052 |       0.00000 |       0.00310 |       0.00345 |       0.70435
      0.00039 |       0.00000 |       0.00291 |       0.00333 |       0.70501
     -0.00122 |       0.00000 |       0.00272 |       0.00326 |       0.70508
     -0.00214 |       0.00000 |       0.00257 |       0.00318 |       0.70447
     -0.00070 |       0.00000 |       0.00267 |       0.00345 |       0.70511
     -0.00186 |       0.00000 |       0.00255 |       0.00365 |       0.70532
     -0.00119 |       0.00000 |       0.00255 |       0.00385 |       0.70354
     -0.00218 |       0.00000 |       0.00243 |       0.00387 |       0.70329
Evaluating losses...
     -0.00208 |       0.00000 |       0.00250 |       0.00376 |      

     -0.00152 |       0.00000 |       0.00055 |       0.00416 |       0.69710
     -0.00193 |       0.00000 |       0.00058 |       0.00420 |       0.69710
     -0.00028 |       0.00000 |       0.00060 |       0.00566 |       0.69568
Evaluating losses...
     -0.00124 |       0.00000 |       0.00058 |       0.00593 |       0.69532
-----------------------------------
| EpLenMean       | 3e+03         |
| EpRewMean       | -1.15         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1743          |
| TimeElapsed     | 2.44e+03      |
| TimestepsSoFar  | 2142208       |
| ev_tdlam_before | 0.478         |
| loss_ent        | 0.6953191     |
| loss_kl         | 0.0059345216  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012388832 |
| loss_vf_loss    | 0.0005821511  |
-----------------------------------
********** Iteration 523 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00150 |       0.00000 |  

********** Iteration 528 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00096 |       0.00000 |       0.00433 |       0.00310 |       0.64422
     -0.00126 |       0.00000 |       0.00406 |       0.00318 |       0.64433
     -0.00119 |       0.00000 |       0.00395 |       0.00345 |       0.64431
     -0.00270 |       0.00000 |       0.00382 |       0.00337 |       0.64465
     -0.00197 |       0.00000 |       0.00381 |       0.00395 |       0.64455
     -0.00381 |       0.00000 |       0.00367 |       0.00364 |       0.64505
     -0.00167 |       0.00000 |       0.00370 |       0.00371 |       0.64496
     -0.00361 |       0.00000 |       0.00362 |       0.00372 |       0.64360
     -0.00460 |       0.00000 |       0.00356 |       0.00422 |       0.64580
     -0.00336 |       0.00000 |       0.00358 |       0.00425 |       0.64567
Evaluating losses...
     -0.00371 |       0.00000 |       0.00350 |       0.00369 |      

     -0.00277 |       0.00000 |       0.00092 |       0.00328 |       0.69808
     -0.00309 |       0.00000 |       0.00085 |       0.00330 |       0.69776
     -0.00402 |       0.00000 |       0.00092 |       0.00336 |       0.69866
Evaluating losses...
     -0.00237 |       0.00000 |       0.00085 |       0.00336 |       0.69866
-----------------------------------
| EpLenMean       | 3e+03         |
| EpRewMean       | -1.12         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1758          |
| TimeElapsed     | 2.49e+03      |
| TimestepsSoFar  | 2187264       |
| ev_tdlam_before | 0.72          |
| loss_ent        | 0.69866145    |
| loss_kl         | 0.0033565536  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0023657435 |
| loss_vf_loss    | 0.0008486516  |
-----------------------------------
********** Iteration 534 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00017 |       0.00000 |  

********** Iteration 539 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00395 |       0.00000 |       0.00369 |       0.00243 |       0.64010
      0.00030 |       0.00000 |       0.00332 |       0.00248 |       0.64223
     -0.00228 |       0.00000 |       0.00306 |       0.00308 |       0.64304
     -0.00097 |       0.00000 |       0.00285 |       0.00353 |       0.64450
    -3.10e-05 |       0.00000 |       0.00279 |       0.00391 |       0.64439
     -0.00162 |       0.00000 |       0.00267 |       0.00483 |       0.64530
     -0.00110 |       0.00000 |       0.00259 |       0.00488 |       0.64567
     -0.00158 |       0.00000 |       0.00263 |       0.00595 |       0.64656
     -0.00274 |       0.00000 |       0.00245 |       0.00516 |       0.64613
     -0.00249 |       0.00000 |       0.00238 |       0.00518 |       0.64476
Evaluating losses...
     -0.00281 |       0.00000 |       0.00225 |       0.00535 |      

     -0.00012 |       0.00000 |       0.00077 |       0.00267 |       0.64104
     -0.00023 |       0.00000 |       0.00067 |       0.00268 |       0.64150
     -0.00021 |       0.00000 |       0.00060 |       0.00287 |       0.64159
Evaluating losses...
     -0.00024 |       0.00000 |       0.00058 |       0.00288 |       0.64182
------------------------------------
| EpLenMean       | 3e+03          |
| EpRewMean       | -1.08          |
| EpThisIter      | 1              |
| EpisodesSoFar   | 1773           |
| TimeElapsed     | 2.54e+03       |
| TimestepsSoFar  | 2232320        |
| ev_tdlam_before | 0.131          |
| loss_ent        | 0.6418201      |
| loss_kl         | 0.002875855    |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00023662625 |
| loss_vf_loss    | 0.00058013166  |
------------------------------------
********** Iteration 545 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00025 |    

********** Iteration 550 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00163 |       0.00000 |       0.00048 |       0.00182 |       0.53591
      0.00045 |       0.00000 |       0.00036 |       0.00181 |       0.53594
      0.00080 |       0.00000 |       0.00032 |       0.00191 |       0.53563
     -0.00018 |       0.00000 |       0.00030 |       0.00202 |       0.53506
     -0.00088 |       0.00000 |       0.00026 |       0.00218 |       0.53579
     -0.00103 |       0.00000 |       0.00024 |       0.00221 |       0.53466
      0.00060 |       0.00000 |       0.00022 |       0.00210 |       0.53456
     -0.00220 |       0.00000 |       0.00022 |       0.00216 |       0.53530
     -0.00130 |       0.00000 |       0.00021 |       0.00222 |       0.53532
     -0.00165 |       0.00000 |       0.00022 |       0.00243 |       0.53495
Evaluating losses...
     -0.00234 |       0.00000 |       0.00019 |       0.00252 |      

     -0.00081 |       0.00000 |       0.00169 |       0.00273 |       0.67992
     -0.00066 |       0.00000 |       0.00160 |       0.00290 |       0.67942
     -0.00030 |       0.00000 |       0.00152 |       0.00326 |       0.67910
Evaluating losses...
     -0.00138 |       0.00000 |       0.00148 |       0.00318 |       0.67993
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.97         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1788          |
| TimeElapsed     | 2.61e+03      |
| TimestepsSoFar  | 2277376       |
| ev_tdlam_before | 0.222         |
| loss_ent        | 0.6799328     |
| loss_kl         | 0.0031811092  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0013815896 |
| loss_vf_loss    | 0.0014848612  |
-----------------------------------
********** Iteration 556 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00196 |       0.00000 |  

********** Iteration 561 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00107 |       0.00000 |       0.00784 |       0.00199 |       0.59488
     -0.00087 |       0.00000 |       0.00647 |       0.00246 |       0.59429
     -0.00142 |       0.00000 |       0.00588 |       0.00281 |       0.59374
     -0.00238 |       0.00000 |       0.00572 |       0.00272 |       0.59285
     -0.00068 |       0.00000 |       0.00557 |       0.00289 |       0.59234
     -0.00210 |       0.00000 |       0.00551 |       0.00298 |       0.59130
     -0.00200 |       0.00000 |       0.00526 |       0.00333 |       0.59113
     -0.00286 |       0.00000 |       0.00512 |       0.00312 |       0.59080
     -0.00252 |       0.00000 |       0.00501 |       0.00327 |       0.59122
     -0.00231 |       0.00000 |       0.00490 |       0.00310 |       0.59187
Evaluating losses...
     -0.00219 |       0.00000 |       0.00481 |       0.00322 |      

     -0.00487 |       0.00000 |       0.00300 |       0.00315 |       0.56576
     -0.00409 |       0.00000 |       0.00305 |       0.00366 |       0.56522
     -0.00434 |       0.00000 |       0.00296 |       0.00325 |       0.56532
Evaluating losses...
     -0.00556 |       0.00000 |       0.00297 |       0.00337 |       0.56602
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.97         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1803          |
| TimeElapsed     | 2.66e+03      |
| TimestepsSoFar  | 2322432       |
| ev_tdlam_before | 0.429         |
| loss_ent        | 0.5660234     |
| loss_kl         | 0.003367504   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0055606924 |
| loss_vf_loss    | 0.0029659495  |
-----------------------------------
********** Iteration 567 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00293 |       0.00000 |  

********** Iteration 572 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00020 |       0.00000 |       0.00257 |       0.00204 |       0.61416
     -0.00070 |       0.00000 |       0.00228 |       0.00249 |       0.61432
      0.00039 |       0.00000 |       0.00217 |       0.00224 |       0.61545
     -0.00192 |       0.00000 |       0.00207 |       0.00277 |       0.61630
     -0.00273 |       0.00000 |       0.00204 |       0.00261 |       0.61735
     -0.00141 |       0.00000 |       0.00190 |       0.00255 |       0.61739
     -0.00169 |       0.00000 |       0.00192 |       0.00267 |       0.61845
     -0.00282 |       0.00000 |       0.00186 |       0.00305 |       0.61879
     -0.00236 |       0.00000 |       0.00178 |       0.00297 |       0.61910
     -0.00229 |       0.00000 |       0.00183 |       0.00294 |       0.61978
Evaluating losses...
     -0.00227 |       0.00000 |       0.00170 |       0.00306 |      

     -0.00244 |       0.00000 |       0.00451 |       0.00364 |       0.68514
     -0.00202 |       0.00000 |       0.00450 |       0.00461 |       0.68537
     -0.00253 |       0.00000 |       0.00447 |       0.00503 |       0.68542
Evaluating losses...
     -0.00321 |       0.00000 |       0.00440 |       0.00513 |       0.68588
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.96         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1818          |
| TimeElapsed     | 2.71e+03      |
| TimestepsSoFar  | 2367488       |
| ev_tdlam_before | 0.455         |
| loss_ent        | 0.6858848     |
| loss_kl         | 0.0051320796  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0032143202 |
| loss_vf_loss    | 0.004395789   |
-----------------------------------
********** Iteration 578 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00080 |       0.00000 |  

********** Iteration 583 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00122 |       0.00000 |       0.00041 |       0.00268 |       0.75555
      0.00195 |       0.00000 |       0.00033 |       0.00275 |       0.75697
      0.00117 |       0.00000 |       0.00030 |       0.00271 |       0.75693
      0.00010 |       0.00000 |       0.00026 |       0.00286 |       0.75762
     -0.00087 |       0.00000 |       0.00026 |       0.00302 |       0.75642
     -0.00015 |       0.00000 |       0.00023 |       0.00314 |       0.75639
     -0.00084 |       0.00000 |       0.00021 |       0.00312 |       0.75615
     -0.00315 |       0.00000 |       0.00021 |       0.00314 |       0.75668
     -0.00089 |       0.00000 |       0.00020 |       0.00331 |       0.75632
     -0.00145 |       0.00000 |       0.00021 |       0.00350 |       0.75556
Evaluating losses...
     -0.00179 |       0.00000 |       0.00019 |       0.00322 |      

     -0.00057 |       0.00000 |       0.00020 |       0.00236 |       0.64638
     -0.00080 |       0.00000 |       0.00018 |       0.00228 |       0.64753
     -0.00130 |       0.00000 |       0.00017 |       0.00240 |       0.64669
Evaluating losses...
     -0.00053 |       0.00000 |       0.00018 |       0.00259 |       0.64653
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.92         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1833          |
| TimeElapsed     | 2.76e+03      |
| TimestepsSoFar  | 2412544       |
| ev_tdlam_before | -4.2          |
| loss_ent        | 0.6465289     |
| loss_kl         | 0.0025852593  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0005313145 |
| loss_vf_loss    | 0.0001769169  |
-----------------------------------
********** Iteration 589 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00090 |       0.00000 |  

********** Iteration 594 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 |       0.00322 |       0.00190 |       0.63334
      0.00049 |       0.00000 |       0.00273 |       0.00193 |       0.63285
      0.00017 |       0.00000 |       0.00256 |       0.00211 |       0.63416
     -0.00015 |       0.00000 |       0.00234 |       0.00250 |       0.63466
    -7.97e-05 |       0.00000 |       0.00226 |       0.00235 |       0.63481
     -0.00075 |       0.00000 |       0.00216 |       0.00224 |       0.63538
     -0.00073 |       0.00000 |       0.00211 |       0.00244 |       0.63517
     7.65e-05 |       0.00000 |       0.00219 |       0.00259 |       0.63512
      0.00038 |       0.00000 |       0.00208 |       0.00260 |       0.63533
     -0.00115 |       0.00000 |       0.00198 |       0.00267 |       0.63631
Evaluating losses...
     -0.00116 |       0.00000 |       0.00193 |       0.00263 |      

     -0.00269 |       0.00000 |       0.00188 |       0.00288 |       0.64061
     -0.00150 |       0.00000 |       0.00178 |       0.00305 |       0.64180
     -0.00277 |       0.00000 |       0.00182 |       0.00367 |       0.64242
Evaluating losses...
     -0.00295 |       0.00000 |       0.00176 |       0.00372 |       0.64288
----------------------------------
| EpLenMean       | 3.02e+03     |
| EpRewMean       | -0.81        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 1848         |
| TimeElapsed     | 2.8e+03      |
| TimestepsSoFar  | 2457600      |
| ev_tdlam_before | 0.106        |
| loss_ent        | 0.6428751    |
| loss_kl         | 0.0037191506 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002953806 |
| loss_vf_loss    | 0.0017577832 |
----------------------------------
********** Iteration 600 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00030 |       0.00000 |       0.00171 |

********** Iteration 605 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00088 |       0.00000 |       0.00559 |       0.00203 |       0.70047
      0.00048 |       0.00000 |       0.00491 |       0.00220 |       0.70014
     -0.00174 |       0.00000 |       0.00472 |       0.00267 |       0.70026
     -0.00065 |       0.00000 |       0.00467 |       0.00267 |       0.70013
     -0.00145 |       0.00000 |       0.00456 |       0.00259 |       0.69985
     -0.00226 |       0.00000 |       0.00455 |       0.00260 |       0.69940
     -0.00198 |       0.00000 |       0.00437 |       0.00272 |       0.69865
     -0.00265 |       0.00000 |       0.00437 |       0.00275 |       0.69939
     -0.00358 |       0.00000 |       0.00427 |       0.00273 |       0.69920
     -0.00321 |       0.00000 |       0.00414 |       0.00286 |       0.69801
Evaluating losses...
     -0.00442 |       0.00000 |       0.00419 |       0.00277 |      

     -0.00162 |       0.00000 |       0.00325 |       0.00274 |       0.69401
     -0.00031 |       0.00000 |       0.00332 |       0.00332 |       0.69396
     -0.00135 |       0.00000 |       0.00327 |       0.00331 |       0.69249
     -0.00118 |       0.00000 |       0.00322 |       0.00335 |       0.69430
Evaluating losses...
     -0.00206 |       0.00000 |       0.00314 |       0.00333 |       0.69326
----------------------------------
| EpLenMean       | 3.02e+03     |
| EpRewMean       | -0.75        |
| EpThisIter      | 0            |
| EpisodesSoFar   | 1862         |
| TimeElapsed     | 2.87e+03     |
| TimestepsSoFar  | 2502656      |
| ev_tdlam_before | 0.601        |
| loss_ent        | 0.69326353   |
| loss_kl         | 0.0033284167 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002059911 |
| loss_vf_loss    | 0.0031363722 |
----------------------------------
********** Iteration 611 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |

********** Iteration 616 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -6.71e-05 |       0.00000 |       0.00300 |       0.00283 |       0.64102
     -0.00179 |       0.00000 |       0.00237 |       0.00339 |       0.64107
     -0.00105 |       0.00000 |       0.00221 |       0.00360 |       0.64175
     -0.00253 |       0.00000 |       0.00206 |       0.00391 |       0.64182
     -0.00208 |       0.00000 |       0.00200 |       0.00399 |       0.64146
     -0.00390 |       0.00000 |       0.00203 |       0.00474 |       0.64138
     -0.00398 |       0.00000 |       0.00190 |       0.00404 |       0.64169
     -0.00462 |       0.00000 |       0.00184 |       0.00408 |       0.64238
     -0.00394 |       0.00000 |       0.00190 |       0.00475 |       0.64293
     -0.00497 |       0.00000 |       0.00183 |       0.00462 |       0.64276
Evaluating losses...
     -0.00498 |       0.00000 |       0.00173 |       0.00390 |      

     -0.00182 |       0.00000 |       0.00429 |       0.00312 |       0.64807
     -0.00231 |       0.00000 |       0.00415 |       0.00353 |       0.64905
     -0.00319 |       0.00000 |       0.00427 |       0.00342 |       0.64891
Evaluating losses...
     -0.00412 |       0.00000 |       0.00411 |       0.00342 |       0.64920
-----------------------------------
| EpLenMean       | 3.03e+03      |
| EpRewMean       | -0.75         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1877          |
| TimeElapsed     | 2.91e+03      |
| TimestepsSoFar  | 2547712       |
| ev_tdlam_before | 0.56          |
| loss_ent        | 0.6491975     |
| loss_kl         | 0.0034218444  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0041245176 |
| loss_vf_loss    | 0.0041092355  |
-----------------------------------
********** Iteration 622 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00115 |       0.00000 |  

********** Iteration 627 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00231 |       0.00000 |       0.00028 |       0.00190 |       0.66687
      0.00099 |       0.00000 |       0.00022 |       0.00223 |       0.66705
      0.00107 |       0.00000 |       0.00021 |       0.00237 |       0.66583
      0.00114 |       0.00000 |       0.00017 |       0.00225 |       0.66439
      0.00022 |       0.00000 |       0.00015 |       0.00255 |       0.66447
     -0.00083 |       0.00000 |       0.00015 |       0.00238 |       0.66497
      0.00134 |       0.00000 |       0.00013 |       0.00264 |       0.66483
      0.00012 |       0.00000 |       0.00013 |       0.00262 |       0.66396
     -0.00021 |       0.00000 |       0.00012 |       0.00265 |       0.66299
     -0.00199 |       0.00000 |       0.00011 |       0.00292 |       0.66275
Evaluating losses...
     -0.00058 |       0.00000 |       0.00013 |       0.00253 |      

     -0.00165 |       0.00000 |       0.00073 |       0.00240 |       0.63654
     -0.00129 |       0.00000 |       0.00069 |       0.00252 |       0.63626
     -0.00182 |       0.00000 |       0.00062 |       0.00276 |       0.63682
Evaluating losses...
     -0.00199 |       0.00000 |       0.00062 |       0.00273 |       0.63645
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.82         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1892          |
| TimeElapsed     | 2.96e+03      |
| TimestepsSoFar  | 2592768       |
| ev_tdlam_before | 0.102         |
| loss_ent        | 0.6364483     |
| loss_kl         | 0.0027323556  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0019910727 |
| loss_vf_loss    | 0.00062377454 |
-----------------------------------
********** Iteration 633 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00201 |       0.00000 |  

********** Iteration 638 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00077 |       0.00000 |       0.00224 |       0.00223 |       0.74334
     -0.00124 |       0.00000 |       0.00157 |       0.00288 |       0.74188
     -0.00166 |       0.00000 |       0.00139 |       0.00311 |       0.73980
     -0.00328 |       0.00000 |       0.00133 |       0.00299 |       0.73874
     -0.00278 |       0.00000 |       0.00128 |       0.00299 |       0.73690
     -0.00376 |       0.00000 |       0.00125 |       0.00328 |       0.73545
     -0.00307 |       0.00000 |       0.00123 |       0.00311 |       0.73727
     -0.00299 |       0.00000 |       0.00123 |       0.00288 |       0.73622
     -0.00352 |       0.00000 |       0.00116 |       0.00326 |       0.73528
     -0.00394 |       0.00000 |       0.00113 |       0.00340 |       0.73376
Evaluating losses...
     -0.00415 |       0.00000 |       0.00108 |       0.00362 |      

     -0.00053 |       0.00000 |       0.00025 |       0.00279 |       0.74339
     5.25e-05 |       0.00000 |       0.00023 |       0.00268 |       0.74214
     -0.00097 |       0.00000 |       0.00023 |       0.00268 |       0.74198
Evaluating losses...
     -0.00074 |       0.00000 |       0.00022 |       0.00279 |       0.74197
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.81          |
| EpThisIter      | 1              |
| EpisodesSoFar   | 1907           |
| TimeElapsed     | 3.01e+03       |
| TimestepsSoFar  | 2637824        |
| ev_tdlam_before | -1.98          |
| loss_ent        | 0.74197227     |
| loss_kl         | 0.0027869288   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00074474217 |
| loss_vf_loss    | 0.00021945233  |
------------------------------------
********** Iteration 644 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00018 |    

********** Iteration 649 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00059 |       0.00000 |       0.00445 |       0.00136 |       0.58682
     -0.00014 |       0.00000 |       0.00411 |       0.00202 |       0.58350
     -0.00022 |       0.00000 |       0.00371 |       0.00215 |       0.58334
     1.57e-05 |       0.00000 |       0.00365 |       0.00209 |       0.58254
     -0.00045 |       0.00000 |       0.00349 |       0.00244 |       0.58139
     -0.00016 |       0.00000 |       0.00346 |       0.00219 |       0.58139
     -0.00086 |       0.00000 |       0.00325 |       0.00203 |       0.58163
     -0.00089 |       0.00000 |       0.00318 |       0.00231 |       0.58082
     -0.00118 |       0.00000 |       0.00325 |       0.00238 |       0.58002
     -0.00184 |       0.00000 |       0.00314 |       0.00238 |       0.58056
Evaluating losses...
     -0.00153 |       0.00000 |       0.00307 |       0.00219 |      

     -0.00175 |       0.00000 |       0.00138 |       0.00268 |       0.70324
     -0.00219 |       0.00000 |       0.00127 |       0.00270 |       0.70370
     -0.00223 |       0.00000 |       0.00131 |       0.00257 |       0.70301
Evaluating losses...
     -0.00121 |       0.00000 |       0.00127 |       0.00268 |       0.70251
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.76         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1923          |
| TimeElapsed     | 3.06e+03      |
| TimestepsSoFar  | 2682880       |
| ev_tdlam_before | 0.64          |
| loss_ent        | 0.7025063     |
| loss_kl         | 0.0026789443  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012113142 |
| loss_vf_loss    | 0.001271389   |
-----------------------------------
********** Iteration 655 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00171 |       0.00000 |  

********** Iteration 660 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -4.32e-05 |       0.00000 |       0.00332 |       0.00219 |       0.65813
     -0.00154 |       0.00000 |       0.00272 |       0.00286 |       0.65803
     -0.00045 |       0.00000 |       0.00249 |       0.00250 |       0.65875
     -0.00207 |       0.00000 |       0.00225 |       0.00282 |       0.65817
     -0.00034 |       0.00000 |       0.00214 |       0.00288 |       0.65869
     -0.00069 |       0.00000 |       0.00206 |       0.00309 |       0.65858
     -0.00146 |       0.00000 |       0.00196 |       0.00286 |       0.65802
     -0.00188 |       0.00000 |       0.00196 |       0.00268 |       0.65904
     -0.00103 |       0.00000 |       0.00191 |       0.00300 |       0.65878
     -0.00198 |       0.00000 |       0.00190 |       0.00308 |       0.65935
Evaluating losses...
     -0.00170 |       0.00000 |       0.00188 |       0.00282 |      

     -0.00043 |       0.00000 |       0.00110 |       0.00218 |       0.61439
     -0.00059 |       0.00000 |       0.00106 |       0.00216 |       0.61491
     -0.00054 |       0.00000 |       0.00106 |       0.00209 |       0.61469
Evaluating losses...
     -0.00168 |       0.00000 |       0.00102 |       0.00218 |       0.61473
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.77         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 1938          |
| TimeElapsed     | 3.11e+03      |
| TimestepsSoFar  | 2727936       |
| ev_tdlam_before | 0.549         |
| loss_ent        | 0.61472654    |
| loss_kl         | 0.0021776464  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0016757074 |
| loss_vf_loss    | 0.0010154112  |
-----------------------------------
********** Iteration 666 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00014 |       0.00000 |  

********** Iteration 671 ************
Eval num_timesteps=2748416, episode_reward=-3.30 +/- 1.19
Episode length: 2862.00 +/- 279.65
New best mean reward!
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00121 |       0.00000 |       0.00209 |       0.00299 |       0.66613
     -0.00091 |       0.00000 |       0.00189 |       0.00274 |       0.66642
     -0.00088 |       0.00000 |       0.00180 |       0.00312 |       0.66640
     -0.00198 |       0.00000 |       0.00174 |       0.00289 |       0.66726
     -0.00013 |       0.00000 |       0.00168 |       0.00285 |       0.66803
     -0.00210 |       0.00000 |       0.00169 |       0.00266 |       0.66877
     -0.00309 |       0.00000 |       0.00165 |       0.00278 |       0.66863
     -0.00182 |       0.00000 |       0.00167 |       0.00282 |       0.66918
     -0.00208 |       0.00000 |       0.00157 |       0.00270 |       0.66967
     -0.00172 |       0.00000 |       0.00159 |      

     -0.00140 |       0.00000 |       0.00410 |       0.00209 |       0.61165
     -0.00136 |       0.00000 |       0.00397 |       0.00233 |       0.61191
     -0.00095 |       0.00000 |       0.00382 |       0.00242 |       0.61117
     -0.00106 |       0.00000 |       0.00378 |       0.00234 |       0.61171
     -0.00109 |       0.00000 |       0.00362 |       0.00223 |       0.61212
Evaluating losses...
     -0.00190 |       0.00000 |       0.00353 |       0.00221 |       0.61202
-----------------------------------
| EpLenMean       | 3.03e+03      |
| EpRewMean       | -0.88         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1952          |
| TimeElapsed     | 3.18e+03      |
| TimestepsSoFar  | 2772992       |
| ev_tdlam_before | 0.375         |
| loss_ent        | 0.6120248     |
| loss_kl         | 0.0022062673  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0018984398 |
| loss_vf_loss    | 0.0035329247  |
-----------------------------------
*******

********** Iteration 682 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -6.12e-05 |       0.00000 |       0.00400 |       0.00142 |       0.57505
     -0.00059 |       0.00000 |       0.00373 |       0.00148 |       0.57422
     -0.00114 |       0.00000 |       0.00352 |       0.00164 |       0.57436
     -0.00031 |       0.00000 |       0.00333 |       0.00191 |       0.57412
     -0.00041 |       0.00000 |       0.00309 |       0.00203 |       0.57410
     -0.00157 |       0.00000 |       0.00305 |       0.00184 |       0.57374
     -0.00074 |       0.00000 |       0.00294 |       0.00183 |       0.57341
     -0.00160 |       0.00000 |       0.00284 |       0.00179 |       0.57334
     -0.00106 |       0.00000 |       0.00273 |       0.00196 |       0.57354
     -0.00097 |       0.00000 |       0.00272 |       0.00201 |       0.57350
Evaluating losses...
     -0.00139 |       0.00000 |       0.00264 |       0.00205 |      

     -0.00251 |       0.00000 |       0.00273 |       0.00276 |       0.65916
     -0.00195 |       0.00000 |       0.00264 |       0.00355 |       0.65806
     -0.00241 |       0.00000 |       0.00261 |       0.00308 |       0.65845
Evaluating losses...
     -0.00208 |       0.00000 |       0.00260 |       0.00313 |       0.65883
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.82        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 1967         |
| TimeElapsed     | 3.23e+03     |
| TimestepsSoFar  | 2818048      |
| ev_tdlam_before | 0.437        |
| loss_ent        | 0.6588274    |
| loss_kl         | 0.0031294266 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002083507 |
| loss_vf_loss    | 0.0026022552 |
----------------------------------
********** Iteration 688 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00092 |       0.00000 |       0.00188 |

********** Iteration 693 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -6.66e-06 |       0.00000 |       0.00356 |       0.00198 |       0.58385
     -0.00111 |       0.00000 |       0.00298 |       0.00226 |       0.58483
     -0.00171 |       0.00000 |       0.00296 |       0.00236 |       0.58579
     -0.00182 |       0.00000 |       0.00274 |       0.00245 |       0.58642
     -0.00340 |       0.00000 |       0.00272 |       0.00253 |       0.58645
     -0.00254 |       0.00000 |       0.00259 |       0.00259 |       0.58692
     -0.00246 |       0.00000 |       0.00253 |       0.00315 |       0.58778
     -0.00273 |       0.00000 |       0.00255 |       0.00266 |       0.58745
     -0.00276 |       0.00000 |       0.00244 |       0.00266 |       0.58705
     -0.00315 |       0.00000 |       0.00247 |       0.00286 |       0.58682
Evaluating losses...
     -0.00371 |       0.00000 |       0.00255 |       0.00275 |      

     -0.00028 |       0.00000 |       0.00268 |       0.00197 |       0.58738
     -0.00025 |       0.00000 |       0.00254 |       0.00212 |       0.58794
     -0.00149 |       0.00000 |       0.00262 |       0.00208 |       0.58730
Evaluating losses...
     -0.00177 |       0.00000 |       0.00259 |       0.00202 |       0.58709
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.76         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1982          |
| TimeElapsed     | 3.28e+03      |
| TimestepsSoFar  | 2863104       |
| ev_tdlam_before | 0.484         |
| loss_ent        | 0.5870899     |
| loss_kl         | 0.0020177802  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0017702768 |
| loss_vf_loss    | 0.002585489   |
-----------------------------------
********** Iteration 699 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |  

********** Iteration 704 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00198 |       0.00000 |       0.00501 |       0.00168 |       0.57963
     -0.00033 |       0.00000 |       0.00441 |       0.00218 |       0.57807
      0.00022 |       0.00000 |       0.00414 |       0.00232 |       0.57793
     -0.00054 |       0.00000 |       0.00400 |       0.00219 |       0.57827
     -0.00046 |       0.00000 |       0.00383 |       0.00238 |       0.57866
     -0.00046 |       0.00000 |       0.00387 |       0.00255 |       0.57846
    -3.33e-05 |       0.00000 |       0.00370 |       0.00251 |       0.57877
     -0.00122 |       0.00000 |       0.00369 |       0.00265 |       0.57792
     -0.00071 |       0.00000 |       0.00357 |       0.00263 |       0.57833
     -0.00070 |       0.00000 |       0.00362 |       0.00271 |       0.57827
Evaluating losses...
     -0.00151 |       0.00000 |       0.00359 |       0.00300 |      

     -0.00129 |       0.00000 |       0.00015 |       0.00148 |       0.54319
     -0.00180 |       0.00000 |       0.00015 |       0.00169 |       0.54372
     -0.00071 |       0.00000 |       0.00015 |       0.00152 |       0.54400
Evaluating losses...
     -0.00132 |       0.00000 |       0.00014 |       0.00178 |       0.54420
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.7          |
| EpThisIter      | 1             |
| EpisodesSoFar   | 1997          |
| TimeElapsed     | 3.32e+03      |
| TimestepsSoFar  | 2908160       |
| ev_tdlam_before | -4.27         |
| loss_ent        | 0.5442036     |
| loss_kl         | 0.0017843565  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0013162371 |
| loss_vf_loss    | 0.00013783418 |
-----------------------------------
********** Iteration 710 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00053 |       0.00000 |  

********** Iteration 715 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00034 |       0.00000 |       0.00326 |       0.00149 |       0.57509
     -0.00059 |       0.00000 |       0.00276 |       0.00155 |       0.57419
      0.00012 |       0.00000 |       0.00258 |       0.00184 |       0.57411
     -0.00110 |       0.00000 |       0.00235 |       0.00176 |       0.57483
     -0.00213 |       0.00000 |       0.00235 |       0.00178 |       0.57434
     -0.00191 |       0.00000 |       0.00224 |       0.00192 |       0.57341
     -0.00232 |       0.00000 |       0.00212 |       0.00193 |       0.57277
     -0.00144 |       0.00000 |       0.00199 |       0.00201 |       0.57289
     -0.00190 |       0.00000 |       0.00203 |       0.00211 |       0.57446
     -0.00078 |       0.00000 |       0.00201 |       0.00219 |       0.57267
Evaluating losses...
     -0.00212 |       0.00000 |       0.00194 |       0.00230 |      

     -0.00063 |       0.00000 |       0.00030 |       0.00194 |       0.65648
     -0.00067 |       0.00000 |       0.00028 |       0.00192 |       0.65671
     -0.00124 |       0.00000 |       0.00028 |       0.00186 |       0.65582
Evaluating losses...
     -0.00103 |       0.00000 |       0.00028 |       0.00186 |       0.65624
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.61         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2012          |
| TimeElapsed     | 3.37e+03      |
| TimestepsSoFar  | 2953216       |
| ev_tdlam_before | -0.191        |
| loss_ent        | 0.6562433     |
| loss_kl         | 0.001857764   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010264791 |
| loss_vf_loss    | 0.00028090188 |
-----------------------------------
********** Iteration 721 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00141 |       0.00000 |  

********** Iteration 726 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00059 |       0.00000 |       0.00636 |       0.00180 |       0.57472
      0.00012 |       0.00000 |       0.00544 |       0.00203 |       0.57419
     -0.00130 |       0.00000 |       0.00532 |       0.00184 |       0.57424
     -0.00116 |       0.00000 |       0.00491 |       0.00196 |       0.57461
     -0.00151 |       0.00000 |       0.00496 |       0.00212 |       0.57401
     -0.00139 |       0.00000 |       0.00492 |       0.00190 |       0.57506
     -0.00132 |       0.00000 |       0.00476 |       0.00194 |       0.57617
     -0.00178 |       0.00000 |       0.00466 |       0.00206 |       0.57513
     -0.00152 |       0.00000 |       0.00461 |       0.00221 |       0.57418
     -0.00172 |       0.00000 |       0.00458 |       0.00204 |       0.57504
Evaluating losses...
     -0.00163 |       0.00000 |       0.00452 |       0.00205 |      

     -0.00170 |       0.00000 |       0.00185 |       0.00271 |       0.63564
     -0.00142 |       0.00000 |       0.00180 |       0.00234 |       0.63682
     -0.00257 |       0.00000 |       0.00186 |       0.00236 |       0.63657
Evaluating losses...
     -0.00241 |       0.00000 |       0.00174 |       0.00229 |       0.63721
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.54        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2027         |
| TimeElapsed     | 3.42e+03     |
| TimestepsSoFar  | 2998272      |
| ev_tdlam_before | 0.48         |
| loss_ent        | 0.6372087    |
| loss_kl         | 0.0022927793 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002414072 |
| loss_vf_loss    | 0.0017366211 |
----------------------------------
********** Iteration 732 ************
Eval num_timesteps=2998272, episode_reward=-0.70 +/- 1.55
Episode length: 3000.00 +/- 0.00
New best mean reward!
Optimizing...
     pol_sur

********** Iteration 737 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00072 |       0.00000 |       0.00255 |       0.00142 |       0.57885
      0.00064 |       0.00000 |       0.00234 |       0.00139 |       0.57999
      0.00029 |       0.00000 |       0.00218 |       0.00140 |       0.58020
      0.00035 |       0.00000 |       0.00217 |       0.00159 |       0.58056
     -0.00030 |       0.00000 |       0.00208 |       0.00150 |       0.58113
      0.00023 |       0.00000 |       0.00201 |       0.00154 |       0.58177
     -0.00014 |       0.00000 |       0.00192 |       0.00172 |       0.58203
      0.00011 |       0.00000 |       0.00188 |       0.00169 |       0.58251
     -0.00045 |       0.00000 |       0.00180 |       0.00183 |       0.58224
    -6.22e-05 |       0.00000 |       0.00182 |       0.00171 |       0.58172
Evaluating losses...
      0.00013 |       0.00000 |       0.00175 |       0.00174 |      

     -0.00116 |       0.00000 |       0.00141 |       0.00258 |       0.61740
     -0.00102 |       0.00000 |       0.00133 |       0.00239 |       0.61705
     -0.00124 |       0.00000 |       0.00125 |       0.00229 |       0.61783
Evaluating losses...
     -0.00204 |       0.00000 |       0.00127 |       0.00218 |       0.61756
-----------------------------------
| EpLenMean       | 3.03e+03      |
| EpRewMean       | -0.45         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2042          |
| TimeElapsed     | 3.49e+03      |
| TimestepsSoFar  | 3043328       |
| ev_tdlam_before | 0.311         |
| loss_ent        | 0.61756134    |
| loss_kl         | 0.0021805027  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0020437175 |
| loss_vf_loss    | 0.0012748982  |
-----------------------------------
********** Iteration 743 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00061 |       0.00000 |  

********** Iteration 748 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00095 |       0.00000 |       0.00597 |       0.00158 |       0.67583
      0.00036 |       0.00000 |       0.00527 |       0.00191 |       0.67552
      0.00091 |       0.00000 |       0.00497 |       0.00197 |       0.67513
     -0.00121 |       0.00000 |       0.00484 |       0.00213 |       0.67447
     -0.00184 |       0.00000 |       0.00446 |       0.00200 |       0.67490
     -0.00043 |       0.00000 |       0.00431 |       0.00222 |       0.67453
     -0.00152 |       0.00000 |       0.00422 |       0.00224 |       0.67569
     -0.00184 |       0.00000 |       0.00414 |       0.00246 |       0.67685
     -0.00209 |       0.00000 |       0.00397 |       0.00253 |       0.67634
     -0.00245 |       0.00000 |       0.00397 |       0.00248 |       0.67658
Evaluating losses...
     -0.00207 |       0.00000 |       0.00378 |       0.00242 |      

     -0.00069 |       0.00000 |       0.00556 |       0.00216 |       0.65332
     -0.00076 |       0.00000 |       0.00545 |       0.00237 |       0.65406
     -0.00062 |       0.00000 |       0.00532 |       0.00223 |       0.65442
Evaluating losses...
     -0.00101 |       0.00000 |       0.00524 |       0.00217 |       0.65395
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.42         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2057          |
| TimeElapsed     | 3.54e+03      |
| TimestepsSoFar  | 3088384       |
| ev_tdlam_before | 0.37          |
| loss_ent        | 0.65394604    |
| loss_kl         | 0.0021728533  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010059902 |
| loss_vf_loss    | 0.005241574   |
-----------------------------------
********** Iteration 754 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 |  

********** Iteration 759 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.00638 |       0.00150 |       0.54762
     -0.00014 |       0.00000 |       0.00577 |       0.00202 |       0.54988
     -0.00202 |       0.00000 |       0.00544 |       0.00173 |       0.54957
     -0.00095 |       0.00000 |       0.00520 |       0.00184 |       0.54946
     -0.00088 |       0.00000 |       0.00502 |       0.00206 |       0.54874
     -0.00074 |       0.00000 |       0.00492 |       0.00184 |       0.54939
     -0.00151 |       0.00000 |       0.00480 |       0.00184 |       0.54979
     -0.00139 |       0.00000 |       0.00466 |       0.00206 |       0.54933
     -0.00226 |       0.00000 |       0.00459 |       0.00209 |       0.54876
     -0.00254 |       0.00000 |       0.00455 |       0.00194 |       0.54898
Evaluating losses...
     -0.00216 |       0.00000 |       0.00455 |       0.00175 |      

     -0.00106 |       0.00000 |       0.00138 |       0.00166 |       0.60507
     -0.00140 |       0.00000 |       0.00132 |       0.00188 |       0.60576
     -0.00156 |       0.00000 |       0.00130 |       0.00187 |       0.60560
Evaluating losses...
     -0.00182 |       0.00000 |       0.00121 |       0.00205 |       0.60534
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.39         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2072          |
| TimeElapsed     | 3.59e+03      |
| TimestepsSoFar  | 3133440       |
| ev_tdlam_before | 0.4           |
| loss_ent        | 0.60533834    |
| loss_kl         | 0.0020488545  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0018224386 |
| loss_vf_loss    | 0.0012059092  |
-----------------------------------
********** Iteration 765 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00014 |       0.00000 |  

********** Iteration 770 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00013 |       0.00000 |       0.00213 |       0.00118 |       0.57262
      0.00074 |       0.00000 |       0.00196 |       0.00121 |       0.57303
      0.00054 |       0.00000 |       0.00186 |       0.00131 |       0.57210
      0.00014 |       0.00000 |       0.00177 |       0.00123 |       0.57202
      0.00046 |       0.00000 |       0.00172 |       0.00126 |       0.57280
      0.00034 |       0.00000 |       0.00167 |       0.00129 |       0.57195
      0.00014 |       0.00000 |       0.00165 |       0.00132 |       0.57107
      0.00036 |       0.00000 |       0.00156 |       0.00137 |       0.57258
     -0.00069 |       0.00000 |       0.00153 |       0.00135 |       0.57165
     6.82e-06 |       0.00000 |       0.00153 |       0.00144 |       0.57172
Evaluating losses...
      0.00017 |       0.00000 |       0.00159 |       0.00142 |      

      0.00010 |       0.00000 |      8.63e-05 |       0.00163 |       0.62107
     -0.00019 |       0.00000 |      9.32e-05 |       0.00164 |       0.62164
     -0.00068 |       0.00000 |      8.46e-05 |       0.00173 |       0.62096
Evaluating losses...
     -0.00115 |       0.00000 |      7.87e-05 |       0.00164 |       0.62116
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.37         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2087          |
| TimeElapsed     | 3.64e+03      |
| TimestepsSoFar  | 3178496       |
| ev_tdlam_before | -4.07         |
| loss_ent        | 0.621161      |
| loss_kl         | 0.0016426631  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.001145342  |
| loss_vf_loss    | 7.8662546e-05 |
-----------------------------------
********** Iteration 776 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00161 |       0.00000 |  

********** Iteration 781 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00117 |       0.00000 |       0.00225 |       0.00156 |       0.67297
      0.00014 |       0.00000 |       0.00204 |       0.00152 |       0.67260
      0.00028 |       0.00000 |       0.00191 |       0.00189 |       0.67011
     -0.00028 |       0.00000 |       0.00179 |       0.00198 |       0.67001
     -0.00071 |       0.00000 |       0.00171 |       0.00193 |       0.66932
      0.00013 |       0.00000 |       0.00165 |       0.00208 |       0.66815
     -0.00036 |       0.00000 |       0.00156 |       0.00194 |       0.66876
     -0.00079 |       0.00000 |       0.00155 |       0.00211 |       0.66830
     -0.00111 |       0.00000 |       0.00156 |       0.00190 |       0.66868
     -0.00083 |       0.00000 |       0.00152 |       0.00209 |       0.66764
Evaluating losses...
     -0.00046 |       0.00000 |       0.00140 |       0.00203 |      

     -0.00184 |       0.00000 |       0.01071 |       0.00217 |       0.57964
     -0.00124 |       0.00000 |       0.01054 |       0.00218 |       0.57973
     -0.00127 |       0.00000 |       0.01045 |       0.00192 |       0.58025
Evaluating losses...
     -0.00168 |       0.00000 |       0.01021 |       0.00192 |       0.57975
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.39         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2102          |
| TimeElapsed     | 3.69e+03      |
| TimestepsSoFar  | 3223552       |
| ev_tdlam_before | 0.574         |
| loss_ent        | 0.5797491     |
| loss_kl         | 0.0019166495  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0016841606 |
| loss_vf_loss    | 0.010213162   |
-----------------------------------
********** Iteration 787 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00105 |       0.00000 |  

********** Iteration 792 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00037 |       0.00000 |       0.00213 |       0.00190 |       0.63824
    -2.81e-05 |       0.00000 |       0.00194 |       0.00169 |       0.63803
     -0.00068 |       0.00000 |       0.00193 |       0.00165 |       0.63728
     -0.00126 |       0.00000 |       0.00186 |       0.00168 |       0.63717
     -0.00074 |       0.00000 |       0.00178 |       0.00155 |       0.63666
     -0.00075 |       0.00000 |       0.00178 |       0.00153 |       0.63592
     -0.00049 |       0.00000 |       0.00171 |       0.00147 |       0.63544
     -0.00166 |       0.00000 |       0.00173 |       0.00154 |       0.63541
     -0.00122 |       0.00000 |       0.00164 |       0.00144 |       0.63541
     -0.00147 |       0.00000 |       0.00163 |       0.00158 |       0.63599
Evaluating losses...
     -0.00192 |       0.00000 |       0.00160 |       0.00157 |      

     -0.00012 |       0.00000 |       0.00028 |       0.00183 |       0.69442
      0.00011 |       0.00000 |       0.00025 |       0.00187 |       0.69424
     -0.00021 |       0.00000 |       0.00026 |       0.00184 |       0.69496
     -0.00033 |       0.00000 |       0.00023 |       0.00197 |       0.69469
Evaluating losses...
     -0.00109 |       0.00000 |       0.00022 |       0.00201 |       0.69454
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.4          |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2117          |
| TimeElapsed     | 3.76e+03      |
| TimestepsSoFar  | 3268608       |
| ev_tdlam_before | -1.9          |
| loss_ent        | 0.6945421     |
| loss_kl         | 0.0020087557  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010919214 |
| loss_vf_loss    | 0.00021758514 |
-----------------------------------
********** Iteration 798 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 803 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |       0.00444 |       0.00130 |       0.66523
      0.00038 |       0.00000 |       0.00378 |       0.00164 |       0.66566
      0.00072 |       0.00000 |       0.00357 |       0.00165 |       0.66504
     -0.00070 |       0.00000 |       0.00347 |       0.00201 |       0.66513
     -0.00096 |       0.00000 |       0.00330 |       0.00195 |       0.66544
     -0.00102 |       0.00000 |       0.00329 |       0.00217 |       0.66529
     -0.00180 |       0.00000 |       0.00311 |       0.00219 |       0.66576
     -0.00142 |       0.00000 |       0.00313 |       0.00249 |       0.66620
     -0.00105 |       0.00000 |       0.00302 |       0.00262 |       0.66553
     -0.00145 |       0.00000 |       0.00312 |       0.00251 |       0.66610
Evaluating losses...
     -0.00260 |       0.00000 |       0.00299 |       0.00256 |      

     -0.00104 |       0.00000 |       0.00303 |       0.00206 |       0.71496
     -0.00146 |       0.00000 |       0.00291 |       0.00191 |       0.71533
     -0.00126 |       0.00000 |       0.00293 |       0.00186 |       0.71518
Evaluating losses...
     -0.00229 |       0.00000 |       0.00288 |       0.00198 |       0.71528
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.33         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2132          |
| TimeElapsed     | 3.81e+03      |
| TimestepsSoFar  | 3313664       |
| ev_tdlam_before | 0.482         |
| loss_ent        | 0.71528035    |
| loss_kl         | 0.001982643   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0022900794 |
| loss_vf_loss    | 0.002883211   |
-----------------------------------
********** Iteration 809 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00054 |       0.00000 |  

********** Iteration 814 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00192 |       0.00000 |       0.00332 |       0.00124 |       0.60880
     -0.00043 |       0.00000 |       0.00302 |       0.00123 |       0.60887
     -0.00037 |       0.00000 |       0.00285 |       0.00125 |       0.60837
     -0.00067 |       0.00000 |       0.00269 |       0.00134 |       0.60764
     -0.00045 |       0.00000 |       0.00272 |       0.00152 |       0.60840
     -0.00035 |       0.00000 |       0.00261 |       0.00152 |       0.60837
     -0.00101 |       0.00000 |       0.00253 |       0.00155 |       0.60799
     -0.00055 |       0.00000 |       0.00248 |       0.00164 |       0.60798
     -0.00042 |       0.00000 |       0.00251 |       0.00178 |       0.60779
     -0.00099 |       0.00000 |       0.00234 |       0.00162 |       0.60858
Evaluating losses...
     -0.00045 |       0.00000 |       0.00238 |       0.00173 |      

     -0.00029 |       0.00000 |       0.00246 |       0.00187 |       0.71026
     -0.00043 |       0.00000 |       0.00239 |       0.00192 |       0.70955
     -0.00043 |       0.00000 |       0.00221 |       0.00191 |       0.70968
Evaluating losses...
     -0.00052 |       0.00000 |       0.00224 |       0.00204 |       0.70903
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.31          |
| EpThisIter      | 2              |
| EpisodesSoFar   | 2147           |
| TimeElapsed     | 3.85e+03       |
| TimestepsSoFar  | 3358720        |
| ev_tdlam_before | 0.295          |
| loss_ent        | 0.70902896     |
| loss_kl         | 0.0020363624   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00051997043 |
| loss_vf_loss    | 0.0022432008   |
------------------------------------
********** Iteration 820 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00035 |    

********** Iteration 825 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00107 |       0.00000 |       0.00197 |       0.00138 |       0.66177
      0.00051 |       0.00000 |       0.00194 |       0.00147 |       0.66281
     5.51e-05 |       0.00000 |       0.00180 |       0.00156 |       0.66183
     -0.00020 |       0.00000 |       0.00182 |       0.00153 |       0.66157
     -0.00067 |       0.00000 |       0.00184 |       0.00156 |       0.66136
     -0.00113 |       0.00000 |       0.00175 |       0.00168 |       0.66127
     -0.00114 |       0.00000 |       0.00168 |       0.00151 |       0.66056
     -0.00049 |       0.00000 |       0.00169 |       0.00147 |       0.66069
     -0.00049 |       0.00000 |       0.00174 |       0.00150 |       0.66037
     -0.00053 |       0.00000 |       0.00167 |       0.00162 |       0.66099
Evaluating losses...
     -0.00079 |       0.00000 |       0.00168 |       0.00144 |      

      0.00052 |       0.00000 |       0.00117 |       0.00166 |       0.63748
     -0.00012 |       0.00000 |       0.00112 |       0.00151 |       0.63662
     -0.00020 |       0.00000 |       0.00106 |       0.00179 |       0.63714
     -0.00041 |       0.00000 |       0.00107 |       0.00162 |       0.63706
Evaluating losses...
      0.00079 |       0.00000 |       0.00101 |       0.00182 |       0.63850
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.31        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2162         |
| TimeElapsed     | 3.9e+03      |
| TimestepsSoFar  | 3403776      |
| ev_tdlam_before | 0.649        |
| loss_ent        | 0.63849986   |
| loss_kl         | 0.0018179814 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0007908207 |
| loss_vf_loss    | 0.001010802  |
----------------------------------
********** Iteration 831 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |

********** Iteration 836 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00054 |       0.00000 |       0.00239 |       0.00138 |       0.74542
      0.00118 |       0.00000 |       0.00229 |       0.00168 |       0.74627
     -0.00056 |       0.00000 |       0.00223 |       0.00152 |       0.74720
     -0.00025 |       0.00000 |       0.00218 |       0.00149 |       0.74672
     -0.00032 |       0.00000 |       0.00214 |       0.00161 |       0.74774
     -0.00051 |       0.00000 |       0.00212 |       0.00177 |       0.74857
     -0.00101 |       0.00000 |       0.00210 |       0.00180 |       0.74977
     -0.00135 |       0.00000 |       0.00205 |       0.00179 |       0.74915
      0.00032 |       0.00000 |       0.00204 |       0.00181 |       0.75071
     -0.00022 |       0.00000 |       0.00198 |       0.00192 |       0.74982
Evaluating losses...
     -0.00080 |       0.00000 |       0.00201 |       0.00184 |      

      0.00039 |       0.00000 |       0.00014 |       0.00144 |       0.63815
     -0.00070 |       0.00000 |       0.00012 |       0.00153 |       0.63757
      0.00035 |       0.00000 |       0.00011 |       0.00161 |       0.63669
     8.12e-05 |       0.00000 |       0.00012 |       0.00153 |       0.63788
Evaluating losses...
     1.69e-05 |       0.00000 |       0.00011 |       0.00142 |       0.63778
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.34         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2177          |
| TimeElapsed     | 3.95e+03      |
| TimestepsSoFar  | 3448832       |
| ev_tdlam_before | -2.25         |
| loss_ent        | 0.6377848     |
| loss_kl         | 0.001424163   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 1.6860664e-05 |
| loss_vf_loss    | 0.00010661453 |
-----------------------------------
********** Iteration 842 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 847 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00057 |       0.00000 |       0.00051 |       0.00115 |       0.66726
      0.00151 |       0.00000 |       0.00040 |       0.00121 |       0.66700
      0.00067 |       0.00000 |       0.00035 |       0.00124 |       0.66661
      0.00041 |       0.00000 |       0.00029 |       0.00126 |       0.66690
      0.00041 |       0.00000 |       0.00028 |       0.00131 |       0.66619
     -0.00056 |       0.00000 |       0.00026 |       0.00133 |       0.66689
      0.00076 |       0.00000 |       0.00025 |       0.00138 |       0.66676
      0.00039 |       0.00000 |       0.00023 |       0.00146 |       0.66676
     -0.00076 |       0.00000 |       0.00023 |       0.00145 |       0.66612
     -0.00029 |       0.00000 |       0.00022 |       0.00151 |       0.66571
Evaluating losses...
    -6.47e-05 |       0.00000 |       0.00021 |       0.00155 |      

      0.00059 |       0.00000 |       0.00017 |       0.00146 |       0.67683
     -0.00033 |       0.00000 |       0.00017 |       0.00157 |       0.67667
    -6.24e-05 |       0.00000 |       0.00015 |       0.00154 |       0.67624
     -0.00074 |       0.00000 |       0.00015 |       0.00156 |       0.67596
Evaluating losses...
     -0.00037 |       0.00000 |       0.00014 |       0.00154 |       0.67539
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.34          |
| EpThisIter      | 2              |
| EpisodesSoFar   | 2192           |
| TimeElapsed     | 4e+03          |
| TimestepsSoFar  | 3493888        |
| ev_tdlam_before | -0.363         |
| loss_ent        | 0.67539066     |
| loss_kl         | 0.0015399143   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00037205074 |
| loss_vf_loss    | 0.00014116966  |
------------------------------------
********** Iteration 853 ************
Optimizing...
     pol_surr |    

********** Iteration 858 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00127 |       0.00000 |       0.00213 |       0.00111 |       0.63364
      0.00047 |       0.00000 |       0.00195 |       0.00116 |       0.63393
      0.00069 |       0.00000 |       0.00178 |       0.00116 |       0.63353
      0.00079 |       0.00000 |       0.00172 |       0.00117 |       0.63293
      0.00012 |       0.00000 |       0.00171 |       0.00118 |       0.63346
     -0.00062 |       0.00000 |       0.00155 |       0.00122 |       0.63375
      0.00035 |       0.00000 |       0.00150 |       0.00119 |       0.63376
      0.00068 |       0.00000 |       0.00148 |       0.00116 |       0.63268
     4.60e-06 |       0.00000 |       0.00148 |       0.00114 |       0.63232
     -0.00013 |       0.00000 |       0.00136 |       0.00121 |       0.63287
Evaluating losses...
     7.58e-05 |       0.00000 |       0.00134 |       0.00119 |      

     -0.00098 |       0.00000 |       0.00326 |       0.00155 |       0.66098
     -0.00084 |       0.00000 |       0.00324 |       0.00152 |       0.66113
     -0.00066 |       0.00000 |       0.00318 |       0.00146 |       0.66091
Evaluating losses...
     -0.00108 |       0.00000 |       0.00310 |       0.00151 |       0.66063
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.31         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2206          |
| TimeElapsed     | 4.07e+03      |
| TimestepsSoFar  | 3538944       |
| ev_tdlam_before | 0.676         |
| loss_ent        | 0.66063064    |
| loss_kl         | 0.0015095493  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010753749 |
| loss_vf_loss    | 0.003096781   |
-----------------------------------
********** Iteration 864 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 |  

********** Iteration 869 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00207 |       0.00000 |       0.00199 |       0.00111 |       0.64173
      0.00081 |       0.00000 |       0.00193 |       0.00114 |       0.64145
      0.00064 |       0.00000 |       0.00184 |       0.00113 |       0.64259
      0.00155 |       0.00000 |       0.00177 |       0.00120 |       0.64310
      0.00035 |       0.00000 |       0.00173 |       0.00117 |       0.64215
      0.00075 |       0.00000 |       0.00174 |       0.00130 |       0.64400
      0.00019 |       0.00000 |       0.00168 |       0.00140 |       0.64272
    -1.64e-05 |       0.00000 |       0.00169 |       0.00136 |       0.64264
     -0.00024 |       0.00000 |       0.00162 |       0.00131 |       0.64263
     -0.00024 |       0.00000 |       0.00164 |       0.00141 |       0.64237
Evaluating losses...
      0.00037 |       0.00000 |       0.00161 |       0.00137 |      

     -0.00076 |       0.00000 |       0.00445 |       0.00171 |       0.63911
     -0.00142 |       0.00000 |       0.00444 |       0.00168 |       0.63886
     -0.00123 |       0.00000 |       0.00427 |       0.00175 |       0.63926
Evaluating losses...
     -0.00118 |       0.00000 |       0.00431 |       0.00181 |       0.63872
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.35         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2222          |
| TimeElapsed     | 4.12e+03      |
| TimestepsSoFar  | 3584000       |
| ev_tdlam_before | 0.663         |
| loss_ent        | 0.63872194    |
| loss_kl         | 0.0018120265  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011818985 |
| loss_vf_loss    | 0.0043052547  |
-----------------------------------
********** Iteration 875 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00108 |       0.00000 |  

********** Iteration 880 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00164 |       0.00000 |       0.00022 |       0.00103 |       0.60991
      0.00200 |       0.00000 |       0.00019 |       0.00116 |       0.61069
      0.00051 |       0.00000 |       0.00016 |       0.00121 |       0.61112
      0.00113 |       0.00000 |       0.00016 |       0.00130 |       0.61133
      0.00011 |       0.00000 |       0.00014 |       0.00121 |       0.61104
      0.00014 |       0.00000 |       0.00014 |       0.00127 |       0.61058
     -0.00011 |       0.00000 |       0.00013 |       0.00136 |       0.61095
      0.00072 |       0.00000 |       0.00013 |       0.00138 |       0.61174
      0.00012 |       0.00000 |       0.00012 |       0.00136 |       0.61118
    -2.43e-05 |       0.00000 |       0.00012 |       0.00150 |       0.61151
Evaluating losses...
     -0.00084 |       0.00000 |       0.00011 |       0.00140 |      

     2.73e-05 |       0.00000 |       0.00512 |       0.00177 |       0.65346
     -0.00085 |       0.00000 |       0.00511 |       0.00174 |       0.65359
      0.00025 |       0.00000 |       0.00508 |       0.00163 |       0.65335
Evaluating losses...
     -0.00036 |       0.00000 |       0.00496 |       0.00195 |       0.65316
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.37          |
| EpThisIter      | 2              |
| EpisodesSoFar   | 2237           |
| TimeElapsed     | 4.17e+03       |
| TimestepsSoFar  | 3629056        |
| ev_tdlam_before | 0.624          |
| loss_ent        | 0.6531601      |
| loss_kl         | 0.001946777    |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00035577893 |
| loss_vf_loss    | 0.0049635684   |
------------------------------------
********** Iteration 886 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00154 |    

********** Iteration 891 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00147 |       0.00000 |       0.00394 |       0.00116 |       0.64347
      0.00061 |       0.00000 |       0.00342 |       0.00125 |       0.64320
      0.00015 |       0.00000 |       0.00322 |       0.00117 |       0.64368
     -0.00030 |       0.00000 |       0.00300 |       0.00128 |       0.64392
      0.00027 |       0.00000 |       0.00284 |       0.00136 |       0.64351
     -0.00044 |       0.00000 |       0.00283 |       0.00150 |       0.64352
    -4.71e-05 |       0.00000 |       0.00278 |       0.00168 |       0.64299
      0.00045 |       0.00000 |       0.00264 |       0.00169 |       0.64345
     -0.00076 |       0.00000 |       0.00258 |       0.00162 |       0.64325
     -0.00077 |       0.00000 |       0.00251 |       0.00170 |       0.64373
Evaluating losses...
     -0.00032 |       0.00000 |       0.00251 |       0.00164 |      

      0.00021 |       0.00000 |       0.00086 |       0.00114 |       0.63009
      0.00017 |       0.00000 |       0.00083 |       0.00115 |       0.63022
      0.00033 |       0.00000 |       0.00085 |       0.00119 |       0.63074
Evaluating losses...
     5.70e-05 |       0.00000 |       0.00078 |       0.00122 |       0.63055
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.37        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2252         |
| TimeElapsed     | 4.22e+03     |
| TimestepsSoFar  | 3674112      |
| ev_tdlam_before | 0.674        |
| loss_ent        | 0.6305457    |
| loss_kl         | 0.0012156738 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 5.701033e-05 |
| loss_vf_loss    | 0.0007832689 |
----------------------------------
********** Iteration 897 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00122 |       0.00000 |       0.00481 |

********** Iteration 902 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00011 |       0.00000 |       0.00168 |       0.00120 |       0.65647
      0.00015 |       0.00000 |       0.00158 |       0.00127 |       0.65702
     -0.00058 |       0.00000 |       0.00154 |       0.00105 |       0.65612
     -0.00047 |       0.00000 |       0.00144 |       0.00107 |       0.65581
     -0.00018 |       0.00000 |       0.00137 |       0.00107 |       0.65609
     -0.00053 |       0.00000 |       0.00135 |       0.00114 |       0.65667
      0.00044 |       0.00000 |       0.00130 |       0.00116 |       0.65708
     -0.00080 |       0.00000 |       0.00120 |       0.00147 |       0.65686
     -0.00099 |       0.00000 |       0.00120 |       0.00131 |       0.65686
     -0.00083 |       0.00000 |       0.00122 |       0.00139 |       0.65717
Evaluating losses...
     -0.00130 |       0.00000 |       0.00115 |       0.00111 |      

     -0.00086 |       0.00000 |       0.00433 |       0.00138 |       0.64981
     -0.00026 |       0.00000 |       0.00436 |       0.00148 |       0.64956
     -0.00034 |       0.00000 |       0.00412 |       0.00153 |       0.65043
Evaluating losses...
     -0.00032 |       0.00000 |       0.00405 |       0.00136 |       0.64974
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.38         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2267          |
| TimeElapsed     | 4.27e+03      |
| TimestepsSoFar  | 3719168       |
| ev_tdlam_before | 0.459         |
| loss_ent        | 0.64973736    |
| loss_kl         | 0.0013559782  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0003180944 |
| loss_vf_loss    | 0.004047859   |
-----------------------------------
********** Iteration 908 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00149 |       0.00000 |  

********** Iteration 913 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00174 |       0.00000 |       0.00405 |       0.00102 |       0.64512
      0.00092 |       0.00000 |       0.00356 |       0.00119 |       0.64642
      0.00107 |       0.00000 |       0.00334 |       0.00140 |       0.64546
     2.80e-05 |       0.00000 |       0.00320 |       0.00144 |       0.64471
    -3.78e-05 |       0.00000 |       0.00309 |       0.00144 |       0.64482
      0.00030 |       0.00000 |       0.00297 |       0.00145 |       0.64543
      0.00045 |       0.00000 |       0.00297 |       0.00145 |       0.64530
     -0.00016 |       0.00000 |       0.00289 |       0.00144 |       0.64589
     -0.00028 |       0.00000 |       0.00291 |       0.00151 |       0.64530
    -4.51e-05 |       0.00000 |       0.00285 |       0.00142 |       0.64606
Evaluating losses...
     -0.00079 |       0.00000 |       0.00276 |       0.00168 |      

     -0.00034 |       0.00000 |       0.00086 |       0.00113 |       0.68302
     -0.00066 |       0.00000 |       0.00086 |       0.00113 |       0.68268
     -0.00122 |       0.00000 |       0.00082 |       0.00124 |       0.68232
     -0.00128 |       0.00000 |       0.00082 |       0.00119 |       0.68234
     -0.00086 |       0.00000 |       0.00078 |       0.00114 |       0.68209
Evaluating losses...
     -0.00145 |       0.00000 |       0.00078 |       0.00120 |       0.68184
----------------------------------
| EpLenMean       | 3.02e+03     |
| EpRewMean       | -0.32        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2281         |
| TimeElapsed     | 4.33e+03     |
| TimestepsSoFar  | 3764224      |
| ev_tdlam_before | 0.582        |
| loss_ent        | 0.6818397    |
| loss_kl         | 0.0011951765 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.001449859 |
| loss_vf_loss    | 0.0007751256 |
----------------------------------
********** Iteration 

********** Iteration 924 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00075 |       0.00000 |       0.00351 |       0.00101 |       0.64994
      0.00064 |       0.00000 |       0.00314 |       0.00099 |       0.65024
      0.00114 |       0.00000 |       0.00296 |       0.00099 |       0.65039
      0.00061 |       0.00000 |       0.00286 |       0.00106 |       0.65041
      0.00049 |       0.00000 |       0.00278 |       0.00109 |       0.65062
      0.00090 |       0.00000 |       0.00269 |       0.00110 |       0.65117
     -0.00012 |       0.00000 |       0.00263 |       0.00111 |       0.65110
     4.47e-05 |       0.00000 |       0.00263 |       0.00113 |       0.65078
      0.00019 |       0.00000 |       0.00258 |       0.00112 |       0.65097
      0.00065 |       0.00000 |       0.00255 |       0.00109 |       0.65092
Evaluating losses...
     7.62e-05 |       0.00000 |       0.00248 |       0.00113 |      

     -0.00055 |       0.00000 |       0.00112 |       0.00118 |       0.62940
    -3.44e-05 |       0.00000 |       0.00109 |       0.00114 |       0.62971
    -6.47e-05 |       0.00000 |       0.00105 |       0.00125 |       0.62950
Evaluating losses...
     -0.00075 |       0.00000 |       0.00104 |       0.00127 |       0.62952
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.32         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2296          |
| TimeElapsed     | 4.38e+03      |
| TimestepsSoFar  | 3809280       |
| ev_tdlam_before | 0.301         |
| loss_ent        | 0.62952274    |
| loss_kl         | 0.0012721113  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0007514005 |
| loss_vf_loss    | 0.0010441036  |
-----------------------------------
********** Iteration 930 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00022 |       0.00000 |  

********** Iteration 935 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00173 |       0.00000 |      8.31e-05 |       0.00101 |       0.68789
      0.00217 |       0.00000 |      7.82e-05 |       0.00109 |       0.68619
      0.00199 |       0.00000 |      8.08e-05 |       0.00105 |       0.68733
      0.00169 |       0.00000 |      7.43e-05 |       0.00106 |       0.68695
      0.00143 |       0.00000 |      7.37e-05 |       0.00111 |       0.68692
      0.00115 |       0.00000 |      6.40e-05 |       0.00112 |       0.68673
      0.00088 |       0.00000 |      6.94e-05 |       0.00118 |       0.68688
      0.00168 |       0.00000 |      6.43e-05 |       0.00117 |       0.68759
      0.00077 |       0.00000 |      6.69e-05 |       0.00121 |       0.68658
      0.00139 |       0.00000 |      5.96e-05 |       0.00124 |       0.68695
Evaluating losses...
     3.51e-05 |       0.00000 |      5.78e-05 |       0.00121 |      

     -0.00039 |       0.00000 |       0.00533 |       0.00143 |       0.64845
     -0.00065 |       0.00000 |       0.00511 |       0.00142 |       0.64836
     -0.00130 |       0.00000 |       0.00515 |       0.00150 |       0.64876
Evaluating losses...
     -0.00071 |       0.00000 |       0.00516 |       0.00147 |       0.64863
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.28          |
| EpThisIter      | 1              |
| EpisodesSoFar   | 2311           |
| TimeElapsed     | 4.43e+03       |
| TimestepsSoFar  | 3854336        |
| ev_tdlam_before | 0.519          |
| loss_ent        | 0.64863193     |
| loss_kl         | 0.0014714799   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00071236386 |
| loss_vf_loss    | 0.005157629    |
------------------------------------
********** Iteration 941 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00058 |    

********** Iteration 946 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00262 |       0.00000 |       0.00588 |       0.00085 |       0.57886
      0.00026 |       0.00000 |       0.00527 |       0.00088 |       0.57866
      0.00017 |       0.00000 |       0.00479 |       0.00098 |       0.57876
      0.00065 |       0.00000 |       0.00457 |       0.00095 |       0.57875
     -0.00077 |       0.00000 |       0.00438 |       0.00096 |       0.57827
      0.00021 |       0.00000 |       0.00429 |       0.00103 |       0.57728
     -0.00026 |       0.00000 |       0.00415 |       0.00105 |       0.57819
     -0.00016 |       0.00000 |       0.00389 |       0.00104 |       0.57795
      0.00013 |       0.00000 |       0.00399 |       0.00105 |       0.57737
      0.00028 |       0.00000 |       0.00380 |       0.00108 |       0.57757
Evaluating losses...
    -4.23e-05 |       0.00000 |       0.00357 |       0.00108 |      

     -0.00044 |       0.00000 |       0.00279 |       0.00101 |       0.62478
    -2.83e-05 |       0.00000 |       0.00279 |       0.00107 |       0.62512
     -0.00033 |       0.00000 |       0.00268 |       0.00105 |       0.62560
Evaluating losses...
      0.00031 |       0.00000 |       0.00260 |       0.00113 |       0.62577
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.31         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2326          |
| TimeElapsed     | 4.48e+03      |
| TimestepsSoFar  | 3899392       |
| ev_tdlam_before | 0.445         |
| loss_ent        | 0.62577415    |
| loss_kl         | 0.0011349339  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00030985125 |
| loss_vf_loss    | 0.0025969322  |
-----------------------------------
********** Iteration 952 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00115 |       0.00000 |  

********** Iteration 957 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00128 |       0.00000 |       0.00148 |       0.00078 |       0.59599
      0.00121 |       0.00000 |       0.00116 |       0.00083 |       0.59654
      0.00089 |       0.00000 |       0.00100 |       0.00087 |       0.59566
      0.00065 |       0.00000 |       0.00085 |       0.00090 |       0.59567
     -0.00028 |       0.00000 |       0.00082 |       0.00092 |       0.59607
      0.00025 |       0.00000 |       0.00077 |       0.00093 |       0.59545
      0.00052 |       0.00000 |       0.00071 |       0.00094 |       0.59603
     -0.00058 |       0.00000 |       0.00070 |       0.00091 |       0.59538
      0.00010 |       0.00000 |       0.00067 |       0.00099 |       0.59559
    -3.67e-05 |       0.00000 |       0.00067 |       0.00104 |       0.59530
Evaluating losses...
     -0.00086 |       0.00000 |       0.00062 |       0.00098 |      

      0.00053 |       0.00000 |       0.00014 |       0.00113 |       0.66541
      0.00073 |       0.00000 |       0.00014 |       0.00109 |       0.66635
     -0.00033 |       0.00000 |       0.00014 |       0.00109 |       0.66564
      0.00034 |       0.00000 |       0.00013 |       0.00108 |       0.66531
Evaluating losses...
      0.00012 |       0.00000 |       0.00013 |       0.00109 |       0.66581
------------------------------------
| EpLenMean       | 3.01e+03       |
| EpRewMean       | -0.24          |
| EpThisIter      | 1              |
| EpisodesSoFar   | 2341           |
| TimeElapsed     | 4.53e+03       |
| TimestepsSoFar  | 3944448        |
| ev_tdlam_before | -1.22          |
| loss_ent        | 0.66580564     |
| loss_kl         | 0.0010861419   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | 0.000116681564 |
| loss_vf_loss    | 0.00012521914  |
------------------------------------
********** Iteration 963 ************
Optimizing...
     pol_surr |    

********** Iteration 968 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00105 |       0.00000 |       0.00314 |       0.00095 |       0.62340
      0.00038 |       0.00000 |       0.00278 |       0.00100 |       0.62249
      0.00061 |       0.00000 |       0.00252 |       0.00101 |       0.62175
      0.00023 |       0.00000 |       0.00248 |       0.00114 |       0.62094
      0.00066 |       0.00000 |       0.00231 |       0.00101 |       0.62179
      0.00056 |       0.00000 |       0.00223 |       0.00111 |       0.62123
     -0.00047 |       0.00000 |       0.00214 |       0.00108 |       0.62170
      0.00066 |       0.00000 |       0.00203 |       0.00110 |       0.62158
     6.91e-05 |       0.00000 |       0.00216 |       0.00123 |       0.62049
      0.00018 |       0.00000 |       0.00205 |       0.00116 |       0.62115
Evaluating losses...
     -0.00039 |       0.00000 |       0.00202 |       0.00111 |      

      0.00042 |       0.00000 |       0.00127 |       0.00101 |       0.64684
      0.00043 |       0.00000 |       0.00125 |       0.00101 |       0.64581
      0.00053 |       0.00000 |       0.00128 |       0.00103 |       0.64617
      0.00012 |       0.00000 |       0.00128 |       0.00105 |       0.64604
Evaluating losses...
      0.00042 |       0.00000 |       0.00119 |       0.00101 |       0.64670
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.31         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2356          |
| TimeElapsed     | 4.58e+03      |
| TimestepsSoFar  | 3989504       |
| ev_tdlam_before | 0.388         |
| loss_ent        | 0.64669716    |
| loss_kl         | 0.0010084728  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00042011973 |
| loss_vf_loss    | 0.0011871761  |
-----------------------------------
********** Iteration 974 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 979 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00078 |       0.00000 |       0.00128 |       0.00117 |       0.63008
      0.00063 |       0.00000 |       0.00116 |       0.00110 |       0.62923
     -0.00013 |       0.00000 |       0.00114 |       0.00127 |       0.62890
     -0.00019 |       0.00000 |       0.00107 |       0.00111 |       0.62836
      0.00029 |       0.00000 |       0.00109 |       0.00102 |       0.62849
     -0.00012 |       0.00000 |       0.00107 |       0.00128 |       0.62853
      0.00019 |       0.00000 |       0.00105 |       0.00113 |       0.62791
      0.00041 |       0.00000 |       0.00103 |       0.00111 |       0.62777
      0.00014 |       0.00000 |       0.00106 |       0.00113 |       0.62819
      0.00020 |       0.00000 |       0.00105 |       0.00119 |       0.62830
Evaluating losses...
     -0.00074 |       0.00000 |       0.00105 |       0.00118 |      

     -0.00016 |       0.00000 |       0.00115 |       0.00114 |       0.67005
     -0.00014 |       0.00000 |       0.00111 |       0.00111 |       0.67049
     -0.00050 |       0.00000 |       0.00107 |       0.00114 |       0.67001
Evaluating losses...
     -0.00052 |       0.00000 |       0.00108 |       0.00114 |       0.67017
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.21         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2371          |
| TimeElapsed     | 4.65e+03      |
| TimestepsSoFar  | 4034560       |
| ev_tdlam_before | 0.601         |
| loss_ent        | 0.67017186    |
| loss_kl         | 0.0011352487  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0005163541 |
| loss_vf_loss    | 0.0010762715  |
-----------------------------------
********** Iteration 985 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00079 |       0.00000 |  

********** Iteration 990 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00183 |       0.00000 |       0.00238 |       0.00090 |       0.63908
      0.00170 |       0.00000 |       0.00227 |       0.00099 |       0.63910
      0.00084 |       0.00000 |       0.00221 |       0.00097 |       0.63918
      0.00199 |       0.00000 |       0.00221 |       0.00120 |       0.64028
      0.00108 |       0.00000 |       0.00217 |       0.00131 |       0.64046
      0.00017 |       0.00000 |       0.00210 |       0.00118 |       0.64020
      0.00128 |       0.00000 |       0.00212 |       0.00114 |       0.63974
      0.00092 |       0.00000 |       0.00208 |       0.00118 |       0.64010
      0.00039 |       0.00000 |       0.00206 |       0.00109 |       0.64007
      0.00024 |       0.00000 |       0.00205 |       0.00116 |       0.64016
Evaluating losses...
    -7.95e-05 |       0.00000 |       0.00201 |       0.00114 |      

     7.30e-05 |       0.00000 |       0.00238 |       0.00119 |       0.60663
      0.00016 |       0.00000 |       0.00229 |       0.00124 |       0.60653
     5.87e-05 |       0.00000 |       0.00227 |       0.00121 |       0.60649
Evaluating losses...
     -0.00014 |       0.00000 |       0.00223 |       0.00117 |       0.60657
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.22         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2386          |
| TimeElapsed     | 4.7e+03       |
| TimestepsSoFar  | 4079616       |
| ev_tdlam_before | 0.615         |
| loss_ent        | 0.60657376    |
| loss_kl         | 0.0011694083  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0001363249 |
| loss_vf_loss    | 0.0022310049  |
-----------------------------------
********** Iteration 996 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00065 |       0.00000 |  

********** Iteration 1001 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00210 |       0.00000 |       0.00035 |       0.00095 |       0.64420
      0.00138 |       0.00000 |       0.00027 |       0.00106 |       0.64300
      0.00100 |       0.00000 |       0.00024 |       0.00106 |       0.64311
      0.00145 |       0.00000 |       0.00023 |       0.00108 |       0.64262
      0.00079 |       0.00000 |       0.00023 |       0.00104 |       0.64219
      0.00059 |       0.00000 |       0.00021 |       0.00106 |       0.64237
      0.00064 |       0.00000 |       0.00019 |       0.00118 |       0.64222
      0.00085 |       0.00000 |       0.00020 |       0.00117 |       0.64223
      0.00057 |       0.00000 |       0.00019 |       0.00112 |       0.64200
      0.00056 |       0.00000 |       0.00017 |       0.00115 |       0.64255
Evaluating losses...
      0.00061 |       0.00000 |       0.00016 |       0.00110 |     

      0.00086 |       0.00000 |      5.65e-05 |       0.00095 |       0.64115
      0.00122 |       0.00000 |      6.74e-05 |       0.00099 |       0.64105
      0.00137 |       0.00000 |      5.61e-05 |       0.00106 |       0.64105
Evaluating losses...
      0.00131 |       0.00000 |      5.90e-05 |       0.00101 |       0.64152
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.22        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2401         |
| TimeElapsed     | 4.75e+03     |
| TimestepsSoFar  | 4124672      |
| ev_tdlam_before | -3.41        |
| loss_ent        | 0.6415207    |
| loss_kl         | 0.0010056073 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0013143044 |
| loss_vf_loss    | 5.903113e-05 |
----------------------------------
********** Iteration 1007 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00081 |       0.00000 |       0.00327 

********** Iteration 1012 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00106 |       0.00000 |       0.00183 |       0.00090 |       0.63671
      0.00074 |       0.00000 |       0.00173 |       0.00089 |       0.63668
      0.00050 |       0.00000 |       0.00172 |       0.00091 |       0.63720
      0.00058 |       0.00000 |       0.00166 |       0.00091 |       0.63686
      0.00103 |       0.00000 |       0.00165 |       0.00097 |       0.63710
      0.00070 |       0.00000 |       0.00159 |       0.00091 |       0.63717
      0.00072 |       0.00000 |       0.00156 |       0.00089 |       0.63692
      0.00072 |       0.00000 |       0.00155 |       0.00097 |       0.63722
      0.00039 |       0.00000 |       0.00153 |       0.00098 |       0.63758
      0.00078 |       0.00000 |       0.00154 |       0.00095 |       0.63772
Evaluating losses...
      0.00059 |       0.00000 |       0.00149 |       0.00092 |     

      0.00154 |       0.00000 |       0.00243 |       0.00091 |       0.61063
      0.00020 |       0.00000 |       0.00237 |       0.00092 |       0.61090
      0.00068 |       0.00000 |       0.00229 |       0.00093 |       0.61142
Evaluating losses...
      0.00071 |       0.00000 |       0.00228 |       0.00090 |       0.61109
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.25         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2416          |
| TimeElapsed     | 4.79e+03      |
| TimestepsSoFar  | 4169728       |
| ev_tdlam_before | 0.658         |
| loss_ent        | 0.6110943     |
| loss_kl         | 0.00090340135 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0007080514  |
| loss_vf_loss    | 0.0022796353  |
-----------------------------------
********** Iteration 1018 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00153 |       0.00000 | 

********** Iteration 1023 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00029 |       0.00000 |       0.00232 |       0.00070 |       0.53256
      0.00100 |       0.00000 |       0.00217 |       0.00070 |       0.53226
      0.00050 |       0.00000 |       0.00199 |       0.00072 |       0.53253
      0.00088 |       0.00000 |       0.00194 |       0.00076 |       0.53221
      0.00057 |       0.00000 |       0.00184 |       0.00075 |       0.53219
      0.00070 |       0.00000 |       0.00187 |       0.00076 |       0.53191
      0.00020 |       0.00000 |       0.00177 |       0.00080 |       0.53175
      0.00058 |       0.00000 |       0.00171 |       0.00077 |       0.53199
      0.00063 |       0.00000 |       0.00179 |       0.00081 |       0.53130
      0.00030 |       0.00000 |       0.00167 |       0.00084 |       0.53131
Evaluating losses...
      0.00024 |       0.00000 |       0.00166 |       0.00081 |     

      0.00141 |       0.00000 |       0.00154 |       0.00095 |       0.66898
      0.00036 |       0.00000 |       0.00153 |       0.00101 |       0.66873
      0.00020 |       0.00000 |       0.00151 |       0.00097 |       0.66874
Evaluating losses...
      0.00047 |       0.00000 |       0.00148 |       0.00092 |       0.66904
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.11         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2431          |
| TimeElapsed     | 4.84e+03      |
| TimestepsSoFar  | 4214784       |
| ev_tdlam_before | 0.392         |
| loss_ent        | 0.6690424     |
| loss_kl         | 0.0009160983  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00046860124 |
| loss_vf_loss    | 0.0014823506  |
-----------------------------------
********** Iteration 1029 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00099 |       0.00000 | 

********** Iteration 1034 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00100 |       0.00000 |       0.00166 |       0.00079 |       0.62542
      0.00080 |       0.00000 |       0.00151 |       0.00081 |       0.62512
      0.00078 |       0.00000 |       0.00138 |       0.00083 |       0.62486
      0.00053 |       0.00000 |       0.00138 |       0.00088 |       0.62504
      0.00073 |       0.00000 |       0.00127 |       0.00082 |       0.62498
      0.00108 |       0.00000 |       0.00128 |       0.00086 |       0.62520
      0.00098 |       0.00000 |       0.00119 |       0.00085 |       0.62496
      0.00067 |       0.00000 |       0.00121 |       0.00084 |       0.62498
      0.00075 |       0.00000 |       0.00118 |       0.00085 |       0.62501
      0.00078 |       0.00000 |       0.00112 |       0.00090 |       0.62468
Evaluating losses...
      0.00041 |       0.00000 |       0.00113 |       0.00091 |     

********** Iteration 1045 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00099 |       0.00000 |       0.00299 |       0.00088 |       0.60938
      0.00015 |       0.00000 |       0.00278 |       0.00091 |       0.60974
      0.00041 |       0.00000 |       0.00269 |       0.00091 |       0.60899
      0.00045 |       0.00000 |       0.00261 |       0.00090 |       0.60933
      0.00083 |       0.00000 |       0.00262 |       0.00092 |       0.60917
      0.00051 |       0.00000 |       0.00256 |       0.00090 |       0.60904
      0.00047 |       0.00000 |       0.00258 |       0.00093 |       0.60902
      0.00015 |       0.00000 |       0.00253 |       0.00093 |       0.60913
      0.00029 |       0.00000 |       0.00253 |       0.00090 |       0.60865
      0.00042 |       0.00000 |       0.00247 |       0.00106 |       0.60877
Evaluating losses...
    -2.45e-05 |       0.00000 |       0.00242 |       0.00092 |     

      0.00056 |       0.00000 |       0.00150 |       0.00102 |       0.61916
      0.00120 |       0.00000 |       0.00151 |       0.00094 |       0.61921
      0.00114 |       0.00000 |       0.00143 |       0.00103 |       0.61939
Evaluating losses...
      0.00059 |       0.00000 |       0.00147 |       0.00097 |       0.61918
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.12        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2461         |
| TimeElapsed     | 4.96e+03     |
| TimestepsSoFar  | 4304896      |
| ev_tdlam_before | 0.688        |
| loss_ent        | 0.619183     |
| loss_kl         | 0.0009734887 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0005851679 |
| loss_vf_loss    | 0.0014679733 |
----------------------------------
********** Iteration 1051 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00172 |       0.00000 |       0.00186 

********** Iteration 1056 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00186 |       0.00000 |       0.00058 |       0.00085 |       0.61506
      0.00173 |       0.00000 |       0.00045 |       0.00091 |       0.61519
      0.00080 |       0.00000 |       0.00038 |       0.00087 |       0.61516
      0.00047 |       0.00000 |       0.00035 |       0.00094 |       0.61508
      0.00028 |       0.00000 |       0.00029 |       0.00096 |       0.61506
      0.00036 |       0.00000 |       0.00027 |       0.00097 |       0.61469
      0.00064 |       0.00000 |       0.00024 |       0.00098 |       0.61488
      0.00020 |       0.00000 |       0.00023 |       0.00108 |       0.61516
      0.00077 |       0.00000 |       0.00022 |       0.00104 |       0.61500
      0.00036 |       0.00000 |       0.00018 |       0.00105 |       0.61468
Evaluating losses...
      0.00055 |       0.00000 |       0.00018 |       0.00104 |     

      0.00064 |       0.00000 |       0.00282 |       0.00084 |       0.58919
      0.00085 |       0.00000 |       0.00281 |       0.00101 |       0.58888
      0.00053 |       0.00000 |       0.00286 |       0.00093 |       0.58872
Evaluating losses...
      0.00097 |       0.00000 |       0.00282 |       0.00093 |       0.58894
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.11        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2476         |
| TimeElapsed     | 5.01e+03     |
| TimestepsSoFar  | 4349952      |
| ev_tdlam_before | 0.589        |
| loss_ent        | 0.5889391    |
| loss_kl         | 0.000934634  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0009683077 |
| loss_vf_loss    | 0.0028230888 |
----------------------------------
********** Iteration 1062 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00213 |       0.00000 |       0.00122 

********** Iteration 1067 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00183 |       0.00000 |       0.00024 |       0.00085 |       0.64807
      0.00136 |       0.00000 |       0.00018 |       0.00085 |       0.64791
      0.00214 |       0.00000 |       0.00018 |       0.00083 |       0.64799
      0.00169 |       0.00000 |       0.00017 |       0.00088 |       0.64770
      0.00199 |       0.00000 |       0.00016 |       0.00086 |       0.64773
      0.00178 |       0.00000 |       0.00014 |       0.00088 |       0.64774
      0.00169 |       0.00000 |       0.00014 |       0.00092 |       0.64800
      0.00186 |       0.00000 |       0.00014 |       0.00095 |       0.64800
      0.00136 |       0.00000 |       0.00013 |       0.00090 |       0.64763
      0.00138 |       0.00000 |       0.00013 |       0.00093 |       0.64763
Evaluating losses...
      0.00143 |       0.00000 |       0.00013 |       0.00093 |     

      0.00097 |       0.00000 |       0.00477 |       0.00089 |       0.57827
      0.00061 |       0.00000 |       0.00471 |       0.00087 |       0.57812
      0.00094 |       0.00000 |       0.00466 |       0.00086 |       0.57836
Evaluating losses...
      0.00060 |       0.00000 |       0.00472 |       0.00084 |       0.57820
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.14        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2491         |
| TimeElapsed     | 5.06e+03     |
| TimestepsSoFar  | 4395008      |
| ev_tdlam_before | 0.458        |
| loss_ent        | 0.57819766   |
| loss_kl         | 0.0008350963 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0006048712 |
| loss_vf_loss    | 0.004715803  |
----------------------------------
********** Iteration 1073 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00265 |       0.00000 |       0.00040 

********** Iteration 1078 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00132 |       0.00000 |       0.00207 |       0.00077 |       0.62524
      0.00102 |       0.00000 |       0.00202 |       0.00075 |       0.62538
      0.00109 |       0.00000 |       0.00201 |       0.00073 |       0.62514
      0.00165 |       0.00000 |       0.00196 |       0.00081 |       0.62500
      0.00126 |       0.00000 |       0.00194 |       0.00079 |       0.62522
      0.00114 |       0.00000 |       0.00193 |       0.00081 |       0.62502
      0.00129 |       0.00000 |       0.00191 |       0.00081 |       0.62545
      0.00099 |       0.00000 |       0.00188 |       0.00082 |       0.62495
      0.00092 |       0.00000 |       0.00188 |       0.00082 |       0.62531
      0.00101 |       0.00000 |       0.00187 |       0.00083 |       0.62525
Evaluating losses...
      0.00067 |       0.00000 |       0.00186 |       0.00080 |     

      0.00104 |       0.00000 |       0.00128 |       0.00076 |       0.58051
      0.00146 |       0.00000 |       0.00126 |       0.00085 |       0.58066
      0.00085 |       0.00000 |       0.00127 |       0.00082 |       0.58058
Evaluating losses...
      0.00011 |       0.00000 |       0.00128 |       0.00080 |       0.58064
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.09         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2506          |
| TimeElapsed     | 5.11e+03      |
| TimestepsSoFar  | 4440064       |
| ev_tdlam_before | 0.565         |
| loss_ent        | 0.58063704    |
| loss_kl         | 0.0007986714  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00011023891 |
| loss_vf_loss    | 0.0012805536  |
-----------------------------------
********** Iteration 1084 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00166 |       0.00000 | 

********** Iteration 1089 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00210 |       0.00000 |       0.00295 |       0.00091 |       0.66061
      0.00166 |       0.00000 |       0.00272 |       0.00088 |       0.66069
      0.00115 |       0.00000 |       0.00260 |       0.00089 |       0.66031
      0.00176 |       0.00000 |       0.00252 |       0.00089 |       0.66064
      0.00089 |       0.00000 |       0.00245 |       0.00094 |       0.65970
      0.00153 |       0.00000 |       0.00243 |       0.00088 |       0.65967
      0.00127 |       0.00000 |       0.00240 |       0.00088 |       0.65988
      0.00183 |       0.00000 |       0.00226 |       0.00089 |       0.66022
      0.00140 |       0.00000 |       0.00230 |       0.00092 |       0.65959
      0.00099 |       0.00000 |       0.00226 |       0.00093 |       0.65932
Evaluating losses...
      0.00150 |       0.00000 |       0.00221 |       0.00095 |     

      0.00097 |       0.00000 |       0.00209 |       0.00086 |       0.67616
      0.00118 |       0.00000 |       0.00207 |       0.00091 |       0.67584
      0.00152 |       0.00000 |       0.00206 |       0.00085 |       0.67577
Evaluating losses...
      0.00108 |       0.00000 |       0.00207 |       0.00088 |       0.67574
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.11        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2521         |
| TimeElapsed     | 5.16e+03     |
| TimestepsSoFar  | 4485120      |
| ev_tdlam_before | 0.467        |
| loss_ent        | 0.67573756   |
| loss_kl         | 0.0008842952 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0010820694 |
| loss_vf_loss    | 0.0020653827 |
----------------------------------
********** Iteration 1095 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00200 |       0.00000 |       0.00028 

********** Iteration 1100 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00172 |       0.00000 |       0.00345 |       0.00074 |       0.59444
      0.00152 |       0.00000 |       0.00284 |       0.00077 |       0.59338
      0.00100 |       0.00000 |       0.00256 |       0.00084 |       0.59307
      0.00100 |       0.00000 |       0.00233 |       0.00077 |       0.59313
      0.00130 |       0.00000 |       0.00228 |       0.00077 |       0.59322
      0.00111 |       0.00000 |       0.00212 |       0.00077 |       0.59330
      0.00124 |       0.00000 |       0.00199 |       0.00087 |       0.59317
      0.00116 |       0.00000 |       0.00204 |       0.00086 |       0.59273
      0.00041 |       0.00000 |       0.00199 |       0.00084 |       0.59253
      0.00043 |       0.00000 |       0.00188 |       0.00079 |       0.59233
Evaluating losses...
      0.00149 |       0.00000 |       0.00181 |       0.00084 |     

      0.00079 |       0.00000 |       0.00139 |       0.00079 |       0.58411
      0.00068 |       0.00000 |       0.00140 |       0.00079 |       0.58368
      0.00103 |       0.00000 |       0.00145 |       0.00082 |       0.58405
Evaluating losses...
      0.00073 |       0.00000 |       0.00133 |       0.00082 |       0.58398
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.2          |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2536          |
| TimeElapsed     | 5.23e+03      |
| TimestepsSoFar  | 4530176       |
| ev_tdlam_before | 0.599         |
| loss_ent        | 0.58398014    |
| loss_kl         | 0.00081813824 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.000730239   |
| loss_vf_loss    | 0.0013323602  |
-----------------------------------
********** Iteration 1106 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00237 |       0.00000 | 

********** Iteration 1111 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00151 |       0.00000 |       0.00230 |       0.00069 |       0.56994
      0.00118 |       0.00000 |       0.00201 |       0.00068 |       0.56971
      0.00204 |       0.00000 |       0.00205 |       0.00068 |       0.56964
      0.00145 |       0.00000 |       0.00192 |       0.00072 |       0.56951
      0.00173 |       0.00000 |       0.00188 |       0.00069 |       0.56950
      0.00129 |       0.00000 |       0.00188 |       0.00074 |       0.56922
      0.00155 |       0.00000 |       0.00175 |       0.00070 |       0.56944
      0.00082 |       0.00000 |       0.00172 |       0.00069 |       0.56943
      0.00101 |       0.00000 |       0.00171 |       0.00075 |       0.56953
      0.00092 |       0.00000 |       0.00165 |       0.00072 |       0.56953
Evaluating losses...
      0.00122 |       0.00000 |       0.00168 |       0.00072 |     

      0.00143 |       0.00000 |       0.00039 |       0.00079 |       0.59742
      0.00122 |       0.00000 |       0.00039 |       0.00072 |       0.59705
      0.00118 |       0.00000 |       0.00038 |       0.00072 |       0.59711
Evaluating losses...
      0.00114 |       0.00000 |       0.00036 |       0.00080 |       0.59713
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.12         |
| EpThisIter      | 2             |
| EpisodesSoFar   | 2551          |
| TimeElapsed     | 5.28e+03      |
| TimestepsSoFar  | 4575232       |
| ev_tdlam_before | -0.043        |
| loss_ent        | 0.5971337     |
| loss_kl         | 0.0008002684  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0011413621  |
| loss_vf_loss    | 0.00035845986 |
-----------------------------------
********** Iteration 1117 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00150 |       0.00000 | 

********** Iteration 1122 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00117 |       0.00000 |       0.00534 |       0.00063 |       0.55830
      0.00184 |       0.00000 |       0.00524 |       0.00072 |       0.55811
      0.00116 |       0.00000 |       0.00504 |       0.00069 |       0.55818
      0.00134 |       0.00000 |       0.00496 |       0.00067 |       0.55798
      0.00127 |       0.00000 |       0.00498 |       0.00071 |       0.55810
      0.00104 |       0.00000 |       0.00486 |       0.00070 |       0.55805
      0.00088 |       0.00000 |       0.00484 |       0.00071 |       0.55820
      0.00118 |       0.00000 |       0.00477 |       0.00073 |       0.55809
      0.00077 |       0.00000 |       0.00476 |       0.00070 |       0.55794
      0.00083 |       0.00000 |       0.00470 |       0.00069 |       0.55806
Evaluating losses...
      0.00065 |       0.00000 |       0.00465 |       0.00070 |     

      0.00090 |       0.00000 |       0.00344 |       0.00079 |       0.59684
      0.00072 |       0.00000 |       0.00335 |       0.00077 |       0.59761
      0.00044 |       0.00000 |       0.00335 |       0.00080 |       0.59770
Evaluating losses...
      0.00098 |       0.00000 |       0.00327 |       0.00078 |       0.59758
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.13        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2566         |
| TimeElapsed     | 5.32e+03     |
| TimestepsSoFar  | 4620288      |
| ev_tdlam_before | 0.383        |
| loss_ent        | 0.59758043   |
| loss_kl         | 0.0007793774 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0009779531 |
| loss_vf_loss    | 0.003274918  |
----------------------------------
********** Iteration 1128 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00176 |       0.00000 |       0.00302 

********** Iteration 1133 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00154 |       0.00000 |       0.00554 |       0.00081 |       0.62299
      0.00121 |       0.00000 |       0.00528 |       0.00091 |       0.62331
      0.00154 |       0.00000 |       0.00525 |       0.00095 |       0.62334
      0.00112 |       0.00000 |       0.00506 |       0.00099 |       0.62326
      0.00108 |       0.00000 |       0.00509 |       0.00094 |       0.62289
      0.00088 |       0.00000 |       0.00496 |       0.00090 |       0.62291
      0.00105 |       0.00000 |       0.00491 |       0.00092 |       0.62328
      0.00122 |       0.00000 |       0.00484 |       0.00093 |       0.62323
      0.00109 |       0.00000 |       0.00486 |       0.00106 |       0.62310
      0.00044 |       0.00000 |       0.00485 |       0.00102 |       0.62288
Evaluating losses...
      0.00022 |       0.00000 |       0.00475 |       0.00094 |     

********** Iteration 1144 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00311 |       0.00000 |       0.00019 |       0.00071 |       0.60269
      0.00298 |       0.00000 |       0.00017 |       0.00068 |       0.60303
      0.00249 |       0.00000 |       0.00016 |       0.00074 |       0.60321
      0.00315 |       0.00000 |       0.00016 |       0.00074 |       0.60337
      0.00288 |       0.00000 |       0.00014 |       0.00071 |       0.60312
      0.00280 |       0.00000 |       0.00014 |       0.00071 |       0.60306
      0.00245 |       0.00000 |       0.00014 |       0.00069 |       0.60274
      0.00252 |       0.00000 |       0.00013 |       0.00074 |       0.60323
      0.00289 |       0.00000 |       0.00013 |       0.00072 |       0.60327
      0.00263 |       0.00000 |       0.00012 |       0.00074 |       0.60350
Evaluating losses...
      0.00277 |       0.00000 |       0.00012 |       0.00071 |     

      0.00250 |       0.00000 |       0.00179 |       0.00071 |       0.59993
      0.00170 |       0.00000 |       0.00183 |       0.00073 |       0.59995
      0.00179 |       0.00000 |       0.00175 |       0.00075 |       0.59974
Evaluating losses...
      0.00164 |       0.00000 |       0.00180 |       0.00079 |       0.59962
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.09        |
| EpThisIter      | 2            |
| EpisodesSoFar   | 2596         |
| TimeElapsed     | 5.42e+03     |
| TimestepsSoFar  | 4710400      |
| ev_tdlam_before | 0.463        |
| loss_ent        | 0.5996242    |
| loss_kl         | 0.0007882624 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.00163971   |
| loss_vf_loss    | 0.0018010648 |
----------------------------------
********** Iteration 1150 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00306 |       0.00000 |       0.00046 

********** Iteration 1155 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00147 |       0.00000 |       0.00453 |       0.00076 |       0.66180
      0.00154 |       0.00000 |       0.00419 |       0.00076 |       0.66201
      0.00159 |       0.00000 |       0.00416 |       0.00076 |       0.66168
      0.00141 |       0.00000 |       0.00393 |       0.00079 |       0.66196
      0.00154 |       0.00000 |       0.00389 |       0.00076 |       0.66212
      0.00154 |       0.00000 |       0.00390 |       0.00082 |       0.66204
      0.00132 |       0.00000 |       0.00372 |       0.00080 |       0.66201
      0.00111 |       0.00000 |       0.00370 |       0.00082 |       0.66216
      0.00147 |       0.00000 |       0.00369 |       0.00084 |       0.66192
      0.00139 |       0.00000 |       0.00356 |       0.00082 |       0.66269
Evaluating losses...
      0.00152 |       0.00000 |       0.00364 |       0.00084 |     

      0.00370 |       0.00000 |       0.00014 |       0.00079 |       0.65204
      0.00381 |       0.00000 |       0.00013 |       0.00080 |       0.65177
      0.00408 |       0.00000 |       0.00013 |       0.00079 |       0.65185
      0.00380 |       0.00000 |       0.00013 |       0.00075 |       0.65191
Evaluating losses...
      0.00356 |       0.00000 |       0.00012 |       0.00077 |       0.65212
-----------------------------------
| EpLenMean       | 3.02e+03      |
| EpRewMean       | -0.16         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2610          |
| TimeElapsed     | 5.49e+03      |
| TimestepsSoFar  | 4755456       |
| ev_tdlam_before | -0.55         |
| loss_ent        | 0.6521239     |
| loss_kl         | 0.0007745949  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0035643424  |
| loss_vf_loss    | 0.00011918558 |
-----------------------------------
********** Iteration 1161 ************
Optimizing...
     pol_surr |    pol_entpen | 

********** Iteration 1166 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00405 |       0.00000 |       0.00033 |       0.00093 |       0.72699
      0.00371 |       0.00000 |       0.00030 |       0.00095 |       0.72741
      0.00365 |       0.00000 |       0.00027 |       0.00089 |       0.72719
      0.00306 |       0.00000 |       0.00029 |       0.00090 |       0.72722
      0.00339 |       0.00000 |       0.00024 |       0.00095 |       0.72761
      0.00280 |       0.00000 |       0.00024 |       0.00093 |       0.72719
      0.00339 |       0.00000 |       0.00023 |       0.00093 |       0.72704
      0.00352 |       0.00000 |       0.00023 |       0.00100 |       0.72724
      0.00352 |       0.00000 |       0.00023 |       0.00096 |       0.72691
      0.00308 |       0.00000 |       0.00021 |       0.00095 |       0.72712
Evaluating losses...
      0.00313 |       0.00000 |       0.00022 |       0.00097 |     

      0.00486 |       0.00000 |       0.00011 |       0.00087 |       0.70815
      0.00410 |       0.00000 |       0.00011 |       0.00087 |       0.70831
      0.00411 |       0.00000 |       0.00011 |       0.00086 |       0.70817
Evaluating losses...
      0.00439 |       0.00000 |       0.00012 |       0.00086 |       0.70798
------------------------------------
| EpLenMean       | 3.02e+03       |
| EpRewMean       | -0.13          |
| EpThisIter      | 1              |
| EpisodesSoFar   | 2625           |
| TimeElapsed     | 5.54e+03       |
| TimestepsSoFar  | 4800512        |
| ev_tdlam_before | -0.748         |
| loss_ent        | 0.70798177     |
| loss_kl         | 0.00085616903  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | 0.004385234    |
| loss_vf_loss    | 0.000119269665 |
------------------------------------
********** Iteration 1172 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00142 |   

********** Iteration 1177 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00210 |       0.00000 |       0.00193 |       0.00078 |       0.65281
      0.00154 |       0.00000 |       0.00190 |       0.00080 |       0.65294
      0.00187 |       0.00000 |       0.00186 |       0.00082 |       0.65303
      0.00229 |       0.00000 |       0.00186 |       0.00080 |       0.65292
      0.00270 |       0.00000 |       0.00186 |       0.00084 |       0.65286
      0.00212 |       0.00000 |       0.00188 |       0.00083 |       0.65313
      0.00228 |       0.00000 |       0.00185 |       0.00081 |       0.65324
      0.00168 |       0.00000 |       0.00189 |       0.00080 |       0.65310
      0.00368 |       0.00000 |       0.00182 |       0.00087 |       0.65286
      0.00206 |       0.00000 |       0.00185 |       0.00084 |       0.65271
Evaluating losses...
      0.00189 |       0.00000 |       0.00183 |       0.00086 |     

      0.00248 |       0.00000 |       0.00093 |       0.00083 |       0.63308
      0.00235 |       0.00000 |       0.00093 |       0.00077 |       0.63334
      0.00254 |       0.00000 |       0.00090 |       0.00081 |       0.63322
Evaluating losses...
      0.00304 |       0.00000 |       0.00094 |       0.00076 |       0.63326
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.21         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2640          |
| TimeElapsed     | 5.59e+03      |
| TimestepsSoFar  | 4845568       |
| ev_tdlam_before | 0.781         |
| loss_ent        | 0.63326406    |
| loss_kl         | 0.00075876614 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0030377018  |
| loss_vf_loss    | 0.00093523855 |
-----------------------------------
********** Iteration 1183 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00162 |       0.00000 | 

********** Iteration 1188 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00384 |       0.00000 |       0.00022 |       0.00079 |       0.63505
      0.00412 |       0.00000 |       0.00019 |       0.00077 |       0.63463
      0.00461 |       0.00000 |       0.00018 |       0.00082 |       0.63492
      0.00449 |       0.00000 |       0.00018 |       0.00079 |       0.63464
      0.00417 |       0.00000 |       0.00017 |       0.00077 |       0.63506
      0.00489 |       0.00000 |       0.00017 |       0.00077 |       0.63480
      0.00370 |       0.00000 |       0.00015 |       0.00076 |       0.63486
      0.00396 |       0.00000 |       0.00017 |       0.00078 |       0.63498
      0.00465 |       0.00000 |       0.00016 |       0.00078 |       0.63517
      0.00368 |       0.00000 |       0.00016 |       0.00081 |       0.63492
Evaluating losses...
      0.00360 |       0.00000 |       0.00016 |       0.00078 |     

      0.00408 |       0.00000 |       0.00016 |       0.00079 |       0.67003
      0.00392 |       0.00000 |       0.00016 |       0.00079 |       0.67020
      0.00391 |       0.00000 |       0.00016 |       0.00077 |       0.67070
Evaluating losses...
      0.00427 |       0.00000 |       0.00014 |       0.00077 |       0.67054
-----------------------------------
| EpLenMean       | 3.01e+03      |
| EpRewMean       | -0.24         |
| EpThisIter      | 1             |
| EpisodesSoFar   | 2655          |
| TimeElapsed     | 5.64e+03      |
| TimestepsSoFar  | 4890624       |
| ev_tdlam_before | -0.538        |
| loss_ent        | 0.6705402     |
| loss_kl         | 0.0007684392  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.004265221   |
| loss_vf_loss    | 0.00014295861 |
-----------------------------------
********** Iteration 1194 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00327 |       0.00000 | 

********** Iteration 1199 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00194 |       0.00000 |       0.00187 |       0.00072 |       0.61599
      0.00191 |       0.00000 |       0.00188 |       0.00076 |       0.61616
      0.00183 |       0.00000 |       0.00186 |       0.00074 |       0.61581
      0.00147 |       0.00000 |       0.00194 |       0.00074 |       0.61563
      0.00180 |       0.00000 |       0.00181 |       0.00078 |       0.61633
      0.00170 |       0.00000 |       0.00186 |       0.00074 |       0.61627
      0.00169 |       0.00000 |       0.00182 |       0.00072 |       0.61623
      0.00185 |       0.00000 |       0.00177 |       0.00075 |       0.61573
      0.00162 |       0.00000 |       0.00183 |       0.00073 |       0.61629
      0.00186 |       0.00000 |       0.00180 |       0.00072 |       0.61569
Evaluating losses...
      0.00203 |       0.00000 |       0.00172 |       0.00075 |     

      0.00171 |       0.00000 |       0.00422 |       0.00068 |       0.58428
      0.00221 |       0.00000 |       0.00414 |       0.00069 |       0.58423
      0.00177 |       0.00000 |       0.00405 |       0.00070 |       0.58394
Evaluating losses...
      0.00160 |       0.00000 |       0.00407 |       0.00071 |       0.58439
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.18        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2670         |
| TimeElapsed     | 5.69e+03     |
| TimestepsSoFar  | 4935680      |
| ev_tdlam_before | 0.321        |
| loss_ent        | 0.5843869    |
| loss_kl         | 0.0007061221 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0016013669 |
| loss_vf_loss    | 0.0040657544 |
----------------------------------
********** Iteration 1205 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00218 |       0.00000 |       0.00215 

********** Iteration 1210 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00264 |       0.00000 |       0.00203 |       0.00068 |       0.57522
      0.00277 |       0.00000 |       0.00198 |       0.00063 |       0.57532
      0.00254 |       0.00000 |       0.00198 |       0.00068 |       0.57504
      0.00325 |       0.00000 |       0.00198 |       0.00066 |       0.57526
      0.00279 |       0.00000 |       0.00199 |       0.00066 |       0.57538
      0.00368 |       0.00000 |       0.00194 |       0.00069 |       0.57542
      0.00287 |       0.00000 |       0.00195 |       0.00064 |       0.57553
      0.00252 |       0.00000 |       0.00194 |       0.00066 |       0.57516
      0.00330 |       0.00000 |       0.00190 |       0.00068 |       0.57519
      0.00261 |       0.00000 |       0.00193 |       0.00066 |       0.57526
Evaluating losses...
      0.00259 |       0.00000 |       0.00194 |       0.00067 |     

      0.00264 |       0.00000 |       0.00340 |       0.00076 |       0.65949
      0.00300 |       0.00000 |       0.00334 |       0.00078 |       0.65951
      0.00270 |       0.00000 |       0.00341 |       0.00076 |       0.65923
Evaluating losses...
      0.00327 |       0.00000 |       0.00336 |       0.00077 |       0.65876
----------------------------------
| EpLenMean       | 3.01e+03     |
| EpRewMean       | -0.19        |
| EpThisIter      | 1            |
| EpisodesSoFar   | 2685         |
| TimeElapsed     | 5.74e+03     |
| TimestepsSoFar  | 4980736      |
| ev_tdlam_before | 0.351        |
| loss_ent        | 0.6587631    |
| loss_kl         | 0.0007678833 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0032675634 |
| loss_vf_loss    | 0.0033641139 |
----------------------------------
********** Iteration 1216 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00342 |       0.00000 |       0.00048 

In [51]:
obs = env.reset()
done = False
total_reward = 0

while not done:
    action, _states = model.predict(obs)
    obs, reward, done, info = env.step(action)
    total_reward += reward
    env.render()
env.close()
print("score:", total_reward)

score: -1


In [53]:
import base64
import IPython

def embed_mp4(filename):
    """Embeds an mp4 file in the notebook."""
    video = open(filename,'rb').read()
    b64 = base64.b64encode(video)
    tag = '''
    <video width="640" height="480" controls>
    <source src="data:video/mp4;base64,{0}" type="video/mp4">
    Your browser does not support the video tag.
    </video>'''.format(b64.decode())

    return IPython.display.HTML(tag)

In [54]:
import imageio

num_episodes = 5
video_filename = 'bnn-ppo.mp4'

with imageio.get_writer(video_filename, fps=60) as video:
    for _ in range(num_episodes):
        obs = env.reset()
        done = False
        total_reward = 0
        video.append_data(env.render('rgb_array'))
        
        while not done:
            action, _steps = model.predict(obs)
            obs, reward, done, info = env.step(action)
            total_reward += reward
            video.append_data(env.render('rgb_array'))
        
        print("score:", total_reward)

embed_mp4(video_filename)



score: -1
score: 0
score: 0
score: 0
score: -1
