# PPO in Stable Baselines

In single-agent PPO, `MlpPolicy` was used in `PPO1` as follows:

```
model = PPO1(MlpPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

```

`MlpPolicy` is found in `stable_baselines/common/policies.py`, inheriting `FeedForwardPolicy`, which inherits from `ActorCriticPolicy`.

In `FeedForwardPolicy`'s `__init__`, there contains the following:
```
if net_arch is None:
    if layers is None:
        layers = [64, 64]
    net_arch = [dict(vf=layers, pi=layers)]

with tf.variable_scope("model", reuse=reuse):
    if feature_extraction == "cnn":
        pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
    else:
        pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

    self._value_fn = linear(vf_latent, 'vf', 1)

    self._proba_distribution, self._policy, self.q_value = \
        self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)
```

Since `MlpPolicy` uses `feature_extraction="mlp"`, look into `mlp_extractor` (here)[https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/policies.py].

`mlp_extractor` constructs a MLP that receive observations as input and outputs a latent representation for the policy and a value network. Amount and size of hidden layers and how many shared between policy and value network can be spcified using `net_arch`.

In `mlp_extractor`, it iterates through `net_arch` and creates layers, specifically using `latent = act_fun(linear(latent, ...))`. Therfore, look into `act_fun` and `linear`, which belongs in stable_baselines.common.tf_layers.

`FeedForwardPolicy`'s default for `act_fun` is `tf.tanh`. Linear contains:

```
def linear(input_tensor, scope, n_hidden, *, init_scale=1.0, init_bias=0.0):
    """
    Creates a fully connected layer for TensorFlow
    :param input_tensor: (TensorFlow Tensor) The input tensor for the fully connected layer
    :param scope: (str) The TensorFlow variable scope
    :param n_hidden: (int) The number of hidden neurons
    :param init_scale: (int) The initialization scale
    :param init_bias: (int) The initialization offset bias
    :return: (TensorFlow Tensor) fully connected layer
    """
    with tf.variable_scope(scope):
        n_input = input_tensor.get_shape()[1].value
        weight = tf.get_variable("w", [n_input, n_hidden], initializer=ortho_init(init_scale))
        bias = tf.get_variable("b", [n_hidden], initializer=tf.constant_initializer(init_bias))
        return tf.matmul(input_tensor, weight) + bias
```

Therefore, to transform this model into a Bayesian neural network, the linear layer needs to be changed into DenseVariational instead of a linear layer. We can do this by modifying `FeedForwardPolicy` (which `MlpPolicy` inherits) with a new `bnn_extractor`, then creating a `BnnPolicy` to replace `MlpPolicy`.

In [157]:
from tensorflow.keras import backend as K
from tensorflow.keras import activations, initializers
from tensorflow.keras.layers import Layer

import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfp.__version__

'0.8.0'

In [158]:
# Specify the surrogate posterior over `keras.layers.Dense` `kernel` and `bias`.
def posterior_mean_field(kernel_size, bias_size=0, dtype=None):
    n = kernel_size + bias_size
    c = np.log(np.expm1(1.))
    return tf.keras.Sequential([
        tfp.layers.VariableLayer(2 * n, dtype=dtype),
        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
            tfd.Normal(loc=t[..., :n],
                     scale=1e-5 + tf.nn.softplus(c + t[..., n:])),
            reinterpreted_batch_ndims=1)),
    ])

# Specify the prior over `keras.layers.Dense` `kernel` and `bias`.
def prior_trainable(kernel_size, bias_size=0, dtype=None):
    n = kernel_size + bias_size
    return tf.keras.Sequential([
        tfp.layers.VariableLayer(n, dtype=dtype),
        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
            tfd.Normal(loc=t, scale=1),
            reinterpreted_batch_ndims=1)),
    ])

In [159]:
def bnn_extractor(flat_observations, net_arch, act_fun):
    """
    Constructs an variational layer that receives observations as an input and outputs a latent representation for the policy and
    a value network. The ``net_arch`` parameter allows to specify the amount and size of the hidden layers and how many
    of them are shared between the policy network and the value network. It is assumed to be a list with the following
    structure:
    1. An arbitrary length (zero allowed) number of integers each specifying the number of units in a shared layer.
       If the number of ints is zero, there will be no shared layers.
    2. An optional dict, to specify the following non-shared layers for the value network and the policy network.
       It is formatted like ``dict(vf=[<value layer sizes>], pi=[<policy layer sizes>])``.
       If it is missing any of the keys (pi or vf), no non-shared layers (empty list) is assumed.
    For example to construct a network with one shared layer of size 55 followed by two non-shared layers for the value
    network of size 255 and a single non-shared layer of size 128 for the policy network, the following layers_spec
    would be used: ``[55, dict(vf=[255, 255], pi=[128])]``. A simple shared network topology with two layers of size 128
    would be specified as [128, 128].
    :param flat_observations: (tf.Tensor) The observations to base policy and value function on.
    :param net_arch: ([int or dict]) The specification of the policy and value networks.
        See above for details on its formatting.
    :param act_fun: (tf function) The activation function to use for the networks.
    :return: (tf.Tensor, tf.Tensor) latent_policy, latent_value of the specified network.
        If all layers are shared, then ``latent_policy == latent_value``
    """
    latent = flat_observations
    policy_only_layers = []  # Layer sizes of the network that only belongs to the policy network
    value_only_layers = []  # Layer sizes of the network that only belongs to the value network

    # Iterate through the shared layers and build the shared parts of the network
    for idx, layer in enumerate(net_arch):
        if isinstance(layer, int):  # Check that this is a shared layer
            layer_size = layer
#             latent = act_fun(linear(latent, "shared_fc{}".format(idx), layer_size, init_scale=np.sqrt(2)))
            latent = act_fun(tfp.layers.DenseVariational(layer_size, posterior_mean_field, prior_trainable, kl_weight=1))
        else:
            assert isinstance(layer, dict), "Error: the net_arch list can only contain ints and dicts"
            if 'pi' in layer:
                assert isinstance(layer['pi'], list), "Error: net_arch[-1]['pi'] must contain a list of integers."
                policy_only_layers = layer['pi']

            if 'vf' in layer:
                assert isinstance(layer['vf'], list), "Error: net_arch[-1]['vf'] must contain a list of integers."
                value_only_layers = layer['vf']
            break  # From here on the network splits up in policy and value network

    # Build the non-shared part of the network
    latent_policy = latent
    latent_value = latent
    for idx, (pi_layer_size, vf_layer_size) in enumerate(zip_longest(policy_only_layers, value_only_layers)):
        if pi_layer_size is not None:
            assert isinstance(pi_layer_size, int), "Error: net_arch[-1]['pi'] must only contain integers."
#             latent_policy = act_fun(linear(latent_policy, "pi_fc{}".format(idx), pi_layer_size, init_scale=np.sqrt(2)))
            latent_policy = act_fun(tfp.layers.DenseVariational(pi_layer_size, posterior_mean_field, prior_trainable, kl_weight=1)(latent_policy))

        if vf_layer_size is not None:
            assert isinstance(vf_layer_size, int), "Error: net_arch[-1]['vf'] must only contain integers."
#             latent_value = act_fun(linear(latent_value, "vf_fc{}".format(idx), vf_layer_size, init_scale=np.sqrt(2)))
            latent_value = act_fun(tfp.layers.DenseVariational(vf_layer_size, posterior_mean_field, prior_trainable, kl_weight=1)(latent_value))

    return latent_policy, latent_value

In [160]:
class FeedForwardPolicy(ActorCriticPolicy):
    """
    Policy object that implements actor critic, using a feed forward neural network.
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param layers: ([int]) (deprecated, use net_arch instead) The size of the Neural network for the policy
        (if None, default to [64, 64])
    :param net_arch: (list) Specification of the actor-critic policy network architecture (see mlp_extractor
        documentation for details).
    :param act_fun: (tf.func) the activation function to use in the neural network.
    :param cnn_extractor: (function (TensorFlow Tensor, ``**kwargs``): (TensorFlow Tensor)) the CNN feature extraction
    :param feature_extraction: (str) The feature extraction type ("cnn" or "mlp")
    :param kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, layers=None, net_arch=None,
                 act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
        super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
                                                scale=(feature_extraction == "cnn"))

        self._kwargs_check(feature_extraction, kwargs)

        if layers is not None:
            warnings.warn("Usage of the `layers` parameter is deprecated! Use net_arch instead "
                          "(it has a different semantics though).", DeprecationWarning)
            if net_arch is not None:
                warnings.warn("The new `net_arch` parameter overrides the deprecated `layers` parameter!",
                              DeprecationWarning)

        if net_arch is None:
            if layers is None:
                layers = [64, 64]
            net_arch = [dict(vf=layers, pi=layers)]

        with tf.variable_scope("model", reuse=reuse):
            if feature_extraction == "cnn":
                pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
            elif feature_extraction == "bnn":
                pi_latent, vf_latent = bnn_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)
            else:
                pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

            self._value_fn = linear(vf_latent, 'vf', 1)

            self._proba_distribution, self._policy, self.q_value = \
                self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)

        self._setup_init()

    def step(self, obs, state=None, mask=None, deterministic=False):
        if deterministic:
            action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        else:
            action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        return action, value, self.initial_state, neglogp

    def proba_step(self, obs, state=None, mask=None):
        return self.sess.run(self.policy_proba, {self.obs_ph: obs})

    def value(self, obs, state=None, mask=None):
        return self.sess.run(self.value_flat, {self.obs_ph: obs})

In [161]:
import warnings
from itertools import zip_longest
from abc import ABC, abstractmethod

import numpy as np
import tensorflow as tf
from gym.spaces import Discrete

from stable_baselines.common.tf_util import batch_to_seq, seq_to_batch
from stable_baselines.common.tf_layers import conv, linear, conv_to_fc, lstm
from stable_baselines.common.distributions import make_proba_dist_type, CategoricalProbabilityDistribution, \
    MultiCategoricalProbabilityDistribution, DiagGaussianProbabilityDistribution, BernoulliProbabilityDistribution
from stable_baselines.common.input import observation_input
from stable_baselines.common.policies import nature_cnn

In [162]:
class BnnPolicy(FeedForwardPolicy):
    """
    Policy object that implements actor critic, using a Bayesian neural net (2 layers of 64)
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param _kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
        super(BnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
                                        feature_extraction="bnn", **_kwargs)

# Single-Agent PPO with BNN

In [163]:
#!/usr/bin/env python3

# Train single CPU PPO1 on slimevolley.
# Should solve it (beat existing AI on average over 1000 trials) in 3 hours on single CPU, within 3M steps.

import os
import gym
import slimevolleygym
from slimevolleygym import SurvivalRewardEnv

from stable_baselines.ppo1 import PPO1
from stable_baselines.common.policies import MlpPolicy
from stable_baselines import logger
from stable_baselines.common.callbacks import EvalCallback

NUM_TIMESTEPS = int(5e6)
SEED = 721
EVAL_FREQ = 250000
EVAL_EPISODES = 10  # was 1000
LOGDIR = "bnn_ppo1" # moved to zoo afterwards.

logger.configure(folder=LOGDIR)

env = gym.make("SlimeVolley-v0")
env.seed(SEED)

Logging to bnn_ppo1


[721]

In [164]:
# take mujoco hyperparams (but doubled timesteps_per_actorbatch to cover more steps.)
model = PPO1(BnnPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

eval_callback = EvalCallback(env, best_model_save_path=LOGDIR, log_path=LOGDIR, eval_freq=EVAL_FREQ, n_eval_episodes=EVAL_EPISODES)

model.learn(total_timesteps=NUM_TIMESTEPS, callback=eval_callback)

model.save(os.path.join(LOGDIR, "final_model")) # probably never get to this point.

env.close()

********** Iteration 0 ************


  "{} != {}".format(self.training_env, self.eval_env))


Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00074 |       0.00000 |       1.01718 |       0.00014 |       2.07934
     3.04e-05 |       0.00000 |       0.88093 |       0.00028 |       2.07920
     -0.00026 |       0.00000 |       0.87572 |       0.00045 |       2.07903
      0.00029 |       0.00000 |       0.76330 |       0.00054 |       2.07893
     -0.00033 |       0.00000 |       0.73383 |       0.00065 |       2.07883
     -0.00106 |       0.00000 |       0.60832 |       0.00084 |       2.07862
     -0.00165 |       0.00000 |       0.54448 |       0.00106 |       2.07842
      0.00029 |       0.00000 |       0.49235 |       0.00124 |       2.07823
     -0.00087 |       0.00000 |       0.48766 |       0.00134 |       2.07813
      0.00092 |       0.00000 |       0.43227 |       0.00147 |       2.07802
Evaluating losses...
     -0.00190 |       0.00000 |       0.41145 |       0.00165 |       2.07782
-----------------------------

      0.00083 |       0.00000 |       0.05020 |       0.00456 |       2.07654
     -0.00032 |       0.00000 |       0.05025 |       0.00439 |       2.07652
     -0.00081 |       0.00000 |       0.04994 |       0.00462 |       2.07635
Evaluating losses...
      0.00092 |       0.00000 |       0.05043 |       0.00448 |       2.07632
----------------------------------
| EpLenMean       | 564          |
| EpRewMean       | -4.93        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 43           |
| TimeElapsed     | 38.9         |
| TimestepsSoFar  | 24576        |
| ev_tdlam_before | -0.0109      |
| loss_ent        | 2.0763159    |
| loss_kl         | 0.0044785896 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0009182282 |
| loss_vf_loss    | 0.05043229   |
----------------------------------
********** Iteration 6 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00154 |       0.00000 |       0.05140 |  

********** Iteration 11 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -2.73e-05 |       0.00000 |       0.05114 |       0.00394 |       2.07346
     1.51e-05 |       0.00000 |       0.05079 |       0.00401 |       2.07348
      0.00199 |       0.00000 |       0.05112 |       0.00391 |       2.07330
     -0.00140 |       0.00000 |       0.05097 |       0.00395 |       2.07327
     -0.00157 |       0.00000 |       0.05087 |       0.00435 |       2.07307
     -0.00209 |       0.00000 |       0.05091 |       0.00419 |       2.07221
     -0.00035 |       0.00000 |       0.05038 |       0.00442 |       2.07200
    -3.37e-05 |       0.00000 |       0.05076 |       0.00467 |       2.07215
      0.00111 |       0.00000 |       0.05128 |       0.00460 |       2.07206
      0.00123 |       0.00000 |       0.05153 |       0.00443 |       2.07171
Evaluating losses...
      0.00044 |       0.00000 |       0.05117 |       0.00454 |       

      0.00290 |       0.00000 |       0.04766 |       0.00470 |       2.07566
      0.00268 |       0.00000 |       0.04717 |       0.00439 |       2.07591
     -0.00097 |       0.00000 |       0.04651 |       0.00482 |       2.07619
      0.00124 |       0.00000 |       0.04655 |       0.00463 |       2.07629
Evaluating losses...
      0.00115 |       0.00000 |       0.04779 |       0.00462 |       2.07606
----------------------------------
| EpLenMean       | 604          |
| EpRewMean       | -4.88        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 116          |
| TimeElapsed     | 111          |
| TimestepsSoFar  | 69632        |
| ev_tdlam_before | -0.00448     |
| loss_ent        | 2.076061     |
| loss_kl         | 0.0046223225 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0011482688 |
| loss_vf_loss    | 0.04778724   |
----------------------------------
********** Iteration 17 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss | 

********** Iteration 22 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00036 |       0.00000 |       0.04540 |       0.00453 |       2.07570
     -0.00065 |       0.00000 |       0.04598 |       0.00451 |       2.07557
      0.00215 |       0.00000 |       0.04563 |       0.00479 |       2.07505
      0.00151 |       0.00000 |       0.04553 |       0.00450 |       2.07520
     -0.00074 |       0.00000 |       0.04614 |       0.00453 |       2.07520
     -0.00155 |       0.00000 |       0.04539 |       0.00477 |       2.07464
     -0.00040 |       0.00000 |       0.04627 |       0.00469 |       2.07481
      0.00080 |       0.00000 |       0.04532 |       0.00505 |       2.07464
     2.44e-05 |       0.00000 |       0.04540 |       0.00507 |       2.07394
      0.00055 |       0.00000 |       0.04542 |       0.00494 |       2.07365
Evaluating losses...
      0.00094 |       0.00000 |       0.04543 |       0.00500 |       

      0.00102 |       0.00000 |       0.04525 |       0.00396 |       2.07366
      0.00028 |       0.00000 |       0.04499 |       0.00397 |       2.07354
Evaluating losses...
      0.00148 |       0.00000 |       0.04462 |       0.00364 |       2.07369
----------------------------------
| EpLenMean       | 611          |
| EpRewMean       | -4.9         |
| EpThisIter      | 6            |
| EpisodesSoFar   | 190          |
| TimeElapsed     | 183          |
| TimestepsSoFar  | 114688       |
| ev_tdlam_before | -0.0127      |
| loss_ent        | 2.0736928    |
| loss_kl         | 0.0036431474 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0014762175 |
| loss_vf_loss    | 0.044619907  |
----------------------------------
********** Iteration 28 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00220 |       0.00000 |       0.04338 |       0.00411 |       2.07352
      0.00042 |       0.00000 |       0.04284 | 

********** Iteration 33 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00183 |       0.00000 |       0.04816 |       0.00428 |       2.07414
      0.00070 |       0.00000 |       0.04851 |       0.00455 |       2.07419
      0.00067 |       0.00000 |       0.04827 |       0.00479 |       2.07358
     -0.00165 |       0.00000 |       0.04794 |       0.00423 |       2.07378
      0.00262 |       0.00000 |       0.04754 |       0.00453 |       2.07382
     -0.00190 |       0.00000 |       0.04786 |       0.00423 |       2.07383
      0.00059 |       0.00000 |       0.04770 |       0.00437 |       2.07409
     -0.00308 |       0.00000 |       0.04740 |       0.00435 |       2.07340
      0.00166 |       0.00000 |       0.04745 |       0.00453 |       2.07363
     -0.00095 |       0.00000 |       0.04845 |       0.00426 |       2.07320
Evaluating losses...
     -0.00112 |       0.00000 |       0.04818 |       0.00475 |       

     -0.00075 |       0.00000 |       0.04497 |       0.00400 |       2.07032
     -0.00142 |       0.00000 |       0.04440 |       0.00403 |       2.07015
      0.00188 |       0.00000 |       0.04511 |       0.00407 |       2.07068
Evaluating losses...
     -0.00192 |       0.00000 |       0.04494 |       0.00401 |       2.06991
-----------------------------------
| EpLenMean       | 608           |
| EpRewMean       | -4.85         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 263           |
| TimeElapsed     | 253           |
| TimestepsSoFar  | 159744        |
| ev_tdlam_before | 0.078         |
| loss_ent        | 2.069914      |
| loss_kl         | 0.004014659   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0019240864 |
| loss_vf_loss    | 0.04494209    |
-----------------------------------
********** Iteration 39 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00121 |       0.00000 |   

********** Iteration 44 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 |       0.03610 |       0.00402 |       2.06791
      0.00132 |       0.00000 |       0.03616 |       0.00414 |       2.06811
     -0.00020 |       0.00000 |       0.03556 |       0.00406 |       2.06811
      0.00161 |       0.00000 |       0.03512 |       0.00403 |       2.06733
      0.00022 |       0.00000 |       0.03599 |       0.00406 |       2.06765
     9.90e-05 |       0.00000 |       0.03546 |       0.00433 |       2.06731
     -0.00106 |       0.00000 |       0.03625 |       0.00406 |       2.06790
     -0.00069 |       0.00000 |       0.03474 |       0.00434 |       2.06747
      0.00169 |       0.00000 |       0.03493 |       0.00441 |       2.06805
     -0.00230 |       0.00000 |       0.03498 |       0.00404 |       2.06706
Evaluating losses...
      0.00052 |       0.00000 |       0.03504 |       0.00429 |       

     -0.00168 |       0.00000 |       0.02310 |       0.00477 |       2.05857
      0.00067 |       0.00000 |       0.02163 |       0.00478 |       2.05827
     -0.00066 |       0.00000 |       0.02273 |       0.00471 |       2.05722
Evaluating losses...
      0.00013 |       0.00000 |       0.02215 |       0.00481 |       2.05786
-----------------------------------
| EpLenMean       | 601           |
| EpRewMean       | -4.85         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 339           |
| TimeElapsed     | 323           |
| TimestepsSoFar  | 204800        |
| ev_tdlam_before | 0.714         |
| loss_ent        | 2.057865      |
| loss_kl         | 0.004809305   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00012547546 |
| loss_vf_loss    | 0.022150118   |
-----------------------------------
********** Iteration 50 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00170 |       0.00000 |   

********** Iteration 55 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -2.67e-05 |       0.00000 |       0.02495 |       0.00397 |       2.04569
     -0.00140 |       0.00000 |       0.02495 |       0.00406 |       2.04480
      0.00021 |       0.00000 |       0.02537 |       0.00398 |       2.04469
     -0.00012 |       0.00000 |       0.02528 |       0.00432 |       2.04349
      0.00020 |       0.00000 |       0.02475 |       0.00437 |       2.04322
     -0.00112 |       0.00000 |       0.02433 |       0.00446 |       2.04430
      0.00026 |       0.00000 |       0.02463 |       0.00413 |       2.04310
      0.00077 |       0.00000 |       0.02532 |       0.00435 |       2.04417
      0.00197 |       0.00000 |       0.02470 |       0.00441 |       2.04283
     -0.00039 |       0.00000 |       0.02487 |       0.00449 |       2.04236
Evaluating losses...
     9.64e-05 |       0.00000 |       0.02504 |       0.00452 |       

      0.00151 |       0.00000 |       0.03200 |       0.00390 |       2.04837
      0.00072 |       0.00000 |       0.03238 |       0.00396 |       2.04850
     2.21e-05 |       0.00000 |       0.03213 |       0.00393 |       2.04846
Evaluating losses...
     -0.00095 |       0.00000 |       0.03121 |       0.00407 |       2.04843
-----------------------------------
| EpLenMean       | 607           |
| EpRewMean       | -4.86         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 412           |
| TimeElapsed     | 396           |
| TimestepsSoFar  | 249856        |
| ev_tdlam_before | 0.667         |
| loss_ent        | 2.048428      |
| loss_kl         | 0.004071019   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009456637 |
| loss_vf_loss    | 0.031209854   |
-----------------------------------
********** Iteration 61 ************
Eval num_timesteps=249856, episode_reward=-5.00 +/- 0.00
Episode length: 647.20 +/- 88.13
New best mean reward!
Optimizing...


********** Iteration 66 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00021 |       0.00000 |       0.02738 |       0.00443 |       2.03635
      0.00067 |       0.00000 |       0.02811 |       0.00412 |       2.03679
      0.00251 |       0.00000 |       0.02827 |       0.00410 |       2.03658
     -0.00199 |       0.00000 |       0.02738 |       0.00428 |       2.03716
      0.00286 |       0.00000 |       0.02820 |       0.00422 |       2.03738
     -0.00050 |       0.00000 |       0.02814 |       0.00430 |       2.03709
      0.00041 |       0.00000 |       0.02817 |       0.00447 |       2.03769
      0.00171 |       0.00000 |       0.02772 |       0.00424 |       2.03805
      0.00222 |       0.00000 |       0.02841 |       0.00420 |       2.03729
      0.00227 |       0.00000 |       0.02786 |       0.00439 |       2.03842
Evaluating losses...
     -0.00036 |       0.00000 |       0.02797 |       0.00409 |       

    -8.33e-05 |       0.00000 |       0.02581 |       0.00419 |       2.02504
      0.00280 |       0.00000 |       0.02646 |       0.00437 |       2.02574
     -0.00056 |       0.00000 |       0.02543 |       0.00462 |       2.02557
Evaluating losses...
     -0.00128 |       0.00000 |       0.02534 |       0.00418 |       2.02680
-----------------------------------
| EpLenMean       | 646           |
| EpRewMean       | -4.79         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 482           |
| TimeElapsed     | 473           |
| TimestepsSoFar  | 294912        |
| ev_tdlam_before | 0.736         |
| loss_ent        | 2.0268033     |
| loss_kl         | 0.004178507   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012822751 |
| loss_vf_loss    | 0.02534422    |
-----------------------------------
********** Iteration 72 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00015 |       0.00000 |   

********** Iteration 77 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00041 |       0.00000 |       0.03731 |       0.00433 |       2.01733
      0.00137 |       0.00000 |       0.03737 |       0.00433 |       2.01742
     -0.00079 |       0.00000 |       0.03716 |       0.00436 |       2.01846
      0.00143 |       0.00000 |       0.03747 |       0.00439 |       2.01795
      0.00072 |       0.00000 |       0.03658 |       0.00428 |       2.01765
     -0.00056 |       0.00000 |       0.03615 |       0.00458 |       2.01865
     -0.00063 |       0.00000 |       0.03672 |       0.00460 |       2.01979
      0.00079 |       0.00000 |       0.03673 |       0.00441 |       2.02095
     -0.00343 |       0.00000 |       0.03659 |       0.00444 |       2.02020
      0.00024 |       0.00000 |       0.03635 |       0.00440 |       2.02088
Evaluating losses...
      0.00128 |       0.00000 |       0.03691 |       0.00429 |       

      0.00112 |       0.00000 |       0.02375 |       0.00401 |       2.00988
     2.34e-05 |       0.00000 |       0.02411 |       0.00395 |       2.00785
      0.00226 |       0.00000 |       0.02486 |       0.00387 |       2.01045
Evaluating losses...
     -0.00079 |       0.00000 |       0.02451 |       0.00396 |       2.01058
------------------------------------
| EpLenMean       | 638            |
| EpRewMean       | -4.81          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 553            |
| TimeElapsed     | 544            |
| TimestepsSoFar  | 339968         |
| ev_tdlam_before | 0.767          |
| loss_ent        | 2.0105813      |
| loss_kl         | 0.0039596464   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00078561105 |
| loss_vf_loss    | 0.024507962    |
------------------------------------
********** Iteration 83 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00130 |     

********** Iteration 88 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00043 |       0.00000 |       0.02501 |       0.00352 |       2.00558
      0.00084 |       0.00000 |       0.02555 |       0.00364 |       2.00630
      0.00185 |       0.00000 |       0.02541 |       0.00382 |       2.00744
     -0.00052 |       0.00000 |       0.02565 |       0.00382 |       2.00676
      0.00079 |       0.00000 |       0.02498 |       0.00363 |       2.00904
     -0.00129 |       0.00000 |       0.02550 |       0.00390 |       2.00819
      0.00262 |       0.00000 |       0.02477 |       0.00371 |       2.00903
     -0.00149 |       0.00000 |       0.02505 |       0.00386 |       2.00907
      0.00087 |       0.00000 |       0.02462 |       0.00383 |       2.00927
      0.00121 |       0.00000 |       0.02531 |       0.00390 |       2.01098
Evaluating losses...
     -0.00129 |       0.00000 |       0.02475 |       0.00371 |       

      0.00170 |       0.00000 |       0.02574 |       0.00388 |       2.00603
      0.00090 |       0.00000 |       0.02616 |       0.00386 |       2.00857
     -0.00105 |       0.00000 |       0.02573 |       0.00395 |       2.00636
Evaluating losses...
      0.00121 |       0.00000 |       0.02554 |       0.00406 |       2.00832
----------------------------------
| EpLenMean       | 619          |
| EpRewMean       | -4.86        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 626          |
| TimeElapsed     | 615          |
| TimestepsSoFar  | 385024       |
| ev_tdlam_before | 0.713        |
| loss_ent        | 2.008319     |
| loss_kl         | 0.004055469  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0012132622 |
| loss_vf_loss    | 0.025543312  |
----------------------------------
********** Iteration 94 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00072 |       0.00000 |       0.02505 | 

********** Iteration 99 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.02129 |       0.00348 |       2.02135
     -0.00071 |       0.00000 |       0.02052 |       0.00364 |       2.02248
      0.00241 |       0.00000 |       0.02080 |       0.00366 |       2.02260
      0.00032 |       0.00000 |       0.02116 |       0.00390 |       2.02344
     -0.00019 |       0.00000 |       0.02077 |       0.00369 |       2.02450
     -0.00045 |       0.00000 |       0.02076 |       0.00376 |       2.02591
      0.00166 |       0.00000 |       0.02037 |       0.00351 |       2.02301
    -3.64e-05 |       0.00000 |       0.02130 |       0.00353 |       2.02558
     -0.00048 |       0.00000 |       0.02100 |       0.00358 |       2.02544
     -0.00063 |       0.00000 |       0.02090 |       0.00364 |       2.02632
Evaluating losses...
     -0.00098 |       0.00000 |       0.02053 |       0.00392 |       

      0.00014 |       0.00000 |       0.02132 |       0.00359 |       2.01092
      0.00027 |       0.00000 |       0.02109 |       0.00376 |       2.01004
     -0.00226 |       0.00000 |       0.02178 |       0.00378 |       2.01167
Evaluating losses...
     -0.00020 |       0.00000 |       0.02125 |       0.00384 |       2.01149
------------------------------------
| EpLenMean       | 612            |
| EpRewMean       | -4.9           |
| EpThisIter      | 7              |
| EpisodesSoFar   | 700            |
| TimeElapsed     | 686            |
| TimestepsSoFar  | 430080         |
| ev_tdlam_before | 0.791          |
| loss_ent        | 2.0114899      |
| loss_kl         | 0.0038389058   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00020430377 |
| loss_vf_loss    | 0.021253424    |
------------------------------------
********** Iteration 105 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00062 |    

********** Iteration 110 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00283 |       0.00000 |       0.02463 |       0.00379 |       1.99925
     -0.00031 |       0.00000 |       0.02484 |       0.00392 |       1.99998
     5.60e-05 |       0.00000 |       0.02426 |       0.00394 |       1.99984
      0.00456 |       0.00000 |       0.02432 |       0.00416 |       2.00010
    -6.19e-05 |       0.00000 |       0.02379 |       0.00379 |       1.99917
     -0.00105 |       0.00000 |       0.02404 |       0.00393 |       2.00044
     -0.00092 |       0.00000 |       0.02396 |       0.00378 |       1.99983
     -0.00230 |       0.00000 |       0.02456 |       0.00394 |       2.00042
      0.00213 |       0.00000 |       0.02352 |       0.00413 |       2.00135
     -0.00158 |       0.00000 |       0.02389 |       0.00414 |       2.00106
Evaluating losses...
     -0.00121 |       0.00000 |       0.02413 |       0.00404 |      

     -0.00078 |       0.00000 |       0.02611 |       0.00395 |       2.00533
     3.62e-06 |       0.00000 |       0.02597 |       0.00378 |       2.00551
     -0.00112 |       0.00000 |       0.02585 |       0.00407 |       2.00555
Evaluating losses...
     4.43e-05 |       0.00000 |       0.02553 |       0.00400 |       2.00533
-----------------------------------
| EpLenMean       | 634           |
| EpRewMean       | -4.82         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 771           |
| TimeElapsed     | 755           |
| TimestepsSoFar  | 475136        |
| ev_tdlam_before | 0.777         |
| loss_ent        | 2.0053287     |
| loss_kl         | 0.0040001743  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 4.4297893e-05 |
| loss_vf_loss    | 0.025527893   |
-----------------------------------
********** Iteration 116 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00374 |       0.00000 |  

********** Iteration 121 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00087 |       0.00000 |       0.02530 |       0.00377 |       2.00857
      0.00114 |       0.00000 |       0.02447 |       0.00370 |       2.00793
      0.00315 |       0.00000 |       0.02432 |       0.00401 |       2.00829
      0.00054 |       0.00000 |       0.02416 |       0.00378 |       2.00912
      0.00124 |       0.00000 |       0.02436 |       0.00366 |       2.00967
      0.00061 |       0.00000 |       0.02450 |       0.00382 |       2.00900
      0.00097 |       0.00000 |       0.02434 |       0.00366 |       2.00898
      0.00060 |       0.00000 |       0.02410 |       0.00379 |       2.00878
      0.00117 |       0.00000 |       0.02474 |       0.00384 |       2.00915
      0.00114 |       0.00000 |       0.02418 |       0.00376 |       2.00938
Evaluating losses...
     -0.00038 |       0.00000 |       0.02409 |       0.00378 |      

      0.00279 |       0.00000 |       0.02564 |       0.00368 |       2.00309
      0.00054 |       0.00000 |       0.02573 |       0.00372 |       2.00306
     -0.00198 |       0.00000 |       0.02551 |       0.00408 |       2.00154
      0.00143 |       0.00000 |       0.02489 |       0.00399 |       2.00285
Evaluating losses...
     -0.00061 |       0.00000 |       0.02547 |       0.00390 |       2.00377
-----------------------------------
| EpLenMean       | 636           |
| EpRewMean       | -4.82         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 841           |
| TimeElapsed     | 831           |
| TimestepsSoFar  | 520192        |
| ev_tdlam_before | 0.765         |
| loss_ent        | 2.0037653     |
| loss_kl         | 0.0038961964  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0006074859 |
| loss_vf_loss    | 0.025471402   |
-----------------------------------
********** Iteration 127 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 132 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00080 |       0.00000 |       0.01960 |       0.00356 |       2.00208
      0.00131 |       0.00000 |       0.01957 |       0.00360 |       2.00281
      0.00056 |       0.00000 |       0.01984 |       0.00349 |       2.00250
      0.00193 |       0.00000 |       0.01884 |       0.00336 |       2.00341
      0.00146 |       0.00000 |       0.01945 |       0.00350 |       2.00242
      0.00488 |       0.00000 |       0.01928 |       0.00331 |       2.00092
      0.00272 |       0.00000 |       0.01966 |       0.00345 |       2.00133
      0.00028 |       0.00000 |       0.01944 |       0.00328 |       2.00145
     -0.00071 |       0.00000 |       0.01931 |       0.00344 |       2.00078
      0.00143 |       0.00000 |       0.01920 |       0.00369 |       2.00014
Evaluating losses...
      0.00035 |       0.00000 |       0.01903 |       0.00344 |      

      0.00101 |       0.00000 |       0.02127 |       0.00364 |       2.00549
     -0.00156 |       0.00000 |       0.02125 |       0.00365 |       2.00776
     -0.00146 |       0.00000 |       0.02166 |       0.00375 |       2.00756
Evaluating losses...
     -0.00297 |       0.00000 |       0.02137 |       0.00352 |       2.00751
-----------------------------------
| EpLenMean       | 605           |
| EpRewMean       | -4.86         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 915           |
| TimeElapsed     | 901           |
| TimestepsSoFar  | 565248        |
| ev_tdlam_before | 0.778         |
| loss_ent        | 2.0075085     |
| loss_kl         | 0.0035223917  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0029667842 |
| loss_vf_loss    | 0.02137308    |
-----------------------------------
********** Iteration 138 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00366 |       0.00000 |  

********** Iteration 143 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00150 |       0.00000 |       0.02146 |       0.00321 |       2.01379
     -0.00061 |       0.00000 |       0.02088 |       0.00324 |       2.01418
     -0.00062 |       0.00000 |       0.02098 |       0.00348 |       2.01341
      0.00099 |       0.00000 |       0.02142 |       0.00333 |       2.01555
      0.00162 |       0.00000 |       0.02091 |       0.00347 |       2.01398
      0.00079 |       0.00000 |       0.02095 |       0.00358 |       2.01557
      0.00110 |       0.00000 |       0.02111 |       0.00332 |       2.01629
     -0.00192 |       0.00000 |       0.02172 |       0.00341 |       2.01694
      0.00028 |       0.00000 |       0.02068 |       0.00343 |       2.01565
     -0.00150 |       0.00000 |       0.02179 |       0.00362 |       2.01562
Evaluating losses...
     -0.00134 |       0.00000 |       0.02095 |       0.00363 |      

     -0.00070 |       0.00000 |       0.02292 |       0.00345 |       2.01216
      0.00191 |       0.00000 |       0.02264 |       0.00339 |       2.01291
      0.00192 |       0.00000 |       0.02285 |       0.00315 |       2.01261
Evaluating losses...
     -0.00143 |       0.00000 |       0.02233 |       0.00317 |       2.01226
-----------------------------------
| EpLenMean       | 625           |
| EpRewMean       | -4.88         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 987           |
| TimeElapsed     | 970           |
| TimestepsSoFar  | 610304        |
| ev_tdlam_before | 0.772         |
| loss_ent        | 2.0122554     |
| loss_kl         | 0.0031660174  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0014332605 |
| loss_vf_loss    | 0.02232574    |
-----------------------------------
********** Iteration 149 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00152 |       0.00000 |  

********** Iteration 154 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00109 |       0.00000 |       0.01692 |       0.00329 |       2.01572
      0.00225 |       0.00000 |       0.01710 |       0.00355 |       2.01494
     -0.00052 |       0.00000 |       0.01671 |       0.00338 |       2.01499
     -0.00146 |       0.00000 |       0.01686 |       0.00325 |       2.01467
      0.00025 |       0.00000 |       0.01715 |       0.00306 |       2.01642
     6.71e-05 |       0.00000 |       0.01733 |       0.00329 |       2.01576
      0.00116 |       0.00000 |       0.01719 |       0.00321 |       2.01613
      0.00093 |       0.00000 |       0.01720 |       0.00347 |       2.01496
      0.00111 |       0.00000 |       0.01696 |       0.00317 |       2.01609
     -0.00129 |       0.00000 |       0.01696 |       0.00315 |       2.01466
Evaluating losses...
     -0.00034 |       0.00000 |       0.01702 |       0.00328 |      

      0.00143 |       0.00000 |       0.02179 |       0.00332 |       2.01219
     2.84e-05 |       0.00000 |       0.02161 |       0.00335 |       2.01185
    -2.50e-05 |       0.00000 |       0.02155 |       0.00320 |       2.01288
Evaluating losses...
      0.00018 |       0.00000 |       0.02189 |       0.00311 |       2.01272
-----------------------------------
| EpLenMean       | 643           |
| EpRewMean       | -4.8          |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1056          |
| TimeElapsed     | 1.04e+03      |
| TimestepsSoFar  | 655360        |
| ev_tdlam_before | 0.814         |
| loss_ent        | 2.0127237     |
| loss_kl         | 0.0031119671  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00017723814 |
| loss_vf_loss    | 0.021885067   |
-----------------------------------
********** Iteration 160 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00025 |       0.00000 |  

********** Iteration 165 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00091 |       0.00000 |       0.02693 |       0.00343 |       2.01920
      0.00099 |       0.00000 |       0.02679 |       0.00333 |       2.01990
     -0.00043 |       0.00000 |       0.02691 |       0.00338 |       2.01904
      0.00145 |       0.00000 |       0.02611 |       0.00338 |       2.01931
     -0.00016 |       0.00000 |       0.02645 |       0.00330 |       2.01996
     -0.00175 |       0.00000 |       0.02666 |       0.00343 |       2.02042
     -0.00183 |       0.00000 |       0.02657 |       0.00350 |       2.02162
     -0.00090 |       0.00000 |       0.02691 |       0.00361 |       2.02169
     -0.00233 |       0.00000 |       0.02621 |       0.00363 |       2.02195
     -0.00065 |       0.00000 |       0.02607 |       0.00395 |       2.02238
Evaluating losses...
     -0.00018 |       0.00000 |       0.02657 |       0.00386 |      

     -0.00192 |       0.00000 |       0.02344 |       0.00373 |       2.01237
      0.00115 |       0.00000 |       0.02325 |       0.00397 |       2.01208
     -0.00305 |       0.00000 |       0.02301 |       0.00370 |       2.01219
Evaluating losses...
      0.00059 |       0.00000 |       0.02332 |       0.00422 |       2.01136
----------------------------------
| EpLenMean       | 614          |
| EpRewMean       | -4.86        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 1131         |
| TimeElapsed     | 1.11e+03     |
| TimestepsSoFar  | 700416       |
| ev_tdlam_before | 0.776        |
| loss_ent        | 2.0113585    |
| loss_kl         | 0.0042213043 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0005923668 |
| loss_vf_loss    | 0.023322314  |
----------------------------------
********** Iteration 171 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00087 |       0.00000 |       0.02101 |

********** Iteration 176 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00039 |       0.00000 |       0.02059 |       0.00379 |       2.00181
      0.00039 |       0.00000 |       0.02068 |       0.00381 |       2.00235
      0.00030 |       0.00000 |       0.02065 |       0.00379 |       2.00332
      0.00326 |       0.00000 |       0.02042 |       0.00346 |       2.00243
     -0.00093 |       0.00000 |       0.02038 |       0.00350 |       2.00348
      0.00106 |       0.00000 |       0.02033 |       0.00367 |       2.00443
      0.00158 |       0.00000 |       0.02049 |       0.00362 |       2.00423
     -0.00113 |       0.00000 |       0.02075 |       0.00386 |       2.00348
     -0.00075 |       0.00000 |       0.02002 |       0.00371 |       2.00144
      0.00070 |       0.00000 |       0.02030 |       0.00384 |       2.00468
Evaluating losses...
      0.00137 |       0.00000 |       0.02060 |       0.00346 |      

      0.00201 |       0.00000 |       0.01920 |       0.00388 |       2.00168
      0.00193 |       0.00000 |       0.01902 |       0.00341 |       2.00309
      0.00087 |       0.00000 |       0.01925 |       0.00379 |       2.00068
Evaluating losses...
      0.00153 |       0.00000 |       0.01880 |       0.00360 |       2.00352
----------------------------------
| EpLenMean       | 610          |
| EpRewMean       | -4.92        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 1204         |
| TimeElapsed     | 1.18e+03     |
| TimestepsSoFar  | 745472       |
| ev_tdlam_before | 0.802        |
| loss_ent        | 2.0035229    |
| loss_kl         | 0.0036037965 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0015287611 |
| loss_vf_loss    | 0.018799677  |
----------------------------------
********** Iteration 182 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |       0.01710 |

********** Iteration 187 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00066 |       0.00000 |       0.02067 |       0.00325 |       1.99904
     -0.00119 |       0.00000 |       0.02068 |       0.00319 |       2.00140
     -0.00057 |       0.00000 |       0.02071 |       0.00341 |       1.99857
     -0.00052 |       0.00000 |       0.02019 |       0.00359 |       1.99956
     -0.00045 |       0.00000 |       0.02032 |       0.00344 |       1.99838
     -0.00247 |       0.00000 |       0.02046 |       0.00362 |       1.99908
     -0.00153 |       0.00000 |       0.02050 |       0.00360 |       1.99786
      0.00049 |       0.00000 |       0.02060 |       0.00366 |       1.99869
      0.00139 |       0.00000 |       0.02045 |       0.00387 |       1.99750
      0.00034 |       0.00000 |       0.02078 |       0.00355 |       1.99763
Evaluating losses...
      0.00114 |       0.00000 |       0.02024 |       0.00371 |      

      0.00088 |       0.00000 |       0.01868 |       0.00284 |       2.00920
      0.00131 |       0.00000 |       0.01872 |       0.00280 |       2.00863
     -0.00234 |       0.00000 |       0.01883 |       0.00318 |       2.00971
Evaluating losses...
     -0.00342 |       0.00000 |       0.01868 |       0.00275 |       2.01014
-----------------------------------
| EpLenMean       | 633           |
| EpRewMean       | -4.93         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1275          |
| TimeElapsed     | 1.25e+03      |
| TimestepsSoFar  | 790528        |
| ev_tdlam_before | 0.811         |
| loss_ent        | 2.0101404     |
| loss_kl         | 0.0027456852  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0034231772 |
| loss_vf_loss    | 0.018675748   |
-----------------------------------
********** Iteration 193 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00090 |       0.00000 |  

********** Iteration 198 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00150 |       0.00000 |       0.02641 |       0.00313 |       2.00374
     -0.00038 |       0.00000 |       0.02624 |       0.00325 |       2.00548
      0.00082 |       0.00000 |       0.02582 |       0.00332 |       2.00168
     5.87e-06 |       0.00000 |       0.02590 |       0.00334 |       2.00093
     -0.00031 |       0.00000 |       0.02582 |       0.00346 |       1.99846
      0.00039 |       0.00000 |       0.02615 |       0.00348 |       1.99924
     -0.00224 |       0.00000 |       0.02592 |       0.00343 |       1.99808
     -0.00038 |       0.00000 |       0.02579 |       0.00350 |       1.99921
     -0.00017 |       0.00000 |       0.02599 |       0.00359 |       1.99815
     -0.00032 |       0.00000 |       0.02603 |       0.00356 |       1.99580
Evaluating losses...
      0.00117 |       0.00000 |       0.02576 |       0.00355 |      

      0.00290 |       0.00000 |       0.02183 |       0.00372 |       1.98632
     1.97e-05 |       0.00000 |       0.02176 |       0.00337 |       1.98581
Evaluating losses...
     -0.00112 |       0.00000 |       0.02170 |       0.00371 |       1.98554
-----------------------------------
| EpLenMean       | 656           |
| EpRewMean       | -4.82         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1343          |
| TimeElapsed     | 1.32e+03      |
| TimestepsSoFar  | 835584        |
| ev_tdlam_before | 0.803         |
| loss_ent        | 1.985536      |
| loss_kl         | 0.0037120457  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011218976 |
| loss_vf_loss    | 0.021702224   |
-----------------------------------
********** Iteration 204 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00160 |       0.00000 |       0.02288 |       0.00314 |       1.98673
      0.00054 |       0.00000 |  

********** Iteration 209 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 |       0.02676 |       0.00353 |       1.98773
      0.00217 |       0.00000 |       0.02630 |       0.00332 |       1.98699
      0.00155 |       0.00000 |       0.02658 |       0.00331 |       1.98655
     -0.00026 |       0.00000 |       0.02711 |       0.00351 |       1.98656
      0.00204 |       0.00000 |       0.02662 |       0.00320 |       1.98592
      0.00171 |       0.00000 |       0.02630 |       0.00337 |       1.98484
      0.00159 |       0.00000 |       0.02609 |       0.00347 |       1.98385
     3.47e-05 |       0.00000 |       0.02610 |       0.00324 |       1.98632
      0.00164 |       0.00000 |       0.02654 |       0.00322 |       1.98569
      0.00076 |       0.00000 |       0.02626 |       0.00338 |       1.98562
Evaluating losses...
     -0.00011 |       0.00000 |       0.02620 |       0.00325 |      

     -0.00015 |       0.00000 |       0.02620 |       0.00331 |       1.98737
      0.00254 |       0.00000 |       0.02682 |       0.00336 |       1.98806
     -0.00146 |       0.00000 |       0.02615 |       0.00344 |       1.98701
Evaluating losses...
     -0.00035 |       0.00000 |       0.02605 |       0.00348 |       1.98635
----------------------------------
| EpLenMean       | 651          |
| EpRewMean       | -4.8         |
| EpThisIter      | 6            |
| EpisodesSoFar   | 1413         |
| TimeElapsed     | 1.39e+03     |
| TimestepsSoFar  | 880640       |
| ev_tdlam_before | 0.787        |
| loss_ent        | 1.986348     |
| loss_kl         | 0.0034811315 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.000353095 |
| loss_vf_loss    | 0.026046634  |
----------------------------------
********** Iteration 215 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00236 |       0.00000 |       0.01788 |

********** Iteration 220 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00061 |       0.00000 |       0.01936 |       0.00319 |       1.98960
      0.00150 |       0.00000 |       0.01907 |       0.00298 |       1.98866
     4.96e-05 |       0.00000 |       0.01908 |       0.00331 |       1.98898
     -0.00174 |       0.00000 |       0.01905 |       0.00315 |       1.98779
     -0.00062 |       0.00000 |       0.01896 |       0.00316 |       1.98730
      0.00116 |       0.00000 |       0.01923 |       0.00333 |       1.98708
     -0.00011 |       0.00000 |       0.01901 |       0.00325 |       1.98759
      0.00079 |       0.00000 |       0.01940 |       0.00338 |       1.98752
     -0.00126 |       0.00000 |       0.01906 |       0.00337 |       1.98697
      0.00194 |       0.00000 |       0.01900 |       0.00364 |       1.98490
Evaluating losses...
      0.00136 |       0.00000 |       0.01934 |       0.00344 |      

      0.00019 |       0.00000 |       0.02179 |       0.00314 |       1.97120
     -0.00171 |       0.00000 |       0.02189 |       0.00314 |       1.96865
      0.00070 |       0.00000 |       0.02197 |       0.00308 |       1.97060
Evaluating losses...
      0.00184 |       0.00000 |       0.02194 |       0.00356 |       1.96806
----------------------------------
| EpLenMean       | 636          |
| EpRewMean       | -4.88        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 1484         |
| TimeElapsed     | 1.46e+03     |
| TimestepsSoFar  | 925696       |
| ev_tdlam_before | 0.789        |
| loss_ent        | 1.9680605    |
| loss_kl         | 0.0035593119 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0018398701 |
| loss_vf_loss    | 0.021940231  |
----------------------------------
********** Iteration 226 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00047 |       0.00000 |       0.01687 |

********** Iteration 231 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00068 |       0.00000 |       0.02358 |       0.00330 |       1.96781
     -0.00083 |       0.00000 |       0.02282 |       0.00319 |       1.96816
      0.00189 |       0.00000 |       0.02258 |       0.00325 |       1.96769
      0.00040 |       0.00000 |       0.02278 |       0.00328 |       1.96990
      0.00158 |       0.00000 |       0.02276 |       0.00311 |       1.96793
      0.00190 |       0.00000 |       0.02262 |       0.00292 |       1.96799
      0.00137 |       0.00000 |       0.02288 |       0.00307 |       1.96978
      0.00198 |       0.00000 |       0.02290 |       0.00279 |       1.96967
     -0.00139 |       0.00000 |       0.02308 |       0.00310 |       1.97034
      0.00089 |       0.00000 |       0.02258 |       0.00306 |       1.97139
Evaluating losses...
      0.00097 |       0.00000 |       0.02246 |       0.00291 |      

      0.00073 |       0.00000 |       0.02083 |       0.00347 |       1.96104
      0.00041 |       0.00000 |       0.02104 |       0.00342 |       1.95947
      0.00048 |       0.00000 |       0.02091 |       0.00321 |       1.96128
Evaluating losses...
    -9.76e-05 |       0.00000 |       0.02082 |       0.00342 |       1.95804
------------------------------------
| EpLenMean       | 654            |
| EpRewMean       | -4.86          |
| EpThisIter      | 5              |
| EpisodesSoFar   | 1552           |
| TimeElapsed     | 1.53e+03       |
| TimestepsSoFar  | 970752         |
| ev_tdlam_before | 0.82           |
| loss_ent        | 1.9580418      |
| loss_kl         | 0.0034190398   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -9.7645854e-05 |
| loss_vf_loss    | 0.020819036    |
------------------------------------
********** Iteration 237 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00210 |    

********** Iteration 242 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00109 |       0.00000 |       0.02100 |       0.00297 |       1.95934
     -0.00012 |       0.00000 |       0.02062 |       0.00307 |       1.95977
     -0.00160 |       0.00000 |       0.02054 |       0.00297 |       1.95861
      0.00066 |       0.00000 |       0.02054 |       0.00327 |       1.96099
     -0.00048 |       0.00000 |       0.02050 |       0.00283 |       1.96059
      0.00011 |       0.00000 |       0.02048 |       0.00291 |       1.96007
      0.00101 |       0.00000 |       0.02064 |       0.00301 |       1.96095
     -0.00028 |       0.00000 |       0.02081 |       0.00294 |       1.96214
      0.00155 |       0.00000 |       0.02054 |       0.00295 |       1.96058
      0.00116 |       0.00000 |       0.02053 |       0.00297 |       1.96076
Evaluating losses...
     -0.00190 |       0.00000 |       0.02023 |       0.00310 |      

      0.00058 |       0.00000 |       0.01785 |       0.00283 |       1.96015
      0.00089 |       0.00000 |       0.01800 |       0.00305 |       1.95866
     -0.00027 |       0.00000 |       0.01785 |       0.00302 |       1.95800
     -0.00065 |       0.00000 |       0.01785 |       0.00303 |       1.95730
Evaluating losses...
      0.00054 |       0.00000 |       0.01753 |       0.00300 |       1.95888
-----------------------------------
| EpLenMean       | 644           |
| EpRewMean       | -4.88         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1623          |
| TimeElapsed     | 1.61e+03      |
| TimestepsSoFar  | 1015808       |
| ev_tdlam_before | 0.811         |
| loss_ent        | 1.9588841     |
| loss_kl         | 0.003003659   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00053792424 |
| loss_vf_loss    | 0.017526997   |
-----------------------------------
********** Iteration 248 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 253 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00206 |       0.00000 |       0.01777 |       0.00261 |       1.95319
     -0.00028 |       0.00000 |       0.01810 |       0.00242 |       1.95386
     -0.00207 |       0.00000 |       0.01763 |       0.00268 |       1.95272
      0.00303 |       0.00000 |       0.01771 |       0.00271 |       1.95280
     -0.00064 |       0.00000 |       0.01751 |       0.00242 |       1.95274
      0.00084 |       0.00000 |       0.01780 |       0.00267 |       1.95348
     -0.00071 |       0.00000 |       0.01769 |       0.00269 |       1.95342
      0.00084 |       0.00000 |       0.01769 |       0.00280 |       1.95216
     9.53e-05 |       0.00000 |       0.01778 |       0.00249 |       1.95446
      0.00026 |       0.00000 |       0.01763 |       0.00270 |       1.95433
Evaluating losses...
      0.00145 |       0.00000 |       0.01752 |       0.00274 |      

      0.00019 |       0.00000 |       0.02301 |       0.00321 |       1.96062
      0.00165 |       0.00000 |       0.02344 |       0.00323 |       1.96044
Evaluating losses...
      0.00054 |       0.00000 |       0.02320 |       0.00321 |       1.96035
----------------------------------
| EpLenMean       | 639          |
| EpRewMean       | -4.82        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 1693         |
| TimeElapsed     | 1.68e+03     |
| TimestepsSoFar  | 1060864      |
| ev_tdlam_before | 0.802        |
| loss_ent        | 1.9603541    |
| loss_kl         | 0.0032103006 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0005442671 |
| loss_vf_loss    | 0.023204084  |
----------------------------------
********** Iteration 259 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00129 |       0.00000 |       0.01992 |       0.00307 |       1.96193
     -0.00033 |       0.00000 |       0.01992 |

********** Iteration 264 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00033 |       0.00000 |       0.01795 |       0.00267 |       1.95371
      0.00143 |       0.00000 |       0.01789 |       0.00256 |       1.95568
     -0.00012 |       0.00000 |       0.01781 |       0.00255 |       1.95511
     -0.00039 |       0.00000 |       0.01774 |       0.00269 |       1.95476
     -0.00048 |       0.00000 |       0.01744 |       0.00271 |       1.95394
      0.00188 |       0.00000 |       0.01757 |       0.00266 |       1.95429
     -0.00049 |       0.00000 |       0.01761 |       0.00262 |       1.95527
      0.00035 |       0.00000 |       0.01736 |       0.00266 |       1.95423
      0.00222 |       0.00000 |       0.01731 |       0.00266 |       1.95407
     -0.00059 |       0.00000 |       0.01731 |       0.00275 |       1.95508
Evaluating losses...
     -0.00084 |       0.00000 |       0.01703 |       0.00253 |      

     -0.00036 |       0.00000 |       0.01869 |       0.00305 |       1.95031
     -0.00156 |       0.00000 |       0.01825 |       0.00287 |       1.94855
    -7.05e-05 |       0.00000 |       0.01851 |       0.00319 |       1.95015
Evaluating losses...
     -0.00022 |       0.00000 |       0.01830 |       0.00294 |       1.94862
------------------------------------
| EpLenMean       | 641            |
| EpRewMean       | -4.85          |
| EpThisIter      | 5              |
| EpisodesSoFar   | 1763           |
| TimeElapsed     | 1.75e+03       |
| TimestepsSoFar  | 1105920        |
| ev_tdlam_before | 0.803          |
| loss_ent        | 1.9486231      |
| loss_kl         | 0.0029427046   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00021558831 |
| loss_vf_loss    | 0.018299991    |
------------------------------------
********** Iteration 270 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00043 |    

********** Iteration 275 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.02423 |       0.00257 |       1.94443
      0.00067 |       0.00000 |       0.02426 |       0.00268 |       1.94301
      0.00069 |       0.00000 |       0.02405 |       0.00258 |       1.94399
     -0.00117 |       0.00000 |       0.02425 |       0.00278 |       1.94289
      0.00180 |       0.00000 |       0.02413 |       0.00276 |       1.94235
      0.00243 |       0.00000 |       0.02413 |       0.00279 |       1.94229
      0.00014 |       0.00000 |       0.02403 |       0.00281 |       1.94291
      0.00136 |       0.00000 |       0.02401 |       0.00284 |       1.94303
      0.00029 |       0.00000 |       0.02385 |       0.00271 |       1.94493
     -0.00074 |       0.00000 |       0.02385 |       0.00273 |       1.94530
Evaluating losses...
      0.00057 |       0.00000 |       0.02351 |       0.00278 |      

     6.01e-05 |       0.00000 |       0.01629 |       0.00279 |       1.94119
     -0.00037 |       0.00000 |       0.01611 |       0.00269 |       1.94135
     4.63e-05 |       0.00000 |       0.01624 |       0.00281 |       1.94300
     -0.00100 |       0.00000 |       0.01624 |       0.00291 |       1.94231
Evaluating losses...
     -0.00103 |       0.00000 |       0.01605 |       0.00289 |       1.94114
-----------------------------------
| EpLenMean       | 646           |
| EpRewMean       | -4.87         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1832          |
| TimeElapsed     | 1.82e+03      |
| TimestepsSoFar  | 1150976       |
| ev_tdlam_before | 0.821         |
| loss_ent        | 1.9411387     |
| loss_kl         | 0.0028912956  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010303764 |
| loss_vf_loss    | 0.01604839    |
-----------------------------------
********** Iteration 281 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 286 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00168 |       0.00000 |       0.01865 |       0.00316 |       1.92667
      0.00228 |       0.00000 |       0.01862 |       0.00309 |       1.93022
      0.00212 |       0.00000 |       0.01821 |       0.00270 |       1.93005
     -0.00017 |       0.00000 |       0.01845 |       0.00285 |       1.92671
      0.00134 |       0.00000 |       0.01817 |       0.00281 |       1.92888
      0.00167 |       0.00000 |       0.01839 |       0.00291 |       1.92816
      0.00070 |       0.00000 |       0.01813 |       0.00287 |       1.92901
      0.00029 |       0.00000 |       0.01834 |       0.00291 |       1.92930
     -0.00152 |       0.00000 |       0.01825 |       0.00288 |       1.93000
      0.00028 |       0.00000 |       0.01817 |       0.00294 |       1.92797
Evaluating losses...
     -0.00020 |       0.00000 |       0.01791 |       0.00297 |      

     -0.00178 |       0.00000 |       0.02532 |       0.00306 |       1.92406
      0.00026 |       0.00000 |       0.02509 |       0.00316 |       1.92421
     -0.00031 |       0.00000 |       0.02554 |       0.00322 |       1.92673
Evaluating losses...
      0.00079 |       0.00000 |       0.02524 |       0.00319 |       1.92571
----------------------------------
| EpLenMean       | 637          |
| EpRewMean       | -4.83        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 1903         |
| TimeElapsed     | 1.89e+03     |
| TimestepsSoFar  | 1196032      |
| ev_tdlam_before | 0.788        |
| loss_ent        | 1.9257114    |
| loss_kl         | 0.0031923365 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0007863034 |
| loss_vf_loss    | 0.025244808  |
----------------------------------
********** Iteration 292 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00362 |       0.00000 |       0.01922 |

********** Iteration 297 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00081 |       0.00000 |       0.01622 |       0.00243 |       1.93754
      0.00086 |       0.00000 |       0.01592 |       0.00246 |       1.93733
      0.00027 |       0.00000 |       0.01598 |       0.00232 |       1.93771
     -0.00164 |       0.00000 |       0.01591 |       0.00271 |       1.93647
     -0.00103 |       0.00000 |       0.01578 |       0.00266 |       1.93803
      0.00141 |       0.00000 |       0.01617 |       0.00264 |       1.93652
      0.00041 |       0.00000 |       0.01584 |       0.00251 |       1.93677
     -0.00110 |       0.00000 |       0.01592 |       0.00275 |       1.93667
      0.00092 |       0.00000 |       0.01566 |       0.00263 |       1.93533
     -0.00077 |       0.00000 |       0.01596 |       0.00248 |       1.93556
Evaluating losses...
      0.00026 |       0.00000 |       0.01567 |       0.00261 |      

     -0.00223 |       0.00000 |       0.01704 |       0.00259 |       1.93678
      0.00032 |       0.00000 |       0.01681 |       0.00263 |       1.93523
      0.00021 |       0.00000 |       0.01716 |       0.00256 |       1.93761
Evaluating losses...
      0.00133 |       0.00000 |       0.01675 |       0.00267 |       1.93645
----------------------------------
| EpLenMean       | 632          |
| EpRewMean       | -4.84        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 1975         |
| TimeElapsed     | 1.97e+03     |
| TimestepsSoFar  | 1241088      |
| ev_tdlam_before | 0.837        |
| loss_ent        | 1.9364473    |
| loss_kl         | 0.002665264  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0013302949 |
| loss_vf_loss    | 0.016753301  |
----------------------------------
********** Iteration 303 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00032 |       0.00000 |       0.01872 |

********** Iteration 308 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00048 |       0.00000 |       0.01844 |       0.00284 |       1.95227
     -0.00136 |       0.00000 |       0.01852 |       0.00278 |       1.95084
     -0.00033 |       0.00000 |       0.01859 |       0.00300 |       1.95036
      0.00047 |       0.00000 |       0.01839 |       0.00300 |       1.95083
      0.00064 |       0.00000 |       0.01856 |       0.00282 |       1.95012
     -0.00037 |       0.00000 |       0.01849 |       0.00280 |       1.95067
     -0.00046 |       0.00000 |       0.01817 |       0.00300 |       1.95045
      0.00083 |       0.00000 |       0.01819 |       0.00303 |       1.94935
      0.00148 |       0.00000 |       0.01832 |       0.00295 |       1.94836
      0.00127 |       0.00000 |       0.01816 |       0.00283 |       1.95070
Evaluating losses...
      0.00072 |       0.00000 |       0.01829 |       0.00304 |      

     -0.00233 |       0.00000 |       0.01677 |       0.00263 |       1.93548
     -0.00041 |       0.00000 |       0.01636 |       0.00264 |       1.93533
      0.00046 |       0.00000 |       0.01666 |       0.00264 |       1.93506
Evaluating losses...
     -0.00135 |       0.00000 |       0.01653 |       0.00284 |       1.93485
-----------------------------------
| EpLenMean       | 616           |
| EpRewMean       | -4.86         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 2049          |
| TimeElapsed     | 2.04e+03      |
| TimestepsSoFar  | 1286144       |
| ev_tdlam_before | 0.84          |
| loss_ent        | 1.9348502     |
| loss_kl         | 0.002841522   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0013453625 |
| loss_vf_loss    | 0.016530441   |
-----------------------------------
********** Iteration 314 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     3.04e-05 |       0.00000 |  

********** Iteration 319 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00057 |       0.00000 |       0.02163 |       0.00256 |       1.93947
      0.00103 |       0.00000 |       0.02190 |       0.00258 |       1.94082
      0.00117 |       0.00000 |       0.02169 |       0.00249 |       1.94172
     -0.00044 |       0.00000 |       0.02170 |       0.00255 |       1.94184
     -0.00212 |       0.00000 |       0.02138 |       0.00257 |       1.94218
     -0.00201 |       0.00000 |       0.02127 |       0.00249 |       1.94314
      0.00096 |       0.00000 |       0.02138 |       0.00276 |       1.94238
      0.00243 |       0.00000 |       0.02131 |       0.00250 |       1.94294
      0.00095 |       0.00000 |       0.02128 |       0.00269 |       1.94354
     7.42e-05 |       0.00000 |       0.02156 |       0.00256 |       1.94469
Evaluating losses...
     -0.00228 |       0.00000 |       0.02162 |       0.00250 |      

     -0.00049 |       0.00000 |       0.01968 |       0.00257 |       1.93755
     2.72e-06 |       0.00000 |       0.01969 |       0.00264 |       1.93555
      0.00201 |       0.00000 |       0.01955 |       0.00271 |       1.93408
Evaluating losses...
     -0.00098 |       0.00000 |       0.01938 |       0.00267 |       1.93329
------------------------------------
| EpLenMean       | 630            |
| EpRewMean       | -4.81          |
| EpThisIter      | 7              |
| EpisodesSoFar   | 2118           |
| TimeElapsed     | 2.11e+03       |
| TimestepsSoFar  | 1331200        |
| ev_tdlam_before | 0.841          |
| loss_ent        | 1.9332871      |
| loss_kl         | 0.0026712066   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00097550405 |
| loss_vf_loss    | 0.019379396    |
------------------------------------
********** Iteration 325 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |    

********** Iteration 330 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00157 |       0.00000 |       0.01737 |       0.00237 |       1.92317
      0.00025 |       0.00000 |       0.01749 |       0.00236 |       1.92543
     -0.00129 |       0.00000 |       0.01755 |       0.00241 |       1.92346
      0.00051 |       0.00000 |       0.01729 |       0.00240 |       1.92551
     -0.00045 |       0.00000 |       0.01762 |       0.00257 |       1.92672
      0.00219 |       0.00000 |       0.01741 |       0.00240 |       1.92828
     -0.00127 |       0.00000 |       0.01766 |       0.00273 |       1.92740
     -0.00062 |       0.00000 |       0.01732 |       0.00254 |       1.92633
     -0.00017 |       0.00000 |       0.01722 |       0.00249 |       1.92851
     -0.00073 |       0.00000 |       0.01741 |       0.00279 |       1.92776
Evaluating losses...
     -0.00155 |       0.00000 |       0.01745 |       0.00268 |      

     -0.00136 |       0.00000 |       0.01611 |       0.00270 |       1.92703
      0.00092 |       0.00000 |       0.01644 |       0.00296 |       1.92747
     9.36e-06 |       0.00000 |       0.01620 |       0.00292 |       1.92486
Evaluating losses...
      0.00124 |       0.00000 |       0.01661 |       0.00268 |       1.92598
----------------------------------
| EpLenMean       | 648          |
| EpRewMean       | -4.81        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 2187         |
| TimeElapsed     | 2.19e+03     |
| TimestepsSoFar  | 1376256      |
| ev_tdlam_before | 0.828        |
| loss_ent        | 1.9259762    |
| loss_kl         | 0.0026849404 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0012350904 |
| loss_vf_loss    | 0.016608039  |
----------------------------------
********** Iteration 336 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00034 |       0.00000 |       0.02102 |

********** Iteration 341 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00048 |       0.00000 |       0.01523 |       0.00252 |       1.92000
      0.00254 |       0.00000 |       0.01496 |       0.00247 |       1.91850
     -0.00088 |       0.00000 |       0.01514 |       0.00250 |       1.91856
     -0.00055 |       0.00000 |       0.01493 |       0.00258 |       1.92044
      0.00043 |       0.00000 |       0.01501 |       0.00266 |       1.92017
     -0.00063 |       0.00000 |       0.01494 |       0.00248 |       1.92061
      0.00121 |       0.00000 |       0.01505 |       0.00265 |       1.92010
      0.00091 |       0.00000 |       0.01509 |       0.00232 |       1.91932
      0.00228 |       0.00000 |       0.01485 |       0.00242 |       1.91930
     -0.00167 |       0.00000 |       0.01493 |       0.00263 |       1.91908
Evaluating losses...
     -0.00168 |       0.00000 |       0.01470 |       0.00258 |      

      0.00051 |       0.00000 |       0.02045 |       0.00261 |       1.92302
      0.00038 |       0.00000 |       0.02075 |       0.00253 |       1.92106
      0.00025 |       0.00000 |       0.02082 |       0.00250 |       1.92258
Evaluating losses...
      0.00014 |       0.00000 |       0.02078 |       0.00256 |       1.92208
-----------------------------------
| EpLenMean       | 649           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2257          |
| TimeElapsed     | 2.26e+03      |
| TimestepsSoFar  | 1421312       |
| ev_tdlam_before | 0.82          |
| loss_ent        | 1.9220754     |
| loss_kl         | 0.0025633774  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00013638637 |
| loss_vf_loss    | 0.020781985   |
-----------------------------------
********** Iteration 347 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00064 |       0.00000 |  

********** Iteration 352 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00020 |       0.00000 |       0.01624 |       0.00241 |       1.90385
      0.00188 |       0.00000 |       0.01602 |       0.00246 |       1.90157
      0.00089 |       0.00000 |       0.01609 |       0.00239 |       1.90105
      0.00048 |       0.00000 |       0.01603 |       0.00231 |       1.90154
     3.24e-05 |       0.00000 |       0.01588 |       0.00241 |       1.90155
      0.00091 |       0.00000 |       0.01598 |       0.00230 |       1.90080
      0.00086 |       0.00000 |       0.01593 |       0.00233 |       1.89988
      0.00018 |       0.00000 |       0.01585 |       0.00237 |       1.89964
     -0.00072 |       0.00000 |       0.01579 |       0.00225 |       1.90183
     -0.00130 |       0.00000 |       0.01574 |       0.00231 |       1.90098
Evaluating losses...
     -0.00013 |       0.00000 |       0.01602 |       0.00230 |      

     -0.00016 |       0.00000 |       0.02401 |       0.00215 |       1.88393
      0.00023 |       0.00000 |       0.02379 |       0.00231 |       1.88859
     -0.00032 |       0.00000 |       0.02358 |       0.00224 |       1.88587
Evaluating losses...
     -0.00081 |       0.00000 |       0.02372 |       0.00223 |       1.88529
------------------------------------
| EpLenMean       | 649            |
| EpRewMean       | -4.83          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 2327           |
| TimeElapsed     | 2.33e+03       |
| TimestepsSoFar  | 1466368        |
| ev_tdlam_before | 0.798          |
| loss_ent        | 1.8852862      |
| loss_kl         | 0.0022340817   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00081008486 |
| loss_vf_loss    | 0.023717044    |
------------------------------------
********** Iteration 358 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00182 |    

********** Iteration 363 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00011 |       0.00000 |       0.02049 |       0.00245 |       1.89014
     -0.00059 |       0.00000 |       0.02019 |       0.00234 |       1.88847
     -0.00042 |       0.00000 |       0.01984 |       0.00225 |       1.88736
     -0.00020 |       0.00000 |       0.01973 |       0.00240 |       1.88824
      0.00059 |       0.00000 |       0.01974 |       0.00224 |       1.88578
     -0.00108 |       0.00000 |       0.01945 |       0.00232 |       1.88472
      0.00011 |       0.00000 |       0.01948 |       0.00258 |       1.88241
     -0.00100 |       0.00000 |       0.01975 |       0.00232 |       1.88369
      0.00039 |       0.00000 |       0.01924 |       0.00237 |       1.87951
      0.00175 |       0.00000 |       0.01931 |       0.00241 |       1.88018
Evaluating losses...
      0.00073 |       0.00000 |       0.01938 |       0.00246 |      

     -0.00030 |       0.00000 |       0.01723 |       0.00225 |       1.87239
      0.00235 |       0.00000 |       0.01704 |       0.00223 |       1.87322
     -0.00023 |       0.00000 |       0.01698 |       0.00232 |       1.87293
      0.00045 |       0.00000 |       0.01716 |       0.00234 |       1.87203
Evaluating losses...
     -0.00025 |       0.00000 |       0.01685 |       0.00225 |       1.87108
------------------------------------
| EpLenMean       | 647            |
| EpRewMean       | -4.85          |
| EpThisIter      | 8              |
| EpisodesSoFar   | 2397           |
| TimeElapsed     | 2.4e+03        |
| TimestepsSoFar  | 1511424        |
| ev_tdlam_before | 0.849          |
| loss_ent        | 1.8710767      |
| loss_kl         | 0.0022474928   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00024951028 |
| loss_vf_loss    | 0.01684824     |
------------------------------------
********** Iteration 369 ************
Optimizing...
     pol_surr |    

********** Iteration 374 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00035 |       0.00000 |       0.01535 |       0.00230 |       1.87745
      0.00039 |       0.00000 |       0.01521 |       0.00238 |       1.87652
     -0.00010 |       0.00000 |       0.01502 |       0.00230 |       1.87583
      0.00136 |       0.00000 |       0.01516 |       0.00222 |       1.87579
      0.00032 |       0.00000 |       0.01511 |       0.00224 |       1.87584
      0.00157 |       0.00000 |       0.01516 |       0.00227 |       1.87465
     -0.00063 |       0.00000 |       0.01510 |       0.00245 |       1.87270
      0.00015 |       0.00000 |       0.01511 |       0.00223 |       1.87135
      0.00189 |       0.00000 |       0.01500 |       0.00234 |       1.87341
      0.00088 |       0.00000 |       0.01507 |       0.00227 |       1.87365
Evaluating losses...
     -0.00127 |       0.00000 |       0.01495 |       0.00217 |      

     6.53e-06 |       0.00000 |       0.02081 |       0.00238 |       1.87782
      0.00129 |       0.00000 |       0.02059 |       0.00226 |       1.87722
Evaluating losses...
      0.00075 |       0.00000 |       0.02055 |       0.00258 |       1.87750
----------------------------------
| EpLenMean       | 639          |
| EpRewMean       | -4.91        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2467         |
| TimeElapsed     | 2.47e+03     |
| TimestepsSoFar  | 1556480      |
| ev_tdlam_before | 0.824        |
| loss_ent        | 1.877504     |
| loss_kl         | 0.0025792308 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0007549409 |
| loss_vf_loss    | 0.02054607   |
----------------------------------
********** Iteration 380 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00301 |       0.00000 |       0.02086 |       0.00239 |       1.87844
      0.00140 |       0.00000 |       0.02073 |

********** Iteration 385 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00023 |       0.00000 |       0.01664 |       0.00213 |       1.87057
      0.00131 |       0.00000 |       0.01641 |       0.00223 |       1.86961
     -0.00040 |       0.00000 |       0.01653 |       0.00213 |       1.86997
     -0.00227 |       0.00000 |       0.01642 |       0.00228 |       1.86908
     -0.00040 |       0.00000 |       0.01630 |       0.00235 |       1.86808
      0.00088 |       0.00000 |       0.01619 |       0.00231 |       1.86953
     -0.00035 |       0.00000 |       0.01626 |       0.00224 |       1.86771
      0.00016 |       0.00000 |       0.01617 |       0.00242 |       1.86748
      0.00191 |       0.00000 |       0.01631 |       0.00236 |       1.86975
      0.00054 |       0.00000 |       0.01642 |       0.00238 |       1.86715
Evaluating losses...
      0.00045 |       0.00000 |       0.01631 |       0.00230 |      

     -0.00248 |       0.00000 |       0.02000 |       0.00257 |       1.87426
      0.00037 |       0.00000 |       0.01965 |       0.00267 |       1.87389
      0.00096 |       0.00000 |       0.01966 |       0.00244 |       1.87685
Evaluating losses...
      0.00026 |       0.00000 |       0.01951 |       0.00250 |       1.87242
----------------------------------
| EpLenMean       | 620          |
| EpRewMean       | -4.88        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2539         |
| TimeElapsed     | 2.54e+03     |
| TimestepsSoFar  | 1601536      |
| ev_tdlam_before | 0.829        |
| loss_ent        | 1.8724245    |
| loss_kl         | 0.0024957252 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0002627545 |
| loss_vf_loss    | 0.019507136  |
----------------------------------
********** Iteration 391 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00054 |       0.00000 |       0.01631 |

********** Iteration 396 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00185 |       0.00000 |       0.01701 |       0.00209 |       1.85416
     -0.00087 |       0.00000 |       0.01708 |       0.00194 |       1.85539
      0.00182 |       0.00000 |       0.01712 |       0.00194 |       1.85389
      0.00171 |       0.00000 |       0.01707 |       0.00203 |       1.85285
      0.00072 |       0.00000 |       0.01702 |       0.00187 |       1.85286
     -0.00038 |       0.00000 |       0.01689 |       0.00210 |       1.85229
     -0.00038 |       0.00000 |       0.01692 |       0.00225 |       1.84860
    -1.23e-06 |       0.00000 |       0.01708 |       0.00223 |       1.84930
     -0.00027 |       0.00000 |       0.01690 |       0.00207 |       1.84888
     -0.00077 |       0.00000 |       0.01689 |       0.00224 |       1.84964
Evaluating losses...
      0.00067 |       0.00000 |       0.01683 |       0.00238 |      

      0.00026 |       0.00000 |       0.01950 |       0.00203 |       1.83493
      0.00064 |       0.00000 |       0.01957 |       0.00201 |       1.83293
Evaluating losses...
      0.00077 |       0.00000 |       0.01921 |       0.00213 |       1.83263
----------------------------------
| EpLenMean       | 639          |
| EpRewMean       | -4.9         |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2607         |
| TimeElapsed     | 2.61e+03     |
| TimestepsSoFar  | 1646592      |
| ev_tdlam_before | 0.819        |
| loss_ent        | 1.8326302    |
| loss_kl         | 0.0021274386 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0007744557 |
| loss_vf_loss    | 0.019209588  |
----------------------------------
********** Iteration 402 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00055 |       0.00000 |       0.01823 |       0.00222 |       1.83316
     8.39e-05 |       0.00000 |       0.01810 |

********** Iteration 407 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00022 |       0.00000 |       0.01943 |       0.00190 |       1.82662
     3.35e-05 |       0.00000 |       0.01909 |       0.00195 |       1.82764
    -8.01e-05 |       0.00000 |       0.01892 |       0.00196 |       1.82717
     -0.00038 |       0.00000 |       0.01879 |       0.00213 |       1.82141
     -0.00079 |       0.00000 |       0.01873 |       0.00198 |       1.82549
     -0.00039 |       0.00000 |       0.01883 |       0.00205 |       1.82433
     -0.00160 |       0.00000 |       0.01870 |       0.00213 |       1.82337
     -0.00104 |       0.00000 |       0.01847 |       0.00220 |       1.82236
      0.00086 |       0.00000 |       0.01876 |       0.00202 |       1.82239
     -0.00057 |       0.00000 |       0.01850 |       0.00233 |       1.82146
Evaluating losses...
      0.00018 |       0.00000 |       0.01888 |       0.00205 |      

     -0.00070 |       0.00000 |       0.02323 |       0.00205 |       1.83198
      0.00157 |       0.00000 |       0.02333 |       0.00208 |       1.83068
      0.00037 |       0.00000 |       0.02327 |       0.00214 |       1.82839
      0.00182 |       0.00000 |       0.02332 |       0.00211 |       1.83068
Evaluating losses...
     -0.00047 |       0.00000 |       0.02319 |       0.00210 |       1.83183
------------------------------------
| EpLenMean       | 643            |
| EpRewMean       | -4.81          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 2678           |
| TimeElapsed     | 2.68e+03       |
| TimestepsSoFar  | 1691648        |
| ev_tdlam_before | 0.811          |
| loss_ent        | 1.831832       |
| loss_kl         | 0.0021000535   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00047304138 |
| loss_vf_loss    | 0.023188435    |
------------------------------------
********** Iteration 413 ************
Optimizing...
     pol_surr |    

********** Iteration 418 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     5.81e-05 |       0.00000 |       0.01617 |       0.00215 |       1.82237
      0.00205 |       0.00000 |       0.01623 |       0.00210 |       1.82206
      0.00112 |       0.00000 |       0.01606 |       0.00203 |       1.82085
      0.00156 |       0.00000 |       0.01609 |       0.00211 |       1.82085
      0.00140 |       0.00000 |       0.01600 |       0.00198 |       1.82053
      0.00118 |       0.00000 |       0.01603 |       0.00206 |       1.82099
     6.25e-05 |       0.00000 |       0.01615 |       0.00203 |       1.82227
      0.00184 |       0.00000 |       0.01605 |       0.00199 |       1.82077
     -0.00105 |       0.00000 |       0.01602 |       0.00183 |       1.82168
     -0.00015 |       0.00000 |       0.01616 |       0.00197 |       1.81985
Evaluating losses...
     -0.00018 |       0.00000 |       0.01596 |       0.00209 |      

     -0.00039 |       0.00000 |       0.01430 |       0.00184 |       1.81070
     -0.00022 |       0.00000 |       0.01406 |       0.00196 |       1.80971
      0.00029 |       0.00000 |       0.01394 |       0.00200 |       1.80889
Evaluating losses...
     -0.00120 |       0.00000 |       0.01394 |       0.00196 |       1.81070
-----------------------------------
| EpLenMean       | 620           |
| EpRewMean       | -4.88         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2750          |
| TimeElapsed     | 2.75e+03      |
| TimestepsSoFar  | 1736704       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 1.810703      |
| loss_kl         | 0.0019553343  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012014027 |
| loss_vf_loss    | 0.013935835   |
-----------------------------------
********** Iteration 424 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00129 |       0.00000 |  

********** Iteration 429 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00118 |       0.00000 |       0.01894 |       0.00189 |       1.81184
      0.00060 |       0.00000 |       0.01901 |       0.00191 |       1.81230
      0.00047 |       0.00000 |       0.01876 |       0.00192 |       1.81159
      0.00058 |       0.00000 |       0.01898 |       0.00180 |       1.81135
      0.00047 |       0.00000 |       0.01887 |       0.00195 |       1.80843
      0.00096 |       0.00000 |       0.01872 |       0.00179 |       1.80885
     -0.00100 |       0.00000 |       0.01876 |       0.00180 |       1.80790
     5.97e-05 |       0.00000 |       0.01894 |       0.00194 |       1.80966
    -9.25e-05 |       0.00000 |       0.01856 |       0.00188 |       1.80751
      0.00148 |       0.00000 |       0.01862 |       0.00186 |       1.80898
Evaluating losses...
     -0.00023 |       0.00000 |       0.01869 |       0.00185 |      

     -0.00071 |       0.00000 |       0.01694 |       0.00220 |       1.80273
      0.00119 |       0.00000 |       0.01718 |       0.00210 |       1.80394
      0.00165 |       0.00000 |       0.01710 |       0.00226 |       1.80413
Evaluating losses...
      0.00074 |       0.00000 |       0.01703 |       0.00210 |       1.80519
----------------------------------
| EpLenMean       | 627          |
| EpRewMean       | -4.94        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2823         |
| TimeElapsed     | 2.83e+03     |
| TimestepsSoFar  | 1781760      |
| ev_tdlam_before | 0.843        |
| loss_ent        | 1.8051945    |
| loss_kl         | 0.002099226  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0007397239 |
| loss_vf_loss    | 0.017029688  |
----------------------------------
********** Iteration 435 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00145 |       0.00000 |       0.01434 |

********** Iteration 440 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00036 |       0.00000 |       0.02065 |       0.00186 |       1.81006
      0.00060 |       0.00000 |       0.02049 |       0.00171 |       1.80854
      0.00155 |       0.00000 |       0.02050 |       0.00170 |       1.80768
      0.00075 |       0.00000 |       0.02023 |       0.00178 |       1.80732
      0.00239 |       0.00000 |       0.02036 |       0.00169 |       1.80530
     -0.00030 |       0.00000 |       0.02022 |       0.00180 |       1.80612
     6.93e-05 |       0.00000 |       0.02040 |       0.00160 |       1.80503
     -0.00034 |       0.00000 |       0.02017 |       0.00186 |       1.80661
      0.00091 |       0.00000 |       0.02053 |       0.00173 |       1.80492
      0.00020 |       0.00000 |       0.02043 |       0.00175 |       1.80475
Evaluating losses...
      0.00071 |       0.00000 |       0.02023 |       0.00175 |      

      0.00174 |       0.00000 |       0.01498 |       0.00194 |       1.79214
      0.00062 |       0.00000 |       0.01493 |       0.00190 |       1.79127
     1.21e-05 |       0.00000 |       0.01485 |       0.00191 |       1.79100
Evaluating losses...
      0.00131 |       0.00000 |       0.01518 |       0.00195 |       1.79122
----------------------------------
| EpLenMean       | 655          |
| EpRewMean       | -4.87        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 2891         |
| TimeElapsed     | 2.9e+03      |
| TimestepsSoFar  | 1826816      |
| ev_tdlam_before | 0.864        |
| loss_ent        | 1.7912164    |
| loss_kl         | 0.0019509994 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0013123078 |
| loss_vf_loss    | 0.015175166  |
----------------------------------
********** Iteration 446 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.01807 |

********** Iteration 451 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00030 |       0.00000 |       0.01661 |       0.00168 |       1.79836
     -0.00184 |       0.00000 |       0.01629 |       0.00171 |       1.79669
     -0.00022 |       0.00000 |       0.01654 |       0.00177 |       1.79737
     -0.00052 |       0.00000 |       0.01612 |       0.00185 |       1.79533
      0.00156 |       0.00000 |       0.01624 |       0.00171 |       1.79406
     -0.00038 |       0.00000 |       0.01632 |       0.00164 |       1.79201
     -0.00054 |       0.00000 |       0.01603 |       0.00177 |       1.79117
      0.00095 |       0.00000 |       0.01624 |       0.00178 |       1.79276
    -4.46e-05 |       0.00000 |       0.01609 |       0.00180 |       1.79218
     -0.00035 |       0.00000 |       0.01616 |       0.00183 |       1.79074
Evaluating losses...
     -0.00044 |       0.00000 |       0.01616 |       0.00196 |      

     -0.00030 |       0.00000 |       0.01771 |       0.00189 |       1.77335
      0.00047 |       0.00000 |       0.01762 |       0.00192 |       1.77185
     -0.00033 |       0.00000 |       0.01761 |       0.00189 |       1.77271
Evaluating losses...
      0.00011 |       0.00000 |       0.01752 |       0.00190 |       1.77164
-----------------------------------
| EpLenMean       | 631           |
| EpRewMean       | -4.89         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2963          |
| TimeElapsed     | 2.98e+03      |
| TimestepsSoFar  | 1871872       |
| ev_tdlam_before | 0.834         |
| loss_ent        | 1.7716389     |
| loss_kl         | 0.0019006657  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00011089223 |
| loss_vf_loss    | 0.017521165   |
-----------------------------------
********** Iteration 457 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -2.28e-05 |       0.00000 |  

********** Iteration 462 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00145 |       0.00000 |       0.01848 |       0.00159 |       1.76543
      0.00081 |       0.00000 |       0.01825 |       0.00173 |       1.76291
      0.00053 |       0.00000 |       0.01813 |       0.00166 |       1.76474
     -0.00038 |       0.00000 |       0.01792 |       0.00166 |       1.76469
     6.53e-05 |       0.00000 |       0.01779 |       0.00165 |       1.76771
     -0.00068 |       0.00000 |       0.01810 |       0.00181 |       1.76746
     -0.00167 |       0.00000 |       0.01800 |       0.00175 |       1.76853
    -4.09e-05 |       0.00000 |       0.01786 |       0.00184 |       1.76937
      0.00129 |       0.00000 |       0.01772 |       0.00190 |       1.77023
      0.00030 |       0.00000 |       0.01810 |       0.00185 |       1.77267
Evaluating losses...
      0.00033 |       0.00000 |       0.01775 |       0.00190 |      

      0.00026 |       0.00000 |       0.01827 |       0.00190 |       1.76776
      0.00041 |       0.00000 |       0.01838 |       0.00188 |       1.76896
     7.69e-05 |       0.00000 |       0.01859 |       0.00186 |       1.76782
Evaluating losses...
      0.00102 |       0.00000 |       0.01845 |       0.00176 |       1.76910
----------------------------------
| EpLenMean       | 629          |
| EpRewMean       | -4.88        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 3035         |
| TimeElapsed     | 3.05e+03     |
| TimestepsSoFar  | 1916928      |
| ev_tdlam_before | 0.838        |
| loss_ent        | 1.7690955    |
| loss_kl         | 0.0017575225 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0010225642 |
| loss_vf_loss    | 0.018448126  |
----------------------------------
********** Iteration 468 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00172 |       0.00000 |       0.01524 |

********** Iteration 473 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 |       0.02390 |       0.00163 |       1.75878
      0.00146 |       0.00000 |       0.02359 |       0.00163 |       1.75939
      0.00012 |       0.00000 |       0.02340 |       0.00165 |       1.75877
      0.00040 |       0.00000 |       0.02353 |       0.00169 |       1.75890
     -0.00121 |       0.00000 |       0.02329 |       0.00169 |       1.76142
     -0.00074 |       0.00000 |       0.02318 |       0.00164 |       1.75973
     -0.00061 |       0.00000 |       0.02307 |       0.00184 |       1.75579
      0.00032 |       0.00000 |       0.02315 |       0.00181 |       1.75934
     -0.00082 |       0.00000 |       0.02329 |       0.00192 |       1.75951
     -0.00273 |       0.00000 |       0.02309 |       0.00197 |       1.75743
Evaluating losses...
     9.57e-05 |       0.00000 |       0.02332 |       0.00205 |      

     -0.00074 |       0.00000 |       0.01810 |       0.00198 |       1.74559
      0.00065 |       0.00000 |       0.01803 |       0.00192 |       1.74580
      0.00108 |       0.00000 |       0.01802 |       0.00192 |       1.74656
Evaluating losses...
     -0.00018 |       0.00000 |       0.01784 |       0.00191 |       1.74607
------------------------------------
| EpLenMean       | 641            |
| EpRewMean       | -4.82          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 3105           |
| TimeElapsed     | 3.11e+03       |
| TimestepsSoFar  | 1961984        |
| ev_tdlam_before | 0.832          |
| loss_ent        | 1.7460696      |
| loss_kl         | 0.001908341    |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00018130359 |
| loss_vf_loss    | 0.017841298    |
------------------------------------
********** Iteration 479 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -1.73e-05 |    

********** Iteration 484 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00064 |       0.00000 |       0.02383 |       0.00139 |       1.74879
     -0.00015 |       0.00000 |       0.02374 |       0.00141 |       1.74766
     -0.00043 |       0.00000 |       0.02344 |       0.00138 |       1.74674
    -6.89e-05 |       0.00000 |       0.02356 |       0.00139 |       1.74627
     -0.00064 |       0.00000 |       0.02353 |       0.00137 |       1.74499
      0.00072 |       0.00000 |       0.02344 |       0.00154 |       1.74435
      0.00217 |       0.00000 |       0.02341 |       0.00155 |       1.74445
     -0.00011 |       0.00000 |       0.02335 |       0.00150 |       1.74349
      0.00032 |       0.00000 |       0.02347 |       0.00161 |       1.74287
     -0.00067 |       0.00000 |       0.02325 |       0.00164 |       1.74228
Evaluating losses...
     -0.00054 |       0.00000 |       0.02302 |       0.00163 |      

     2.69e-05 |       0.00000 |       0.02037 |       0.00152 |       1.73802
      0.00140 |       0.00000 |       0.02047 |       0.00148 |       1.73972
      0.00144 |       0.00000 |       0.02064 |       0.00145 |       1.73971
      0.00067 |       0.00000 |       0.02042 |       0.00161 |       1.74009
Evaluating losses...
     -0.00023 |       0.00000 |       0.02046 |       0.00160 |       1.73983
----------------------------------
| EpLenMean       | 647          |
| EpRewMean       | -4.88        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 3174         |
| TimeElapsed     | 3.19e+03     |
| TimestepsSoFar  | 2007040      |
| ev_tdlam_before | 0.822        |
| loss_ent        | 1.7398281    |
| loss_kl         | 0.0016034185 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.000227493 |
| loss_vf_loss    | 0.020464256  |
----------------------------------
********** Iteration 490 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |

********** Iteration 495 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00016 |       0.00000 |       0.01490 |       0.00153 |       1.73777
     -0.00022 |       0.00000 |       0.01482 |       0.00159 |       1.73611
     -0.00078 |       0.00000 |       0.01471 |       0.00153 |       1.73713
      0.00032 |       0.00000 |       0.01472 |       0.00149 |       1.73558
      0.00020 |       0.00000 |       0.01464 |       0.00154 |       1.73418
      0.00016 |       0.00000 |       0.01465 |       0.00161 |       1.73230
      0.00011 |       0.00000 |       0.01444 |       0.00156 |       1.73485
      0.00053 |       0.00000 |       0.01455 |       0.00161 |       1.73268
      0.00070 |       0.00000 |       0.01452 |       0.00177 |       1.73013
      0.00077 |       0.00000 |       0.01455 |       0.00168 |       1.72976
Evaluating losses...
      0.00052 |       0.00000 |       0.01448 |       0.00166 |      

      0.00062 |       0.00000 |       0.01981 |       0.00172 |       1.73175
      0.00038 |       0.00000 |       0.01960 |       0.00162 |       1.73062
      0.00028 |       0.00000 |       0.01948 |       0.00182 |       1.73010
Evaluating losses...
     -0.00033 |       0.00000 |       0.01966 |       0.00177 |       1.72965
-----------------------------------
| EpLenMean       | 635           |
| EpRewMean       | -4.86         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3246          |
| TimeElapsed     | 3.26e+03      |
| TimestepsSoFar  | 2052096       |
| ev_tdlam_before | 0.835         |
| loss_ent        | 1.7296523     |
| loss_kl         | 0.0017654448  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0003280734 |
| loss_vf_loss    | 0.01965644    |
-----------------------------------
********** Iteration 501 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00185 |       0.00000 |  

********** Iteration 506 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00138 |       0.00000 |       0.02303 |       0.00137 |       1.73198
     7.66e-05 |       0.00000 |       0.02250 |       0.00144 |       1.73250
     -0.00131 |       0.00000 |       0.02240 |       0.00143 |       1.73088
     -0.00044 |       0.00000 |       0.02230 |       0.00149 |       1.72895
     -0.00039 |       0.00000 |       0.02239 |       0.00146 |       1.72958
      0.00067 |       0.00000 |       0.02211 |       0.00155 |       1.72695
      0.00027 |       0.00000 |       0.02225 |       0.00157 |       1.72758
     -0.00034 |       0.00000 |       0.02206 |       0.00150 |       1.72786
      0.00028 |       0.00000 |       0.02209 |       0.00160 |       1.72557
     -0.00115 |       0.00000 |       0.02230 |       0.00157 |       1.72582
Evaluating losses...
     -0.00099 |       0.00000 |       0.02217 |       0.00171 |      

      0.00014 |       0.00000 |       0.01959 |       0.00169 |       1.71994
     -0.00134 |       0.00000 |       0.01967 |       0.00165 |       1.71908
     -0.00071 |       0.00000 |       0.01994 |       0.00150 |       1.71838
Evaluating losses...
     -0.00086 |       0.00000 |       0.01986 |       0.00166 |       1.71878
------------------------------------
| EpLenMean       | 626            |
| EpRewMean       | -4.86          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 3318           |
| TimeElapsed     | 3.34e+03       |
| TimestepsSoFar  | 2097152        |
| ev_tdlam_before | 0.804          |
| loss_ent        | 1.7187824      |
| loss_kl         | 0.0016602458   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00086207874 |
| loss_vf_loss    | 0.01986285     |
------------------------------------
********** Iteration 512 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00030 |    

********** Iteration 517 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00039 |       0.00000 |       0.01704 |       0.00145 |       1.74455
     -0.00063 |       0.00000 |       0.01677 |       0.00150 |       1.74362
      0.00039 |       0.00000 |       0.01701 |       0.00153 |       1.74329
     -0.00096 |       0.00000 |       0.01676 |       0.00158 |       1.74232
     -0.00120 |       0.00000 |       0.01667 |       0.00145 |       1.74192
     -0.00102 |       0.00000 |       0.01677 |       0.00144 |       1.74273
     -0.00029 |       0.00000 |       0.01667 |       0.00161 |       1.74060
     -0.00048 |       0.00000 |       0.01687 |       0.00162 |       1.73995
     -0.00051 |       0.00000 |       0.01665 |       0.00157 |       1.73959
      0.00081 |       0.00000 |       0.01685 |       0.00156 |       1.74181
Evaluating losses...
      0.00023 |       0.00000 |       0.01678 |       0.00153 |      

     -0.00087 |       0.00000 |       0.01678 |       0.00136 |       1.73441
      0.00058 |       0.00000 |       0.01679 |       0.00141 |       1.73759
      0.00037 |       0.00000 |       0.01683 |       0.00142 |       1.73456
Evaluating losses...
     -0.00076 |       0.00000 |       0.01665 |       0.00158 |       1.73527
-----------------------------------
| EpLenMean       | 654           |
| EpRewMean       | -4.75         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3387          |
| TimeElapsed     | 3.41e+03      |
| TimestepsSoFar  | 2142208       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 1.7352734     |
| loss_kl         | 0.0015809978  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0007620853 |
| loss_vf_loss    | 0.016648369   |
-----------------------------------
********** Iteration 523 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00035 |       0.00000 |  

********** Iteration 528 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00106 |       0.00000 |       0.02096 |       0.00149 |       1.73084
      0.00114 |       0.00000 |       0.02067 |       0.00153 |       1.72956
      0.00048 |       0.00000 |       0.02078 |       0.00153 |       1.72931
     -0.00013 |       0.00000 |       0.02098 |       0.00160 |       1.72908
     -0.00016 |       0.00000 |       0.02073 |       0.00157 |       1.72893
     -0.00020 |       0.00000 |       0.02080 |       0.00146 |       1.72673
      0.00093 |       0.00000 |       0.02062 |       0.00157 |       1.72470
      0.00111 |       0.00000 |       0.02052 |       0.00157 |       1.72521
      0.00102 |       0.00000 |       0.02108 |       0.00161 |       1.72511
     -0.00193 |       0.00000 |       0.02054 |       0.00171 |       1.72533
Evaluating losses...
      0.00082 |       0.00000 |       0.02046 |       0.00167 |      

      0.00104 |       0.00000 |       0.01738 |       0.00133 |       1.73957
      0.00157 |       0.00000 |       0.01734 |       0.00132 |       1.73890
      0.00052 |       0.00000 |       0.01744 |       0.00132 |       1.73851
Evaluating losses...
    -2.66e-05 |       0.00000 |       0.01733 |       0.00132 |       1.73802
------------------------------------
| EpLenMean       | 629            |
| EpRewMean       | -4.8           |
| EpThisIter      | 7              |
| EpisodesSoFar   | 3461           |
| TimeElapsed     | 3.48e+03       |
| TimestepsSoFar  | 2187264        |
| ev_tdlam_before | 0.842          |
| loss_ent        | 1.7380182      |
| loss_kl         | 0.001317431    |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -2.6564812e-05 |
| loss_vf_loss    | 0.01732855     |
------------------------------------
********** Iteration 534 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00256 |    

********** Iteration 539 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00080 |       0.00000 |       0.01783 |       0.00145 |       1.72368
      0.00119 |       0.00000 |       0.01777 |       0.00159 |       1.71855
     -0.00020 |       0.00000 |       0.01788 |       0.00147 |       1.72056
     -0.00034 |       0.00000 |       0.01790 |       0.00153 |       1.71918
     -0.00021 |       0.00000 |       0.01769 |       0.00156 |       1.71632
      0.00015 |       0.00000 |       0.01772 |       0.00146 |       1.71835
      0.00020 |       0.00000 |       0.01784 |       0.00163 |       1.71775
      0.00166 |       0.00000 |       0.01761 |       0.00143 |       1.71703
      0.00033 |       0.00000 |       0.01762 |       0.00151 |       1.71617
     -0.00077 |       0.00000 |       0.01744 |       0.00172 |       1.71445
Evaluating losses...
     -0.00105 |       0.00000 |       0.01760 |       0.00159 |      

      0.00096 |       0.00000 |       0.01639 |       0.00158 |       1.71729
      0.00081 |       0.00000 |       0.01633 |       0.00168 |       1.71534
Evaluating losses...
      0.00082 |       0.00000 |       0.01638 |       0.00152 |       1.71790
----------------------------------
| EpLenMean       | 595          |
| EpRewMean       | -4.85        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 3537         |
| TimeElapsed     | 3.55e+03     |
| TimestepsSoFar  | 2232320      |
| ev_tdlam_before | 0.85         |
| loss_ent        | 1.7178998    |
| loss_kl         | 0.0015189444 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.000816677  |
| loss_vf_loss    | 0.016381025  |
----------------------------------
********** Iteration 545 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00068 |       0.00000 |       0.01959 |       0.00155 |       1.71700
      0.00076 |       0.00000 |       0.01963 |

********** Iteration 550 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00065 |       0.00000 |       0.01785 |       0.00123 |       1.68392
     -0.00038 |       0.00000 |       0.01740 |       0.00131 |       1.68317
     -0.00017 |       0.00000 |       0.01728 |       0.00134 |       1.68347
      0.00101 |       0.00000 |       0.01745 |       0.00131 |       1.68545
     -0.00048 |       0.00000 |       0.01714 |       0.00133 |       1.68730
     -0.00075 |       0.00000 |       0.01709 |       0.00135 |       1.68702
      0.00113 |       0.00000 |       0.01733 |       0.00136 |       1.68610
    -6.54e-05 |       0.00000 |       0.01722 |       0.00134 |       1.68612
      0.00085 |       0.00000 |       0.01716 |       0.00142 |       1.68856
     2.06e-05 |       0.00000 |       0.01711 |       0.00133 |       1.68655
Evaluating losses...
      0.00018 |       0.00000 |       0.01706 |       0.00130 |      

     -0.00032 |       0.00000 |       0.02189 |       0.00151 |       1.67267
      0.00168 |       0.00000 |       0.02181 |       0.00156 |       1.67281
      0.00146 |       0.00000 |       0.02184 |       0.00157 |       1.67031
Evaluating losses...
      0.00134 |       0.00000 |       0.02197 |       0.00164 |       1.66931
----------------------------------
| EpLenMean       | 628          |
| EpRewMean       | -4.92        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 3607         |
| TimeElapsed     | 3.63e+03     |
| TimestepsSoFar  | 2277376      |
| ev_tdlam_before | 0.814        |
| loss_ent        | 1.6693085    |
| loss_kl         | 0.0016366784 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0013436559 |
| loss_vf_loss    | 0.021971341  |
----------------------------------
********** Iteration 556 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00084 |       0.00000 |       0.01503 |

********** Iteration 561 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00031 |       0.00000 |       0.01439 |       0.00132 |       1.68006
     5.77e-05 |       0.00000 |       0.01427 |       0.00125 |       1.67947
    -1.70e-05 |       0.00000 |       0.01409 |       0.00127 |       1.68012
      0.00085 |       0.00000 |       0.01419 |       0.00129 |       1.68067
      0.00050 |       0.00000 |       0.01421 |       0.00125 |       1.67954
    -1.63e-05 |       0.00000 |       0.01406 |       0.00125 |       1.67965
      0.00062 |       0.00000 |       0.01407 |       0.00126 |       1.68045
      0.00046 |       0.00000 |       0.01414 |       0.00128 |       1.67959
      0.00029 |       0.00000 |       0.01417 |       0.00135 |       1.68010
      0.00067 |       0.00000 |       0.01389 |       0.00129 |       1.67850
Evaluating losses...
     -0.00138 |       0.00000 |       0.01395 |       0.00131 |      

      0.00092 |       0.00000 |       0.01542 |       0.00149 |       1.65793
      0.00080 |       0.00000 |       0.01540 |       0.00141 |       1.66013
     6.68e-05 |       0.00000 |       0.01560 |       0.00140 |       1.65542
      0.00109 |       0.00000 |       0.01540 |       0.00141 |       1.65800
Evaluating losses...
    -2.82e-06 |       0.00000 |       0.01544 |       0.00145 |       1.65712
------------------------------------
| EpLenMean       | 632            |
| EpRewMean       | -4.88          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 3677           |
| TimeElapsed     | 3.7e+03        |
| TimestepsSoFar  | 2322432        |
| ev_tdlam_before | 0.831          |
| loss_ent        | 1.6571215      |
| loss_kl         | 0.0014538844   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -2.8249342e-06 |
| loss_vf_loss    | 0.015441917    |
------------------------------------
********** Iteration 567 ************
Optimizing...
     pol_surr |    

********** Iteration 572 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00119 |       0.00000 |       0.01750 |       0.00099 |       1.64803
     -0.00083 |       0.00000 |       0.01737 |       0.00103 |       1.64913
      0.00058 |       0.00000 |       0.01745 |       0.00103 |       1.64950
      0.00064 |       0.00000 |       0.01743 |       0.00110 |       1.64840
     -0.00179 |       0.00000 |       0.01749 |       0.00100 |       1.64888
     -0.00028 |       0.00000 |       0.01705 |       0.00106 |       1.64969
      0.00067 |       0.00000 |       0.01720 |       0.00105 |       1.65010
      0.00061 |       0.00000 |       0.01732 |       0.00112 |       1.64643
     -0.00031 |       0.00000 |       0.01721 |       0.00107 |       1.64838
     2.67e-05 |       0.00000 |       0.01698 |       0.00111 |       1.64837
Evaluating losses...
     -0.00115 |       0.00000 |       0.01717 |       0.00110 |      

      0.00078 |       0.00000 |       0.01375 |       0.00130 |       1.65628
     -0.00184 |       0.00000 |       0.01377 |       0.00133 |       1.65494
      0.00044 |       0.00000 |       0.01364 |       0.00135 |       1.65336
Evaluating losses...
      0.00022 |       0.00000 |       0.01366 |       0.00138 |       1.65435
----------------------------------
| EpLenMean       | 624          |
| EpRewMean       | -4.89        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 3751         |
| TimeElapsed     | 3.77e+03     |
| TimestepsSoFar  | 2367488      |
| ev_tdlam_before | 0.853        |
| loss_ent        | 1.6543485    |
| loss_kl         | 0.0013791755 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0002150354 |
| loss_vf_loss    | 0.013659084  |
----------------------------------
********** Iteration 578 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00060 |       0.00000 |       0.01505 |

********** Iteration 583 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00050 |       0.00000 |       0.02543 |       0.00137 |       1.65594
     -0.00021 |       0.00000 |       0.02439 |       0.00145 |       1.65442
     -0.00121 |       0.00000 |       0.02399 |       0.00135 |       1.65532
      0.00242 |       0.00000 |       0.02371 |       0.00133 |       1.65396
      0.00184 |       0.00000 |       0.02378 |       0.00134 |       1.65263
     -0.00060 |       0.00000 |       0.02390 |       0.00144 |       1.65465
      0.00146 |       0.00000 |       0.02380 |       0.00132 |       1.65505
     -0.00078 |       0.00000 |       0.02380 |       0.00130 |       1.65339
      0.00176 |       0.00000 |       0.02336 |       0.00138 |       1.65454
     -0.00030 |       0.00000 |       0.02355 |       0.00130 |       1.65454
Evaluating losses...
      0.00087 |       0.00000 |       0.02356 |       0.00132 |      

      0.00171 |       0.00000 |       0.01454 |       0.00135 |       1.62258
     -0.00028 |       0.00000 |       0.01448 |       0.00124 |       1.62236
      0.00106 |       0.00000 |       0.01455 |       0.00134 |       1.62086
Evaluating losses...
    -5.78e-05 |       0.00000 |       0.01445 |       0.00138 |       1.62127
------------------------------------
| EpLenMean       | 615            |
| EpRewMean       | -4.84          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 3824           |
| TimeElapsed     | 3.84e+03       |
| TimestepsSoFar  | 2412544        |
| ev_tdlam_before | 0.849          |
| loss_ent        | 1.6212678      |
| loss_kl         | 0.0013849301   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -5.7751313e-05 |
| loss_vf_loss    | 0.014450476    |
------------------------------------
********** Iteration 589 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00015 |    

********** Iteration 594 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00010 |       0.00000 |       0.01648 |       0.00093 |       1.64628
     -0.00156 |       0.00000 |       0.01610 |       0.00096 |       1.64742
     -0.00053 |       0.00000 |       0.01598 |       0.00098 |       1.64339
      0.00092 |       0.00000 |       0.01610 |       0.00103 |       1.64093
      0.00119 |       0.00000 |       0.01616 |       0.00107 |       1.63996
      0.00112 |       0.00000 |       0.01597 |       0.00094 |       1.64092
      0.00033 |       0.00000 |       0.01602 |       0.00100 |       1.64057
      0.00048 |       0.00000 |       0.01578 |       0.00101 |       1.64026
     -0.00031 |       0.00000 |       0.01576 |       0.00101 |       1.64008
      0.00026 |       0.00000 |       0.01586 |       0.00106 |       1.63830
Evaluating losses...
      0.00014 |       0.00000 |       0.01578 |       0.00100 |      

      0.00088 |       0.00000 |       0.01531 |       0.00128 |       1.64926
      0.00087 |       0.00000 |       0.01554 |       0.00130 |       1.64778
      0.00077 |       0.00000 |       0.01547 |       0.00123 |       1.64788
Evaluating losses...
      0.00137 |       0.00000 |       0.01523 |       0.00124 |       1.64707
----------------------------------
| EpLenMean       | 614          |
| EpRewMean       | -4.81        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 3896         |
| TimeElapsed     | 3.91e+03     |
| TimestepsSoFar  | 2457600      |
| ev_tdlam_before | 0.854        |
| loss_ent        | 1.6470721    |
| loss_kl         | 0.0012419955 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0013662443 |
| loss_vf_loss    | 0.015233156  |
----------------------------------
********** Iteration 600 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     6.58e-07 |       0.00000 |       0.02187 |

********** Iteration 605 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00092 |       0.00000 |       0.02168 |       0.00116 |       1.66152
     9.17e-05 |       0.00000 |       0.02102 |       0.00110 |       1.66328
      0.00031 |       0.00000 |       0.02115 |       0.00106 |       1.66196
     -0.00030 |       0.00000 |       0.02098 |       0.00111 |       1.66051
      0.00067 |       0.00000 |       0.02097 |       0.00113 |       1.66139
     -0.00011 |       0.00000 |       0.02092 |       0.00115 |       1.66050
     2.42e-05 |       0.00000 |       0.02122 |       0.00116 |       1.66189
      0.00067 |       0.00000 |       0.02077 |       0.00116 |       1.65953
     -0.00018 |       0.00000 |       0.02069 |       0.00117 |       1.65943
     -0.00010 |       0.00000 |       0.02065 |       0.00127 |       1.65790
Evaluating losses...
     -0.00032 |       0.00000 |       0.02058 |       0.00126 |      

     -0.00024 |       0.00000 |       0.02227 |       0.00141 |       1.66132
      0.00026 |       0.00000 |       0.02240 |       0.00133 |       1.66086
      0.00102 |       0.00000 |       0.02216 |       0.00146 |       1.66359
      0.00012 |       0.00000 |       0.02208 |       0.00135 |       1.66379
Evaluating losses...
      0.00142 |       0.00000 |       0.02220 |       0.00144 |       1.66446
----------------------------------
| EpLenMean       | 647          |
| EpRewMean       | -4.76        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 3966         |
| TimeElapsed     | 3.99e+03     |
| TimestepsSoFar  | 2502656      |
| ev_tdlam_before | 0.808        |
| loss_ent        | 1.6644571    |
| loss_kl         | 0.0014442576 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0014221965 |
| loss_vf_loss    | 0.022197444  |
----------------------------------
********** Iteration 611 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |

********** Iteration 616 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00055 |       0.00000 |       0.01738 |       0.00081 |       1.64915
      0.00070 |       0.00000 |       0.01722 |       0.00086 |       1.65190
    -5.77e-05 |       0.00000 |       0.01718 |       0.00086 |       1.64978
      0.00027 |       0.00000 |       0.01713 |       0.00087 |       1.65180
     -0.00027 |       0.00000 |       0.01720 |       0.00083 |       1.65234
      0.00048 |       0.00000 |       0.01717 |       0.00088 |       1.65245
      0.00024 |       0.00000 |       0.01698 |       0.00087 |       1.65278
     -0.00044 |       0.00000 |       0.01692 |       0.00092 |       1.65299
      0.00076 |       0.00000 |       0.01723 |       0.00098 |       1.65324
     4.77e-05 |       0.00000 |       0.01697 |       0.00089 |       1.65384
Evaluating losses...
      0.00049 |       0.00000 |       0.01697 |       0.00088 |      

      0.00038 |       0.00000 |       0.01471 |       0.00104 |       1.63464
     3.60e-05 |       0.00000 |       0.01467 |       0.00111 |       1.63488
      0.00041 |       0.00000 |       0.01467 |       0.00117 |       1.63443
Evaluating losses...
     -0.00171 |       0.00000 |       0.01481 |       0.00117 |       1.63316
-----------------------------------
| EpLenMean       | 628           |
| EpRewMean       | -4.9          |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4038          |
| TimeElapsed     | 4.06e+03      |
| TimestepsSoFar  | 2547712       |
| ev_tdlam_before | 0.854         |
| loss_ent        | 1.6331592     |
| loss_kl         | 0.0011722161  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0017118724 |
| loss_vf_loss    | 0.01480558    |
-----------------------------------
********** Iteration 622 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -6.49e-05 |       0.00000 |  

********** Iteration 627 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00025 |       0.00000 |       0.01741 |       0.00107 |       1.62386
      0.00035 |       0.00000 |       0.01698 |       0.00109 |       1.62028
      0.00049 |       0.00000 |       0.01692 |       0.00111 |       1.62092
     -0.00012 |       0.00000 |       0.01687 |       0.00119 |       1.61962
     6.33e-05 |       0.00000 |       0.01684 |       0.00114 |       1.61788
      0.00108 |       0.00000 |       0.01664 |       0.00123 |       1.61663
     -0.00051 |       0.00000 |       0.01685 |       0.00143 |       1.61512
     7.98e-05 |       0.00000 |       0.01678 |       0.00145 |       1.61352
      0.00023 |       0.00000 |       0.01666 |       0.00127 |       1.61440
     -0.00066 |       0.00000 |       0.01669 |       0.00147 |       1.61226
Evaluating losses...
     -0.00054 |       0.00000 |       0.01649 |       0.00126 |      

      0.00018 |       0.00000 |       0.01655 |       0.00118 |       1.60119
      0.00138 |       0.00000 |       0.01654 |       0.00119 |       1.60114
    -3.79e-05 |       0.00000 |       0.01654 |       0.00120 |       1.60341
Evaluating losses...
      0.00023 |       0.00000 |       0.01641 |       0.00118 |       1.60008
-----------------------------------
| EpLenMean       | 636           |
| EpRewMean       | -4.81         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4108          |
| TimeElapsed     | 4.13e+03      |
| TimestepsSoFar  | 2592768       |
| ev_tdlam_before | 0.845         |
| loss_ent        | 1.6000792     |
| loss_kl         | 0.001183875   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00022701803 |
| loss_vf_loss    | 0.016412849   |
-----------------------------------
********** Iteration 633 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00014 |       0.00000 |  

********** Iteration 638 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00115 |       0.00000 |       0.01472 |       0.00078 |       1.62951
      0.00088 |       0.00000 |       0.01456 |       0.00078 |       1.63089
     -0.00089 |       0.00000 |       0.01451 |       0.00081 |       1.62929
      0.00014 |       0.00000 |       0.01434 |       0.00082 |       1.63083
      0.00043 |       0.00000 |       0.01430 |       0.00090 |       1.63214
      0.00071 |       0.00000 |       0.01428 |       0.00081 |       1.63204
     -0.00035 |       0.00000 |       0.01443 |       0.00077 |       1.63238
      0.00117 |       0.00000 |       0.01431 |       0.00085 |       1.63237
    -4.73e-05 |       0.00000 |       0.01424 |       0.00078 |       1.62923
      0.00137 |       0.00000 |       0.01420 |       0.00079 |       1.63198
Evaluating losses...
    -8.16e-06 |       0.00000 |       0.01428 |       0.00080 |      

      0.00117 |       0.00000 |       0.01678 |       0.00107 |       1.64747
      0.00089 |       0.00000 |       0.01675 |       0.00115 |       1.64916
      0.00036 |       0.00000 |       0.01677 |       0.00120 |       1.64937
Evaluating losses...
      0.00054 |       0.00000 |       0.01660 |       0.00119 |       1.64676
-----------------------------------
| EpLenMean       | 651           |
| EpRewMean       | -4.82         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4178          |
| TimeElapsed     | 4.2e+03       |
| TimestepsSoFar  | 2637824       |
| ev_tdlam_before | 0.836         |
| loss_ent        | 1.6467574     |
| loss_kl         | 0.0011930568  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00053844566 |
| loss_vf_loss    | 0.016600013   |
-----------------------------------
********** Iteration 644 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00099 |       0.00000 |  

********** Iteration 649 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00093 |       0.00000 |       0.01834 |       0.00086 |       1.63275
      0.00034 |       0.00000 |       0.01847 |       0.00089 |       1.63072
     -0.00013 |       0.00000 |       0.01817 |       0.00090 |       1.63360
     -0.00010 |       0.00000 |       0.01812 |       0.00090 |       1.63515
      0.00071 |       0.00000 |       0.01780 |       0.00095 |       1.63580
     -0.00091 |       0.00000 |       0.01787 |       0.00099 |       1.63608
     -0.00204 |       0.00000 |       0.01795 |       0.00096 |       1.63672
    -8.67e-05 |       0.00000 |       0.01785 |       0.00112 |       1.63793
     -0.00051 |       0.00000 |       0.01780 |       0.00109 |       1.63735
    -5.53e-05 |       0.00000 |       0.01785 |       0.00108 |       1.63874
Evaluating losses...
     -0.00061 |       0.00000 |       0.01770 |       0.00118 |      

     -0.00021 |       0.00000 |       0.01474 |       0.00107 |       1.63504
      0.00078 |       0.00000 |       0.01482 |       0.00118 |       1.63382
    -4.37e-05 |       0.00000 |       0.01478 |       0.00111 |       1.63316
Evaluating losses...
      0.00076 |       0.00000 |       0.01463 |       0.00106 |       1.63550
-----------------------------------
| EpLenMean       | 640           |
| EpRewMean       | -4.85         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4248          |
| TimeElapsed     | 4.27e+03      |
| TimestepsSoFar  | 2682880       |
| ev_tdlam_before | 0.845         |
| loss_ent        | 1.635505      |
| loss_kl         | 0.0010621262  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00076226896 |
| loss_vf_loss    | 0.0146329515  |
-----------------------------------
********** Iteration 655 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00074 |       0.00000 |  

********** Iteration 660 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00052 |       0.00000 |       0.01573 |       0.00078 |       1.61411
     -0.00026 |       0.00000 |       0.01538 |       0.00074 |       1.61606
      0.00095 |       0.00000 |       0.01549 |       0.00076 |       1.61504
     -0.00020 |       0.00000 |       0.01548 |       0.00078 |       1.61386
     -0.00023 |       0.00000 |       0.01515 |       0.00076 |       1.61389
     -0.00030 |       0.00000 |       0.01529 |       0.00078 |       1.61372
     -0.00037 |       0.00000 |       0.01540 |       0.00079 |       1.61253
      0.00033 |       0.00000 |       0.01532 |       0.00087 |       1.61404
     -0.00016 |       0.00000 |       0.01505 |       0.00087 |       1.61374
      0.00030 |       0.00000 |       0.01519 |       0.00081 |       1.61297
Evaluating losses...
     -0.00040 |       0.00000 |       0.01532 |       0.00086 |      

      0.00051 |       0.00000 |       0.02109 |       0.00092 |       1.61766
      0.00098 |       0.00000 |       0.02083 |       0.00090 |       1.61507
      0.00047 |       0.00000 |       0.02099 |       0.00090 |       1.61658
     -0.00064 |       0.00000 |       0.02079 |       0.00086 |       1.61668
Evaluating losses...
     -0.00071 |       0.00000 |       0.02111 |       0.00091 |       1.61724
-----------------------------------
| EpLenMean       | 631           |
| EpRewMean       | -4.85         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4319          |
| TimeElapsed     | 4.34e+03      |
| TimestepsSoFar  | 2727936       |
| ev_tdlam_before | 0.825         |
| loss_ent        | 1.6172429     |
| loss_kl         | 0.0009095543  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0007097223 |
| loss_vf_loss    | 0.021113733   |
-----------------------------------
********** Iteration 666 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 671 ************
Eval num_timesteps=2748416, episode_reward=-4.80 +/- 0.40
Episode length: 702.00 +/- 223.28
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00046 |       0.00000 |       0.01719 |       0.00079 |       1.60578
      0.00093 |       0.00000 |       0.01703 |       0.00077 |       1.60820
      0.00019 |       0.00000 |       0.01674 |       0.00086 |       1.60928
    -2.61e-06 |       0.00000 |       0.01672 |       0.00091 |       1.61098
     -0.00079 |       0.00000 |       0.01681 |       0.00088 |       1.61156
     -0.00059 |       0.00000 |       0.01690 |       0.00094 |       1.61086
     -0.00038 |       0.00000 |       0.01681 |       0.00089 |       1.61351
     -0.00031 |       0.00000 |       0.01659 |       0.00097 |       1.61366
     6.16e-05 |       0.00000 |       0.01656 |       0.00100 |       1.61383
      0.00020 |       0.00000 |       0.01666 |       0.00098 |       1.6171

     3.56e-05 |       0.00000 |       0.01726 |       0.00103 |       1.59931
      0.00043 |       0.00000 |       0.01721 |       0.00112 |       1.59751
     -0.00012 |       0.00000 |       0.01716 |       0.00107 |       1.59815
     -0.00081 |       0.00000 |       0.01723 |       0.00106 |       1.59778
Evaluating losses...
     -0.00041 |       0.00000 |       0.01706 |       0.00116 |       1.59795
------------------------------------
| EpLenMean       | 640            |
| EpRewMean       | -4.91          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4388           |
| TimeElapsed     | 4.43e+03       |
| TimestepsSoFar  | 2772992        |
| ev_tdlam_before | 0.836          |
| loss_ent        | 1.5979469      |
| loss_kl         | 0.0011611014   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00041022003 |
| loss_vf_loss    | 0.017058704    |
------------------------------------
********** Iteration 677 ************
Optimizing...
     pol_surr |    

********** Iteration 682 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00047 |       0.00000 |       0.01468 |       0.00091 |       1.60390
     5.57e-05 |       0.00000 |       0.01433 |       0.00085 |       1.60457
      0.00079 |       0.00000 |       0.01440 |       0.00085 |       1.60578
      0.00093 |       0.00000 |       0.01445 |       0.00083 |       1.60700
     -0.00060 |       0.00000 |       0.01438 |       0.00094 |       1.60681
     -0.00115 |       0.00000 |       0.01441 |       0.00107 |       1.60840
      0.00059 |       0.00000 |       0.01455 |       0.00098 |       1.60964
      0.00014 |       0.00000 |       0.01456 |       0.00095 |       1.60889
     -0.00103 |       0.00000 |       0.01442 |       0.00096 |       1.61105
     -0.00062 |       0.00000 |       0.01429 |       0.00103 |       1.60999
Evaluating losses...
      0.00063 |       0.00000 |       0.01424 |       0.00102 |      

     -0.00082 |       0.00000 |       0.02276 |       0.00091 |       1.60588
     -0.00040 |       0.00000 |       0.02305 |       0.00103 |       1.60347
     -0.00063 |       0.00000 |       0.02287 |       0.00099 |       1.60403
Evaluating losses...
     2.62e-05 |       0.00000 |       0.02264 |       0.00095 |       1.60188
-----------------------------------
| EpLenMean       | 633           |
| EpRewMean       | -4.9          |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4460          |
| TimeElapsed     | 4.5e+03       |
| TimestepsSoFar  | 2818048       |
| ev_tdlam_before | 0.818         |
| loss_ent        | 1.6018845     |
| loss_kl         | 0.0009536542  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 2.6187976e-05 |
| loss_vf_loss    | 0.02263634    |
-----------------------------------
********** Iteration 688 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00093 |       0.00000 |  

********** Iteration 693 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00052 |       0.00000 |       0.02307 |       0.00074 |       1.60409
     -0.00034 |       0.00000 |       0.02296 |       0.00075 |       1.60416
     -0.00026 |       0.00000 |       0.02284 |       0.00082 |       1.60177
      0.00124 |       0.00000 |       0.02327 |       0.00078 |       1.60180
     -0.00026 |       0.00000 |       0.02319 |       0.00084 |       1.60256
     -0.00046 |       0.00000 |       0.02300 |       0.00094 |       1.60117
     -0.00056 |       0.00000 |       0.02294 |       0.00098 |       1.59880
     -0.00134 |       0.00000 |       0.02259 |       0.00101 |       1.60006
     -0.00021 |       0.00000 |       0.02277 |       0.00099 |       1.59904
      0.00017 |       0.00000 |       0.02302 |       0.00099 |       1.60043
Evaluating losses...
     -0.00050 |       0.00000 |       0.02294 |       0.00100 |      

    -7.69e-05 |       0.00000 |       0.02307 |       0.00079 |       1.59587
     -0.00024 |       0.00000 |       0.02288 |       0.00081 |       1.59658
      0.00019 |       0.00000 |       0.02303 |       0.00086 |       1.59644
Evaluating losses...
     -0.00017 |       0.00000 |       0.02314 |       0.00082 |       1.59638
------------------------------------
| EpLenMean       | 613            |
| EpRewMean       | -4.79          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4534           |
| TimeElapsed     | 4.64e+03       |
| TimestepsSoFar  | 2863104        |
| ev_tdlam_before | 0.813          |
| loss_ent        | 1.596377       |
| loss_kl         | 0.0008168062   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00017271237 |
| loss_vf_loss    | 0.023140732    |
------------------------------------
********** Iteration 699 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00090 |    

********** Iteration 704 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00048 |       0.00000 |       0.01364 |       0.00071 |       1.60575
     -0.00082 |       0.00000 |       0.01364 |       0.00076 |       1.60815
      0.00042 |       0.00000 |       0.01347 |       0.00072 |       1.60780
      0.00080 |       0.00000 |       0.01344 |       0.00074 |       1.60730
     -0.00034 |       0.00000 |       0.01330 |       0.00076 |       1.60919
     -0.00078 |       0.00000 |       0.01326 |       0.00069 |       1.60943
     -0.00068 |       0.00000 |       0.01336 |       0.00078 |       1.61158
    -4.16e-05 |       0.00000 |       0.01327 |       0.00079 |       1.61440
     3.55e-05 |       0.00000 |       0.01339 |       0.00072 |       1.61050
     -0.00078 |       0.00000 |       0.01341 |       0.00083 |       1.61388
Evaluating losses...
     -0.00030 |       0.00000 |       0.01330 |       0.00083 |      

     -0.00023 |       0.00000 |       0.01491 |       0.00087 |       1.59620
     -0.00085 |       0.00000 |       0.01498 |       0.00080 |       1.59615
      0.00128 |       0.00000 |       0.01477 |       0.00090 |       1.59501
Evaluating losses...
     -0.00050 |       0.00000 |       0.01499 |       0.00084 |       1.59452
------------------------------------
| EpLenMean       | 607            |
| EpRewMean       | -4.79          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4608           |
| TimeElapsed     | 4.94e+03       |
| TimestepsSoFar  | 2908160        |
| ev_tdlam_before | 0.871          |
| loss_ent        | 1.5945218      |
| loss_kl         | 0.0008395914   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00050228694 |
| loss_vf_loss    | 0.014988074    |
------------------------------------
********** Iteration 710 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00044 |    

********** Iteration 715 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00021 |       0.00000 |       0.01631 |       0.00067 |       1.59224
     -0.00131 |       0.00000 |       0.01643 |       0.00065 |       1.59280
      0.00035 |       0.00000 |       0.01622 |       0.00067 |       1.59398
    -1.63e-05 |       0.00000 |       0.01640 |       0.00064 |       1.59475
     -0.00019 |       0.00000 |       0.01621 |       0.00067 |       1.59404
     -0.00013 |       0.00000 |       0.01630 |       0.00065 |       1.59558
      0.00035 |       0.00000 |       0.01624 |       0.00067 |       1.59553
     -0.00078 |       0.00000 |       0.01588 |       0.00072 |       1.59712
      0.00013 |       0.00000 |       0.01618 |       0.00068 |       1.59604
    -9.95e-05 |       0.00000 |       0.01612 |       0.00077 |       1.59815
Evaluating losses...
     5.36e-06 |       0.00000 |       0.01625 |       0.00075 |      

      0.00033 |       0.00000 |       0.01648 |       0.00101 |       1.58621
     -0.00089 |       0.00000 |       0.01618 |       0.00094 |       1.58732
     -0.00013 |       0.00000 |       0.01625 |       0.00109 |       1.58629
Evaluating losses...
     -0.00097 |       0.00000 |       0.01633 |       0.00104 |       1.58733
-----------------------------------
| EpLenMean       | 616           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4681          |
| TimeElapsed     | 5.19e+03      |
| TimestepsSoFar  | 2953216       |
| ev_tdlam_before | 0.85          |
| loss_ent        | 1.5873263     |
| loss_kl         | 0.00104483    |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009731385 |
| loss_vf_loss    | 0.01632616    |
-----------------------------------
********** Iteration 721 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00105 |       0.00000 |  

********** Iteration 726 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     1.60e-05 |       0.00000 |       0.01931 |       0.00078 |       1.60519
      0.00028 |       0.00000 |       0.01925 |       0.00084 |       1.60412
      0.00076 |       0.00000 |       0.01912 |       0.00083 |       1.60242
     -0.00064 |       0.00000 |       0.01919 |       0.00084 |       1.60172
     3.19e-05 |       0.00000 |       0.01913 |       0.00092 |       1.60000
     -0.00018 |       0.00000 |       0.01919 |       0.00082 |       1.59901
     -0.00015 |       0.00000 |       0.01915 |       0.00089 |       1.59936
     -0.00064 |       0.00000 |       0.01901 |       0.00092 |       1.59711
      0.00044 |       0.00000 |       0.01911 |       0.00084 |       1.59710
      0.00023 |       0.00000 |       0.01910 |       0.00093 |       1.59603
Evaluating losses...
      0.00045 |       0.00000 |       0.01906 |       0.00092 |      

      0.00117 |       0.00000 |       0.01366 |       0.00093 |       1.60199
     -0.00077 |       0.00000 |       0.01356 |       0.00102 |       1.60160
      0.00037 |       0.00000 |       0.01358 |       0.00090 |       1.60074
Evaluating losses...
     8.87e-05 |       0.00000 |       0.01353 |       0.00089 |       1.60091
----------------------------------
| EpLenMean       | 646          |
| EpRewMean       | -4.79        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 4753         |
| TimeElapsed     | 5.47e+03     |
| TimestepsSoFar  | 2998272      |
| ev_tdlam_before | 0.866        |
| loss_ent        | 1.6009091    |
| loss_kl         | 0.0008851342 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 8.866016e-05 |
| loss_vf_loss    | 0.01353225   |
----------------------------------
********** Iteration 732 ************
Eval num_timesteps=2998272, episode_reward=-4.40 +/- 0.92
Episode length: 619.40 +/- 109.86
New best mean reward!
Optimizing...
     pol_su

********** Iteration 737 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00091 |       0.00000 |       0.01591 |       0.00068 |       1.59015
      0.00034 |       0.00000 |       0.01561 |       0.00070 |       1.59053
      0.00016 |       0.00000 |       0.01550 |       0.00068 |       1.59093
     -0.00019 |       0.00000 |       0.01530 |       0.00068 |       1.59224
     -0.00051 |       0.00000 |       0.01531 |       0.00069 |       1.59064
      0.00038 |       0.00000 |       0.01497 |       0.00074 |       1.59190
      0.00123 |       0.00000 |       0.01485 |       0.00072 |       1.59154
      0.00092 |       0.00000 |       0.01481 |       0.00072 |       1.59217
      0.00034 |       0.00000 |       0.01518 |       0.00071 |       1.59097
     -0.00043 |       0.00000 |       0.01474 |       0.00071 |       1.59236
Evaluating losses...
      0.00035 |       0.00000 |       0.01495 |       0.00071 |      

      0.00030 |       0.00000 |       0.01436 |       0.00074 |       1.60488
     -0.00049 |       0.00000 |       0.01435 |       0.00074 |       1.60507
     -0.00118 |       0.00000 |       0.01450 |       0.00083 |       1.60581
Evaluating losses...
      0.00045 |       0.00000 |       0.01407 |       0.00081 |       1.60715
-----------------------------------
| EpLenMean       | 627           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4825          |
| TimeElapsed     | 5.78e+03      |
| TimestepsSoFar  | 3043328       |
| ev_tdlam_before | 0.865         |
| loss_ent        | 1.6071502     |
| loss_kl         | 0.00081473275 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00044506788 |
| loss_vf_loss    | 0.014065403   |
-----------------------------------
********** Iteration 743 ************


KeyboardInterrupt: 

In [165]:
obs = env.reset()
done = False
total_reward = 0

while not done:
    action, _states = model.predict(obs)
    obs, reward, done, info = env.step(action)
    total_reward += reward
    env.render()

print("score:", total_reward)

score: -5
