# PPO in Stable Baselines

In single-agent PPO, `MlpPolicy` was used in `PPO1` as follows:

```
model = PPO1(MlpPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

```

`MlpPolicy` is found in `stable_baselines/common/policies.py`, inheriting `FeedForwardPolicy`, which inherits from `ActorCriticPolicy`.

In `FeedForwardPolicy`'s `__init__`, there contains the following:
```
if net_arch is None:
    if layers is None:
        layers = [64, 64]
    net_arch = [dict(vf=layers, pi=layers)]

with tf.variable_scope("model", reuse=reuse):
    if feature_extraction == "cnn":
        pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
    else:
        pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

    self._value_fn = linear(vf_latent, 'vf', 1)

    self._proba_distribution, self._policy, self.q_value = \
        self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)
```

Since `MlpPolicy` uses `feature_extraction="mlp"`, look into `mlp_extractor` (here)[https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/policies.py].

`mlp_extractor` constructs a MLP that receive observations as input and outputs a latent representation for the policy and a value network. Amount and size of hidden layers and how many shared between policy and value network can be spcified using `net_arch`.

In `mlp_extractor`, it iterates through `net_arch` and creates layers, specifically using `latent = act_fun(linear(latent, ...))`. Therfore, look into `act_fun` and `linear`, which belongs in stable_baselines.common.tf_layers.

`FeedForwardPolicy`'s default for `act_fun` is `tf.tanh`. Linear contains:

```
def linear(input_tensor, scope, n_hidden, *, init_scale=1.0, init_bias=0.0):
    """
    Creates a fully connected layer for TensorFlow
    :param input_tensor: (TensorFlow Tensor) The input tensor for the fully connected layer
    :param scope: (str) The TensorFlow variable scope
    :param n_hidden: (int) The number of hidden neurons
    :param init_scale: (int) The initialization scale
    :param init_bias: (int) The initialization offset bias
    :return: (TensorFlow Tensor) fully connected layer
    """
    with tf.variable_scope(scope):
        n_input = input_tensor.get_shape()[1].value
        weight = tf.get_variable("w", [n_input, n_hidden], initializer=ortho_init(init_scale))
        bias = tf.get_variable("b", [n_hidden], initializer=tf.constant_initializer(init_bias))
        return tf.matmul(input_tensor, weight) + bias
```

Therefore, to transform this model into a Bayesian neural network, the linear layer needs to be changed into DenseVariational instead of a linear layer. We can do this by modifying `FeedForwardPolicy` (which `MlpPolicy` inherits) with a new `bnn_extractor`, then creating a `BnnPolicy` to replace `MlpPolicy`.

In [6]:
from tensorflow.keras import backend as K
from tensorflow.keras import activations, initializers
from tensorflow.keras.layers import Layer

import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfp.__version__

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



'0.8.0'

In [38]:
def bnn_extractor(flat_observations, net_arch, act_fun):
    """
    Constructs an variational layer that receives observations as an input and outputs a latent representation for the policy and
    a value network. The ``net_arch`` parameter allows to specify the amount and size of the hidden layers and how many
    of them are shared between the policy network and the value network. It is assumed to be a list with the following
    structure:
    1. An arbitrary length (zero allowed) number of integers each specifying the number of units in a shared layer.
       If the number of ints is zero, there will be no shared layers.
    2. An optional dict, to specify the following non-shared layers for the value network and the policy network.
       It is formatted like ``dict(vf=[<value layer sizes>], pi=[<policy layer sizes>])``.
       If it is missing any of the keys (pi or vf), no non-shared layers (empty list) is assumed.
    For example to construct a network with one shared layer of size 55 followed by two non-shared layers for the value
    network of size 255 and a single non-shared layer of size 128 for the policy network, the following layers_spec
    would be used: ``[55, dict(vf=[255, 255], pi=[128])]``. A simple shared network topology with two layers of size 128
    would be specified as [128, 128].
    :param flat_observations: (tf.Tensor) The observations to base policy and value function on.
    :param net_arch: ([int or dict]) The specification of the policy and value networks.
        See above for details on its formatting.
    :param act_fun: (tf function) The activation function to use for the networks.
    :return: (tf.Tensor, tf.Tensor) latent_policy, latent_value of the specified network.
        If all layers are shared, then ``latent_policy == latent_value``
    """
    latent = flat_observations
    policy_only_layers = []  # Layer sizes of the network that only belongs to the policy network
    value_only_layers = []  # Layer sizes of the network that only belongs to the value network
    kernel_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p)

    # Iterate through the shared layers and build the shared parts of the network
    for idx, layer in enumerate(net_arch):
        if isinstance(layer, int):  # Check that this is a shared layer
            layer_size = layer
#             latent = act_fun(linear(latent, "shared_fc{}".format(idx), layer_size, init_scale=np.sqrt(2)))
            latent = act_fun(tfp.layers.DenseFlipout(layer_size, name="shared_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))
        else:
            assert isinstance(layer, dict), "Error: the net_arch list can only contain ints and dicts"
            if 'pi' in layer:
                assert isinstance(layer['pi'], list), "Error: net_arch[-1]['pi'] must contain a list of integers."
                policy_only_layers = layer['pi']

            if 'vf' in layer:
                assert isinstance(layer['vf'], list), "Error: net_arch[-1]['vf'] must contain a list of integers."
                value_only_layers = layer['vf']
            break  # From here on the network splits up in policy and value network

    # Build the non-shared part of the network
    latent_policy = latent
    latent_value = latent
    for idx, (pi_layer_size, vf_layer_size) in enumerate(zip_longest(policy_only_layers, value_only_layers)):
        if pi_layer_size is not None:
            assert isinstance(pi_layer_size, int), "Error: net_arch[-1]['pi'] must only contain integers."
#             latent_policy = act_fun(linear(latent_policy, "pi_fc{}".format(idx), pi_layer_size, init_scale=np.sqrt(2)))
            latent_policy = act_fun(tfp.layers.DenseFlipout(pi_layer_size, name="pi_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))

        if vf_layer_size is not None:
            assert isinstance(vf_layer_size, int), "Error: net_arch[-1]['vf'] must only contain integers."
#             latent_value = act_fun(linear(latent_value, "vf_fc{}".format(idx), vf_layer_size, init_scale=np.sqrt(2)))
            latent_value = act_fun(tfp.layers.DenseFlipout(vf_layer_size, name="vf_fc{}".format(idx), activation='relu', kernel_divergence_fn=kernel_divergence_fn)(latent))

    return latent_policy, latent_value

In [39]:
from stable_baselines.common.policies import ActorCriticPolicy, nature_cnn

class FeedForwardPolicy(ActorCriticPolicy):
    """
    Policy object that implements actor critic, using a feed forward neural network.
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param layers: ([int]) (deprecated, use net_arch instead) The size of the Neural network for the policy
        (if None, default to [64, 64])
    :param net_arch: (list) Specification of the actor-critic policy network architecture (see mlp_extractor
        documentation for details).
    :param act_fun: (tf.func) the activation function to use in the neural network.
    :param cnn_extractor: (function (TensorFlow Tensor, ``**kwargs``): (TensorFlow Tensor)) the CNN feature extraction
    :param feature_extraction: (str) The feature extraction type ("cnn" or "mlp")
    :param kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, layers=None, net_arch=None,
                 act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
        super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
                                                scale=(feature_extraction == "cnn"))

        self._kwargs_check(feature_extraction, kwargs)

        if layers is not None:
            warnings.warn("Usage of the `layers` parameter is deprecated! Use net_arch instead "
                          "(it has a different semantics though).", DeprecationWarning)
            if net_arch is not None:
                warnings.warn("The new `net_arch` parameter overrides the deprecated `layers` parameter!",
                              DeprecationWarning)

        if net_arch is None:
            if layers is None:
                layers = [64, 64]
            net_arch = [dict(vf=layers, pi=layers)]

        with tf.variable_scope("model", reuse=reuse):
            if feature_extraction == "cnn":
                pi_latent = vf_latent = cnn_extractor(self.processed_obs, **kwargs)
            elif feature_extraction == "bnn":
                pi_latent, vf_latent = bnn_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)
            else:
                pi_latent, vf_latent = mlp_extractor(tf.layers.flatten(self.processed_obs), net_arch, act_fun)

            self._value_fn = linear(vf_latent, 'vf', 1)

            self._proba_distribution, self._policy, self.q_value = \
                self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)

        self._setup_init()

    def step(self, obs, state=None, mask=None, deterministic=False):
        if deterministic:
            action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        else:
            action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                                   {self.obs_ph: obs})
        return action, value, self.initial_state, neglogp

    def proba_step(self, obs, state=None, mask=None):
        return self.sess.run(self.policy_proba, {self.obs_ph: obs})

    def value(self, obs, state=None, mask=None):
        return self.sess.run(self.value_flat, {self.obs_ph: obs})

In [40]:
import warnings
from itertools import zip_longest
from abc import ABC, abstractmethod

import numpy as np
import tensorflow as tf
from gym.spaces import Discrete

from stable_baselines.common.tf_util import batch_to_seq, seq_to_batch
from stable_baselines.common.tf_layers import conv, linear, conv_to_fc, lstm
from stable_baselines.common.distributions import make_proba_dist_type, CategoricalProbabilityDistribution, \
    MultiCategoricalProbabilityDistribution, DiagGaussianProbabilityDistribution, BernoulliProbabilityDistribution
from stable_baselines.common.input import observation_input
from stable_baselines.common.policies import nature_cnn

In [41]:
class BnnPolicy(FeedForwardPolicy):
    """
    Policy object that implements actor critic, using a Bayesian neural net (2 layers of 64)
    :param sess: (TensorFlow session) The current TensorFlow session
    :param ob_space: (Gym Space) The observation space of the environment
    :param ac_space: (Gym Space) The action space of the environment
    :param n_env: (int) The number of environments to run
    :param n_steps: (int) The number of steps to run for each environment
    :param n_batch: (int) The number of batch to run (n_envs * n_steps)
    :param reuse: (bool) If the policy is reusable or not
    :param _kwargs: (dict) Extra keyword arguments for the nature CNN feature extraction
    """

    def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
        super(BnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
                                        feature_extraction="bnn", **_kwargs)

# Single-Agent PPO with BNN

In [42]:
#!/usr/bin/env python3

# Train single CPU PPO1 on slimevolley.
# Should solve it (beat existing AI on average over 1000 trials) in 3 hours on single CPU, within 3M steps.

import os
import gym
import slimevolleygym
from slimevolleygym import SurvivalRewardEnv

from stable_baselines.ppo1 import PPO1
from stable_baselines.common.policies import MlpPolicy
from stable_baselines import logger
from stable_baselines.common.callbacks import EvalCallback

NUM_TIMESTEPS = int(5e6)
SEED = 721
EVAL_FREQ = 250000
EVAL_EPISODES = 10  # was 1000
LOGDIR = "bnn_ppo1" # moved to zoo afterwards.

logger.configure(folder=LOGDIR)

env = gym.make("SlimeVolley-v0")
env.seed(SEED)

Logging to bnn_ppo1


[721]

In [43]:
# take mujoco hyperparams (but doubled timesteps_per_actorbatch to cover more steps.)
model = PPO1(BnnPolicy, env, timesteps_per_actorbatch=4096, clip_param=0.2, entcoeff=0.0, optim_epochs=10,
                 optim_stepsize=3e-4, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', verbose=2)

eval_callback = EvalCallback(env, best_model_save_path=LOGDIR, log_path=LOGDIR, eval_freq=EVAL_FREQ, n_eval_episodes=EVAL_EPISODES)

model.learn(total_timesteps=NUM_TIMESTEPS, callback=eval_callback)

model.save(os.path.join(LOGDIR, "final_model")) # probably never get to this point.

env.close()



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where









  "{} != {}".format(self.training_env, self.eval_env))


********** Iteration 0 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00012 |       0.00000 |       0.05447 |      3.00e-05 |       2.07941
     -0.00056 |       0.00000 |       0.02884 |       0.00018 |       2.07928
     -0.00126 |       0.00000 |       0.02618 |       0.00059 |       2.07889
     -0.00201 |       0.00000 |       0.02503 |       0.00098 |       2.07850
     -0.00333 |       0.00000 |       0.02386 |       0.00199 |       2.07752
     -0.00427 |       0.00000 |       0.02256 |       0.00365 |       2.07588
     -0.00490 |       0.00000 |       0.02186 |       0.00474 |       2.07480
     -0.00523 |       0.00000 |       0.02107 |       0.00449 |       2.07505
     -0.00570 |       0.00000 |       0.02039 |       0.00459 |       2.07494
     -0.00574 |       0.00000 |       0.02034 |       0.00535 |       2.07418
Evaluating losses...
     -0.00687 |       0.00000 |       0.02030 |       0.00543 |       2

     -0.00441 |       0.00000 |       0.01676 |       0.00385 |       2.01647
     -0.00309 |       0.00000 |       0.01702 |       0.00359 |       2.01728
     -0.00466 |       0.00000 |       0.01666 |       0.00397 |       2.01500
Evaluating losses...
     -0.00507 |       0.00000 |       0.01677 |       0.00405 |       2.01364
-----------------------------------
| EpLenMean       | 552           |
| EpRewMean       | -4.93         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 44            |
| TimeElapsed     | 52            |
| TimestepsSoFar  | 24576         |
| ev_tdlam_before | 0.838         |
| loss_ent        | 2.0136447     |
| loss_kl         | 0.0040452033  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0050681904 |
| loss_vf_loss    | 0.016771859   |
-----------------------------------
********** Iteration 6 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00093 |       0.00000 |    

********** Iteration 11 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00060 |       0.00000 |       0.01954 |       0.00107 |       1.97353
      0.00059 |       0.00000 |       0.01851 |       0.00161 |       1.98444
     -0.00216 |       0.00000 |       0.01848 |       0.00222 |       1.98945
     -0.00218 |       0.00000 |       0.01795 |       0.00276 |       1.99286
     -0.00164 |       0.00000 |       0.01776 |       0.00293 |       1.99448
     -0.00282 |       0.00000 |       0.01747 |       0.00264 |       1.99053
     -0.00373 |       0.00000 |       0.01751 |       0.00310 |       1.99198
     -0.00394 |       0.00000 |       0.01740 |       0.00351 |       1.99099
     -0.00387 |       0.00000 |       0.01717 |       0.00377 |       1.99145
     -0.00530 |       0.00000 |       0.01706 |       0.00372 |       1.98979
Evaluating losses...
     -0.00430 |       0.00000 |       0.01686 |       0.00378 |       

     -0.00609 |       0.00000 |       0.02070 |       0.00583 |       1.97736
     -0.00702 |       0.00000 |       0.02115 |       0.00464 |       1.97612
     -0.00653 |       0.00000 |       0.02023 |       0.00546 |       1.97290
Evaluating losses...
     -0.00744 |       0.00000 |       0.02001 |       0.00539 |       1.97305
----------------------------------
| EpLenMean       | 624          |
| EpRewMean       | -4.81        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 113          |
| TimeElapsed     | 136          |
| TimestepsSoFar  | 69632        |
| ev_tdlam_before | 0.796        |
| loss_ent        | 1.9730539    |
| loss_kl         | 0.0053886888 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.007444583 |
| loss_vf_loss    | 0.020012245  |
----------------------------------
********** Iteration 17 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00030 |       0.00000 |       0.01471 | 

********** Iteration 22 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00035 |       0.00000 |       0.02044 |       0.00161 |       1.91817
     -0.00137 |       0.00000 |       0.02003 |       0.00295 |       1.90329
     -0.00191 |       0.00000 |       0.02010 |       0.00387 |       1.89839
     -0.00176 |       0.00000 |       0.01946 |       0.00355 |       1.90264
     -0.00276 |       0.00000 |       0.01938 |       0.00452 |       1.89579
     -0.00463 |       0.00000 |       0.01923 |       0.00462 |       1.89580
     -0.00342 |       0.00000 |       0.01934 |       0.00421 |       1.89788
     -0.00522 |       0.00000 |       0.01916 |       0.00482 |       1.89652
     -0.00406 |       0.00000 |       0.01879 |       0.00465 |       1.89763
     -0.00503 |       0.00000 |       0.01935 |       0.00450 |       1.89867
Evaluating losses...
     -0.00612 |       0.00000 |       0.01880 |       0.00534 |       

     -0.00285 |       0.00000 |       0.01438 |       0.00526 |       1.84578
     -0.00300 |       0.00000 |       0.01439 |       0.00476 |       1.84351
     -0.00414 |       0.00000 |       0.01368 |       0.00481 |       1.84514
Evaluating losses...
     -0.00206 |       0.00000 |       0.01409 |       0.00495 |       1.84685
----------------------------------
| EpLenMean       | 622          |
| EpRewMean       | -4.87        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 187          |
| TimeElapsed     | 217          |
| TimestepsSoFar  | 114688       |
| ev_tdlam_before | 0.836        |
| loss_ent        | 1.8468512    |
| loss_kl         | 0.004946707  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002058799 |
| loss_vf_loss    | 0.0140924025 |
----------------------------------
********** Iteration 28 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 |       0.01430 | 

********** Iteration 33 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00160 |       0.00000 |       0.01825 |       0.00223 |       1.78826
     -0.00043 |       0.00000 |       0.01773 |       0.00399 |       1.79825
     -0.00248 |       0.00000 |       0.01752 |       0.00388 |       1.79472
     -0.00126 |       0.00000 |       0.01735 |       0.00481 |       1.79587
     -0.00188 |       0.00000 |       0.01710 |       0.00556 |       1.79856
     -0.00187 |       0.00000 |       0.01683 |       0.00530 |       1.79315
     -0.00406 |       0.00000 |       0.01680 |       0.00502 |       1.79332
     -0.00488 |       0.00000 |       0.01672 |       0.00567 |       1.79525
     -0.00208 |       0.00000 |       0.01661 |       0.00611 |       1.80120
     -0.00499 |       0.00000 |       0.01615 |       0.00615 |       1.79544
Evaluating losses...
     -0.00496 |       0.00000 |       0.01645 |       0.00572 |       

     -0.00343 |       0.00000 |       0.01356 |       0.00562 |       1.72466
     -0.00347 |       0.00000 |       0.01360 |       0.00564 |       1.72496
Evaluating losses...
     -0.00398 |       0.00000 |       0.01337 |       0.00532 |       1.72583
----------------------------------
| EpLenMean       | 622          |
| EpRewMean       | -4.83        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 257          |
| TimeElapsed     | 303          |
| TimestepsSoFar  | 159744       |
| ev_tdlam_before | 0.841        |
| loss_ent        | 1.7258316    |
| loss_kl         | 0.0053211935 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.00397831  |
| loss_vf_loss    | 0.01336605   |
----------------------------------
********** Iteration 39 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00054 |       0.00000 |       0.01491 |       0.00219 |       1.71044
     -0.00024 |       0.00000 |       0.01411 | 

********** Iteration 44 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -9.26e-05 |       0.00000 |       0.01463 |       0.00230 |       1.57592
     -0.00050 |       0.00000 |       0.01429 |       0.00454 |       1.58083
     -0.00359 |       0.00000 |       0.01392 |       0.00392 |       1.58798
     -0.00448 |       0.00000 |       0.01406 |       0.00524 |       1.59111
     -0.00344 |       0.00000 |       0.01386 |       0.00475 |       1.57764
     -0.00386 |       0.00000 |       0.01367 |       0.00534 |       1.57985
     -0.00471 |       0.00000 |       0.01375 |       0.00508 |       1.57999
     -0.00386 |       0.00000 |       0.01368 |       0.00549 |       1.58815
     -0.00412 |       0.00000 |       0.01346 |       0.00568 |       1.59063
     -0.00478 |       0.00000 |       0.01365 |       0.00559 |       1.58143
Evaluating losses...
     -0.00470 |       0.00000 |       0.01370 |       0.00583 |       

     -0.00379 |       0.00000 |       0.01280 |       0.00505 |       1.45273
     -0.00308 |       0.00000 |       0.01253 |       0.00515 |       1.45250
Evaluating losses...
     -0.00522 |       0.00000 |       0.01259 |       0.00515 |       1.44878
-----------------------------------
| EpLenMean       | 602           |
| EpRewMean       | -4.83         |
| EpThisIter      | 8             |
| EpisodesSoFar   | 333           |
| TimeElapsed     | 393           |
| TimestepsSoFar  | 204800        |
| ev_tdlam_before | 0.85          |
| loss_ent        | 1.4487821     |
| loss_kl         | 0.005148874   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0052174525 |
| loss_vf_loss    | 0.012592102   |
-----------------------------------
********** Iteration 50 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00084 |       0.00000 |       0.02326 |       0.00216 |       1.46989
     -0.00026 |       0.00000 |   

********** Iteration 55 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00160 |       0.00000 |       0.01889 |       0.00247 |       1.42257
     -0.00096 |       0.00000 |       0.01756 |       0.00267 |       1.42763
     -0.00054 |       0.00000 |       0.01668 |       0.00298 |       1.42361
     -0.00273 |       0.00000 |       0.01666 |       0.00332 |       1.42982
     -0.00198 |       0.00000 |       0.01627 |       0.00368 |       1.42412
     -0.00132 |       0.00000 |       0.01607 |       0.00430 |       1.42391
     -0.00360 |       0.00000 |       0.01604 |       0.00444 |       1.42792
     -0.00302 |       0.00000 |       0.01578 |       0.00462 |       1.42272
     -0.00278 |       0.00000 |       0.01569 |       0.00490 |       1.42565
     -0.00381 |       0.00000 |       0.01576 |       0.00525 |       1.43704
Evaluating losses...
     -0.00588 |       0.00000 |       0.01547 |       0.00521 |       

     -0.00322 |       0.00000 |       0.01974 |       0.00440 |       1.44880
     -0.00114 |       0.00000 |       0.01964 |       0.00448 |       1.46522
Evaluating losses...
     -0.00283 |       0.00000 |       0.01945 |       0.00422 |       1.46119
-----------------------------------
| EpLenMean       | 617           |
| EpRewMean       | -4.79         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 406           |
| TimeElapsed     | 487           |
| TimestepsSoFar  | 249856        |
| ev_tdlam_before | 0.807         |
| loss_ent        | 1.4611857     |
| loss_kl         | 0.0042226287  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0028302127 |
| loss_vf_loss    | 0.019452691   |
-----------------------------------
********** Iteration 61 ************
Eval num_timesteps=249856, episode_reward=-5.00 +/- 0.00
Episode length: 630.30 +/- 103.05
New best mean reward!
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent

********** Iteration 66 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00055 |       0.00000 |       0.01740 |       0.00316 |       1.45605
     -0.00266 |       0.00000 |       0.01670 |       0.00468 |       1.45597
     -0.00310 |       0.00000 |       0.01579 |       0.00637 |       1.45728
     -0.00508 |       0.00000 |       0.01569 |       0.00615 |       1.45342
     -0.00551 |       0.00000 |       0.01551 |       0.00681 |       1.45894
     -0.00552 |       0.00000 |       0.01554 |       0.00741 |       1.45613
     -0.00469 |       0.00000 |       0.01554 |       0.00620 |       1.45077
     -0.00722 |       0.00000 |       0.01489 |       0.00697 |       1.44578
     -0.00636 |       0.00000 |       0.01500 |       0.00686 |       1.44915
     -0.00615 |       0.00000 |       0.01433 |       0.00724 |       1.44921
Evaluating losses...
     -0.00656 |       0.00000 |       0.01472 |       0.00650 |       

     -0.00338 |       0.00000 |       0.01414 |       0.00532 |       1.37746
     -0.00376 |       0.00000 |       0.01424 |       0.00607 |       1.38718
     -0.00444 |       0.00000 |       0.01390 |       0.00607 |       1.39156
Evaluating losses...
     -0.00225 |       0.00000 |       0.01407 |       0.00616 |       1.38875
-----------------------------------
| EpLenMean       | 630           |
| EpRewMean       | -4.78         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 478           |
| TimeElapsed     | 587           |
| TimestepsSoFar  | 294912        |
| ev_tdlam_before | 0.854         |
| loss_ent        | 1.388754      |
| loss_kl         | 0.006156838   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0022504632 |
| loss_vf_loss    | 0.014070863   |
-----------------------------------
********** Iteration 72 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00197 |       0.00000 |   

********** Iteration 77 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00032 |       0.00000 |       0.01765 |       0.00268 |       1.39418
      0.00047 |       0.00000 |       0.01673 |       0.00294 |       1.38078
     -0.00315 |       0.00000 |       0.01623 |       0.00370 |       1.37623
     -0.00164 |       0.00000 |       0.01594 |       0.00367 |       1.37803
     -0.00107 |       0.00000 |       0.01565 |       0.00387 |       1.37676
     -0.00288 |       0.00000 |       0.01511 |       0.00428 |       1.37871
     -0.00392 |       0.00000 |       0.01554 |       0.00462 |       1.36822
     -0.00278 |       0.00000 |       0.01522 |       0.00486 |       1.37212
     -0.00272 |       0.00000 |       0.01471 |       0.00493 |       1.36160
     -0.00458 |       0.00000 |       0.01481 |       0.00472 |       1.37100
Evaluating losses...
     -0.00446 |       0.00000 |       0.01470 |       0.00490 |       

     -0.00603 |       0.00000 |       0.01354 |       0.00631 |       1.18527
     -0.00709 |       0.00000 |       0.01305 |       0.00507 |       1.20467
     -0.00530 |       0.00000 |       0.01288 |       0.00621 |       1.19438
Evaluating losses...
     -0.00694 |       0.00000 |       0.01292 |       0.00573 |       1.19757
-----------------------------------
| EpLenMean       | 643           |
| EpRewMean       | -4.77         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 548           |
| TimeElapsed     | 680           |
| TimestepsSoFar  | 339968        |
| ev_tdlam_before | 0.862         |
| loss_ent        | 1.1975687     |
| loss_kl         | 0.0057330346  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0069440836 |
| loss_vf_loss    | 0.012918954   |
-----------------------------------
********** Iteration 83 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00040 |       0.00000 |   

********** Iteration 88 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00097 |       0.00000 |       0.01511 |       0.00239 |       1.14684
     -0.00017 |       0.00000 |       0.01367 |       0.00292 |       1.13845
     -0.00131 |       0.00000 |       0.01328 |       0.00300 |       1.13919
     -0.00236 |       0.00000 |       0.01297 |       0.00360 |       1.14133
     -0.00353 |       0.00000 |       0.01275 |       0.00409 |       1.13321
     -0.00111 |       0.00000 |       0.01255 |       0.00408 |       1.13442
     -0.00082 |       0.00000 |       0.01259 |       0.00444 |       1.12509
     -0.00200 |       0.00000 |       0.01216 |       0.00429 |       1.13409
     -0.00323 |       0.00000 |       0.01213 |       0.00471 |       1.13120
     -0.00193 |       0.00000 |       0.01221 |       0.00471 |       1.13465
Evaluating losses...
     -0.00470 |       0.00000 |       0.01202 |       0.00489 |       

     -0.00549 |       0.00000 |       0.01204 |       0.00494 |       1.11864
     -0.00521 |       0.00000 |       0.01206 |       0.00471 |       1.11914
Evaluating losses...
     -0.00626 |       0.00000 |       0.01126 |       0.00465 |       1.12281
----------------------------------
| EpLenMean       | 622          |
| EpRewMean       | -4.82        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 621          |
| TimeElapsed     | 778          |
| TimestepsSoFar  | 385024       |
| ev_tdlam_before | 0.855        |
| loss_ent        | 1.1228142    |
| loss_kl         | 0.0046473127 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.006261877 |
| loss_vf_loss    | 0.011255524  |
----------------------------------
********** Iteration 94 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00135 |       0.00000 |       0.01441 |       0.00304 |       1.10380
     -0.00033 |       0.00000 |       0.01370 | 

********** Iteration 99 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     4.47e-05 |       0.00000 |       0.01334 |       0.00192 |       1.01686
      0.00049 |       0.00000 |       0.01260 |       0.00235 |       1.01869
     -0.00077 |       0.00000 |       0.01259 |       0.00251 |       1.02553
     -0.00039 |       0.00000 |       0.01244 |       0.00295 |       1.02678
     -0.00194 |       0.00000 |       0.01219 |       0.00340 |       1.03526
     -0.00022 |       0.00000 |       0.01211 |       0.00332 |       1.02879
     -0.00196 |       0.00000 |       0.01216 |       0.00304 |       1.02028
     -0.00226 |       0.00000 |       0.01194 |       0.00397 |       1.02786
     -0.00276 |       0.00000 |       0.01159 |       0.00349 |       1.02169
     -0.00231 |       0.00000 |       0.01215 |       0.00388 |       1.02123
Evaluating losses...
     -0.00593 |       0.00000 |       0.01167 |       0.00388 |       

     -0.00249 |       0.00000 |       0.01048 |       0.00547 |       1.00462
     -0.00290 |       0.00000 |       0.00956 |       0.00498 |       1.01079
     -0.00318 |       0.00000 |       0.00992 |       0.00525 |       1.01472
Evaluating losses...
     -0.00379 |       0.00000 |       0.00988 |       0.00540 |       1.01387
-----------------------------------
| EpLenMean       | 610           |
| EpRewMean       | -4.92         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 695           |
| TimeElapsed     | 875           |
| TimestepsSoFar  | 430080        |
| ev_tdlam_before | 0.873         |
| loss_ent        | 1.0138727     |
| loss_kl         | 0.0053965235  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0037912484 |
| loss_vf_loss    | 0.009880482   |
-----------------------------------
********** Iteration 105 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00066 |       0.00000 |  

********** Iteration 110 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00137 |       0.00000 |       0.01438 |       0.00205 |       0.98606
     -0.00036 |       0.00000 |       0.01351 |       0.00223 |       0.97284
     -0.00063 |       0.00000 |       0.01340 |       0.00250 |       0.97810
     -0.00277 |       0.00000 |       0.01307 |       0.00246 |       0.97301
      0.00013 |       0.00000 |       0.01279 |       0.00291 |       0.97206
     -0.00135 |       0.00000 |       0.01278 |       0.00278 |       0.98144
     -0.00124 |       0.00000 |       0.01271 |       0.00322 |       0.98031
     -0.00173 |       0.00000 |       0.01246 |       0.00344 |       0.97429
     -0.00320 |       0.00000 |       0.01221 |       0.00355 |       0.98107
     -0.00488 |       0.00000 |       0.01226 |       0.00364 |       0.98323
Evaluating losses...
     -0.00407 |       0.00000 |       0.01213 |       0.00383 |      

     -0.00330 |       0.00000 |       0.01425 |       0.00394 |       0.91633
     -0.00314 |       0.00000 |       0.01401 |       0.00407 |       0.91153
     -0.00401 |       0.00000 |       0.01388 |       0.00423 |       0.90884
Evaluating losses...
     -0.00472 |       0.00000 |       0.01370 |       0.00438 |       0.91281
-----------------------------------
| EpLenMean       | 640           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 765           |
| TimeElapsed     | 967           |
| TimestepsSoFar  | 475136        |
| ev_tdlam_before | 0.866         |
| loss_ent        | 0.91281116    |
| loss_kl         | 0.0043798955  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0047186296 |
| loss_vf_loss    | 0.013695717   |
-----------------------------------
********** Iteration 116 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00089 |       0.00000 |  

********** Iteration 121 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00080 |       0.00000 |       0.01925 |       0.00195 |       0.84252
      0.00038 |       0.00000 |       0.01820 |       0.00221 |       0.84924
     -0.00050 |       0.00000 |       0.01764 |       0.00251 |       0.86114
     -0.00034 |       0.00000 |       0.01698 |       0.00265 |       0.85556
      0.00049 |       0.00000 |       0.01685 |       0.00262 |       0.85060
     -0.00010 |       0.00000 |       0.01601 |       0.00277 |       0.85377
     -0.00243 |       0.00000 |       0.01612 |       0.00292 |       0.85779
     -0.00137 |       0.00000 |       0.01569 |       0.00360 |       0.85341
     -0.00151 |       0.00000 |       0.01564 |       0.00307 |       0.84850
     -0.00251 |       0.00000 |       0.01526 |       0.00315 |       0.84063
Evaluating losses...
     -0.00273 |       0.00000 |       0.01509 |       0.00302 |      

     -0.00250 |       0.00000 |       0.01210 |       0.00299 |       0.78932
     -0.00273 |       0.00000 |       0.01220 |       0.00289 |       0.78571
     -0.00260 |       0.00000 |       0.01187 |       0.00295 |       0.77546
     -0.00330 |       0.00000 |       0.01177 |       0.00296 |       0.78016
Evaluating losses...
     -0.00332 |       0.00000 |       0.01160 |       0.00325 |       0.78312
-----------------------------------
| EpLenMean       | 619           |
| EpRewMean       | -4.83         |
| EpThisIter      | 8             |
| EpisodesSoFar   | 838           |
| TimeElapsed     | 1.08e+03      |
| TimestepsSoFar  | 520192        |
| ev_tdlam_before | 0.854         |
| loss_ent        | 0.78312355    |
| loss_kl         | 0.0032517735  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0033189491 |
| loss_vf_loss    | 0.011602477   |
-----------------------------------
********** Iteration 127 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 132 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00044 |       0.00000 |       0.01494 |       0.00181 |       0.77547
     8.35e-05 |       0.00000 |       0.01403 |       0.00223 |       0.75844
     -0.00147 |       0.00000 |       0.01347 |       0.00274 |       0.74924
     -0.00142 |       0.00000 |       0.01310 |       0.00274 |       0.74989
     -0.00149 |       0.00000 |       0.01296 |       0.00278 |       0.75137
     -0.00162 |       0.00000 |       0.01258 |       0.00310 |       0.74519
     -0.00250 |       0.00000 |       0.01253 |       0.00272 |       0.75581
     -0.00184 |       0.00000 |       0.01219 |       0.00311 |       0.75511
     -0.00250 |       0.00000 |       0.01211 |       0.00304 |       0.75927
     -0.00349 |       0.00000 |       0.01207 |       0.00311 |       0.75182
Evaluating losses...
     -0.00376 |       0.00000 |       0.01171 |       0.00330 |      

     -0.00107 |       0.00000 |       0.01123 |       0.00270 |       0.72372
     -0.00120 |       0.00000 |       0.01122 |       0.00279 |       0.72375
Evaluating losses...
     -0.00143 |       0.00000 |       0.01109 |       0.00266 |       0.72472
-----------------------------------
| EpLenMean       | 595           |
| EpRewMean       | -4.86         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 914           |
| TimeElapsed     | 1.17e+03      |
| TimestepsSoFar  | 565248        |
| ev_tdlam_before | 0.877         |
| loss_ent        | 0.724722      |
| loss_kl         | 0.002658327   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0014285623 |
| loss_vf_loss    | 0.0110899     |
-----------------------------------
********** Iteration 138 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00056 |       0.00000 |       0.01332 |       0.00174 |       0.72711
      0.00081 |       0.00000 |  

********** Iteration 143 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00105 |       0.00000 |       0.01242 |       0.00202 |       0.71496
      0.00015 |       0.00000 |       0.01164 |       0.00208 |       0.71646
     -0.00084 |       0.00000 |       0.01122 |       0.00229 |       0.70602
    -4.91e-05 |       0.00000 |       0.01057 |       0.00220 |       0.71395
     -0.00206 |       0.00000 |       0.01057 |       0.00239 |       0.71500
     -0.00094 |       0.00000 |       0.01031 |       0.00253 |       0.70484
     -0.00338 |       0.00000 |       0.01029 |       0.00261 |       0.70921
     -0.00093 |       0.00000 |       0.00990 |       0.00261 |       0.70901
     -0.00232 |       0.00000 |       0.00990 |       0.00275 |       0.71285
     -0.00134 |       0.00000 |       0.00991 |       0.00295 |       0.70209
Evaluating losses...
     -0.00401 |       0.00000 |       0.00959 |       0.00298 |      

     -0.00217 |       0.00000 |       0.01205 |       0.00344 |       0.66016
     -0.00257 |       0.00000 |       0.01236 |       0.00333 |       0.66603
     -0.00391 |       0.00000 |       0.01210 |       0.00314 |       0.67134
Evaluating losses...
     -0.00378 |       0.00000 |       0.01186 |       0.00316 |       0.67770
----------------------------------
| EpLenMean       | 622          |
| EpRewMean       | -4.9         |
| EpThisIter      | 8            |
| EpisodesSoFar   | 985          |
| TimeElapsed     | 1.26e+03     |
| TimestepsSoFar  | 610304       |
| ev_tdlam_before | 0.862        |
| loss_ent        | 0.6776973    |
| loss_kl         | 0.003155386  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003775988 |
| loss_vf_loss    | 0.011862356  |
----------------------------------
********** Iteration 149 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00036 |       0.00000 |       0.01349 |

********** Iteration 154 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00052 |       0.00000 |       0.01597 |       0.00197 |       0.61375
     -0.00080 |       0.00000 |       0.01509 |       0.00210 |       0.62185
      0.00017 |       0.00000 |       0.01439 |       0.00243 |       0.62838
     -0.00069 |       0.00000 |       0.01387 |       0.00241 |       0.62508
     -0.00066 |       0.00000 |       0.01361 |       0.00265 |       0.62281
     -0.00296 |       0.00000 |       0.01339 |       0.00275 |       0.61953
     -0.00316 |       0.00000 |       0.01318 |       0.00296 |       0.62090
     -0.00118 |       0.00000 |       0.01317 |       0.00283 |       0.61890
     -0.00186 |       0.00000 |       0.01270 |       0.00302 |       0.62250
     -0.00231 |       0.00000 |       0.01269 |       0.00301 |       0.62078
Evaluating losses...
     -0.00252 |       0.00000 |       0.01253 |       0.00302 |      

     -0.00196 |       0.00000 |       0.01175 |       0.00245 |       0.62130
     -0.00240 |       0.00000 |       0.01170 |       0.00244 |       0.63113
     -0.00214 |       0.00000 |       0.01157 |       0.00228 |       0.63455
Evaluating losses...
     -0.00309 |       0.00000 |       0.01137 |       0.00244 |       0.62682
----------------------------------
| EpLenMean       | 630          |
| EpRewMean       | -4.81        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 1056         |
| TimeElapsed     | 1.36e+03     |
| TimestepsSoFar  | 655360       |
| ev_tdlam_before | 0.873        |
| loss_ent        | 0.6268201    |
| loss_kl         | 0.0024428354 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.003089152 |
| loss_vf_loss    | 0.011366056  |
----------------------------------
********** Iteration 160 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00103 |       0.00000 |       0.01406 |

********** Iteration 165 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00062 |       0.00000 |       0.01703 |       0.00161 |       0.54865
     -0.00012 |       0.00000 |       0.01602 |       0.00185 |       0.55739
     -0.00080 |       0.00000 |       0.01538 |       0.00176 |       0.55077
     7.30e-05 |       0.00000 |       0.01526 |       0.00194 |       0.55464
     -0.00072 |       0.00000 |       0.01484 |       0.00232 |       0.56486
     -0.00246 |       0.00000 |       0.01452 |       0.00245 |       0.56258
     -0.00164 |       0.00000 |       0.01459 |       0.00266 |       0.56227
     -0.00256 |       0.00000 |       0.01445 |       0.00275 |       0.56891
     -0.00214 |       0.00000 |       0.01405 |       0.00285 |       0.56768
     -0.00170 |       0.00000 |       0.01391 |       0.00291 |       0.56825
Evaluating losses...
     -0.00329 |       0.00000 |       0.01375 |       0.00285 |      

     -0.00114 |       0.00000 |       0.01235 |       0.00230 |       0.52668
     -0.00185 |       0.00000 |       0.01208 |       0.00242 |       0.52247
     -0.00100 |       0.00000 |       0.01191 |       0.00265 |       0.53217
Evaluating losses...
     -0.00230 |       0.00000 |       0.01173 |       0.00247 |       0.53123
-----------------------------------
| EpLenMean       | 645           |
| EpRewMean       | -4.81         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1125          |
| TimeElapsed     | 1.44e+03      |
| TimestepsSoFar  | 700416        |
| ev_tdlam_before | 0.879         |
| loss_ent        | 0.5312348     |
| loss_kl         | 0.0024654737  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0022981353 |
| loss_vf_loss    | 0.011732877   |
-----------------------------------
********** Iteration 171 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00039 |       0.00000 |  

********** Iteration 176 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00068 |       0.00000 |       0.01622 |       0.00175 |       0.52570
     -0.00194 |       0.00000 |       0.01513 |       0.00255 |       0.53782
     -0.00069 |       0.00000 |       0.01480 |       0.00218 |       0.53391
     -0.00105 |       0.00000 |       0.01413 |       0.00215 |       0.53295
     -0.00035 |       0.00000 |       0.01395 |       0.00240 |       0.53607
     -0.00163 |       0.00000 |       0.01358 |       0.00238 |       0.53131
     -0.00249 |       0.00000 |       0.01358 |       0.00250 |       0.53188
     -0.00149 |       0.00000 |       0.01336 |       0.00244 |       0.52962
     -0.00262 |       0.00000 |       0.01313 |       0.00251 |       0.53360
     -0.00346 |       0.00000 |       0.01279 |       0.00257 |       0.52483
Evaluating losses...
     -0.00317 |       0.00000 |       0.01280 |       0.00238 |      

     -0.00106 |       0.00000 |       0.01342 |       0.00234 |       0.48149
     -0.00152 |       0.00000 |       0.01326 |       0.00234 |       0.48265
     -0.00316 |       0.00000 |       0.01274 |       0.00235 |       0.48535
Evaluating losses...
     -0.00254 |       0.00000 |       0.01268 |       0.00230 |       0.48029
----------------------------------
| EpLenMean       | 639          |
| EpRewMean       | -4.81        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 1197         |
| TimeElapsed     | 1.53e+03     |
| TimestepsSoFar  | 745472       |
| ev_tdlam_before | 0.869        |
| loss_ent        | 0.4802869    |
| loss_kl         | 0.002302913  |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002539461 |
| loss_vf_loss    | 0.012683244  |
----------------------------------
********** Iteration 182 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00044 |       0.00000 |       0.01660 |

********** Iteration 187 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00050 |       0.00000 |       0.01308 |       0.00121 |       0.47903
      0.00087 |       0.00000 |       0.01239 |       0.00162 |       0.48806
    -3.91e-05 |       0.00000 |       0.01192 |       0.00142 |       0.48465
     -0.00131 |       0.00000 |       0.01170 |       0.00184 |       0.49617
     -0.00048 |       0.00000 |       0.01146 |       0.00178 |       0.49441
     -0.00231 |       0.00000 |       0.01138 |       0.00189 |       0.49418
     -0.00171 |       0.00000 |       0.01113 |       0.00189 |       0.48964
     -0.00234 |       0.00000 |       0.01107 |       0.00203 |       0.48918
     -0.00318 |       0.00000 |       0.01103 |       0.00197 |       0.48828
     -0.00242 |       0.00000 |       0.01096 |       0.00230 |       0.48500
Evaluating losses...
     -0.00300 |       0.00000 |       0.01071 |       0.00213 |      

     -0.00180 |       0.00000 |       0.01364 |       0.00204 |       0.48463
     -0.00305 |       0.00000 |       0.01369 |       0.00215 |       0.48235
     -0.00225 |       0.00000 |       0.01337 |       0.00220 |       0.48809
Evaluating losses...
     -0.00313 |       0.00000 |       0.01348 |       0.00206 |       0.48518
-----------------------------------
| EpLenMean       | 651           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1265          |
| TimeElapsed     | 1.62e+03      |
| TimestepsSoFar  | 790528        |
| ev_tdlam_before | 0.846         |
| loss_ent        | 0.48517835    |
| loss_kl         | 0.0020607347  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0031281859 |
| loss_vf_loss    | 0.0134817995  |
-----------------------------------
********** Iteration 193 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00084 |       0.00000 |  

********** Iteration 198 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00115 |       0.00000 |       0.01362 |       0.00147 |       0.40607
     -0.00015 |       0.00000 |       0.01277 |       0.00170 |       0.41545
      0.00011 |       0.00000 |       0.01263 |       0.00168 |       0.41186
      0.00068 |       0.00000 |       0.01234 |       0.00215 |       0.40567
      0.00011 |       0.00000 |       0.01229 |       0.00215 |       0.41504
     -0.00094 |       0.00000 |       0.01190 |       0.00189 |       0.41741
     -0.00103 |       0.00000 |       0.01166 |       0.00210 |       0.41841
     -0.00080 |       0.00000 |       0.01168 |       0.00225 |       0.42135
     -0.00220 |       0.00000 |       0.01145 |       0.00213 |       0.41152
     -0.00038 |       0.00000 |       0.01155 |       0.00225 |       0.40989
Evaluating losses...
     -0.00249 |       0.00000 |       0.01129 |       0.00221 |      

     -0.00250 |       0.00000 |       0.01351 |       0.00228 |       0.40769
     -0.00282 |       0.00000 |       0.01317 |       0.00243 |       0.40745
     -0.00231 |       0.00000 |       0.01311 |       0.00257 |       0.40968
Evaluating losses...
     -0.00212 |       0.00000 |       0.01305 |       0.00251 |       0.41440
-----------------------------------
| EpLenMean       | 670           |
| EpRewMean       | -4.86         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1332          |
| TimeElapsed     | 1.7e+03       |
| TimestepsSoFar  | 835584        |
| ev_tdlam_before | 0.831         |
| loss_ent        | 0.41439882    |
| loss_kl         | 0.0025105807  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0021220949 |
| loss_vf_loss    | 0.013048327   |
-----------------------------------
********** Iteration 204 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |  

********** Iteration 209 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00033 |       0.00000 |       0.01745 |       0.00166 |       0.39825
      0.00093 |       0.00000 |       0.01588 |       0.00181 |       0.40412
     -0.00054 |       0.00000 |       0.01533 |       0.00164 |       0.39559
     -0.00013 |       0.00000 |       0.01475 |       0.00163 |       0.39741
     -0.00047 |       0.00000 |       0.01451 |       0.00175 |       0.39739
     -0.00209 |       0.00000 |       0.01413 |       0.00189 |       0.40107
     -0.00129 |       0.00000 |       0.01399 |       0.00201 |       0.39716
     -0.00139 |       0.00000 |       0.01382 |       0.00196 |       0.39433
     -0.00032 |       0.00000 |       0.01348 |       0.00208 |       0.38989
     -0.00223 |       0.00000 |       0.01322 |       0.00200 |       0.39281
Evaluating losses...
     -0.00196 |       0.00000 |       0.01305 |       0.00200 |      

     -0.00152 |       0.00000 |       0.01390 |       0.00242 |       0.33117
     -0.00279 |       0.00000 |       0.01362 |       0.00234 |       0.33089
     -0.00256 |       0.00000 |       0.01346 |       0.00228 |       0.32939
Evaluating losses...
     -0.00197 |       0.00000 |       0.01353 |       0.00264 |       0.32957
-----------------------------------
| EpLenMean       | 678           |
| EpRewMean       | -4.84         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1398          |
| TimeElapsed     | 1.78e+03      |
| TimestepsSoFar  | 880640        |
| ev_tdlam_before | 0.864         |
| loss_ent        | 0.32957184    |
| loss_kl         | 0.0026361542  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0019689943 |
| loss_vf_loss    | 0.013529919   |
-----------------------------------
********** Iteration 215 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00095 |       0.00000 |  

********** Iteration 220 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00125 |       0.00000 |       0.01845 |       0.00120 |       0.28024
     -0.00061 |       0.00000 |       0.01696 |       0.00133 |       0.27695
     -0.00123 |       0.00000 |       0.01645 |       0.00165 |       0.27712
     -0.00154 |       0.00000 |       0.01595 |       0.00153 |       0.27673
     -0.00167 |       0.00000 |       0.01538 |       0.00174 |       0.27314
     -0.00234 |       0.00000 |       0.01508 |       0.00166 |       0.27328
     -0.00226 |       0.00000 |       0.01457 |       0.00181 |       0.27364
     -0.00258 |       0.00000 |       0.01455 |       0.00188 |       0.27496
     -0.00243 |       0.00000 |       0.01411 |       0.00186 |       0.27652
     -0.00288 |       0.00000 |       0.01398 |       0.00191 |       0.27570
Evaluating losses...
     -0.00209 |       0.00000 |       0.01400 |       0.00183 |      

     -0.00181 |       0.00000 |       0.01062 |       0.00167 |       0.27145
     -0.00125 |       0.00000 |       0.01045 |       0.00169 |       0.27111
     -0.00162 |       0.00000 |       0.01020 |       0.00187 |       0.27068
Evaluating losses...
     -0.00123 |       0.00000 |       0.01032 |       0.00192 |       0.27169
-----------------------------------
| EpLenMean       | 669           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1466          |
| TimeElapsed     | 1.87e+03      |
| TimestepsSoFar  | 925696        |
| ev_tdlam_before | 0.889         |
| loss_ent        | 0.2716937     |
| loss_kl         | 0.0019176039  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012304444 |
| loss_vf_loss    | 0.010319784   |
-----------------------------------
********** Iteration 226 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00125 |       0.00000 |  

********** Iteration 231 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00075 |       0.00000 |       0.01876 |       0.00117 |       0.24849
     1.57e-05 |       0.00000 |       0.01791 |       0.00141 |       0.24690
     -0.00161 |       0.00000 |       0.01732 |       0.00151 |       0.24674
     -0.00193 |       0.00000 |       0.01690 |       0.00152 |       0.24263
     -0.00046 |       0.00000 |       0.01679 |       0.00180 |       0.24039
     -0.00187 |       0.00000 |       0.01671 |       0.00205 |       0.23890
     -0.00078 |       0.00000 |       0.01630 |       0.00230 |       0.23861
     -0.00226 |       0.00000 |       0.01629 |       0.00243 |       0.24000
     -0.00204 |       0.00000 |       0.01618 |       0.00230 |       0.23905
     -0.00261 |       0.00000 |       0.01565 |       0.00222 |       0.23956
Evaluating losses...
     -0.00197 |       0.00000 |       0.01574 |       0.00222 |      

     -0.00074 |       0.00000 |       0.01306 |       0.00158 |       0.25891
     -0.00108 |       0.00000 |       0.01280 |       0.00175 |       0.26180
     -0.00056 |       0.00000 |       0.01259 |       0.00176 |       0.26059
Evaluating losses...
     -0.00154 |       0.00000 |       0.01276 |       0.00163 |       0.25862
-----------------------------------
| EpLenMean       | 687           |
| EpRewMean       | -4.77         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1531          |
| TimeElapsed     | 1.95e+03      |
| TimestepsSoFar  | 970752        |
| ev_tdlam_before | 0.865         |
| loss_ent        | 0.258623      |
| loss_kl         | 0.0016303916  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0015438027 |
| loss_vf_loss    | 0.012757241   |
-----------------------------------
********** Iteration 237 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00090 |       0.00000 |  

********** Iteration 242 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00162 |       0.00000 |       0.01716 |       0.00135 |       0.25004
     -0.00061 |       0.00000 |       0.01606 |       0.00131 |       0.24710
      0.00060 |       0.00000 |       0.01573 |       0.00149 |       0.24665
     -0.00119 |       0.00000 |       0.01531 |       0.00165 |       0.24400
     -0.00062 |       0.00000 |       0.01507 |       0.00173 |       0.24412
     -0.00085 |       0.00000 |       0.01465 |       0.00203 |       0.24323
     -0.00075 |       0.00000 |       0.01449 |       0.00166 |       0.24647
      0.00021 |       0.00000 |       0.01414 |       0.00151 |       0.24808
     -0.00210 |       0.00000 |       0.01407 |       0.00179 |       0.24477
     -0.00168 |       0.00000 |       0.01419 |       0.00192 |       0.24281
Evaluating losses...
     -0.00238 |       0.00000 |       0.01376 |       0.00168 |      

     -0.00028 |       0.00000 |       0.01495 |       0.00145 |       0.25007
      0.00035 |       0.00000 |       0.01465 |       0.00143 |       0.24875
     -0.00166 |       0.00000 |       0.01458 |       0.00153 |       0.25070
     -0.00027 |       0.00000 |       0.01428 |       0.00161 |       0.25113
     -0.00280 |       0.00000 |       0.01411 |       0.00169 |       0.25009
Evaluating losses...
     -0.00112 |       0.00000 |       0.01391 |       0.00152 |       0.24942
-----------------------------------
| EpLenMean       | 654           |
| EpRewMean       | -4.85         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1601          |
| TimeElapsed     | 2.04e+03      |
| TimestepsSoFar  | 1015808       |
| ev_tdlam_before | 0.85          |
| loss_ent        | 0.24942113    |
| loss_kl         | 0.001520328   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011176245 |
| loss_vf_loss    | 0.013913788   |
-----------------------------------
*******

********** Iteration 253 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.01409 |       0.00132 |       0.27212
     -0.00016 |       0.00000 |       0.01333 |       0.00139 |       0.27271
      0.00056 |       0.00000 |       0.01275 |       0.00170 |       0.26981
     -0.00062 |       0.00000 |       0.01237 |       0.00171 |       0.26730
     -0.00011 |       0.00000 |       0.01224 |       0.00176 |       0.26935
     -0.00104 |       0.00000 |       0.01212 |       0.00163 |       0.26887
     -0.00177 |       0.00000 |       0.01205 |       0.00171 |       0.27051
     -0.00031 |       0.00000 |       0.01172 |       0.00178 |       0.27040
     -0.00036 |       0.00000 |       0.01187 |       0.00180 |       0.26951
    -5.02e-05 |       0.00000 |       0.01190 |       0.00211 |       0.26820
Evaluating losses...
     -0.00068 |       0.00000 |       0.01135 |       0.00179 |      

     -0.00033 |       0.00000 |       0.01469 |       0.00187 |       0.28189
     -0.00198 |       0.00000 |       0.01436 |       0.00198 |       0.28137
     -0.00101 |       0.00000 |       0.01440 |       0.00200 |       0.28135
Evaluating losses...
     -0.00262 |       0.00000 |       0.01393 |       0.00192 |       0.27989
----------------------------------
| EpLenMean       | 674          |
| EpRewMean       | -4.81        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 1667         |
| TimeElapsed     | 2.11e+03     |
| TimestepsSoFar  | 1060864      |
| ev_tdlam_before | 0.861        |
| loss_ent        | 0.27988824   |
| loss_kl         | 0.0019189971 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002622656 |
| loss_vf_loss    | 0.01392527   |
----------------------------------
********** Iteration 259 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00161 |       0.00000 |       0.01392 |

********** Iteration 264 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00032 |       0.00000 |       0.01642 |       0.00123 |       0.26997
     -0.00046 |       0.00000 |       0.01455 |       0.00135 |       0.27383
     -0.00086 |       0.00000 |       0.01366 |       0.00135 |       0.27436
     -0.00092 |       0.00000 |       0.01324 |       0.00144 |       0.27347
     -0.00053 |       0.00000 |       0.01295 |       0.00161 |       0.27466
     -0.00104 |       0.00000 |       0.01255 |       0.00158 |       0.27410
     -0.00035 |       0.00000 |       0.01249 |       0.00181 |       0.27611
     -0.00109 |       0.00000 |       0.01234 |       0.00198 |       0.27665
     -0.00091 |       0.00000 |       0.01226 |       0.00200 |       0.27759
     -0.00163 |       0.00000 |       0.01206 |       0.00197 |       0.27683
Evaluating losses...
     -0.00228 |       0.00000 |       0.01168 |       0.00193 |      

     -0.00060 |       0.00000 |       0.01135 |       0.00152 |       0.25623
     -0.00172 |       0.00000 |       0.01108 |       0.00158 |       0.25518
     -0.00114 |       0.00000 |       0.01087 |       0.00149 |       0.25424
Evaluating losses...
     -0.00165 |       0.00000 |       0.01082 |       0.00144 |       0.25501
-----------------------------------
| EpLenMean       | 676           |
| EpRewMean       | -4.76         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1735          |
| TimeElapsed     | 2.18e+03      |
| TimestepsSoFar  | 1105920       |
| ev_tdlam_before | 0.869         |
| loss_ent        | 0.2550118     |
| loss_kl         | 0.001444487   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0016451941 |
| loss_vf_loss    | 0.010824587   |
-----------------------------------
********** Iteration 270 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00041 |       0.00000 |  

********** Iteration 275 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00059 |       0.00000 |       0.01681 |       0.00126 |       0.22969
     -0.00065 |       0.00000 |       0.01582 |       0.00135 |       0.23092
     -0.00160 |       0.00000 |       0.01523 |       0.00132 |       0.23407
     -0.00119 |       0.00000 |       0.01477 |       0.00148 |       0.23240
     -0.00149 |       0.00000 |       0.01463 |       0.00132 |       0.23090
     -0.00124 |       0.00000 |       0.01395 |       0.00170 |       0.23084
     -0.00116 |       0.00000 |       0.01382 |       0.00145 |       0.23100
     -0.00093 |       0.00000 |       0.01359 |       0.00172 |       0.23140
     -0.00209 |       0.00000 |       0.01362 |       0.00204 |       0.23125
     -0.00086 |       0.00000 |       0.01328 |       0.00165 |       0.22908
Evaluating losses...
     -0.00260 |       0.00000 |       0.01309 |       0.00179 |      

      0.00035 |       0.00000 |       0.01335 |       0.00150 |       0.25567
     -0.00078 |       0.00000 |       0.01332 |       0.00161 |       0.25577
      0.00064 |       0.00000 |       0.01291 |       0.00161 |       0.25651
Evaluating losses...
     -0.00092 |       0.00000 |       0.01288 |       0.00156 |       0.25566
-----------------------------------
| EpLenMean       | 669           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1802          |
| TimeElapsed     | 2.25e+03      |
| TimestepsSoFar  | 1150976       |
| ev_tdlam_before | 0.857         |
| loss_ent        | 0.25566226    |
| loss_kl         | 0.0015593364  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009220876 |
| loss_vf_loss    | 0.012875417   |
-----------------------------------
********** Iteration 281 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00100 |       0.00000 |  

********** Iteration 286 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00110 |       0.00000 |       0.01260 |       0.00119 |       0.25511
      0.00139 |       0.00000 |       0.01173 |       0.00137 |       0.25374
      0.00067 |       0.00000 |       0.01140 |       0.00160 |       0.25439
     -0.00058 |       0.00000 |       0.01100 |       0.00155 |       0.25362
      0.00063 |       0.00000 |       0.01069 |       0.00139 |       0.25420
      0.00048 |       0.00000 |       0.01044 |       0.00144 |       0.25271
     -0.00149 |       0.00000 |       0.01029 |       0.00149 |       0.25190
     -0.00045 |       0.00000 |       0.01042 |       0.00150 |       0.25505
     -0.00136 |       0.00000 |       0.01015 |       0.00167 |       0.25456
     -0.00188 |       0.00000 |       0.00993 |       0.00160 |       0.25475
Evaluating losses...
     -0.00145 |       0.00000 |       0.00996 |       0.00152 |      

     -0.00194 |       0.00000 |       0.01433 |       0.00142 |       0.20357
     -0.00208 |       0.00000 |       0.01389 |       0.00149 |       0.20426
     -0.00127 |       0.00000 |       0.01367 |       0.00152 |       0.20227
Evaluating losses...
     -0.00185 |       0.00000 |       0.01352 |       0.00148 |       0.20262
-----------------------------------
| EpLenMean       | 690           |
| EpRewMean       | -4.83         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 1869          |
| TimeElapsed     | 2.34e+03      |
| TimestepsSoFar  | 1196032       |
| ev_tdlam_before | 0.861         |
| loss_ent        | 0.20261891    |
| loss_kl         | 0.0014844706  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0018473028 |
| loss_vf_loss    | 0.013517784   |
-----------------------------------
********** Iteration 292 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00193 |       0.00000 |  

********** Iteration 297 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00088 |       0.00000 |       0.01252 |       0.00085 |       0.19818
      0.00055 |       0.00000 |       0.01172 |       0.00100 |       0.19566
      0.00044 |       0.00000 |       0.01150 |       0.00115 |       0.19492
      0.00026 |       0.00000 |       0.01125 |       0.00120 |       0.19545
     -0.00041 |       0.00000 |       0.01104 |       0.00119 |       0.19572
      0.00026 |       0.00000 |       0.01070 |       0.00118 |       0.19606
     -0.00139 |       0.00000 |       0.01064 |       0.00120 |       0.19590
     -0.00102 |       0.00000 |       0.01051 |       0.00112 |       0.19749
     -0.00090 |       0.00000 |       0.01051 |       0.00124 |       0.19533
      0.00013 |       0.00000 |       0.01047 |       0.00114 |       0.19473
Evaluating losses...
     -0.00065 |       0.00000 |       0.01044 |       0.00125 |      

     -0.00245 |       0.00000 |       0.01154 |       0.00215 |       0.19676
     -0.00252 |       0.00000 |       0.01137 |       0.00214 |       0.19624
     -0.00346 |       0.00000 |       0.01118 |       0.00206 |       0.19741
Evaluating losses...
     -0.00348 |       0.00000 |       0.01118 |       0.00201 |       0.19736
-----------------------------------
| EpLenMean       | 668           |
| EpRewMean       | -4.72         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 1934          |
| TimeElapsed     | 2.42e+03      |
| TimestepsSoFar  | 1241088       |
| ev_tdlam_before | 0.872         |
| loss_ent        | 0.1973588     |
| loss_kl         | 0.002013463   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0034786002 |
| loss_vf_loss    | 0.011177938   |
-----------------------------------
********** Iteration 303 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00116 |       0.00000 |  

********** Iteration 308 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00022 |       0.00000 |       0.01228 |       0.00101 |       0.18030
     -0.00011 |       0.00000 |       0.01149 |       0.00105 |       0.18205
     -0.00080 |       0.00000 |       0.01116 |       0.00112 |       0.18025
     -0.00082 |       0.00000 |       0.01093 |       0.00118 |       0.18136
      0.00053 |       0.00000 |       0.01082 |       0.00133 |       0.18033
     -0.00173 |       0.00000 |       0.01039 |       0.00140 |       0.18246
     -0.00116 |       0.00000 |       0.01033 |       0.00139 |       0.18080
     -0.00141 |       0.00000 |       0.01023 |       0.00138 |       0.17870
     -0.00076 |       0.00000 |       0.01018 |       0.00137 |       0.17839
     -0.00049 |       0.00000 |       0.00988 |       0.00141 |       0.17994
Evaluating losses...
     -0.00231 |       0.00000 |       0.00977 |       0.00139 |      

     -0.00032 |       0.00000 |       0.01086 |       0.00111 |       0.18275
     9.15e-05 |       0.00000 |       0.01063 |       0.00099 |       0.18228
      0.00037 |       0.00000 |       0.01058 |       0.00114 |       0.18254
Evaluating losses...
     -0.00114 |       0.00000 |       0.01040 |       0.00126 |       0.18284
-----------------------------------
| EpLenMean       | 664           |
| EpRewMean       | -4.85         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2002          |
| TimeElapsed     | 2.52e+03      |
| TimestepsSoFar  | 1286144       |
| ev_tdlam_before | 0.867         |
| loss_ent        | 0.18284376    |
| loss_kl         | 0.0012599423  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011374073 |
| loss_vf_loss    | 0.010395747   |
-----------------------------------
********** Iteration 314 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00037 |       0.00000 |  

********** Iteration 319 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |       0.01224 |       0.00082 |       0.17435
     -0.00136 |       0.00000 |       0.01174 |       0.00084 |       0.17502
     -0.00111 |       0.00000 |       0.01129 |       0.00095 |       0.17601
     -0.00027 |       0.00000 |       0.01086 |       0.00097 |       0.17393
     -0.00107 |       0.00000 |       0.01077 |       0.00095 |       0.17470
     -0.00075 |       0.00000 |       0.01071 |       0.00100 |       0.17528
     -0.00020 |       0.00000 |       0.01057 |       0.00113 |       0.17555
     -0.00151 |       0.00000 |       0.01040 |       0.00121 |       0.17679
     -0.00122 |       0.00000 |       0.01026 |       0.00121 |       0.17826
     -0.00141 |       0.00000 |       0.01022 |       0.00124 |       0.17584
Evaluating losses...
     -0.00169 |       0.00000 |       0.01021 |       0.00127 |      

     -0.00109 |       0.00000 |       0.01129 |       0.00137 |       0.20417
     -0.00046 |       0.00000 |       0.01103 |       0.00133 |       0.20317
     -0.00098 |       0.00000 |       0.01099 |       0.00140 |       0.20443
Evaluating losses...
     -0.00179 |       0.00000 |       0.01078 |       0.00150 |       0.20395
----------------------------------
| EpLenMean       | 677          |
| EpRewMean       | -4.83        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2068         |
| TimeElapsed     | 2.61e+03     |
| TimestepsSoFar  | 1331200      |
| ev_tdlam_before | 0.887        |
| loss_ent        | 0.2039518    |
| loss_kl         | 0.0014967871 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.001786385 |
| loss_vf_loss    | 0.010775999  |
----------------------------------
********** Iteration 325 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00205 |       0.00000 |       0.01412 |

********** Iteration 330 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00098 |       0.00000 |       0.01420 |       0.00102 |       0.21112
      0.00052 |       0.00000 |       0.01319 |       0.00117 |       0.21299
     -0.00086 |       0.00000 |       0.01278 |       0.00107 |       0.21365
     -0.00127 |       0.00000 |       0.01262 |       0.00111 |       0.21206
     4.91e-05 |       0.00000 |       0.01235 |       0.00105 |       0.21248
     -0.00191 |       0.00000 |       0.01212 |       0.00120 |       0.21156
     -0.00058 |       0.00000 |       0.01215 |       0.00126 |       0.21217
     -0.00121 |       0.00000 |       0.01201 |       0.00128 |       0.21312
     -0.00086 |       0.00000 |       0.01193 |       0.00140 |       0.21065
     -0.00193 |       0.00000 |       0.01179 |       0.00121 |       0.20922
Evaluating losses...
     -0.00177 |       0.00000 |       0.01163 |       0.00128 |      

     -0.00099 |       0.00000 |       0.01277 |       0.00116 |       0.18360
     -0.00070 |       0.00000 |       0.01256 |       0.00113 |       0.18256
     -0.00128 |       0.00000 |       0.01237 |       0.00125 |       0.18312
Evaluating losses...
     -0.00266 |       0.00000 |       0.01227 |       0.00115 |       0.18351
----------------------------------
| EpLenMean       | 719          |
| EpRewMean       | -4.72        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2131         |
| TimeElapsed     | 2.69e+03     |
| TimestepsSoFar  | 1376256      |
| ev_tdlam_before | 0.879        |
| loss_ent        | 0.18351309   |
| loss_kl         | 0.0011473558 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.00266114  |
| loss_vf_loss    | 0.01226513   |
----------------------------------
********** Iteration 336 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.01795 |

********** Iteration 341 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00019 |       0.00000 |       0.02016 |       0.00096 |       0.18136
     4.99e-05 |       0.00000 |       0.01844 |       0.00091 |       0.17836
     -0.00102 |       0.00000 |       0.01738 |       0.00096 |       0.18034
     -0.00022 |       0.00000 |       0.01671 |       0.00101 |       0.18033
     -0.00154 |       0.00000 |       0.01665 |       0.00104 |       0.17951
      0.00012 |       0.00000 |       0.01603 |       0.00107 |       0.17926
     -0.00173 |       0.00000 |       0.01581 |       0.00104 |       0.18078
     -0.00144 |       0.00000 |       0.01560 |       0.00121 |       0.18054
     -0.00151 |       0.00000 |       0.01564 |       0.00128 |       0.17928
     -0.00232 |       0.00000 |       0.01516 |       0.00128 |       0.18064
Evaluating losses...
     -0.00193 |       0.00000 |       0.01513 |       0.00124 |      

     -0.00159 |       0.00000 |       0.01343 |       0.00254 |       0.21264
     -0.00181 |       0.00000 |       0.01299 |       0.00201 |       0.21152
     -0.00236 |       0.00000 |       0.01319 |       0.00238 |       0.21019
Evaluating losses...
     -0.00251 |       0.00000 |       0.01290 |       0.00214 |       0.21026
-----------------------------------
| EpLenMean       | 687           |
| EpRewMean       | -4.74         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2197          |
| TimeElapsed     | 2.79e+03      |
| TimestepsSoFar  | 1421312       |
| ev_tdlam_before | 0.848         |
| loss_ent        | 0.21025963    |
| loss_kl         | 0.002136074   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0025125248 |
| loss_vf_loss    | 0.012899701   |
-----------------------------------
********** Iteration 347 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00078 |       0.00000 |  

********** Iteration 352 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00174 |       0.00000 |       0.01449 |       0.00113 |       0.17762
     -0.00067 |       0.00000 |       0.01296 |       0.00121 |       0.17561
    -8.85e-05 |       0.00000 |       0.01242 |       0.00132 |       0.17413
     -0.00114 |       0.00000 |       0.01208 |       0.00176 |       0.17336
     -0.00114 |       0.00000 |       0.01182 |       0.00145 |       0.17448
     -0.00038 |       0.00000 |       0.01170 |       0.00158 |       0.17382
     -0.00180 |       0.00000 |       0.01143 |       0.00152 |       0.17441
     -0.00251 |       0.00000 |       0.01128 |       0.00151 |       0.17512
     -0.00165 |       0.00000 |       0.01101 |       0.00167 |       0.17440
     -0.00164 |       0.00000 |       0.01090 |       0.00149 |       0.17584
Evaluating losses...
     -0.00214 |       0.00000 |       0.01077 |       0.00156 |      

     -0.00077 |       0.00000 |       0.01207 |       0.00116 |       0.16775
     -0.00088 |       0.00000 |       0.01207 |       0.00140 |       0.16719
     -0.00197 |       0.00000 |       0.01199 |       0.00120 |       0.16724
Evaluating losses...
     -0.00070 |       0.00000 |       0.01152 |       0.00125 |       0.16684
-----------------------------------
| EpLenMean       | 681           |
| EpRewMean       | -4.79         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 2262          |
| TimeElapsed     | 2.87e+03      |
| TimestepsSoFar  | 1466368       |
| ev_tdlam_before | 0.865         |
| loss_ent        | 0.16684417    |
| loss_kl         | 0.0012470173  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0006978087 |
| loss_vf_loss    | 0.011520282   |
-----------------------------------
********** Iteration 358 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00048 |       0.00000 |  

********** Iteration 363 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00189 |       0.00000 |       0.01930 |       0.00080 |       0.19402
    -7.04e-05 |       0.00000 |       0.01784 |       0.00092 |       0.19289
      0.00046 |       0.00000 |       0.01691 |       0.00092 |       0.19362
     -0.00048 |       0.00000 |       0.01614 |       0.00107 |       0.19354
     -0.00085 |       0.00000 |       0.01557 |       0.00112 |       0.19480
     -0.00136 |       0.00000 |       0.01538 |       0.00118 |       0.19531
     -0.00151 |       0.00000 |       0.01509 |       0.00135 |       0.19335
     -0.00018 |       0.00000 |       0.01491 |       0.00133 |       0.19395
     -0.00149 |       0.00000 |       0.01471 |       0.00137 |       0.19453
     -0.00074 |       0.00000 |       0.01433 |       0.00165 |       0.19357
Evaluating losses...
     -0.00141 |       0.00000 |       0.01407 |       0.00136 |      

     -0.00162 |       0.00000 |       0.01218 |       0.00117 |       0.19294
     -0.00033 |       0.00000 |       0.01181 |       0.00142 |       0.19241
     -0.00202 |       0.00000 |       0.01192 |       0.00144 |       0.19266
     -0.00250 |       0.00000 |       0.01180 |       0.00150 |       0.19426
Evaluating losses...
     -0.00183 |       0.00000 |       0.01143 |       0.00137 |       0.19461
-----------------------------------
| EpLenMean       | 684           |
| EpRewMean       | -4.8          |
| EpThisIter      | 5             |
| EpisodesSoFar   | 2329          |
| TimeElapsed     | 2.96e+03      |
| TimestepsSoFar  | 1511424       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 0.19460517    |
| loss_kl         | 0.0013715562  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0018320079 |
| loss_vf_loss    | 0.0114274835  |
-----------------------------------
********** Iteration 369 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 374 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     9.63e-05 |       0.00000 |       0.01286 |       0.00080 |       0.17422
     -0.00081 |       0.00000 |       0.01219 |       0.00108 |       0.17857
     -0.00043 |       0.00000 |       0.01190 |       0.00096 |       0.17699
     -0.00036 |       0.00000 |       0.01166 |       0.00115 |       0.17776
     -0.00057 |       0.00000 |       0.01148 |       0.00106 |       0.17719
     -0.00168 |       0.00000 |       0.01119 |       0.00112 |       0.17629
     -0.00150 |       0.00000 |       0.01121 |       0.00110 |       0.17714
     -0.00189 |       0.00000 |       0.01119 |       0.00124 |       0.17686
     -0.00084 |       0.00000 |       0.01086 |       0.00114 |       0.17600
     -0.00062 |       0.00000 |       0.01094 |       0.00136 |       0.17887
Evaluating losses...
     -0.00250 |       0.00000 |       0.01072 |       0.00146 |      

     -0.00150 |       0.00000 |       0.01143 |       0.00125 |       0.18075
     -0.00142 |       0.00000 |       0.01121 |       0.00124 |       0.18107
     -0.00202 |       0.00000 |       0.01128 |       0.00146 |       0.18112
Evaluating losses...
     -0.00200 |       0.00000 |       0.01111 |       0.00152 |       0.17941
----------------------------------
| EpLenMean       | 700          |
| EpRewMean       | -4.8         |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2393         |
| TimeElapsed     | 3.05e+03     |
| TimestepsSoFar  | 1556480      |
| ev_tdlam_before | 0.872        |
| loss_ent        | 0.17941384   |
| loss_kl         | 0.0015208054 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.002000825 |
| loss_vf_loss    | 0.011109149  |
----------------------------------
********** Iteration 380 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00105 |       0.00000 |       0.01823 |

********** Iteration 385 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00158 |       0.00000 |       0.01323 |       0.00088 |       0.18243
      0.00067 |       0.00000 |       0.01245 |       0.00097 |       0.18093
      0.00067 |       0.00000 |       0.01213 |       0.00097 |       0.17939
      0.00027 |       0.00000 |       0.01178 |       0.00087 |       0.18049
     -0.00131 |       0.00000 |       0.01145 |       0.00093 |       0.17948
     -0.00012 |       0.00000 |       0.01126 |       0.00100 |       0.17896
     -0.00096 |       0.00000 |       0.01120 |       0.00105 |       0.17790
     -0.00106 |       0.00000 |       0.01110 |       0.00100 |       0.17831
     -0.00050 |       0.00000 |       0.01099 |       0.00118 |       0.17627
     -0.00095 |       0.00000 |       0.01074 |       0.00120 |       0.17782
Evaluating losses...
     -0.00132 |       0.00000 |       0.01071 |       0.00118 |      

      0.00113 |       0.00000 |       0.01169 |       0.00102 |       0.18387
     -0.00034 |       0.00000 |       0.01153 |       0.00095 |       0.18322
     -0.00021 |       0.00000 |       0.01145 |       0.00110 |       0.18181
Evaluating losses...
     -0.00079 |       0.00000 |       0.01124 |       0.00117 |       0.18206
-----------------------------------
| EpLenMean       | 666           |
| EpRewMean       | -4.89         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2461          |
| TimeElapsed     | 3.14e+03      |
| TimestepsSoFar  | 1601536       |
| ev_tdlam_before | 0.886         |
| loss_ent        | 0.18205819    |
| loss_kl         | 0.0011722365  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0007889954 |
| loss_vf_loss    | 0.011242621   |
-----------------------------------
********** Iteration 391 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00087 |       0.00000 |  

********** Iteration 396 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00037 |       0.00000 |       0.01320 |       0.00068 |       0.17262
     -0.00054 |       0.00000 |       0.01249 |       0.00090 |       0.17475
     3.62e-06 |       0.00000 |       0.01215 |       0.00111 |       0.17841
     -0.00119 |       0.00000 |       0.01197 |       0.00114 |       0.17878
     -0.00028 |       0.00000 |       0.01171 |       0.00123 |       0.17844
     -0.00097 |       0.00000 |       0.01167 |       0.00134 |       0.17855
     -0.00049 |       0.00000 |       0.01138 |       0.00142 |       0.17904
     -0.00146 |       0.00000 |       0.01128 |       0.00125 |       0.17850
     -0.00089 |       0.00000 |       0.01135 |       0.00155 |       0.18014
     -0.00131 |       0.00000 |       0.01117 |       0.00157 |       0.18001
Evaluating losses...
     -0.00170 |       0.00000 |       0.01104 |       0.00159 |      

     -0.00148 |       0.00000 |       0.01296 |       0.00133 |       0.15688
     -0.00255 |       0.00000 |       0.01267 |       0.00127 |       0.15644
     -0.00186 |       0.00000 |       0.01267 |       0.00128 |       0.15785
Evaluating losses...
     -0.00201 |       0.00000 |       0.01223 |       0.00142 |       0.15691
-----------------------------------
| EpLenMean       | 645           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2530          |
| TimeElapsed     | 3.21e+03      |
| TimestepsSoFar  | 1646592       |
| ev_tdlam_before | 0.87          |
| loss_ent        | 0.15691333    |
| loss_kl         | 0.0014156642  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0020111192 |
| loss_vf_loss    | 0.012234911   |
-----------------------------------
********** Iteration 402 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00034 |       0.00000 |  

********** Iteration 407 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00106 |       0.00000 |       0.01294 |       0.00087 |       0.17144
     -0.00055 |       0.00000 |       0.01177 |       0.00088 |       0.17194
     -0.00163 |       0.00000 |       0.01124 |       0.00097 |       0.17203
     -0.00031 |       0.00000 |       0.01091 |       0.00104 |       0.17221
     -0.00208 |       0.00000 |       0.01054 |       0.00128 |       0.17227
     -0.00109 |       0.00000 |       0.01049 |       0.00138 |       0.17369
     -0.00207 |       0.00000 |       0.01026 |       0.00137 |       0.17388
     -0.00266 |       0.00000 |       0.01007 |       0.00151 |       0.17231
     -0.00219 |       0.00000 |       0.00997 |       0.00150 |       0.17288
     -0.00137 |       0.00000 |       0.00987 |       0.00173 |       0.17152
Evaluating losses...
     -0.00340 |       0.00000 |       0.00974 |       0.00185 |      

     -0.00208 |       0.00000 |       0.01195 |       0.00094 |       0.14266
     -0.00068 |       0.00000 |       0.01165 |       0.00095 |       0.14252
     -0.00180 |       0.00000 |       0.01132 |       0.00105 |       0.14239
Evaluating losses...
     -0.00142 |       0.00000 |       0.01138 |       0.00100 |       0.14209
-----------------------------------
| EpLenMean       | 683           |
| EpRewMean       | -4.79         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 2595          |
| TimeElapsed     | 3.28e+03      |
| TimestepsSoFar  | 1691648       |
| ev_tdlam_before | 0.883         |
| loss_ent        | 0.14208874    |
| loss_kl         | 0.000998962   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0014188441 |
| loss_vf_loss    | 0.011380831   |
-----------------------------------
********** Iteration 413 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00033 |       0.00000 |  

********** Iteration 418 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00015 |       0.00000 |       0.01622 |       0.00065 |       0.15416
     -0.00010 |       0.00000 |       0.01537 |       0.00073 |       0.15379
     6.70e-05 |       0.00000 |       0.01480 |       0.00067 |       0.15324
    -1.44e-06 |       0.00000 |       0.01451 |       0.00071 |       0.15231
     -0.00010 |       0.00000 |       0.01421 |       0.00082 |       0.15171
     -0.00092 |       0.00000 |       0.01390 |       0.00077 |       0.15250
     -0.00064 |       0.00000 |       0.01363 |       0.00073 |       0.15288
     -0.00084 |       0.00000 |       0.01352 |       0.00079 |       0.15288
     -0.00090 |       0.00000 |       0.01316 |       0.00077 |       0.15196
     -0.00108 |       0.00000 |       0.01323 |       0.00084 |       0.15136
Evaluating losses...
     -0.00135 |       0.00000 |       0.01300 |       0.00080 |      

     -0.00087 |       0.00000 |       0.01230 |       0.00101 |       0.16383
     -0.00166 |       0.00000 |       0.01216 |       0.00101 |       0.16358
     -0.00136 |       0.00000 |       0.01217 |       0.00108 |       0.16455
Evaluating losses...
     -0.00112 |       0.00000 |       0.01199 |       0.00129 |       0.16324
-----------------------------------
| EpLenMean       | 688           |
| EpRewMean       | -4.78         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2660          |
| TimeElapsed     | 3.34e+03      |
| TimestepsSoFar  | 1736704       |
| ev_tdlam_before | 0.887         |
| loss_ent        | 0.16324426    |
| loss_kl         | 0.0012868166  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011231797 |
| loss_vf_loss    | 0.011986441   |
-----------------------------------
********** Iteration 424 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00076 |       0.00000 |  

********** Iteration 429 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00192 |       0.00000 |       0.01811 |       0.00080 |       0.16782
      0.00067 |       0.00000 |       0.01656 |       0.00090 |       0.16887
      0.00037 |       0.00000 |       0.01563 |       0.00092 |       0.16778
     -0.00024 |       0.00000 |       0.01515 |       0.00091 |       0.16846
     -0.00088 |       0.00000 |       0.01473 |       0.00097 |       0.16934
     -0.00036 |       0.00000 |       0.01438 |       0.00110 |       0.16852
     -0.00143 |       0.00000 |       0.01387 |       0.00118 |       0.16846
     -0.00150 |       0.00000 |       0.01365 |       0.00112 |       0.16911
     -0.00159 |       0.00000 |       0.01360 |       0.00120 |       0.16856
     -0.00237 |       0.00000 |       0.01321 |       0.00116 |       0.16977
Evaluating losses...
     -0.00243 |       0.00000 |       0.01315 |       0.00117 |      

     -0.00068 |       0.00000 |       0.01154 |       0.00105 |       0.16982
     -0.00106 |       0.00000 |       0.01138 |       0.00110 |       0.16968
     -0.00081 |       0.00000 |       0.01122 |       0.00115 |       0.16973
Evaluating losses...
     -0.00090 |       0.00000 |       0.01101 |       0.00109 |       0.16958
-----------------------------------
| EpLenMean       | 683           |
| EpRewMean       | -4.92         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2725          |
| TimeElapsed     | 3.41e+03      |
| TimestepsSoFar  | 1781760       |
| ev_tdlam_before | 0.886         |
| loss_ent        | 0.16958325    |
| loss_kl         | 0.001090423   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009029943 |
| loss_vf_loss    | 0.011013535   |
-----------------------------------
********** Iteration 435 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00071 |       0.00000 |  

********** Iteration 440 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00052 |       0.00000 |       0.01371 |       0.00072 |       0.16667
     -0.00033 |       0.00000 |       0.01243 |       0.00073 |       0.16681
    -8.65e-05 |       0.00000 |       0.01175 |       0.00086 |       0.16627
     -0.00066 |       0.00000 |       0.01126 |       0.00080 |       0.16598
     -0.00088 |       0.00000 |       0.01091 |       0.00085 |       0.16561
     -0.00093 |       0.00000 |       0.01081 |       0.00081 |       0.16522
     -0.00108 |       0.00000 |       0.01061 |       0.00091 |       0.16580
     -0.00140 |       0.00000 |       0.01040 |       0.00096 |       0.16578
     -0.00014 |       0.00000 |       0.01024 |       0.00091 |       0.16537
     -0.00138 |       0.00000 |       0.01036 |       0.00109 |       0.16605
Evaluating losses...
     -0.00190 |       0.00000 |       0.00995 |       0.00102 |      

     -0.00204 |       0.00000 |       0.01555 |       0.00102 |       0.14400
     -0.00267 |       0.00000 |       0.01544 |       0.00105 |       0.14424
     -0.00281 |       0.00000 |       0.01526 |       0.00139 |       0.14386
Evaluating losses...
     -0.00219 |       0.00000 |       0.01484 |       0.00109 |       0.14401
-----------------------------------
| EpLenMean       | 673           |
| EpRewMean       | -4.82         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 2792          |
| TimeElapsed     | 3.47e+03      |
| TimestepsSoFar  | 1826816       |
| ev_tdlam_before | 0.865         |
| loss_ent        | 0.14401078    |
| loss_kl         | 0.0010875792  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0021942393 |
| loss_vf_loss    | 0.014839614   |
-----------------------------------
********** Iteration 446 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00077 |       0.00000 |  

********** Iteration 451 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00029 |       0.00000 |       0.01638 |       0.00068 |       0.17239
      0.00044 |       0.00000 |       0.01503 |       0.00070 |       0.17199
    -4.07e-05 |       0.00000 |       0.01444 |       0.00077 |       0.17266
    -3.59e-07 |       0.00000 |       0.01415 |       0.00082 |       0.17341
    -6.33e-05 |       0.00000 |       0.01360 |       0.00083 |       0.17332
     4.97e-05 |       0.00000 |       0.01357 |       0.00085 |       0.17397
     -0.00073 |       0.00000 |       0.01338 |       0.00096 |       0.17466
     3.50e-05 |       0.00000 |       0.01307 |       0.00094 |       0.17441
     -0.00095 |       0.00000 |       0.01292 |       0.00096 |       0.17476
     -0.00190 |       0.00000 |       0.01284 |       0.00106 |       0.17578
Evaluating losses...
     -0.00167 |       0.00000 |       0.01256 |       0.00103 |      

     -0.00199 |       0.00000 |       0.01175 |       0.00113 |       0.15337
     -0.00216 |       0.00000 |       0.01159 |       0.00113 |       0.15343
     -0.00101 |       0.00000 |       0.01145 |       0.00136 |       0.15371
Evaluating losses...
     -0.00229 |       0.00000 |       0.01124 |       0.00145 |       0.15388
----------------------------------
| EpLenMean       | 697          |
| EpRewMean       | -4.77        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 2856         |
| TimeElapsed     | 3.53e+03     |
| TimestepsSoFar  | 1871872      |
| ev_tdlam_before | 0.888        |
| loss_ent        | 0.15388018   |
| loss_kl         | 0.0014518917 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | -0.00229031  |
| loss_vf_loss    | 0.011241007  |
----------------------------------
********** Iteration 457 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00098 |       0.00000 |       0.01777 |

********** Iteration 462 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00041 |       0.00000 |       0.01349 |       0.00062 |       0.16537
      0.00020 |       0.00000 |       0.01266 |       0.00076 |       0.16521
      0.00072 |       0.00000 |       0.01219 |       0.00079 |       0.16593
     -0.00078 |       0.00000 |       0.01207 |       0.00094 |       0.16663
     -0.00093 |       0.00000 |       0.01177 |       0.00106 |       0.16659
     -0.00043 |       0.00000 |       0.01153 |       0.00086 |       0.16721
     8.81e-05 |       0.00000 |       0.01125 |       0.00098 |       0.16742
     -0.00069 |       0.00000 |       0.01135 |       0.00106 |       0.16845
     -0.00067 |       0.00000 |       0.01098 |       0.00093 |       0.16817
     -0.00110 |       0.00000 |       0.01120 |       0.00101 |       0.16732
Evaluating losses...
     -0.00133 |       0.00000 |       0.01099 |       0.00110 |      

     -0.00042 |       0.00000 |       0.01142 |       0.00074 |       0.14525
     -0.00094 |       0.00000 |       0.01123 |       0.00083 |       0.14666
     -0.00127 |       0.00000 |       0.01113 |       0.00090 |       0.14626
Evaluating losses...
     -0.00085 |       0.00000 |       0.01114 |       0.00097 |       0.14643
-----------------------------------
| EpLenMean       | 663           |
| EpRewMean       | -4.82         |
| EpThisIter      | 8             |
| EpisodesSoFar   | 2925          |
| TimeElapsed     | 3.59e+03      |
| TimestepsSoFar  | 1916928       |
| ev_tdlam_before | 0.877         |
| loss_ent        | 0.14643341    |
| loss_kl         | 0.0009694664  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0008487826 |
| loss_vf_loss    | 0.011138834   |
-----------------------------------
********** Iteration 468 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00036 |       0.00000 |  

********** Iteration 473 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00053 |       0.00000 |       0.01303 |       0.00054 |       0.15517
      0.00014 |       0.00000 |       0.01172 |       0.00058 |       0.15333
      0.00022 |       0.00000 |       0.01131 |       0.00059 |       0.15419
     -0.00026 |       0.00000 |       0.01103 |       0.00078 |       0.15181
     -0.00056 |       0.00000 |       0.01071 |       0.00076 |       0.15225
     -0.00042 |       0.00000 |       0.01066 |       0.00072 |       0.15327
     -0.00064 |       0.00000 |       0.01061 |       0.00074 |       0.15316
     -0.00159 |       0.00000 |       0.01044 |       0.00074 |       0.15421
     -0.00101 |       0.00000 |       0.01040 |       0.00077 |       0.15380
     -0.00151 |       0.00000 |       0.01032 |       0.00073 |       0.15330
Evaluating losses...
     -0.00150 |       0.00000 |       0.01010 |       0.00071 |      

     -0.00127 |       0.00000 |       0.01041 |       0.00095 |       0.14624
     -0.00018 |       0.00000 |       0.01014 |       0.00092 |       0.14585
     -0.00114 |       0.00000 |       0.01004 |       0.00094 |       0.14647
Evaluating losses...
     -0.00151 |       0.00000 |       0.00997 |       0.00101 |       0.14650
-----------------------------------
| EpLenMean       | 666           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 2992          |
| TimeElapsed     | 3.65e+03      |
| TimestepsSoFar  | 1961984       |
| ev_tdlam_before | 0.889         |
| loss_ent        | 0.14650242    |
| loss_kl         | 0.0010123343  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0015130453 |
| loss_vf_loss    | 0.009969694   |
-----------------------------------
********** Iteration 479 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00028 |       0.00000 |  

********** Iteration 484 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00032 |       0.00000 |       0.01327 |       0.00052 |       0.13859
      0.00024 |       0.00000 |       0.01246 |       0.00055 |       0.14058
     -0.00105 |       0.00000 |       0.01207 |       0.00080 |       0.14159
    -3.36e-05 |       0.00000 |       0.01175 |       0.00081 |       0.14137
     -0.00147 |       0.00000 |       0.01156 |       0.00090 |       0.14161
     -0.00073 |       0.00000 |       0.01114 |       0.00080 |       0.14176
     -0.00077 |       0.00000 |       0.01115 |       0.00086 |       0.14191
     -0.00172 |       0.00000 |       0.01084 |       0.00092 |       0.14185
     -0.00163 |       0.00000 |       0.01077 |       0.00104 |       0.14215
     -0.00151 |       0.00000 |       0.01075 |       0.00099 |       0.14197
Evaluating losses...
     -0.00204 |       0.00000 |       0.01060 |       0.00104 |      

     -0.00016 |       0.00000 |       0.00893 |       0.00072 |       0.15718
     -0.00077 |       0.00000 |       0.00896 |       0.00088 |       0.15562
     -0.00054 |       0.00000 |       0.00883 |       0.00090 |       0.15574
     -0.00035 |       0.00000 |       0.00873 |       0.00092 |       0.15600
Evaluating losses...
     -0.00139 |       0.00000 |       0.00863 |       0.00096 |       0.15545
-----------------------------------
| EpLenMean       | 696           |
| EpRewMean       | -4.76         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3056          |
| TimeElapsed     | 3.72e+03      |
| TimestepsSoFar  | 2007040       |
| ev_tdlam_before | 0.897         |
| loss_ent        | 0.15545279    |
| loss_kl         | 0.00096269033 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0013886241 |
| loss_vf_loss    | 0.008626948   |
-----------------------------------
********** Iteration 490 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 495 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     6.77e-05 |       0.00000 |       0.01718 |       0.00060 |       0.15963
     -0.00014 |       0.00000 |       0.01604 |       0.00068 |       0.16014
      0.00049 |       0.00000 |       0.01548 |       0.00080 |       0.16114
     -0.00013 |       0.00000 |       0.01512 |       0.00074 |       0.16034
     -0.00087 |       0.00000 |       0.01475 |       0.00092 |       0.16087
     -0.00121 |       0.00000 |       0.01444 |       0.00085 |       0.16020
     -0.00034 |       0.00000 |       0.01439 |       0.00094 |       0.16112
     -0.00076 |       0.00000 |       0.01419 |       0.00096 |       0.16086
     -0.00181 |       0.00000 |       0.01392 |       0.00124 |       0.16218
     -0.00091 |       0.00000 |       0.01391 |       0.00122 |       0.16220
Evaluating losses...
     -0.00147 |       0.00000 |       0.01362 |       0.00120 |      

     -0.00105 |       0.00000 |       0.01307 |       0.00074 |       0.12500
     -0.00218 |       0.00000 |       0.01287 |       0.00080 |       0.12506
     -0.00205 |       0.00000 |       0.01281 |       0.00078 |       0.12595
Evaluating losses...
     -0.00168 |       0.00000 |       0.01268 |       0.00078 |       0.12639
-----------------------------------
| EpLenMean       | 682           |
| EpRewMean       | -4.74         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3122          |
| TimeElapsed     | 3.8e+03       |
| TimestepsSoFar  | 2052096       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 0.12638569    |
| loss_kl         | 0.00078039244 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.00168196   |
| loss_vf_loss    | 0.012675851   |
-----------------------------------
********** Iteration 501 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00069 |       0.00000 |  

********** Iteration 506 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00056 |       0.00000 |       0.01474 |       0.00061 |       0.15729
      0.00053 |       0.00000 |       0.01324 |       0.00055 |       0.15565
     -0.00016 |       0.00000 |       0.01253 |       0.00061 |       0.15484
     -0.00099 |       0.00000 |       0.01193 |       0.00064 |       0.15544
     -0.00038 |       0.00000 |       0.01161 |       0.00066 |       0.15443
     -0.00126 |       0.00000 |       0.01147 |       0.00083 |       0.15343
     -0.00098 |       0.00000 |       0.01125 |       0.00084 |       0.15403
     -0.00127 |       0.00000 |       0.01118 |       0.00088 |       0.15426
     -0.00133 |       0.00000 |       0.01086 |       0.00090 |       0.15436
     -0.00124 |       0.00000 |       0.01078 |       0.00093 |       0.15480
Evaluating losses...
     -0.00201 |       0.00000 |       0.01055 |       0.00105 |      

     -0.00012 |       0.00000 |       0.01271 |       0.00067 |       0.14033
     -0.00049 |       0.00000 |       0.01254 |       0.00073 |       0.14050
     -0.00027 |       0.00000 |       0.01226 |       0.00067 |       0.14063
     -0.00087 |       0.00000 |       0.01219 |       0.00067 |       0.14025
Evaluating losses...
     -0.00092 |       0.00000 |       0.01192 |       0.00077 |       0.14055
-----------------------------------
| EpLenMean       | 659           |
| EpRewMean       | -4.82         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3190          |
| TimeElapsed     | 3.87e+03      |
| TimestepsSoFar  | 2097152       |
| ev_tdlam_before | 0.879         |
| loss_ent        | 0.14055064    |
| loss_kl         | 0.0007663062  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009244969 |
| loss_vf_loss    | 0.011924324   |
-----------------------------------
********** Iteration 512 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 517 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00069 |       0.00000 |       0.01654 |       0.00046 |       0.12463
     -0.00061 |       0.00000 |       0.01545 |       0.00044 |       0.12320
     -0.00102 |       0.00000 |       0.01513 |       0.00047 |       0.12305
     -0.00057 |       0.00000 |       0.01478 |       0.00056 |       0.12275
     -0.00080 |       0.00000 |       0.01442 |       0.00057 |       0.12250
     -0.00067 |       0.00000 |       0.01428 |       0.00060 |       0.12216
     -0.00113 |       0.00000 |       0.01404 |       0.00069 |       0.12219
     -0.00193 |       0.00000 |       0.01385 |       0.00077 |       0.12245
     -0.00140 |       0.00000 |       0.01365 |       0.00077 |       0.12161
     -0.00045 |       0.00000 |       0.01368 |       0.00091 |       0.12174
Evaluating losses...
     -0.00147 |       0.00000 |       0.01348 |       0.00083 |      

     -0.00087 |       0.00000 |       0.01150 |       0.00074 |       0.14280
     -0.00108 |       0.00000 |       0.01139 |       0.00072 |       0.14287
     -0.00060 |       0.00000 |       0.01129 |       0.00076 |       0.14259
Evaluating losses...
     -0.00099 |       0.00000 |       0.01125 |       0.00075 |       0.14266
-----------------------------------
| EpLenMean       | 669           |
| EpRewMean       | -4.82         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3257          |
| TimeElapsed     | 3.93e+03      |
| TimestepsSoFar  | 2142208       |
| ev_tdlam_before | 0.884         |
| loss_ent        | 0.14266166    |
| loss_kl         | 0.0007491651  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009890466 |
| loss_vf_loss    | 0.011245452   |
-----------------------------------
********** Iteration 523 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00081 |       0.00000 |  

********** Iteration 528 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00104 |       0.00000 |       0.01468 |       0.00058 |       0.13298
     3.91e-05 |       0.00000 |       0.01410 |       0.00057 |       0.13232
     -0.00098 |       0.00000 |       0.01369 |       0.00058 |       0.13184
     -0.00055 |       0.00000 |       0.01337 |       0.00070 |       0.13107
     4.09e-05 |       0.00000 |       0.01314 |       0.00058 |       0.13114
     -0.00064 |       0.00000 |       0.01303 |       0.00074 |       0.13321
     -0.00100 |       0.00000 |       0.01293 |       0.00086 |       0.13063
     -0.00117 |       0.00000 |       0.01266 |       0.00083 |       0.13084
     -0.00142 |       0.00000 |       0.01280 |       0.00083 |       0.13215
     -0.00139 |       0.00000 |       0.01264 |       0.00096 |       0.13191
Evaluating losses...
     -0.00178 |       0.00000 |       0.01242 |       0.00096 |      

     -0.00053 |       0.00000 |       0.01221 |       0.00079 |       0.15352
     -0.00094 |       0.00000 |       0.01210 |       0.00083 |       0.15314
     -0.00146 |       0.00000 |       0.01191 |       0.00080 |       0.15452
Evaluating losses...
     -0.00205 |       0.00000 |       0.01190 |       0.00088 |       0.15535
-----------------------------------
| EpLenMean       | 675           |
| EpRewMean       | -4.75         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3323          |
| TimeElapsed     | 4e+03         |
| TimestepsSoFar  | 2187264       |
| ev_tdlam_before | 0.865         |
| loss_ent        | 0.15534534    |
| loss_kl         | 0.0008806244  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0020506962 |
| loss_vf_loss    | 0.011899397   |
-----------------------------------
********** Iteration 534 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00046 |       0.00000 |  

********** Iteration 539 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     6.09e-05 |       0.00000 |       0.01368 |       0.00055 |       0.14006
      0.00082 |       0.00000 |       0.01264 |       0.00062 |       0.13981
     -0.00049 |       0.00000 |       0.01212 |       0.00073 |       0.13971
     -0.00017 |       0.00000 |       0.01211 |       0.00070 |       0.14022
     -0.00016 |       0.00000 |       0.01181 |       0.00074 |       0.14023
     -0.00043 |       0.00000 |       0.01163 |       0.00078 |       0.13971
     -0.00021 |       0.00000 |       0.01140 |       0.00077 |       0.14060
     -0.00120 |       0.00000 |       0.01134 |       0.00081 |       0.14057
     -0.00091 |       0.00000 |       0.01111 |       0.00074 |       0.14126
     -0.00136 |       0.00000 |       0.01097 |       0.00073 |       0.14107
Evaluating losses...
     -0.00154 |       0.00000 |       0.01092 |       0.00074 |      

     -0.00086 |       0.00000 |       0.01159 |       0.00071 |       0.14226
     -0.00111 |       0.00000 |       0.01152 |       0.00073 |       0.14239
     -0.00109 |       0.00000 |       0.01150 |       0.00082 |       0.14354
Evaluating losses...
     -0.00147 |       0.00000 |       0.01119 |       0.00090 |       0.14312
-----------------------------------
| EpLenMean       | 694           |
| EpRewMean       | -4.73         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3388          |
| TimeElapsed     | 4.09e+03      |
| TimestepsSoFar  | 2232320       |
| ev_tdlam_before | 0.89          |
| loss_ent        | 0.14311509    |
| loss_kl         | 0.0009019322  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0014706485 |
| loss_vf_loss    | 0.011192437   |
-----------------------------------
********** Iteration 545 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00090 |       0.00000 |  

********** Iteration 550 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00054 |       0.00000 |       0.01489 |       0.00041 |       0.12642
     -0.00023 |       0.00000 |       0.01406 |       0.00052 |       0.12480
      0.00012 |       0.00000 |       0.01359 |       0.00052 |       0.12379
     -0.00053 |       0.00000 |       0.01305 |       0.00058 |       0.12342
     -0.00034 |       0.00000 |       0.01270 |       0.00065 |       0.12323
     -0.00042 |       0.00000 |       0.01256 |       0.00065 |       0.12391
     -0.00041 |       0.00000 |       0.01229 |       0.00062 |       0.12398
     -0.00098 |       0.00000 |       0.01195 |       0.00065 |       0.12388
     -0.00149 |       0.00000 |       0.01177 |       0.00062 |       0.12465
     -0.00125 |       0.00000 |       0.01160 |       0.00068 |       0.12367
Evaluating losses...
     -0.00103 |       0.00000 |       0.01133 |       0.00072 |      

     8.33e-05 |       0.00000 |       0.01164 |       0.00071 |       0.13225
     -0.00034 |       0.00000 |       0.01151 |       0.00063 |       0.13346
     -0.00068 |       0.00000 |       0.01122 |       0.00059 |       0.13421
Evaluating losses...
     -0.00074 |       0.00000 |       0.01119 |       0.00060 |       0.13427
------------------------------------
| EpLenMean       | 665            |
| EpRewMean       | -4.78          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 3456           |
| TimeElapsed     | 4.18e+03       |
| TimestepsSoFar  | 2277376        |
| ev_tdlam_before | 0.875          |
| loss_ent        | 0.13427046     |
| loss_kl         | 0.000602075    |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00074174313 |
| loss_vf_loss    | 0.011192157    |
------------------------------------
********** Iteration 556 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     7.79e-05 |    

********** Iteration 561 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00030 |       0.00000 |       0.01473 |       0.00045 |       0.12675
      0.00078 |       0.00000 |       0.01280 |       0.00044 |       0.12620
      0.00032 |       0.00000 |       0.01212 |       0.00055 |       0.12624
     -0.00043 |       0.00000 |       0.01183 |       0.00052 |       0.12632
     -0.00022 |       0.00000 |       0.01138 |       0.00064 |       0.12525
     -0.00022 |       0.00000 |       0.01124 |       0.00053 |       0.12495
     -0.00043 |       0.00000 |       0.01114 |       0.00059 |       0.12531
     -0.00100 |       0.00000 |       0.01102 |       0.00057 |       0.12440
     -0.00114 |       0.00000 |       0.01085 |       0.00055 |       0.12478
     -0.00099 |       0.00000 |       0.01083 |       0.00054 |       0.12431
Evaluating losses...
     -0.00124 |       0.00000 |       0.01067 |       0.00055 |      

     -0.00076 |       0.00000 |       0.01333 |       0.00083 |       0.13176
     -0.00149 |       0.00000 |       0.01301 |       0.00079 |       0.13149
     -0.00108 |       0.00000 |       0.01289 |       0.00088 |       0.13092
Evaluating losses...
     -0.00138 |       0.00000 |       0.01276 |       0.00096 |       0.13150
-----------------------------------
| EpLenMean       | 658           |
| EpRewMean       | -4.86         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 3524          |
| TimeElapsed     | 4.27e+03      |
| TimestepsSoFar  | 2322432       |
| ev_tdlam_before | 0.876         |
| loss_ent        | 0.131496      |
| loss_kl         | 0.0009607383  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0013763419 |
| loss_vf_loss    | 0.012755769   |
-----------------------------------
********** Iteration 567 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00044 |       0.00000 |  

********** Iteration 572 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00050 |       0.00000 |       0.01648 |       0.00051 |       0.12154
     -0.00033 |       0.00000 |       0.01512 |       0.00061 |       0.12096
     -0.00013 |       0.00000 |       0.01447 |       0.00063 |       0.12155
      0.00040 |       0.00000 |       0.01395 |       0.00073 |       0.12028
     -0.00064 |       0.00000 |       0.01375 |       0.00073 |       0.12014
     9.84e-05 |       0.00000 |       0.01361 |       0.00087 |       0.12030
     -0.00063 |       0.00000 |       0.01324 |       0.00067 |       0.12082
     -0.00026 |       0.00000 |       0.01315 |       0.00077 |       0.12136
     -0.00046 |       0.00000 |       0.01294 |       0.00086 |       0.12061
     -0.00088 |       0.00000 |       0.01304 |       0.00095 |       0.12087
Evaluating losses...
     -0.00116 |       0.00000 |       0.01277 |       0.00080 |      

     -0.00023 |       0.00000 |       0.01156 |       0.00059 |       0.11902
     -0.00037 |       0.00000 |       0.01151 |       0.00071 |       0.11979
     -0.00090 |       0.00000 |       0.01136 |       0.00064 |       0.11929
Evaluating losses...
     -0.00109 |       0.00000 |       0.01122 |       0.00072 |       0.11942
-----------------------------------
| EpLenMean       | 692           |
| EpRewMean       | -4.79         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3589          |
| TimeElapsed     | 4.36e+03      |
| TimestepsSoFar  | 2367488       |
| ev_tdlam_before | 0.88          |
| loss_ent        | 0.11942049    |
| loss_kl         | 0.00072094135 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0010866947 |
| loss_vf_loss    | 0.011215593   |
-----------------------------------
********** Iteration 578 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00084 |       0.00000 |  

********** Iteration 583 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |       0.01160 |       0.00043 |       0.12256
     -0.00011 |       0.00000 |       0.01107 |       0.00043 |       0.12363
     -0.00035 |       0.00000 |       0.01054 |       0.00051 |       0.12386
      0.00010 |       0.00000 |       0.01019 |       0.00051 |       0.12441
     -0.00023 |       0.00000 |       0.01012 |       0.00063 |       0.12554
     -0.00092 |       0.00000 |       0.00997 |       0.00056 |       0.12531
     -0.00053 |       0.00000 |       0.00979 |       0.00055 |       0.12500
     -0.00021 |       0.00000 |       0.00957 |       0.00064 |       0.12559
     -0.00044 |       0.00000 |       0.00950 |       0.00066 |       0.12530
     -0.00117 |       0.00000 |       0.00947 |       0.00068 |       0.12590
Evaluating losses...
     -0.00133 |       0.00000 |       0.00941 |       0.00067 |      

     -0.00112 |       0.00000 |       0.01096 |       0.00064 |       0.12786
     -0.00142 |       0.00000 |       0.01077 |       0.00053 |       0.12817
     -0.00122 |       0.00000 |       0.01073 |       0.00056 |       0.12848
Evaluating losses...
     -0.00115 |       0.00000 |       0.01045 |       0.00060 |       0.12841
-----------------------------------
| EpLenMean       | 685           |
| EpRewMean       | -4.84         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3655          |
| TimeElapsed     | 4.44e+03      |
| TimestepsSoFar  | 2412544       |
| ev_tdlam_before | 0.883         |
| loss_ent        | 0.12840688    |
| loss_kl         | 0.00059675425 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011547403 |
| loss_vf_loss    | 0.010446272   |
-----------------------------------
********** Iteration 589 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     6.13e-05 |       0.00000 |  

********** Iteration 594 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00058 |       0.00000 |       0.01929 |       0.00052 |       0.13495
      0.00032 |       0.00000 |       0.01712 |       0.00059 |       0.13489
     -0.00036 |       0.00000 |       0.01676 |       0.00061 |       0.13533
     -0.00034 |       0.00000 |       0.01627 |       0.00073 |       0.13512
     -0.00056 |       0.00000 |       0.01589 |       0.00068 |       0.13396
     -0.00114 |       0.00000 |       0.01585 |       0.00072 |       0.13405
     -0.00076 |       0.00000 |       0.01547 |       0.00079 |       0.13439
     -0.00161 |       0.00000 |       0.01537 |       0.00076 |       0.13397
     -0.00079 |       0.00000 |       0.01528 |       0.00080 |       0.13372
     -0.00026 |       0.00000 |       0.01516 |       0.00075 |       0.13372
Evaluating losses...
     -0.00132 |       0.00000 |       0.01489 |       0.00090 |      

      0.00053 |       0.00000 |       0.01206 |       0.00057 |       0.12740
     -0.00050 |       0.00000 |       0.01193 |       0.00061 |       0.12683
     6.64e-05 |       0.00000 |       0.01177 |       0.00063 |       0.12700
Evaluating losses...
     -0.00077 |       0.00000 |       0.01164 |       0.00065 |       0.12591
-----------------------------------
| EpLenMean       | 668           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 3724          |
| TimeElapsed     | 4.52e+03      |
| TimestepsSoFar  | 2457600       |
| ev_tdlam_before | 0.876         |
| loss_ent        | 0.12590767    |
| loss_kl         | 0.0006491497  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0007662971 |
| loss_vf_loss    | 0.011635625   |
-----------------------------------
********** Iteration 600 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00051 |       0.00000 |  

********** Iteration 605 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     2.02e-05 |       0.00000 |       0.01533 |       0.00044 |       0.12924
     -0.00043 |       0.00000 |       0.01399 |       0.00062 |       0.12847
     -0.00026 |       0.00000 |       0.01351 |       0.00060 |       0.12882
     -0.00043 |       0.00000 |       0.01308 |       0.00053 |       0.12942
     -0.00066 |       0.00000 |       0.01251 |       0.00072 |       0.12959
     -0.00052 |       0.00000 |       0.01242 |       0.00065 |       0.12950
     -0.00091 |       0.00000 |       0.01221 |       0.00066 |       0.12924
     -0.00055 |       0.00000 |       0.01189 |       0.00075 |       0.12952
     -0.00121 |       0.00000 |       0.01172 |       0.00094 |       0.12940
     -0.00109 |       0.00000 |       0.01171 |       0.00080 |       0.13018
Evaluating losses...
     -0.00112 |       0.00000 |       0.01164 |       0.00083 |      

     -0.00064 |       0.00000 |       0.01506 |       0.00051 |       0.11838
     -0.00073 |       0.00000 |       0.01488 |       0.00052 |       0.11843
     -0.00072 |       0.00000 |       0.01479 |       0.00060 |       0.11759
     -0.00076 |       0.00000 |       0.01446 |       0.00059 |       0.11761
Evaluating losses...
     -0.00036 |       0.00000 |       0.01408 |       0.00057 |       0.11818
-----------------------------------
| EpLenMean       | 679           |
| EpRewMean       | -4.86         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 3790          |
| TimeElapsed     | 4.62e+03      |
| TimestepsSoFar  | 2502656       |
| ev_tdlam_before | 0.882         |
| loss_ent        | 0.11818278    |
| loss_kl         | 0.0005732741  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0003602706 |
| loss_vf_loss    | 0.014077596   |
-----------------------------------
********** Iteration 611 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 616 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00030 |       0.00000 |       0.01555 |       0.00044 |       0.12615
      0.00044 |       0.00000 |       0.01474 |       0.00042 |       0.12563
    -6.88e-05 |       0.00000 |       0.01403 |       0.00044 |       0.12572
      0.00105 |       0.00000 |       0.01400 |       0.00047 |       0.12589
     -0.00017 |       0.00000 |       0.01357 |       0.00051 |       0.12535
     -0.00040 |       0.00000 |       0.01344 |       0.00047 |       0.12590
     3.45e-05 |       0.00000 |       0.01315 |       0.00056 |       0.12566
     -0.00089 |       0.00000 |       0.01314 |       0.00054 |       0.12485
     -0.00027 |       0.00000 |       0.01286 |       0.00062 |       0.12517
     -0.00027 |       0.00000 |       0.01295 |       0.00064 |       0.12600
Evaluating losses...
     -0.00091 |       0.00000 |       0.01260 |       0.00061 |      

     -0.00058 |       0.00000 |       0.00928 |       0.00057 |       0.10368
     -0.00067 |       0.00000 |       0.00928 |       0.00055 |       0.10418
     -0.00047 |       0.00000 |       0.00909 |       0.00061 |       0.10423
Evaluating losses...
     -0.00092 |       0.00000 |       0.00911 |       0.00057 |       0.10428
------------------------------------
| EpLenMean       | 659            |
| EpRewMean       | -4.8           |
| EpThisIter      | 7              |
| EpisodesSoFar   | 3860           |
| TimeElapsed     | 4.71e+03       |
| TimestepsSoFar  | 2547712        |
| ev_tdlam_before | 0.889          |
| loss_ent        | 0.104279324    |
| loss_kl         | 0.0005667298   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00091602956 |
| loss_vf_loss    | 0.009109519    |
------------------------------------
********** Iteration 622 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00014 |    

********** Iteration 627 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00176 |       0.00000 |       0.01995 |       0.00046 |       0.13400
      0.00039 |       0.00000 |       0.01834 |       0.00048 |       0.13456
     -0.00033 |       0.00000 |       0.01752 |       0.00052 |       0.13487
      0.00010 |       0.00000 |       0.01730 |       0.00066 |       0.13538
    -2.84e-05 |       0.00000 |       0.01683 |       0.00066 |       0.13490
     -0.00106 |       0.00000 |       0.01654 |       0.00066 |       0.13354
     -0.00061 |       0.00000 |       0.01635 |       0.00069 |       0.13597
     -0.00087 |       0.00000 |       0.01614 |       0.00069 |       0.13615
      0.00017 |       0.00000 |       0.01561 |       0.00070 |       0.13526
     -0.00147 |       0.00000 |       0.01584 |       0.00067 |       0.13565
Evaluating losses...
     -0.00121 |       0.00000 |       0.01548 |       0.00074 |      

     -0.00051 |       0.00000 |       0.01023 |       0.00051 |       0.11800
     -0.00023 |       0.00000 |       0.01017 |       0.00054 |       0.11852
     -0.00038 |       0.00000 |       0.01013 |       0.00056 |       0.11883
Evaluating losses...
     -0.00098 |       0.00000 |       0.00996 |       0.00057 |       0.11789
-----------------------------------
| EpLenMean       | 686           |
| EpRewMean       | -4.76         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3924          |
| TimeElapsed     | 4.78e+03      |
| TimestepsSoFar  | 2592768       |
| ev_tdlam_before | 0.896         |
| loss_ent        | 0.11789429    |
| loss_kl         | 0.0005720066  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009762759 |
| loss_vf_loss    | 0.009964071   |
-----------------------------------
********** Iteration 633 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00054 |       0.00000 |  

********** Iteration 638 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -3.09e-05 |       0.00000 |       0.01524 |       0.00040 |       0.11643
    -1.99e-06 |       0.00000 |       0.01411 |       0.00057 |       0.11593
     -0.00020 |       0.00000 |       0.01355 |       0.00051 |       0.11632
     -0.00043 |       0.00000 |       0.01324 |       0.00056 |       0.11606
     -0.00021 |       0.00000 |       0.01286 |       0.00064 |       0.11561
     -0.00068 |       0.00000 |       0.01280 |       0.00063 |       0.11551
     8.84e-06 |       0.00000 |       0.01257 |       0.00064 |       0.11568
     -0.00061 |       0.00000 |       0.01243 |       0.00073 |       0.11572
     -0.00043 |       0.00000 |       0.01212 |       0.00068 |       0.11656
     -0.00049 |       0.00000 |       0.01208 |       0.00062 |       0.11691
Evaluating losses...
     -0.00069 |       0.00000 |       0.01189 |       0.00068 |      

     -0.00099 |       0.00000 |       0.01290 |       0.00051 |       0.09928
     -0.00070 |       0.00000 |       0.01299 |       0.00048 |       0.09948
     -0.00104 |       0.00000 |       0.01271 |       0.00051 |       0.09965
Evaluating losses...
     -0.00099 |       0.00000 |       0.01266 |       0.00050 |       0.09974
-----------------------------------
| EpLenMean       | 667           |
| EpRewMean       | -4.79         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 3993          |
| TimeElapsed     | 4.87e+03      |
| TimestepsSoFar  | 2637824       |
| ev_tdlam_before | 0.869         |
| loss_ent        | 0.09974465    |
| loss_kl         | 0.0004999247  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0009920092 |
| loss_vf_loss    | 0.012659931   |
-----------------------------------
********** Iteration 644 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00101 |       0.00000 |  

********** Iteration 649 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00146 |       0.00000 |       0.01574 |       0.00035 |       0.10319
     8.99e-05 |       0.00000 |       0.01474 |       0.00036 |       0.10440
     -0.00019 |       0.00000 |       0.01433 |       0.00043 |       0.10532
      0.00014 |       0.00000 |       0.01384 |       0.00048 |       0.10565
     -0.00036 |       0.00000 |       0.01351 |       0.00051 |       0.10550
     -0.00020 |       0.00000 |       0.01330 |       0.00049 |       0.10551
    -7.60e-05 |       0.00000 |       0.01313 |       0.00045 |       0.10542
     -0.00022 |       0.00000 |       0.01307 |       0.00051 |       0.10482
     -0.00098 |       0.00000 |       0.01278 |       0.00045 |       0.10597
     -0.00077 |       0.00000 |       0.01268 |       0.00049 |       0.10550
Evaluating losses...
     -0.00104 |       0.00000 |       0.01252 |       0.00050 |      

      0.00026 |       0.00000 |       0.01256 |       0.00058 |       0.12146
     -0.00033 |       0.00000 |       0.01249 |       0.00060 |       0.12126
     -0.00083 |       0.00000 |       0.01224 |       0.00061 |       0.12147
Evaluating losses...
     -0.00081 |       0.00000 |       0.01211 |       0.00066 |       0.12201
-----------------------------------
| EpLenMean       | 671           |
| EpRewMean       | -4.75         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4059          |
| TimeElapsed     | 4.96e+03      |
| TimestepsSoFar  | 2682880       |
| ev_tdlam_before | 0.869         |
| loss_ent        | 0.12200642    |
| loss_kl         | 0.00066015864 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0008082669 |
| loss_vf_loss    | 0.012106168   |
-----------------------------------
********** Iteration 655 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00081 |       0.00000 |  

********** Iteration 660 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00015 |       0.00000 |       0.01228 |       0.00038 |       0.11138
     -0.00025 |       0.00000 |       0.01175 |       0.00047 |       0.11112
      0.00062 |       0.00000 |       0.01159 |       0.00041 |       0.11146
      0.00040 |       0.00000 |       0.01133 |       0.00048 |       0.11032
     5.50e-05 |       0.00000 |       0.01130 |       0.00044 |       0.11128
     -0.00048 |       0.00000 |       0.01109 |       0.00044 |       0.11195
     -0.00027 |       0.00000 |       0.01110 |       0.00043 |       0.11213
     -0.00079 |       0.00000 |       0.01089 |       0.00044 |       0.11222
     -0.00063 |       0.00000 |       0.01070 |       0.00056 |       0.11162
     -0.00068 |       0.00000 |       0.01063 |       0.00056 |       0.11198
Evaluating losses...
     -0.00108 |       0.00000 |       0.01062 |       0.00054 |      

     8.79e-05 |       0.00000 |       0.01215 |       0.00050 |       0.12052
     -0.00011 |       0.00000 |       0.01191 |       0.00053 |       0.12187
     -0.00013 |       0.00000 |       0.01178 |       0.00062 |       0.12209
Evaluating losses...
     -0.00040 |       0.00000 |       0.01173 |       0.00061 |       0.12176
------------------------------------
| EpLenMean       | 689            |
| EpRewMean       | -4.8           |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4124           |
| TimeElapsed     | 5.05e+03       |
| TimestepsSoFar  | 2727936        |
| ev_tdlam_before | 0.876          |
| loss_ent        | 0.12175978     |
| loss_kl         | 0.00060766155  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00039595552 |
| loss_vf_loss    | 0.011733       |
------------------------------------
********** Iteration 666 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00103 |    

********** Iteration 671 ************
Eval num_timesteps=2748416, episode_reward=-4.70 +/- 0.46
Episode length: 731.90 +/- 166.00
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00074 |       0.00000 |       0.01841 |       0.00034 |       0.10469
      0.00065 |       0.00000 |       0.01679 |       0.00035 |       0.10478
      0.00038 |       0.00000 |       0.01601 |       0.00041 |       0.10602
      0.00024 |       0.00000 |       0.01551 |       0.00038 |       0.10598
     -0.00053 |       0.00000 |       0.01505 |       0.00038 |       0.10565
    -1.14e-05 |       0.00000 |       0.01483 |       0.00045 |       0.10562
     -0.00021 |       0.00000 |       0.01443 |       0.00048 |       0.10621
     -0.00025 |       0.00000 |       0.01401 |       0.00051 |       0.10675
     -0.00057 |       0.00000 |       0.01388 |       0.00045 |       0.10662
     -0.00083 |       0.00000 |       0.01374 |       0.00055 |       0.1070

     -0.00033 |       0.00000 |       0.01446 |       0.00063 |       0.13474
     -0.00019 |       0.00000 |       0.01402 |       0.00060 |       0.13367
     4.87e-05 |       0.00000 |       0.01408 |       0.00066 |       0.13394
     -0.00124 |       0.00000 |       0.01380 |       0.00068 |       0.13402
Evaluating losses...
     -0.00130 |       0.00000 |       0.01349 |       0.00074 |       0.13500
-----------------------------------
| EpLenMean       | 702           |
| EpRewMean       | -4.85         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 4188          |
| TimeElapsed     | 5.14e+03      |
| TimestepsSoFar  | 2772992       |
| ev_tdlam_before | 0.862         |
| loss_ent        | 0.13499618    |
| loss_kl         | 0.00073874684 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0012993361 |
| loss_vf_loss    | 0.013493057   |
-----------------------------------
********** Iteration 677 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 682 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00034 |       0.00000 |       0.01949 |       0.00042 |       0.13191
     -0.00026 |       0.00000 |       0.01795 |       0.00042 |       0.13278
      0.00080 |       0.00000 |       0.01724 |       0.00046 |       0.13347
      0.00059 |       0.00000 |       0.01667 |       0.00045 |       0.13310
     -0.00019 |       0.00000 |       0.01627 |       0.00045 |       0.13321
      0.00018 |       0.00000 |       0.01608 |       0.00055 |       0.13425
    -6.93e-05 |       0.00000 |       0.01553 |       0.00052 |       0.13506
      0.00037 |       0.00000 |       0.01552 |       0.00055 |       0.13473
     -0.00074 |       0.00000 |       0.01533 |       0.00057 |       0.13555
      0.00041 |       0.00000 |       0.01513 |       0.00062 |       0.13597
Evaluating losses...
      0.00018 |       0.00000 |       0.01471 |       0.00062 |      

     -0.00093 |       0.00000 |       0.01456 |       0.00052 |       0.10715
     -0.00084 |       0.00000 |       0.01448 |       0.00059 |       0.10766
     -0.00106 |       0.00000 |       0.01435 |       0.00060 |       0.10711
Evaluating losses...
     -0.00149 |       0.00000 |       0.01402 |       0.00058 |       0.10634
-----------------------------------
| EpLenMean       | 716           |
| EpRewMean       | -4.77         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4252          |
| TimeElapsed     | 5.24e+03      |
| TimestepsSoFar  | 2818048       |
| ev_tdlam_before | 0.881         |
| loss_ent        | 0.10634023    |
| loss_kl         | 0.0005796447  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0014909003 |
| loss_vf_loss    | 0.01401598    |
-----------------------------------
********** Iteration 688 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00022 |       0.00000 |  

********** Iteration 693 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00082 |       0.00000 |       0.01313 |       0.00038 |       0.12283
      0.00011 |       0.00000 |       0.01269 |       0.00042 |       0.12336
     -0.00051 |       0.00000 |       0.01235 |       0.00043 |       0.12295
      0.00033 |       0.00000 |       0.01205 |       0.00047 |       0.12355
     -0.00035 |       0.00000 |       0.01191 |       0.00047 |       0.12323
    -2.95e-06 |       0.00000 |       0.01173 |       0.00049 |       0.12360
     -0.00046 |       0.00000 |       0.01153 |       0.00053 |       0.12401
     -0.00014 |       0.00000 |       0.01136 |       0.00054 |       0.12318
     -0.00041 |       0.00000 |       0.01126 |       0.00050 |       0.12333
     -0.00065 |       0.00000 |       0.01117 |       0.00052 |       0.12347
Evaluating losses...
     -0.00047 |       0.00000 |       0.01101 |       0.00056 |      

     -0.00090 |       0.00000 |       0.01419 |       0.00053 |       0.11229
     -0.00130 |       0.00000 |       0.01401 |       0.00059 |       0.11191
     -0.00106 |       0.00000 |       0.01404 |       0.00058 |       0.11200
     -0.00106 |       0.00000 |       0.01395 |       0.00059 |       0.11151
Evaluating losses...
     -0.00119 |       0.00000 |       0.01355 |       0.00053 |       0.11190
-----------------------------------
| EpLenMean       | 659           |
| EpRewMean       | -4.81         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4322          |
| TimeElapsed     | 5.33e+03      |
| TimestepsSoFar  | 2863104       |
| ev_tdlam_before | 0.875         |
| loss_ent        | 0.11190263    |
| loss_kl         | 0.00052880286 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0011907322 |
| loss_vf_loss    | 0.013554587   |
-----------------------------------
********** Iteration 699 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 704 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00062 |       0.00000 |       0.01341 |       0.00032 |       0.10193
      0.00033 |       0.00000 |       0.01240 |       0.00033 |       0.10171
     -0.00049 |       0.00000 |       0.01177 |       0.00032 |       0.10212
    -8.87e-05 |       0.00000 |       0.01135 |       0.00042 |       0.10135
     -0.00023 |       0.00000 |       0.01105 |       0.00039 |       0.10193
     -0.00039 |       0.00000 |       0.01081 |       0.00039 |       0.10212
     2.26e-06 |       0.00000 |       0.01069 |       0.00040 |       0.10279
     -0.00021 |       0.00000 |       0.01042 |       0.00043 |       0.10258
     -0.00018 |       0.00000 |       0.01024 |       0.00047 |       0.10289
     -0.00060 |       0.00000 |       0.01012 |       0.00047 |       0.10296
Evaluating losses...
     -0.00078 |       0.00000 |       0.00992 |       0.00046 |      

     -0.00016 |       0.00000 |       0.01224 |       0.00057 |       0.11553
     -0.00079 |       0.00000 |       0.01213 |       0.00060 |       0.11514
     -0.00062 |       0.00000 |       0.01204 |       0.00059 |       0.11514
Evaluating losses...
     -0.00082 |       0.00000 |       0.01184 |       0.00053 |       0.11490
-----------------------------------
| EpLenMean       | 663           |
| EpRewMean       | -4.87         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4388          |
| TimeElapsed     | 5.41e+03      |
| TimestepsSoFar  | 2908160       |
| ev_tdlam_before | 0.885         |
| loss_ent        | 0.11490347    |
| loss_kl         | 0.00053396495 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0008195017 |
| loss_vf_loss    | 0.011838398   |
-----------------------------------
********** Iteration 710 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00011 |       0.00000 |  

********** Iteration 715 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00077 |       0.00000 |       0.01506 |       0.00038 |       0.12217
     -0.00068 |       0.00000 |       0.01430 |       0.00038 |       0.12137
     -0.00061 |       0.00000 |       0.01376 |       0.00040 |       0.12188
     -0.00130 |       0.00000 |       0.01361 |       0.00040 |       0.12181
     -0.00084 |       0.00000 |       0.01322 |       0.00045 |       0.12106
     -0.00083 |       0.00000 |       0.01301 |       0.00048 |       0.12133
     -0.00097 |       0.00000 |       0.01288 |       0.00050 |       0.12102
     -0.00106 |       0.00000 |       0.01252 |       0.00052 |       0.12115
     -0.00077 |       0.00000 |       0.01251 |       0.00050 |       0.12108
     -0.00083 |       0.00000 |       0.01241 |       0.00054 |       0.12071
Evaluating losses...
     -0.00127 |       0.00000 |       0.01218 |       0.00052 |      

     -0.00042 |       0.00000 |       0.01129 |       0.00044 |       0.10176
     -0.00057 |       0.00000 |       0.01103 |       0.00049 |       0.10194
     -0.00029 |       0.00000 |       0.01107 |       0.00058 |       0.10164
Evaluating losses...
     -0.00048 |       0.00000 |       0.01080 |       0.00054 |       0.10159
------------------------------------
| EpLenMean       | 659            |
| EpRewMean       | -4.79          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4456           |
| TimeElapsed     | 5.5e+03        |
| TimestepsSoFar  | 2953216        |
| ev_tdlam_before | 0.887          |
| loss_ent        | 0.10159148     |
| loss_kl         | 0.0005377279   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00048264815 |
| loss_vf_loss    | 0.01079866     |
------------------------------------
********** Iteration 721 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00052 |    

********** Iteration 726 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00024 |       0.00000 |       0.01674 |       0.00025 |       0.10083
     -0.00013 |       0.00000 |       0.01573 |       0.00028 |       0.10092
      0.00055 |       0.00000 |       0.01538 |       0.00028 |       0.10113
    -2.64e-05 |       0.00000 |       0.01482 |       0.00028 |       0.10044
     -0.00031 |       0.00000 |       0.01480 |       0.00028 |       0.10047
      0.00050 |       0.00000 |       0.01441 |       0.00032 |       0.10045
     5.93e-06 |       0.00000 |       0.01422 |       0.00036 |       0.10014
     -0.00045 |       0.00000 |       0.01401 |       0.00037 |       0.10041
     -0.00014 |       0.00000 |       0.01395 |       0.00034 |       0.10085
     -0.00033 |       0.00000 |       0.01377 |       0.00043 |       0.10042
Evaluating losses...
     -0.00043 |       0.00000 |       0.01368 |       0.00038 |      

     -0.00065 |       0.00000 |       0.01288 |       0.00044 |       0.11528
     -0.00024 |       0.00000 |       0.01292 |       0.00041 |       0.11547
     -0.00107 |       0.00000 |       0.01258 |       0.00041 |       0.11519
Evaluating losses...
    -3.78e-05 |       0.00000 |       0.01243 |       0.00040 |       0.11524
------------------------------------
| EpLenMean       | 663            |
| EpRewMean       | -4.77          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4524           |
| TimeElapsed     | 5.59e+03       |
| TimestepsSoFar  | 2998272        |
| ev_tdlam_before | 0.873          |
| loss_ent        | 0.11524486     |
| loss_kl         | 0.00040135768  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -3.7795515e-05 |
| loss_vf_loss    | 0.012427969    |
------------------------------------
********** Iteration 732 ************
Eval num_timesteps=2998272, episode_reward=-4.70 +/- 0.46
Episode length: 613.80 +/- 130.35
Optimizing...
     

********** Iteration 737 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -3.24e-05 |       0.00000 |       0.01459 |       0.00033 |       0.11319
      0.00082 |       0.00000 |       0.01383 |       0.00039 |       0.11324
      0.00053 |       0.00000 |       0.01337 |       0.00035 |       0.11298
     -0.00029 |       0.00000 |       0.01308 |       0.00039 |       0.11360
     -0.00029 |       0.00000 |       0.01298 |       0.00048 |       0.11339
     -0.00039 |       0.00000 |       0.01269 |       0.00044 |       0.11351
     -0.00059 |       0.00000 |       0.01241 |       0.00049 |       0.11382
     -0.00073 |       0.00000 |       0.01235 |       0.00044 |       0.11229
     -0.00015 |       0.00000 |       0.01217 |       0.00048 |       0.11261
     -0.00061 |       0.00000 |       0.01201 |       0.00050 |       0.11288
Evaluating losses...
      0.00012 |       0.00000 |       0.01198 |       0.00057 |      

    -7.99e-05 |       0.00000 |       0.01161 |       0.00038 |       0.09870
     3.23e-05 |       0.00000 |       0.01157 |       0.00039 |       0.09905
     -0.00042 |       0.00000 |       0.01149 |       0.00036 |       0.09874
Evaluating losses...
     -0.00023 |       0.00000 |       0.01139 |       0.00040 |       0.09903
------------------------------------
| EpLenMean       | 666            |
| EpRewMean       | -4.84          |
| EpThisIter      | 7              |
| EpisodesSoFar   | 4591           |
| TimeElapsed     | 5.68e+03       |
| TimestepsSoFar  | 3043328        |
| ev_tdlam_before | 0.891          |
| loss_ent        | 0.09903034     |
| loss_kl         | 0.00040218115  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00023369794 |
| loss_vf_loss    | 0.011394417    |
------------------------------------
********** Iteration 743 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00027 |    

********** Iteration 748 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00061 |       0.00000 |       0.01562 |       0.00031 |       0.12435
      0.00032 |       0.00000 |       0.01511 |       0.00037 |       0.12458
      0.00047 |       0.00000 |       0.01466 |       0.00034 |       0.12394
     7.65e-05 |       0.00000 |       0.01421 |       0.00036 |       0.12390
     -0.00035 |       0.00000 |       0.01421 |       0.00037 |       0.12390
     -0.00022 |       0.00000 |       0.01391 |       0.00039 |       0.12418
     4.11e-05 |       0.00000 |       0.01382 |       0.00037 |       0.12391
     -0.00018 |       0.00000 |       0.01361 |       0.00039 |       0.12388
     -0.00024 |       0.00000 |       0.01351 |       0.00044 |       0.12415
     -0.00022 |       0.00000 |       0.01333 |       0.00044 |       0.12370
Evaluating losses...
     -0.00040 |       0.00000 |       0.01325 |       0.00041 |      

     -0.00063 |       0.00000 |       0.01420 |       0.00041 |       0.10816
     -0.00028 |       0.00000 |       0.01420 |       0.00046 |       0.10791
     -0.00058 |       0.00000 |       0.01390 |       0.00042 |       0.10801
     -0.00044 |       0.00000 |       0.01370 |       0.00039 |       0.10850
Evaluating losses...
     -0.00014 |       0.00000 |       0.01353 |       0.00040 |       0.10842
------------------------------------
| EpLenMean       | 696            |
| EpRewMean       | -4.74          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 4655           |
| TimeElapsed     | 5.77e+03       |
| TimestepsSoFar  | 3088384        |
| ev_tdlam_before | 0.866          |
| loss_ent        | 0.1084156      |
| loss_kl         | 0.0004004985   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00013942376 |
| loss_vf_loss    | 0.013530022    |
------------------------------------
********** Iteration 754 ************
Optimizing...
     pol_surr |    

********** Iteration 759 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     9.31e-06 |       0.00000 |       0.01170 |       0.00030 |       0.10788
     -0.00011 |       0.00000 |       0.01078 |       0.00044 |       0.10767
     5.94e-05 |       0.00000 |       0.01045 |       0.00043 |       0.10875
     -0.00013 |       0.00000 |       0.01016 |       0.00043 |       0.10830
     -0.00021 |       0.00000 |       0.01002 |       0.00047 |       0.10836
      0.00013 |       0.00000 |       0.00991 |       0.00039 |       0.10810
     -0.00046 |       0.00000 |       0.00980 |       0.00051 |       0.10801
     -0.00037 |       0.00000 |       0.00977 |       0.00046 |       0.10760
     -0.00048 |       0.00000 |       0.00972 |       0.00054 |       0.10770
     -0.00048 |       0.00000 |       0.00957 |       0.00039 |       0.10739
Evaluating losses...
     -0.00055 |       0.00000 |       0.00951 |       0.00046 |      

     -0.00043 |       0.00000 |       0.01347 |       0.00040 |       0.11686
      0.00035 |       0.00000 |       0.01328 |       0.00040 |       0.11746
     -0.00089 |       0.00000 |       0.01321 |       0.00044 |       0.11757
      0.00038 |       0.00000 |       0.01303 |       0.00039 |       0.11804
Evaluating losses...
      0.00030 |       0.00000 |       0.01289 |       0.00043 |       0.11853
-----------------------------------
| EpLenMean       | 696           |
| EpRewMean       | -4.71         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 4723          |
| TimeElapsed     | 5.86e+03      |
| TimestepsSoFar  | 3133440       |
| ev_tdlam_before | 0.884         |
| loss_ent        | 0.1185275     |
| loss_kl         | 0.00043396902 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00030265073 |
| loss_vf_loss    | 0.012891349   |
-----------------------------------
********** Iteration 765 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 770 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00104 |       0.00000 |       0.01726 |       0.00037 |       0.12159
      0.00037 |       0.00000 |       0.01630 |       0.00037 |       0.12112
      0.00020 |       0.00000 |       0.01575 |       0.00038 |       0.12028
      0.00045 |       0.00000 |       0.01543 |       0.00039 |       0.12105
      0.00050 |       0.00000 |       0.01517 |       0.00045 |       0.11964
     -0.00013 |       0.00000 |       0.01501 |       0.00048 |       0.11854
      0.00023 |       0.00000 |       0.01470 |       0.00047 |       0.11887
      0.00024 |       0.00000 |       0.01450 |       0.00052 |       0.11901
      0.00039 |       0.00000 |       0.01442 |       0.00056 |       0.11839
     -0.00016 |       0.00000 |       0.01419 |       0.00052 |       0.11940
Evaluating losses...
      0.00024 |       0.00000 |       0.01404 |       0.00048 |      

      0.00033 |       0.00000 |       0.01304 |       0.00055 |       0.12784
     -0.00017 |       0.00000 |       0.01293 |       0.00061 |       0.12756
    -5.40e-05 |       0.00000 |       0.01280 |       0.00060 |       0.12770
Evaluating losses...
     -0.00024 |       0.00000 |       0.01277 |       0.00059 |       0.12764
-----------------------------------
| EpLenMean       | 681           |
| EpRewMean       | -4.71         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 4788          |
| TimeElapsed     | 5.95e+03      |
| TimestepsSoFar  | 3178496       |
| ev_tdlam_before | 0.871         |
| loss_ent        | 0.12763897    |
| loss_kl         | 0.000592215   |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0002357486 |
| loss_vf_loss    | 0.012773669   |
-----------------------------------
********** Iteration 776 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 |  

********** Iteration 781 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00063 |       0.00000 |       0.01241 |       0.00033 |       0.11831
      0.00050 |       0.00000 |       0.01176 |       0.00034 |       0.11789
     -0.00017 |       0.00000 |       0.01156 |       0.00036 |       0.11799
      0.00050 |       0.00000 |       0.01134 |       0.00037 |       0.11807
      0.00041 |       0.00000 |       0.01112 |       0.00039 |       0.11807
     -0.00039 |       0.00000 |       0.01119 |       0.00043 |       0.11862
      0.00035 |       0.00000 |       0.01087 |       0.00041 |       0.11823
     3.33e-05 |       0.00000 |       0.01074 |       0.00038 |       0.11844
      0.00028 |       0.00000 |       0.01067 |       0.00046 |       0.11823
     -0.00033 |       0.00000 |       0.01055 |       0.00048 |       0.11817
Evaluating losses...
     -0.00036 |       0.00000 |       0.01039 |       0.00044 |      

     -0.00029 |       0.00000 |       0.01012 |       0.00034 |       0.10767
     -0.00031 |       0.00000 |       0.01005 |       0.00036 |       0.10820
     -0.00050 |       0.00000 |       0.00987 |       0.00040 |       0.10737
Evaluating losses...
     -0.00042 |       0.00000 |       0.00981 |       0.00042 |       0.10767
-----------------------------------
| EpLenMean       | 693           |
| EpRewMean       | -4.83         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 4853          |
| TimeElapsed     | 6.04e+03      |
| TimestepsSoFar  | 3223552       |
| ev_tdlam_before | 0.893         |
| loss_ent        | 0.107671924   |
| loss_kl         | 0.00041659604 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0004204488 |
| loss_vf_loss    | 0.0098097045  |
-----------------------------------
********** Iteration 787 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00061 |       0.00000 |  

********** Iteration 792 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
    -7.80e-06 |       0.00000 |       0.01590 |       0.00036 |       0.11072
      0.00046 |       0.00000 |       0.01441 |       0.00036 |       0.11082
      0.00020 |       0.00000 |       0.01406 |       0.00038 |       0.11087
     -0.00013 |       0.00000 |       0.01377 |       0.00041 |       0.11042
     9.02e-05 |       0.00000 |       0.01343 |       0.00047 |       0.10958
      0.00078 |       0.00000 |       0.01320 |       0.00048 |       0.10912
     -0.00053 |       0.00000 |       0.01302 |       0.00051 |       0.10925
     -0.00026 |       0.00000 |       0.01292 |       0.00045 |       0.11010
     -0.00084 |       0.00000 |       0.01276 |       0.00045 |       0.11046
     -0.00031 |       0.00000 |       0.01257 |       0.00047 |       0.10957
Evaluating losses...
     -0.00060 |       0.00000 |       0.01247 |       0.00048 |      

      0.00031 |       0.00000 |       0.01562 |       0.00039 |       0.10999
     -0.00038 |       0.00000 |       0.01555 |       0.00044 |       0.11024
     -0.00021 |       0.00000 |       0.01527 |       0.00040 |       0.10985
     -0.00053 |       0.00000 |       0.01517 |       0.00048 |       0.10967
     -0.00027 |       0.00000 |       0.01495 |       0.00039 |       0.10935
Evaluating losses...
      0.00057 |       0.00000 |       0.01492 |       0.00042 |       0.10928
-----------------------------------
| EpLenMean       | 668           |
| EpRewMean       | -4.8          |
| EpThisIter      | 5             |
| EpisodesSoFar   | 4920          |
| TimeElapsed     | 6.15e+03      |
| TimestepsSoFar  | 3268608       |
| ev_tdlam_before | 0.858         |
| loss_ent        | 0.10928448    |
| loss_kl         | 0.00042305083 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0005687368  |
| loss_vf_loss    | 0.014922765   |
-----------------------------------
*******

********** Iteration 803 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00052 |       0.00000 |       0.01496 |       0.00030 |       0.09614
      0.00042 |       0.00000 |       0.01423 |       0.00042 |       0.09618
     -0.00047 |       0.00000 |       0.01359 |       0.00050 |       0.09604
     -0.00081 |       0.00000 |       0.01320 |       0.00046 |       0.09604
     -0.00081 |       0.00000 |       0.01288 |       0.00052 |       0.09616
     -0.00068 |       0.00000 |       0.01268 |       0.00053 |       0.09595
     -0.00088 |       0.00000 |       0.01262 |       0.00056 |       0.09602
     -0.00082 |       0.00000 |       0.01229 |       0.00065 |       0.09635
     -0.00102 |       0.00000 |       0.01226 |       0.00068 |       0.09603
     -0.00106 |       0.00000 |       0.01216 |       0.00062 |       0.09578
Evaluating losses...
     -0.00091 |       0.00000 |       0.01189 |       0.00068 |      

     -0.00076 |       0.00000 |       0.01121 |       0.00046 |       0.10241
     -0.00027 |       0.00000 |       0.01118 |       0.00042 |       0.10182
     -0.00044 |       0.00000 |       0.01103 |       0.00048 |       0.10181
Evaluating losses...
    -6.64e-05 |       0.00000 |       0.01093 |       0.00045 |       0.10168
-----------------------------------
| EpLenMean       | 694           |
| EpRewMean       | -4.75         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 4986          |
| TimeElapsed     | 6.22e+03      |
| TimestepsSoFar  | 3313664       |
| ev_tdlam_before | 0.894         |
| loss_ent        | 0.10168047    |
| loss_kl         | 0.00044846386 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -6.644125e-05 |
| loss_vf_loss    | 0.0109305065  |
-----------------------------------
********** Iteration 809 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00092 |       0.00000 |  

********** Iteration 814 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00133 |       0.00000 |       0.01594 |       0.00036 |       0.10389
      0.00072 |       0.00000 |       0.01456 |       0.00035 |       0.10413
     3.45e-05 |       0.00000 |       0.01394 |       0.00038 |       0.10474
      0.00039 |       0.00000 |       0.01359 |       0.00040 |       0.10444
     -0.00022 |       0.00000 |       0.01330 |       0.00039 |       0.10528
    -7.89e-05 |       0.00000 |       0.01305 |       0.00043 |       0.10538
      0.00027 |       0.00000 |       0.01291 |       0.00045 |       0.10553
      0.00032 |       0.00000 |       0.01269 |       0.00048 |       0.10564
    -2.75e-05 |       0.00000 |       0.01261 |       0.00045 |       0.10543
     -0.00047 |       0.00000 |       0.01249 |       0.00046 |       0.10445
Evaluating losses...
      0.00049 |       0.00000 |       0.01222 |       0.00041 |      

     -0.00020 |       0.00000 |       0.01222 |       0.00048 |       0.10782
     -0.00025 |       0.00000 |       0.01203 |       0.00049 |       0.10774
     -0.00014 |       0.00000 |       0.01192 |       0.00055 |       0.10819
Evaluating losses...
     -0.00070 |       0.00000 |       0.01173 |       0.00052 |       0.10789
------------------------------------
| EpLenMean       | 693            |
| EpRewMean       | -4.69          |
| EpThisIter      | 5              |
| EpisodesSoFar   | 5050           |
| TimeElapsed     | 6.29e+03       |
| TimestepsSoFar  | 3358720        |
| ev_tdlam_before | 0.875          |
| loss_ent        | 0.107894026    |
| loss_kl         | 0.00052110443  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00070486753 |
| loss_vf_loss    | 0.011732847    |
------------------------------------
********** Iteration 820 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00121 |    

********** Iteration 825 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00062 |       0.00000 |       0.01252 |       0.00024 |       0.08345
    -6.04e-05 |       0.00000 |       0.01178 |       0.00027 |       0.08392
     -0.00025 |       0.00000 |       0.01141 |       0.00031 |       0.08388
    -9.72e-05 |       0.00000 |       0.01131 |       0.00032 |       0.08363
    -4.31e-05 |       0.00000 |       0.01121 |       0.00034 |       0.08314
      0.00019 |       0.00000 |       0.01101 |       0.00032 |       0.08334
     -0.00018 |       0.00000 |       0.01086 |       0.00039 |       0.08341
     -0.00036 |       0.00000 |       0.01069 |       0.00036 |       0.08316
     -0.00015 |       0.00000 |       0.01075 |       0.00034 |       0.08295
     -0.00030 |       0.00000 |       0.01054 |       0.00035 |       0.08275
Evaluating losses...
     -0.00030 |       0.00000 |       0.01064 |       0.00037 |      

     -0.00034 |       0.00000 |       0.01409 |       0.00052 |       0.10873
     -0.00089 |       0.00000 |       0.01389 |       0.00057 |       0.10857
     -0.00043 |       0.00000 |       0.01383 |       0.00054 |       0.10861
    -4.24e-05 |       0.00000 |       0.01365 |       0.00049 |       0.10869
Evaluating losses...
      0.00012 |       0.00000 |       0.01347 |       0.00056 |       0.10820
-----------------------------------
| EpLenMean       | 692           |
| EpRewMean       | -4.71         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 5117          |
| TimeElapsed     | 6.35e+03      |
| TimestepsSoFar  | 3403776       |
| ev_tdlam_before | 0.849         |
| loss_ent        | 0.108199775   |
| loss_kl         | 0.0005644445  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00011842139 |
| loss_vf_loss    | 0.013468639   |
-----------------------------------
********** Iteration 831 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 836 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00041 |       0.00000 |       0.01216 |       0.00032 |       0.10539
      0.00058 |       0.00000 |       0.01164 |       0.00036 |       0.10571
      0.00073 |       0.00000 |       0.01136 |       0.00032 |       0.10512
      0.00050 |       0.00000 |       0.01107 |       0.00036 |       0.10531
      0.00082 |       0.00000 |       0.01111 |       0.00034 |       0.10501
      0.00045 |       0.00000 |       0.01079 |       0.00035 |       0.10530
      0.00026 |       0.00000 |       0.01079 |       0.00034 |       0.10475
      0.00027 |       0.00000 |       0.01070 |       0.00036 |       0.10492
     7.95e-05 |       0.00000 |       0.01050 |       0.00035 |       0.10504
      0.00027 |       0.00000 |       0.01055 |       0.00037 |       0.10485
Evaluating losses...
      0.00036 |       0.00000 |       0.01037 |       0.00034 |      

     -0.00058 |       0.00000 |       0.01022 |       0.00045 |       0.10264
     4.92e-05 |       0.00000 |       0.01008 |       0.00055 |       0.10324
      0.00022 |       0.00000 |       0.01002 |       0.00055 |       0.10315
Evaluating losses...
    -6.84e-05 |       0.00000 |       0.00992 |       0.00053 |       0.10285
------------------------------------
| EpLenMean       | 674            |
| EpRewMean       | -4.75          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 5181           |
| TimeElapsed     | 6.42e+03       |
| TimestepsSoFar  | 3448832        |
| ev_tdlam_before | 0.906          |
| loss_ent        | 0.10284989     |
| loss_kl         | 0.00052688504  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -6.8361755e-05 |
| loss_vf_loss    | 0.009924311    |
------------------------------------
********** Iteration 842 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00048 |    

********** Iteration 847 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00072 |       0.00000 |       0.01537 |       0.00032 |       0.10289
     -0.00021 |       0.00000 |       0.01456 |       0.00030 |       0.10304
      0.00066 |       0.00000 |       0.01422 |       0.00034 |       0.10222
     4.36e-05 |       0.00000 |       0.01393 |       0.00037 |       0.10270
     -0.00048 |       0.00000 |       0.01349 |       0.00043 |       0.10283
     2.44e-05 |       0.00000 |       0.01337 |       0.00038 |       0.10233
     -0.00042 |       0.00000 |       0.01326 |       0.00037 |       0.10257
     -0.00059 |       0.00000 |       0.01308 |       0.00039 |       0.10286
     -0.00049 |       0.00000 |       0.01295 |       0.00039 |       0.10258
     4.46e-05 |       0.00000 |       0.01283 |       0.00040 |       0.10265
Evaluating losses...
     -0.00044 |       0.00000 |       0.01279 |       0.00038 |      

     -0.00055 |       0.00000 |       0.01150 |       0.00044 |       0.08477
     -0.00037 |       0.00000 |       0.01140 |       0.00046 |       0.08476
     -0.00082 |       0.00000 |       0.01112 |       0.00046 |       0.08467
     -0.00052 |       0.00000 |       0.01114 |       0.00050 |       0.08496
Evaluating losses...
     -0.00070 |       0.00000 |       0.01096 |       0.00046 |       0.08495
------------------------------------
| EpLenMean       | 680            |
| EpRewMean       | -4.79          |
| EpThisIter      | 8              |
| EpisodesSoFar   | 5248           |
| TimeElapsed     | 6.48e+03       |
| TimestepsSoFar  | 3493888        |
| ev_tdlam_before | 0.877          |
| loss_ent        | 0.08494892     |
| loss_kl         | 0.00045667062  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00070354133 |
| loss_vf_loss    | 0.010962611    |
------------------------------------
********** Iteration 853 ************
Optimizing...
     pol_surr |    

********** Iteration 858 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00015 |       0.00000 |       0.01350 |       0.00031 |       0.10706
    -6.83e-07 |       0.00000 |       0.01290 |       0.00033 |       0.10674
     8.98e-05 |       0.00000 |       0.01269 |       0.00036 |       0.10630
      0.00037 |       0.00000 |       0.01231 |       0.00033 |       0.10674
      0.00023 |       0.00000 |       0.01214 |       0.00038 |       0.10649
      0.00015 |       0.00000 |       0.01200 |       0.00034 |       0.10686
      0.00030 |       0.00000 |       0.01184 |       0.00034 |       0.10729
    -4.87e-05 |       0.00000 |       0.01167 |       0.00037 |       0.10694
      0.00050 |       0.00000 |       0.01161 |       0.00038 |       0.10735
     1.50e-05 |       0.00000 |       0.01151 |       0.00037 |       0.10730
Evaluating losses...
     3.42e-05 |       0.00000 |       0.01139 |       0.00037 |      

     -0.00014 |       0.00000 |       0.01453 |       0.00032 |       0.09560
     -0.00020 |       0.00000 |       0.01435 |       0.00034 |       0.09577
     -0.00026 |       0.00000 |       0.01432 |       0.00036 |       0.09587
     -0.00029 |       0.00000 |       0.01439 |       0.00037 |       0.09605
Evaluating losses...
     -0.00041 |       0.00000 |       0.01386 |       0.00044 |       0.09616
------------------------------------
| EpLenMean       | 668            |
| EpRewMean       | -4.77          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 5315           |
| TimeElapsed     | 6.54e+03       |
| TimestepsSoFar  | 3538944        |
| ev_tdlam_before | 0.867          |
| loss_ent        | 0.09616237     |
| loss_kl         | 0.0004393103   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00040739874 |
| loss_vf_loss    | 0.013863877    |
------------------------------------
********** Iteration 864 ************
Optimizing...
     pol_surr |    

********** Iteration 869 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     1.12e-05 |       0.00000 |       0.01419 |       0.00027 |       0.09585
      0.00034 |       0.00000 |       0.01330 |       0.00034 |       0.09605
     -0.00019 |       0.00000 |       0.01289 |       0.00039 |       0.09662
    -4.67e-05 |       0.00000 |       0.01247 |       0.00050 |       0.09670
     4.84e-05 |       0.00000 |       0.01224 |       0.00042 |       0.09635
     -0.00034 |       0.00000 |       0.01204 |       0.00044 |       0.09649
     -0.00044 |       0.00000 |       0.01189 |       0.00045 |       0.09604
     -0.00041 |       0.00000 |       0.01175 |       0.00049 |       0.09592
     -0.00073 |       0.00000 |       0.01160 |       0.00045 |       0.09579
     -0.00048 |       0.00000 |       0.01140 |       0.00049 |       0.09650
Evaluating losses...
     -0.00058 |       0.00000 |       0.01138 |       0.00045 |      

     -0.00015 |       0.00000 |       0.01318 |       0.00036 |       0.10293
     -0.00071 |       0.00000 |       0.01313 |       0.00042 |       0.10271
    -7.28e-05 |       0.00000 |       0.01302 |       0.00041 |       0.10231
Evaluating losses...
     -0.00024 |       0.00000 |       0.01282 |       0.00043 |       0.10273
------------------------------------
| EpLenMean       | 682            |
| EpRewMean       | -4.75          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 5381           |
| TimeElapsed     | 6.63e+03       |
| TimestepsSoFar  | 3584000        |
| ev_tdlam_before | 0.886          |
| loss_ent        | 0.10273111     |
| loss_kl         | 0.0004284762   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00024280953 |
| loss_vf_loss    | 0.012823971    |
------------------------------------
********** Iteration 875 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00013 |    

********** Iteration 880 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00080 |       0.00000 |       0.01437 |       0.00030 |       0.10303
      0.00053 |       0.00000 |       0.01331 |       0.00035 |       0.10365
      0.00011 |       0.00000 |       0.01294 |       0.00031 |       0.10333
      0.00043 |       0.00000 |       0.01246 |       0.00034 |       0.10359
     -0.00013 |       0.00000 |       0.01221 |       0.00033 |       0.10333
      0.00063 |       0.00000 |       0.01205 |       0.00033 |       0.10411
     -0.00012 |       0.00000 |       0.01187 |       0.00037 |       0.10361
     7.84e-05 |       0.00000 |       0.01174 |       0.00037 |       0.10377
     -0.00010 |       0.00000 |       0.01149 |       0.00039 |       0.10335
    -9.81e-05 |       0.00000 |       0.01151 |       0.00043 |       0.10403
Evaluating losses...
     -0.00015 |       0.00000 |       0.01142 |       0.00040 |      

      0.00042 |       0.00000 |       0.01083 |       0.00031 |       0.09931
      0.00033 |       0.00000 |       0.01068 |       0.00032 |       0.09907
      0.00024 |       0.00000 |       0.01063 |       0.00031 |       0.09916
     -0.00018 |       0.00000 |       0.01048 |       0.00034 |       0.09943
Evaluating losses...
     7.84e-05 |       0.00000 |       0.01046 |       0.00031 |       0.10012
-----------------------------------
| EpLenMean       | 684           |
| EpRewMean       | -4.76         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 5446          |
| TimeElapsed     | 6.72e+03      |
| TimestepsSoFar  | 3629056       |
| ev_tdlam_before | 0.899         |
| loss_ent        | 0.10012235    |
| loss_kl         | 0.00030565177 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 7.842848e-05  |
| loss_vf_loss    | 0.01046362    |
-----------------------------------
********** Iteration 886 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 891 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00072 |       0.00000 |       0.01277 |       0.00030 |       0.10155
      0.00044 |       0.00000 |       0.01205 |       0.00025 |       0.10123
      0.00065 |       0.00000 |       0.01159 |       0.00031 |       0.10158
      0.00042 |       0.00000 |       0.01124 |       0.00030 |       0.10123
     -0.00013 |       0.00000 |       0.01100 |       0.00027 |       0.10105
      0.00028 |       0.00000 |       0.01089 |       0.00028 |       0.10082
    -2.11e-05 |       0.00000 |       0.01074 |       0.00033 |       0.10088
      0.00043 |       0.00000 |       0.01054 |       0.00036 |       0.10110
      0.00037 |       0.00000 |       0.01042 |       0.00030 |       0.10065
      0.00030 |       0.00000 |       0.01038 |       0.00034 |       0.10025
Evaluating losses...
     -0.00012 |       0.00000 |       0.01020 |       0.00034 |      

      0.00020 |       0.00000 |       0.01327 |       0.00035 |       0.09473
    -5.04e-05 |       0.00000 |       0.01324 |       0.00035 |       0.09504
     -0.00012 |       0.00000 |       0.01303 |       0.00036 |       0.09482
      0.00065 |       0.00000 |       0.01307 |       0.00037 |       0.09463
Evaluating losses...
     -0.00026 |       0.00000 |       0.01278 |       0.00038 |       0.09467
------------------------------------
| EpLenMean       | 703            |
| EpRewMean       | -4.82          |
| EpThisIter      | 6              |
| EpisodesSoFar   | 5511           |
| TimeElapsed     | 6.81e+03       |
| TimestepsSoFar  | 3674112        |
| ev_tdlam_before | 0.882          |
| loss_ent        | 0.094669156    |
| loss_kl         | 0.0003824315   |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00026347674 |
| loss_vf_loss    | 0.0127837425   |
------------------------------------
********** Iteration 897 ************
Optimizing...
     pol_surr |    

********** Iteration 902 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     8.02e-05 |       0.00000 |       0.01591 |       0.00029 |       0.10873
      0.00056 |       0.00000 |       0.01491 |       0.00027 |       0.10849
      0.00072 |       0.00000 |       0.01398 |       0.00032 |       0.10781
     -0.00013 |       0.00000 |       0.01368 |       0.00030 |       0.10825
     -0.00069 |       0.00000 |       0.01321 |       0.00033 |       0.10856
      0.00052 |       0.00000 |       0.01304 |       0.00031 |       0.10825
      0.00050 |       0.00000 |       0.01287 |       0.00031 |       0.10822
     -0.00042 |       0.00000 |       0.01246 |       0.00034 |       0.10817
     3.71e-05 |       0.00000 |       0.01254 |       0.00033 |       0.10809
     4.40e-05 |       0.00000 |       0.01226 |       0.00032 |       0.10811
Evaluating losses...
     -0.00013 |       0.00000 |       0.01219 |       0.00033 |      

     7.19e-05 |       0.00000 |       0.01475 |       0.00031 |       0.09939
      0.00024 |       0.00000 |       0.01454 |       0.00032 |       0.09973
      0.00034 |       0.00000 |       0.01441 |       0.00033 |       0.10004
Evaluating losses...
     7.32e-05 |       0.00000 |       0.01420 |       0.00033 |       0.09999
-----------------------------------
| EpLenMean       | 677           |
| EpRewMean       | -4.78         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 5578          |
| TimeElapsed     | 6.89e+03      |
| TimestepsSoFar  | 3719168       |
| ev_tdlam_before | 0.85          |
| loss_ent        | 0.09999351    |
| loss_kl         | 0.00033004617 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 7.321057e-05  |
| loss_vf_loss    | 0.014199345   |
-----------------------------------
********** Iteration 908 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00040 |       0.00000 |  

********** Iteration 913 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00044 |       0.00000 |       0.01789 |       0.00037 |       0.11001
      0.00131 |       0.00000 |       0.01689 |       0.00034 |       0.11042
      0.00077 |       0.00000 |       0.01632 |       0.00034 |       0.11050
      0.00047 |       0.00000 |       0.01607 |       0.00034 |       0.11001
      0.00025 |       0.00000 |       0.01580 |       0.00040 |       0.11071
      0.00043 |       0.00000 |       0.01569 |       0.00035 |       0.11015
      0.00017 |       0.00000 |       0.01530 |       0.00034 |       0.11062
      0.00019 |       0.00000 |       0.01521 |       0.00034 |       0.11047
      0.00027 |       0.00000 |       0.01492 |       0.00039 |       0.11098
      0.00040 |       0.00000 |       0.01498 |       0.00036 |       0.11036
Evaluating losses...
      0.00017 |       0.00000 |       0.01495 |       0.00037 |      

    -7.37e-05 |       0.00000 |       0.01286 |       0.00034 |       0.09902
      0.00028 |       0.00000 |       0.01266 |       0.00037 |       0.09891
      0.00032 |       0.00000 |       0.01241 |       0.00035 |       0.09972
      0.00052 |       0.00000 |       0.01228 |       0.00035 |       0.09919
     -0.00055 |       0.00000 |       0.01217 |       0.00035 |       0.09880
Evaluating losses...
     4.29e-05 |       0.00000 |       0.01214 |       0.00036 |       0.09926
----------------------------------
| EpLenMean       | 718          |
| EpRewMean       | -4.75        |
| EpThisIter      | 5            |
| EpisodesSoFar   | 5638         |
| TimeElapsed     | 6.99e+03     |
| TimestepsSoFar  | 3764224      |
| ev_tdlam_before | 0.879        |
| loss_ent        | 0.09925646   |
| loss_kl         | 0.0003586442 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 4.286866e-05 |
| loss_vf_loss    | 0.012139119  |
----------------------------------
********** Iteration 

********** Iteration 924 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00118 |       0.00000 |       0.01488 |       0.00024 |       0.09551
      0.00045 |       0.00000 |       0.01415 |       0.00031 |       0.09507
    -9.25e-05 |       0.00000 |       0.01379 |       0.00040 |       0.09444
    -3.77e-05 |       0.00000 |       0.01356 |       0.00037 |       0.09455
     -0.00013 |       0.00000 |       0.01332 |       0.00037 |       0.09472
    -8.51e-05 |       0.00000 |       0.01306 |       0.00035 |       0.09428
     -0.00022 |       0.00000 |       0.01288 |       0.00038 |       0.09436
     -0.00062 |       0.00000 |       0.01266 |       0.00042 |       0.09419
     -0.00040 |       0.00000 |       0.01250 |       0.00035 |       0.09436
     -0.00016 |       0.00000 |       0.01237 |       0.00038 |       0.09428
Evaluating losses...
     -0.00052 |       0.00000 |       0.01241 |       0.00040 |      

      0.00012 |       0.00000 |       0.01135 |       0.00030 |       0.09352
      0.00015 |       0.00000 |       0.01132 |       0.00030 |       0.09386
     3.15e-05 |       0.00000 |       0.01129 |       0.00030 |       0.09413
Evaluating losses...
     -0.00043 |       0.00000 |       0.01114 |       0.00028 |       0.09411
------------------------------------
| EpLenMean       | 685            |
| EpRewMean       | -4.81          |
| EpThisIter      | 7              |
| EpisodesSoFar   | 5706           |
| TimeElapsed     | 7.08e+03       |
| TimestepsSoFar  | 3809280        |
| ev_tdlam_before | 0.896          |
| loss_ent        | 0.09410663     |
| loss_kl         | 0.00028094323  |
| loss_pol_entpen | 0.0            |
| loss_pol_surr   | -0.00042849226 |
| loss_vf_loss    | 0.011135091    |
------------------------------------
********** Iteration 930 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00089 |    

********** Iteration 935 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00061 |       0.00000 |       0.02210 |       0.00026 |       0.09377
      0.00034 |       0.00000 |       0.02055 |       0.00027 |       0.09354
      0.00065 |       0.00000 |       0.01967 |       0.00029 |       0.09315
     -0.00012 |       0.00000 |       0.01917 |       0.00031 |       0.09337
      0.00032 |       0.00000 |       0.01875 |       0.00030 |       0.09295
     8.92e-05 |       0.00000 |       0.01826 |       0.00030 |       0.09300
     5.21e-05 |       0.00000 |       0.01820 |       0.00030 |       0.09322
      0.00017 |       0.00000 |       0.01791 |       0.00031 |       0.09294
      0.00019 |       0.00000 |       0.01745 |       0.00032 |       0.09309
      0.00058 |       0.00000 |       0.01751 |       0.00036 |       0.09293
Evaluating losses...
      0.00050 |       0.00000 |       0.01741 |       0.00033 |      

     3.80e-05 |       0.00000 |       0.01305 |       0.00032 |       0.09743
     -0.00025 |       0.00000 |       0.01293 |       0.00033 |       0.09706
     -0.00024 |       0.00000 |       0.01284 |       0.00033 |       0.09685
Evaluating losses...
     -0.00059 |       0.00000 |       0.01271 |       0.00032 |       0.09718
-----------------------------------
| EpLenMean       | 668           |
| EpRewMean       | -4.82         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 5774          |
| TimeElapsed     | 7.17e+03      |
| TimestepsSoFar  | 3854336       |
| ev_tdlam_before | 0.873         |
| loss_ent        | 0.09718215    |
| loss_kl         | 0.00032054365 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -0.0005945279 |
| loss_vf_loss    | 0.01270693    |
-----------------------------------
********** Iteration 941 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00026 |       0.00000 |  

********** Iteration 946 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00051 |       0.00000 |       0.01358 |       0.00024 |       0.09159
      0.00040 |       0.00000 |       0.01327 |       0.00024 |       0.09168
      0.00074 |       0.00000 |       0.01319 |       0.00026 |       0.09121
      0.00064 |       0.00000 |       0.01295 |       0.00024 |       0.09163
     -0.00011 |       0.00000 |       0.01270 |       0.00026 |       0.09143
    -7.94e-05 |       0.00000 |       0.01262 |       0.00028 |       0.09096
      0.00066 |       0.00000 |       0.01258 |       0.00029 |       0.09099
      0.00043 |       0.00000 |       0.01245 |       0.00027 |       0.09119
     6.91e-05 |       0.00000 |       0.01237 |       0.00029 |       0.09105
      0.00021 |       0.00000 |       0.01232 |       0.00026 |       0.09098
Evaluating losses...
      0.00031 |       0.00000 |       0.01229 |       0.00028 |      

      0.00017 |       0.00000 |       0.01463 |       0.00029 |       0.09859
      0.00011 |       0.00000 |       0.01440 |       0.00033 |       0.09846
      0.00082 |       0.00000 |       0.01442 |       0.00029 |       0.09831
Evaluating losses...
     6.72e-06 |       0.00000 |       0.01419 |       0.00029 |       0.09847
-----------------------------------
| EpLenMean       | 646           |
| EpRewMean       | -4.86         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 5845          |
| TimeElapsed     | 7.26e+03      |
| TimestepsSoFar  | 3899392       |
| ev_tdlam_before | 0.868         |
| loss_ent        | 0.09847218    |
| loss_kl         | 0.00029395713 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 6.7236833e-06 |
| loss_vf_loss    | 0.014189464   |
-----------------------------------
********** Iteration 952 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00101 |       0.00000 |  

********** Iteration 957 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00023 |       0.00000 |       0.01570 |       0.00028 |       0.10190
      0.00061 |       0.00000 |       0.01511 |       0.00029 |       0.10192
      0.00079 |       0.00000 |       0.01484 |       0.00031 |       0.10185
      0.00082 |       0.00000 |       0.01449 |       0.00029 |       0.10237
      0.00139 |       0.00000 |       0.01437 |       0.00030 |       0.10297
      0.00018 |       0.00000 |       0.01411 |       0.00032 |       0.10222
      0.00089 |       0.00000 |       0.01406 |       0.00033 |       0.10175
      0.00043 |       0.00000 |       0.01395 |       0.00033 |       0.10157
      0.00091 |       0.00000 |       0.01394 |       0.00033 |       0.10172
      0.00047 |       0.00000 |       0.01373 |       0.00034 |       0.10170
Evaluating losses...
      0.00033 |       0.00000 |       0.01367 |       0.00031 |      

      0.00046 |       0.00000 |       0.01229 |       0.00034 |       0.10992
      0.00020 |       0.00000 |       0.01235 |       0.00033 |       0.10968
      0.00040 |       0.00000 |       0.01222 |       0.00033 |       0.10968
Evaluating losses...
      0.00019 |       0.00000 |       0.01209 |       0.00032 |       0.10961
-----------------------------------
| EpLenMean       | 683           |
| EpRewMean       | -4.81         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 5908          |
| TimeElapsed     | 7.36e+03      |
| TimestepsSoFar  | 3944448       |
| ev_tdlam_before | 0.883         |
| loss_ent        | 0.10961055    |
| loss_kl         | 0.00032429773 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00018876581 |
| loss_vf_loss    | 0.012086617   |
-----------------------------------
********** Iteration 963 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00033 |       0.00000 |  

********** Iteration 968 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00112 |       0.00000 |       0.01638 |       0.00032 |       0.12359
      0.00048 |       0.00000 |       0.01567 |       0.00032 |       0.12359
      0.00069 |       0.00000 |       0.01526 |       0.00036 |       0.12386
      0.00031 |       0.00000 |       0.01479 |       0.00033 |       0.12387
      0.00027 |       0.00000 |       0.01466 |       0.00033 |       0.12355
      0.00071 |       0.00000 |       0.01450 |       0.00038 |       0.12382
      0.00031 |       0.00000 |       0.01427 |       0.00036 |       0.12366
      0.00019 |       0.00000 |       0.01423 |       0.00036 |       0.12372
      0.00058 |       0.00000 |       0.01403 |       0.00037 |       0.12385
      0.00030 |       0.00000 |       0.01397 |       0.00038 |       0.12356
Evaluating losses...
     9.31e-05 |       0.00000 |       0.01381 |       0.00036 |      

      0.00058 |       0.00000 |       0.01192 |       0.00025 |       0.08901
      0.00098 |       0.00000 |       0.01194 |       0.00027 |       0.08910
      0.00045 |       0.00000 |       0.01191 |       0.00028 |       0.08883
Evaluating losses...
      0.00024 |       0.00000 |       0.01182 |       0.00029 |       0.08916
-----------------------------------
| EpLenMean       | 725           |
| EpRewMean       | -4.7          |
| EpThisIter      | 7             |
| EpisodesSoFar   | 5971          |
| TimeElapsed     | 7.45e+03      |
| TimestepsSoFar  | 3989504       |
| ev_tdlam_before | 0.899         |
| loss_ent        | 0.0891611     |
| loss_kl         | 0.00028735065 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00024430966 |
| loss_vf_loss    | 0.01182216    |
-----------------------------------
********** Iteration 974 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00053 |       0.00000 |  

********** Iteration 979 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00078 |       0.00000 |       0.01635 |       0.00023 |       0.09139
      0.00086 |       0.00000 |       0.01559 |       0.00027 |       0.09123
      0.00085 |       0.00000 |       0.01533 |       0.00030 |       0.09140
      0.00054 |       0.00000 |       0.01505 |       0.00026 |       0.09118
     6.63e-05 |       0.00000 |       0.01477 |       0.00027 |       0.09123
      0.00020 |       0.00000 |       0.01461 |       0.00024 |       0.09120
      0.00048 |       0.00000 |       0.01439 |       0.00029 |       0.09082
      0.00074 |       0.00000 |       0.01428 |       0.00027 |       0.09107
      0.00044 |       0.00000 |       0.01413 |       0.00028 |       0.09153
      0.00028 |       0.00000 |       0.01401 |       0.00029 |       0.09168
Evaluating losses...
    -8.88e-05 |       0.00000 |       0.01393 |       0.00028 |      

      0.00021 |       0.00000 |       0.01251 |       0.00030 |       0.09554
      0.00049 |       0.00000 |       0.01233 |       0.00033 |       0.09591
      0.00041 |       0.00000 |       0.01225 |       0.00035 |       0.09562
      0.00016 |       0.00000 |       0.01209 |       0.00033 |       0.09566
Evaluating losses...
      0.00024 |       0.00000 |       0.01202 |       0.00033 |       0.09564
-----------------------------------
| EpLenMean       | 730           |
| EpRewMean       | -4.69         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 6031          |
| TimeElapsed     | 7.55e+03      |
| TimestepsSoFar  | 4034560       |
| ev_tdlam_before | 0.88          |
| loss_ent        | 0.09563667    |
| loss_kl         | 0.0003258066  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00024432968 |
| loss_vf_loss    | 0.01202056    |
-----------------------------------
********** Iteration 985 ************
Optimizing...
     pol_surr |    pol_entpen |  

********** Iteration 990 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00092 |       0.00000 |       0.01390 |       0.00025 |       0.09809
      0.00027 |       0.00000 |       0.01355 |       0.00025 |       0.09746
      0.00025 |       0.00000 |       0.01339 |       0.00029 |       0.09741
      0.00015 |       0.00000 |       0.01333 |       0.00027 |       0.09769
     8.66e-05 |       0.00000 |       0.01310 |       0.00030 |       0.09730
     -0.00028 |       0.00000 |       0.01313 |       0.00029 |       0.09705
    -5.52e-05 |       0.00000 |       0.01292 |       0.00035 |       0.09654
     -0.00038 |       0.00000 |       0.01276 |       0.00040 |       0.09642
     -0.00025 |       0.00000 |       0.01258 |       0.00037 |       0.09633
     -0.00010 |       0.00000 |       0.01252 |       0.00041 |       0.09633
Evaluating losses...
     -0.00028 |       0.00000 |       0.01257 |       0.00042 |      

      0.00055 |       0.00000 |       0.01289 |       0.00024 |       0.08944
      0.00047 |       0.00000 |       0.01270 |       0.00027 |       0.08936
     6.40e-05 |       0.00000 |       0.01256 |       0.00024 |       0.08970
Evaluating losses...
      0.00017 |       0.00000 |       0.01250 |       0.00024 |       0.08960
-----------------------------------
| EpLenMean       | 714           |
| EpRewMean       | -4.75         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6096          |
| TimeElapsed     | 7.64e+03      |
| TimestepsSoFar  | 4079616       |
| ev_tdlam_before | 0.881         |
| loss_ent        | 0.08959835    |
| loss_kl         | 0.00023662994 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00016883336 |
| loss_vf_loss    | 0.0124985045  |
-----------------------------------
********** Iteration 996 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00078 |       0.00000 |  

********** Iteration 1001 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00036 |       0.00000 |       0.01623 |       0.00020 |       0.08097
      0.00063 |       0.00000 |       0.01556 |       0.00021 |       0.08121
      0.00047 |       0.00000 |       0.01502 |       0.00022 |       0.08077
      0.00066 |       0.00000 |       0.01486 |       0.00021 |       0.08066
      0.00058 |       0.00000 |       0.01454 |       0.00023 |       0.08086
      0.00049 |       0.00000 |       0.01425 |       0.00021 |       0.08099
      0.00032 |       0.00000 |       0.01409 |       0.00023 |       0.08098
      0.00023 |       0.00000 |       0.01387 |       0.00022 |       0.08117
      0.00025 |       0.00000 |       0.01371 |       0.00025 |       0.08131
      0.00061 |       0.00000 |       0.01348 |       0.00026 |       0.08118
Evaluating losses...
      0.00013 |       0.00000 |       0.01340 |       0.00021 |     

      0.00064 |       0.00000 |       0.01283 |       0.00039 |       0.09824
     -0.00018 |       0.00000 |       0.01280 |       0.00038 |       0.09815
      0.00023 |       0.00000 |       0.01268 |       0.00038 |       0.09853
Evaluating losses...
    -8.14e-05 |       0.00000 |       0.01260 |       0.00041 |       0.09842
-----------------------------------
| EpLenMean       | 689           |
| EpRewMean       | -4.79         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 6160          |
| TimeElapsed     | 7.73e+03      |
| TimestepsSoFar  | 4124672       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 0.098416284   |
| loss_kl         | 0.00041400935 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | -8.141401e-05 |
| loss_vf_loss    | 0.0125997355  |
-----------------------------------
********** Iteration 1007 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00067 |       0.00000 | 

********** Iteration 1012 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00032 |       0.00000 |       0.01510 |       0.00027 |       0.10775
      0.00079 |       0.00000 |       0.01457 |       0.00031 |       0.10822
      0.00116 |       0.00000 |       0.01433 |       0.00029 |       0.10772
      0.00045 |       0.00000 |       0.01383 |       0.00027 |       0.10754
      0.00039 |       0.00000 |       0.01369 |       0.00029 |       0.10754
      0.00100 |       0.00000 |       0.01339 |       0.00027 |       0.10759
      0.00027 |       0.00000 |       0.01337 |       0.00029 |       0.10800
      0.00078 |       0.00000 |       0.01309 |       0.00030 |       0.10766
     9.82e-05 |       0.00000 |       0.01300 |       0.00028 |       0.10761
      0.00021 |       0.00000 |       0.01281 |       0.00030 |       0.10799
Evaluating losses...
     5.93e-05 |       0.00000 |       0.01260 |       0.00031 |     

      0.00067 |       0.00000 |       0.01405 |       0.00026 |       0.09222
      0.00041 |       0.00000 |       0.01401 |       0.00025 |       0.09208
      0.00076 |       0.00000 |       0.01392 |       0.00026 |       0.09181
Evaluating losses...
      0.00054 |       0.00000 |       0.01386 |       0.00028 |       0.09183
-----------------------------------
| EpLenMean       | 698           |
| EpRewMean       | -4.78         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6226          |
| TimeElapsed     | 7.81e+03      |
| TimestepsSoFar  | 4169728       |
| ev_tdlam_before | 0.866         |
| loss_ent        | 0.0918286     |
| loss_kl         | 0.0002776458  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00053640833 |
| loss_vf_loss    | 0.013859432   |
-----------------------------------
********** Iteration 1018 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00049 |       0.00000 | 

********** Iteration 1023 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00042 |       0.00000 |       0.01542 |       0.00020 |       0.08481
      0.00084 |       0.00000 |       0.01496 |       0.00021 |       0.08480
      0.00048 |       0.00000 |       0.01456 |       0.00022 |       0.08494
      0.00059 |       0.00000 |       0.01418 |       0.00023 |       0.08508
      0.00031 |       0.00000 |       0.01400 |       0.00024 |       0.08545
      0.00054 |       0.00000 |       0.01383 |       0.00026 |       0.08555
      0.00044 |       0.00000 |       0.01364 |       0.00023 |       0.08532
      0.00035 |       0.00000 |       0.01354 |       0.00026 |       0.08502
      0.00025 |       0.00000 |       0.01345 |       0.00021 |       0.08488
      0.00059 |       0.00000 |       0.01351 |       0.00026 |       0.08516
Evaluating losses...
      0.00075 |       0.00000 |       0.01324 |       0.00027 |     

      0.00054 |       0.00000 |       0.01062 |       0.00028 |       0.09022
      0.00045 |       0.00000 |       0.01062 |       0.00030 |       0.09046
      0.00070 |       0.00000 |       0.01049 |       0.00026 |       0.09036
Evaluating losses...
      0.00050 |       0.00000 |       0.01049 |       0.00030 |       0.09020
-----------------------------------
| EpLenMean       | 659           |
| EpRewMean       | -4.78         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 6295          |
| TimeElapsed     | 7.9e+03       |
| TimestepsSoFar  | 4214784       |
| ev_tdlam_before | 0.891         |
| loss_ent        | 0.0902026     |
| loss_kl         | 0.0003048713  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00050258706 |
| loss_vf_loss    | 0.010492776   |
-----------------------------------
********** Iteration 1029 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00037 |       0.00000 | 

********** Iteration 1034 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00051 |       0.00000 |       0.01312 |       0.00023 |       0.08894
      0.00091 |       0.00000 |       0.01235 |       0.00023 |       0.08927
      0.00058 |       0.00000 |       0.01216 |       0.00022 |       0.08941
      0.00044 |       0.00000 |       0.01171 |       0.00025 |       0.08939
      0.00070 |       0.00000 |       0.01148 |       0.00026 |       0.08959
     2.39e-06 |       0.00000 |       0.01135 |       0.00026 |       0.08934
      0.00034 |       0.00000 |       0.01119 |       0.00026 |       0.08908
      0.00020 |       0.00000 |       0.01102 |       0.00024 |       0.08919
      0.00030 |       0.00000 |       0.01089 |       0.00024 |       0.08941
    -3.83e-05 |       0.00000 |       0.01077 |       0.00027 |       0.08939
Evaluating losses...
      0.00029 |       0.00000 |       0.01076 |       0.00022 |     

      0.00041 |       0.00000 |       0.01314 |       0.00020 |       0.07572
      0.00022 |       0.00000 |       0.01310 |       0.00021 |       0.07589
      0.00019 |       0.00000 |       0.01298 |       0.00020 |       0.07557
      0.00022 |       0.00000 |       0.01290 |       0.00020 |       0.07582
Evaluating losses...
      0.00050 |       0.00000 |       0.01274 |       0.00024 |       0.07548
-----------------------------------
| EpLenMean       | 672           |
| EpRewMean       | -4.87         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 6361          |
| TimeElapsed     | 7.99e+03      |
| TimestepsSoFar  | 4259840       |
| ev_tdlam_before | 0.884         |
| loss_ent        | 0.07548081    |
| loss_kl         | 0.00024016957 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0004980662  |
| loss_vf_loss    | 0.012738672   |
-----------------------------------
********** Iteration 1040 ************
Optimizing...
     pol_surr |    pol_entpen | 

********** Iteration 1045 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00058 |       0.00000 |       0.01232 |       0.00024 |       0.09197
      0.00083 |       0.00000 |       0.01189 |       0.00022 |       0.09184
      0.00040 |       0.00000 |       0.01159 |       0.00023 |       0.09211
      0.00053 |       0.00000 |       0.01132 |       0.00024 |       0.09201
      0.00065 |       0.00000 |       0.01115 |       0.00025 |       0.09215
      0.00069 |       0.00000 |       0.01104 |       0.00021 |       0.09197
      0.00035 |       0.00000 |       0.01094 |       0.00023 |       0.09208
      0.00073 |       0.00000 |       0.01083 |       0.00025 |       0.09190
      0.00025 |       0.00000 |       0.01067 |       0.00023 |       0.09189
      0.00040 |       0.00000 |       0.01060 |       0.00023 |       0.09180
Evaluating losses...
      0.00046 |       0.00000 |       0.01057 |       0.00023 |     

      0.00053 |       0.00000 |       0.01230 |       0.00022 |       0.08607
      0.00057 |       0.00000 |       0.01218 |       0.00022 |       0.08588
      0.00029 |       0.00000 |       0.01202 |       0.00022 |       0.08621
Evaluating losses...
      0.00036 |       0.00000 |       0.01194 |       0.00020 |       0.08589
-----------------------------------
| EpLenMean       | 697           |
| EpRewMean       | -4.85         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6425          |
| TimeElapsed     | 8.09e+03      |
| TimestepsSoFar  | 4304896       |
| ev_tdlam_before | 0.885         |
| loss_ent        | 0.0858883     |
| loss_kl         | 0.00020200516 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0003612698  |
| loss_vf_loss    | 0.011935248   |
-----------------------------------
********** Iteration 1051 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00042 |       0.00000 | 

********** Iteration 1056 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00065 |       0.00000 |       0.01499 |       0.00022 |       0.08422
      0.00063 |       0.00000 |       0.01441 |       0.00022 |       0.08409
      0.00026 |       0.00000 |       0.01402 |       0.00020 |       0.08403
      0.00043 |       0.00000 |       0.01383 |       0.00022 |       0.08423
      0.00062 |       0.00000 |       0.01353 |       0.00022 |       0.08421
      0.00070 |       0.00000 |       0.01341 |       0.00023 |       0.08399
      0.00026 |       0.00000 |       0.01326 |       0.00023 |       0.08401
     9.95e-05 |       0.00000 |       0.01315 |       0.00020 |       0.08412
      0.00052 |       0.00000 |       0.01297 |       0.00021 |       0.08420
      0.00047 |       0.00000 |       0.01281 |       0.00022 |       0.08409
Evaluating losses...
      0.00023 |       0.00000 |       0.01268 |       0.00021 |     

      0.00053 |       0.00000 |       0.01038 |       0.00025 |       0.08736
      0.00064 |       0.00000 |       0.01017 |       0.00026 |       0.08720
      0.00034 |       0.00000 |       0.01024 |       0.00027 |       0.08734
Evaluating losses...
     6.31e-05 |       0.00000 |       0.01015 |       0.00025 |       0.08719
-----------------------------------
| EpLenMean       | 688           |
| EpRewMean       | -4.77         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6490          |
| TimeElapsed     | 8.18e+03      |
| TimestepsSoFar  | 4349952       |
| ev_tdlam_before | 0.907         |
| loss_ent        | 0.08718636    |
| loss_kl         | 0.00025464935 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 6.312414e-05  |
| loss_vf_loss    | 0.010147874   |
-----------------------------------
********** Iteration 1062 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00076 |       0.00000 | 

********** Iteration 1067 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00064 |       0.00000 |       0.01520 |       0.00020 |       0.08634
      0.00080 |       0.00000 |       0.01483 |       0.00023 |       0.08607
      0.00082 |       0.00000 |       0.01445 |       0.00025 |       0.08626
      0.00019 |       0.00000 |       0.01419 |       0.00021 |       0.08633
      0.00043 |       0.00000 |       0.01399 |       0.00026 |       0.08660
      0.00027 |       0.00000 |       0.01380 |       0.00024 |       0.08655
      0.00043 |       0.00000 |       0.01363 |       0.00024 |       0.08643
      0.00087 |       0.00000 |       0.01352 |       0.00024 |       0.08663
      0.00028 |       0.00000 |       0.01340 |       0.00024 |       0.08645
      0.00033 |       0.00000 |       0.01322 |       0.00023 |       0.08692
Evaluating losses...
      0.00034 |       0.00000 |       0.01326 |       0.00021 |     

      0.00030 |       0.00000 |       0.01309 |       0.00025 |       0.09160
      0.00025 |       0.00000 |       0.01299 |       0.00024 |       0.09152
      0.00023 |       0.00000 |       0.01289 |       0.00023 |       0.09146
Evaluating losses...
      0.00036 |       0.00000 |       0.01282 |       0.00024 |       0.09121
-----------------------------------
| EpLenMean       | 709           |
| EpRewMean       | -4.74         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6553          |
| TimeElapsed     | 8.27e+03      |
| TimestepsSoFar  | 4395008       |
| ev_tdlam_before | 0.895         |
| loss_ent        | 0.091205806   |
| loss_kl         | 0.0002381654  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00035636488 |
| loss_vf_loss    | 0.012824744   |
-----------------------------------
********** Iteration 1073 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 | 

********** Iteration 1078 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00081 |       0.00000 |       0.01544 |       0.00022 |       0.09034
      0.00031 |       0.00000 |       0.01461 |       0.00022 |       0.09030
      0.00084 |       0.00000 |       0.01417 |       0.00020 |       0.09036
      0.00044 |       0.00000 |       0.01383 |       0.00022 |       0.09044
      0.00059 |       0.00000 |       0.01359 |       0.00024 |       0.09029
      0.00048 |       0.00000 |       0.01350 |       0.00024 |       0.09033
      0.00035 |       0.00000 |       0.01319 |       0.00022 |       0.09062
      0.00074 |       0.00000 |       0.01315 |       0.00021 |       0.09083
      0.00053 |       0.00000 |       0.01313 |       0.00022 |       0.09067
      0.00037 |       0.00000 |       0.01301 |       0.00023 |       0.09083
Evaluating losses...
      0.00033 |       0.00000 |       0.01288 |       0.00024 |     

      0.00022 |       0.00000 |       0.01372 |       0.00023 |       0.07738
      0.00021 |       0.00000 |       0.01374 |       0.00017 |       0.07746
      0.00025 |       0.00000 |       0.01367 |       0.00020 |       0.07737
Evaluating losses...
      0.00026 |       0.00000 |       0.01360 |       0.00023 |       0.07771
-----------------------------------
| EpLenMean       | 677           |
| EpRewMean       | -4.79         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 6621          |
| TimeElapsed     | 8.36e+03      |
| TimestepsSoFar  | 4440064       |
| ev_tdlam_before | 0.875         |
| loss_ent        | 0.077713884   |
| loss_kl         | 0.0002312171  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00025618562 |
| loss_vf_loss    | 0.01360411    |
-----------------------------------
********** Iteration 1084 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00063 |       0.00000 | 

********** Iteration 1089 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00080 |       0.00000 |       0.01123 |       0.00020 |       0.09092
      0.00031 |       0.00000 |       0.01089 |       0.00021 |       0.09106
      0.00121 |       0.00000 |       0.01061 |       0.00023 |       0.09101
      0.00074 |       0.00000 |       0.01049 |       0.00019 |       0.09087
      0.00106 |       0.00000 |       0.01042 |       0.00023 |       0.09067
      0.00081 |       0.00000 |       0.01036 |       0.00024 |       0.09103
      0.00055 |       0.00000 |       0.01021 |       0.00026 |       0.09036
      0.00110 |       0.00000 |       0.01008 |       0.00024 |       0.09036
      0.00091 |       0.00000 |       0.01008 |       0.00027 |       0.09048
      0.00072 |       0.00000 |       0.01000 |       0.00024 |       0.09062
Evaluating losses...
      0.00092 |       0.00000 |       0.00989 |       0.00024 |     

      0.00085 |       0.00000 |       0.01064 |       0.00024 |       0.09024
      0.00074 |       0.00000 |       0.01061 |       0.00022 |       0.09038
      0.00063 |       0.00000 |       0.01054 |       0.00026 |       0.09018
Evaluating losses...
      0.00078 |       0.00000 |       0.01047 |       0.00025 |       0.09007
-----------------------------------
| EpLenMean       | 693           |
| EpRewMean       | -4.73         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 6685          |
| TimeElapsed     | 8.41e+03      |
| TimestepsSoFar  | 4485120       |
| ev_tdlam_before | 0.888         |
| loss_ent        | 0.09006618    |
| loss_kl         | 0.00024913266 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00077968556 |
| loss_vf_loss    | 0.010468809   |
-----------------------------------
********** Iteration 1095 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00082 |       0.00000 | 

********** Iteration 1100 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00100 |       0.00000 |       0.01443 |       0.00021 |       0.08979
      0.00094 |       0.00000 |       0.01428 |       0.00021 |       0.08962
      0.00081 |       0.00000 |       0.01402 |       0.00023 |       0.08944
      0.00053 |       0.00000 |       0.01400 |       0.00019 |       0.08944
      0.00053 |       0.00000 |       0.01383 |       0.00021 |       0.08928
      0.00055 |       0.00000 |       0.01363 |       0.00026 |       0.08914
      0.00077 |       0.00000 |       0.01356 |       0.00021 |       0.08934
      0.00052 |       0.00000 |       0.01344 |       0.00024 |       0.08906
      0.00051 |       0.00000 |       0.01341 |       0.00024 |       0.08938
      0.00039 |       0.00000 |       0.01329 |       0.00023 |       0.08928
Evaluating losses...
      0.00054 |       0.00000 |       0.01323 |       0.00023 |     

      0.00042 |       0.00000 |       0.01124 |       0.00018 |       0.07531
      0.00051 |       0.00000 |       0.01120 |       0.00017 |       0.07533
      0.00046 |       0.00000 |       0.01113 |       0.00021 |       0.07537
Evaluating losses...
      0.00031 |       0.00000 |       0.01109 |       0.00021 |       0.07529
-----------------------------------
| EpLenMean       | 714           |
| EpRewMean       | -4.81         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6749          |
| TimeElapsed     | 8.46e+03      |
| TimestepsSoFar  | 4530176       |
| ev_tdlam_before | 0.898         |
| loss_ent        | 0.07528651    |
| loss_kl         | 0.00020640144 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00030827709 |
| loss_vf_loss    | 0.0110947     |
-----------------------------------
********** Iteration 1106 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00169 |       0.00000 | 

********** Iteration 1111 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00106 |       0.00000 |       0.01633 |       0.00019 |       0.08868
      0.00083 |       0.00000 |       0.01604 |       0.00020 |       0.08853
      0.00142 |       0.00000 |       0.01576 |       0.00022 |       0.08843
      0.00045 |       0.00000 |       0.01558 |       0.00021 |       0.08840
      0.00086 |       0.00000 |       0.01520 |       0.00021 |       0.08836
      0.00024 |       0.00000 |       0.01530 |       0.00021 |       0.08835
      0.00060 |       0.00000 |       0.01523 |       0.00021 |       0.08846
      0.00064 |       0.00000 |       0.01478 |       0.00022 |       0.08845
      0.00075 |       0.00000 |       0.01483 |       0.00023 |       0.08818
      0.00092 |       0.00000 |       0.01481 |       0.00021 |       0.08839
Evaluating losses...
      0.00089 |       0.00000 |       0.01482 |       0.00024 |     

      0.00074 |       0.00000 |       0.01403 |       0.00021 |       0.08741
      0.00051 |       0.00000 |       0.01397 |       0.00021 |       0.08706
      0.00068 |       0.00000 |       0.01396 |       0.00022 |       0.08710
Evaluating losses...
      0.00059 |       0.00000 |       0.01390 |       0.00020 |       0.08707
-----------------------------------
| EpLenMean       | 685           |
| EpRewMean       | -4.8          |
| EpThisIter      | 5             |
| EpisodesSoFar   | 6816          |
| TimeElapsed     | 8.51e+03      |
| TimestepsSoFar  | 4575232       |
| ev_tdlam_before | 0.875         |
| loss_ent        | 0.08707449    |
| loss_kl         | 0.00020015848 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0005924642  |
| loss_vf_loss    | 0.013899901   |
-----------------------------------
********** Iteration 1117 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 | 

********** Iteration 1122 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00076 |       0.00000 |       0.01413 |       0.00021 |       0.08953
      0.00104 |       0.00000 |       0.01375 |       0.00021 |       0.08977
      0.00075 |       0.00000 |       0.01333 |       0.00020 |       0.08987
      0.00063 |       0.00000 |       0.01310 |       0.00019 |       0.09010
      0.00091 |       0.00000 |       0.01298 |       0.00020 |       0.08994
      0.00063 |       0.00000 |       0.01287 |       0.00022 |       0.09014
      0.00065 |       0.00000 |       0.01276 |       0.00021 |       0.09008
      0.00056 |       0.00000 |       0.01266 |       0.00022 |       0.09013
      0.00104 |       0.00000 |       0.01259 |       0.00024 |       0.09033
      0.00087 |       0.00000 |       0.01262 |       0.00023 |       0.09035
Evaluating losses...
      0.00079 |       0.00000 |       0.01239 |       0.00022 |     

      0.00054 |       0.00000 |       0.01406 |       0.00020 |       0.07665
      0.00071 |       0.00000 |       0.01414 |       0.00018 |       0.07648
      0.00048 |       0.00000 |       0.01397 |       0.00019 |       0.07665
Evaluating losses...
      0.00059 |       0.00000 |       0.01409 |       0.00017 |       0.07666
-----------------------------------
| EpLenMean       | 679           |
| EpRewMean       | -4.8          |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6882          |
| TimeElapsed     | 8.56e+03      |
| TimestepsSoFar  | 4620288       |
| ev_tdlam_before | 0.892         |
| loss_ent        | 0.0766636     |
| loss_kl         | 0.00016765851 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00058643776 |
| loss_vf_loss    | 0.014090679   |
-----------------------------------
********** Iteration 1128 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00055 |       0.00000 | 

********** Iteration 1133 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00048 |       0.00000 |       0.01223 |       0.00019 |       0.08388
      0.00081 |       0.00000 |       0.01203 |       0.00018 |       0.08401
      0.00040 |       0.00000 |       0.01185 |       0.00019 |       0.08393
      0.00055 |       0.00000 |       0.01179 |       0.00018 |       0.08399
      0.00062 |       0.00000 |       0.01160 |       0.00018 |       0.08409
      0.00060 |       0.00000 |       0.01149 |       0.00018 |       0.08406
      0.00069 |       0.00000 |       0.01141 |       0.00019 |       0.08422
      0.00056 |       0.00000 |       0.01147 |       0.00020 |       0.08399
      0.00067 |       0.00000 |       0.01139 |       0.00021 |       0.08420
      0.00029 |       0.00000 |       0.01126 |       0.00021 |       0.08399
Evaluating losses...
      0.00056 |       0.00000 |       0.01138 |       0.00020 |     

      0.00085 |       0.00000 |       0.01282 |       0.00019 |       0.08440
      0.00054 |       0.00000 |       0.01272 |       0.00021 |       0.08446
      0.00087 |       0.00000 |       0.01267 |       0.00019 |       0.08482
Evaluating losses...
      0.00048 |       0.00000 |       0.01271 |       0.00022 |       0.08468
-----------------------------------
| EpLenMean       | 683           |
| EpRewMean       | -4.76         |
| EpThisIter      | 6             |
| EpisodesSoFar   | 6947          |
| TimeElapsed     | 8.61e+03      |
| TimestepsSoFar  | 4665344       |
| ev_tdlam_before | 0.903         |
| loss_ent        | 0.08468132    |
| loss_kl         | 0.00022093649 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00048443885 |
| loss_vf_loss    | 0.012705399   |
-----------------------------------
********** Iteration 1139 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00084 |       0.00000 | 

********** Iteration 1144 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00120 |       0.00000 |       0.01568 |       0.00019 |       0.08279
      0.00218 |       0.00000 |       0.01543 |       0.00019 |       0.08277
      0.00081 |       0.00000 |       0.01518 |       0.00017 |       0.08313
      0.00089 |       0.00000 |       0.01505 |       0.00022 |       0.08305
      0.00122 |       0.00000 |       0.01499 |       0.00020 |       0.08307
      0.00089 |       0.00000 |       0.01482 |       0.00021 |       0.08322
      0.00044 |       0.00000 |       0.01490 |       0.00021 |       0.08342
      0.00096 |       0.00000 |       0.01469 |       0.00022 |       0.08330
      0.00071 |       0.00000 |       0.01460 |       0.00021 |       0.08338
      0.00035 |       0.00000 |       0.01448 |       0.00021 |       0.08337
Evaluating losses...
      0.00087 |       0.00000 |       0.01430 |       0.00022 |     

      0.00075 |       0.00000 |       0.01301 |       0.00020 |       0.08616
      0.00117 |       0.00000 |       0.01285 |       0.00019 |       0.08620
      0.00082 |       0.00000 |       0.01285 |       0.00022 |       0.08618
Evaluating losses...
      0.00085 |       0.00000 |       0.01285 |       0.00022 |       0.08629
----------------------------------
| EpLenMean       | 679          |
| EpRewMean       | -4.73        |
| EpThisIter      | 7            |
| EpisodesSoFar   | 7014         |
| TimeElapsed     | 8.66e+03     |
| TimestepsSoFar  | 4710400      |
| ev_tdlam_before | 0.882        |
| loss_ent        | 0.08629009   |
| loss_kl         | 0.0002173365 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.0008534854 |
| loss_vf_loss    | 0.012850197  |
----------------------------------
********** Iteration 1150 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00081 |       0.00000 |       0.01656 

********** Iteration 1155 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00133 |       0.00000 |       0.01563 |       0.00021 |       0.08387
      0.00090 |       0.00000 |       0.01539 |       0.00021 |       0.08393
      0.00086 |       0.00000 |       0.01512 |       0.00022 |       0.08367
      0.00088 |       0.00000 |       0.01519 |       0.00019 |       0.08393
      0.00103 |       0.00000 |       0.01496 |       0.00020 |       0.08393
      0.00105 |       0.00000 |       0.01496 |       0.00019 |       0.08389
      0.00159 |       0.00000 |       0.01475 |       0.00019 |       0.08396
      0.00088 |       0.00000 |       0.01482 |       0.00023 |       0.08372
      0.00140 |       0.00000 |       0.01471 |       0.00022 |       0.08385
      0.00085 |       0.00000 |       0.01462 |       0.00020 |       0.08377
Evaluating losses...
      0.00078 |       0.00000 |       0.01461 |       0.00022 |     

      0.00066 |       0.00000 |       0.01289 |       0.00022 |       0.08777
      0.00063 |       0.00000 |       0.01284 |       0.00020 |       0.08808
      0.00066 |       0.00000 |       0.01274 |       0.00022 |       0.08782
      0.00070 |       0.00000 |       0.01267 |       0.00023 |       0.08797
Evaluating losses...
      0.00053 |       0.00000 |       0.01255 |       0.00020 |       0.08779
-----------------------------------
| EpLenMean       | 661           |
| EpRewMean       | -4.83         |
| EpThisIter      | 5             |
| EpisodesSoFar   | 7081          |
| TimeElapsed     | 8.71e+03      |
| TimestepsSoFar  | 4755456       |
| ev_tdlam_before | 0.893         |
| loss_ent        | 0.087789774   |
| loss_kl         | 0.00019773688 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0005296958  |
| loss_vf_loss    | 0.012546426   |
-----------------------------------
********** Iteration 1161 ************
Optimizing...
     pol_surr |    pol_entpen | 

********** Iteration 1166 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00086 |       0.00000 |       0.01447 |       0.00020 |       0.08467
      0.00051 |       0.00000 |       0.01435 |       0.00020 |       0.08472
      0.00065 |       0.00000 |       0.01436 |       0.00019 |       0.08463
      0.00070 |       0.00000 |       0.01422 |       0.00021 |       0.08456
      0.00095 |       0.00000 |       0.01422 |       0.00022 |       0.08469
      0.00076 |       0.00000 |       0.01405 |       0.00019 |       0.08480
      0.00083 |       0.00000 |       0.01403 |       0.00021 |       0.08467
      0.00135 |       0.00000 |       0.01401 |       0.00023 |       0.08475
      0.00117 |       0.00000 |       0.01392 |       0.00022 |       0.08451
      0.00081 |       0.00000 |       0.01369 |       0.00022 |       0.08440
Evaluating losses...
      0.00098 |       0.00000 |       0.01381 |       0.00023 |     

      0.00096 |       0.00000 |       0.01542 |       0.00024 |       0.09581
      0.00094 |       0.00000 |       0.01535 |       0.00021 |       0.09560
      0.00124 |       0.00000 |       0.01519 |       0.00024 |       0.09585
      0.00109 |       0.00000 |       0.01526 |       0.00023 |       0.09571
Evaluating losses...
      0.00088 |       0.00000 |       0.01507 |       0.00022 |       0.09603
----------------------------------
| EpLenMean       | 697          |
| EpRewMean       | -4.75        |
| EpThisIter      | 6            |
| EpisodesSoFar   | 7146         |
| TimeElapsed     | 8.76e+03     |
| TimestepsSoFar  | 4800512      |
| ev_tdlam_before | 0.87         |
| loss_ent        | 0.096028835  |
| loss_kl         | 0.0002155645 |
| loss_pol_entpen | 0.0          |
| loss_pol_surr   | 0.000880338  |
| loss_vf_loss    | 0.015065014  |
----------------------------------
********** Iteration 1172 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss 

********** Iteration 1177 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 |       0.01356 |       0.00018 |       0.08592
      0.00116 |       0.00000 |       0.01347 |       0.00018 |       0.08555
      0.00065 |       0.00000 |       0.01330 |       0.00020 |       0.08568
      0.00117 |       0.00000 |       0.01330 |       0.00021 |       0.08589
      0.00096 |       0.00000 |       0.01330 |       0.00019 |       0.08557
      0.00101 |       0.00000 |       0.01316 |       0.00017 |       0.08561
      0.00068 |       0.00000 |       0.01317 |       0.00019 |       0.08556
      0.00096 |       0.00000 |       0.01301 |       0.00018 |       0.08560
      0.00141 |       0.00000 |       0.01298 |       0.00021 |       0.08548
      0.00117 |       0.00000 |       0.01301 |       0.00021 |       0.08543
Evaluating losses...
      0.00068 |       0.00000 |       0.01294 |       0.00018 |     

      0.00061 |       0.00000 |       0.01149 |       0.00017 |       0.07212
      0.00066 |       0.00000 |       0.01144 |       0.00016 |       0.07201
      0.00066 |       0.00000 |       0.01145 |       0.00019 |       0.07206
Evaluating losses...
      0.00063 |       0.00000 |       0.01144 |       0.00019 |       0.07205
-----------------------------------
| EpLenMean       | 676           |
| EpRewMean       | -4.77         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 7214          |
| TimeElapsed     | 8.81e+03      |
| TimestepsSoFar  | 4845568       |
| ev_tdlam_before | 0.901         |
| loss_ent        | 0.07204836    |
| loss_kl         | 0.0001898561  |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.00063069235 |
| loss_vf_loss    | 0.011438007   |
-----------------------------------
********** Iteration 1183 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00085 |       0.00000 | 

********** Iteration 1188 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00089 |       0.00000 |       0.01156 |       0.00019 |       0.08256
      0.00066 |       0.00000 |       0.01131 |       0.00019 |       0.08247
      0.00084 |       0.00000 |       0.01131 |       0.00017 |       0.08255
      0.00130 |       0.00000 |       0.01128 |       0.00019 |       0.08259
      0.00090 |       0.00000 |       0.01127 |       0.00020 |       0.08244
      0.00086 |       0.00000 |       0.01118 |       0.00018 |       0.08270
      0.00095 |       0.00000 |       0.01109 |       0.00016 |       0.08265
      0.00083 |       0.00000 |       0.01102 |       0.00020 |       0.08249
      0.00090 |       0.00000 |       0.01120 |       0.00020 |       0.08260
      0.00091 |       0.00000 |       0.01108 |       0.00020 |       0.08269
Evaluating losses...
      0.00084 |       0.00000 |       0.01099 |       0.00020 |     

      0.00152 |       0.00000 |       0.01490 |       0.00021 |       0.08845
      0.00094 |       0.00000 |       0.01488 |       0.00020 |       0.08852
      0.00111 |       0.00000 |       0.01481 |       0.00023 |       0.08870
Evaluating losses...
      0.00113 |       0.00000 |       0.01485 |       0.00022 |       0.08875
-----------------------------------
| EpLenMean       | 669           |
| EpRewMean       | -4.86         |
| EpThisIter      | 4             |
| EpisodesSoFar   | 7281          |
| TimeElapsed     | 8.86e+03      |
| TimestepsSoFar  | 4890624       |
| ev_tdlam_before | 0.868         |
| loss_ent        | 0.08875399    |
| loss_kl         | 0.00022062495 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0011267677  |
| loss_vf_loss    | 0.0148496255  |
-----------------------------------
********** Iteration 1194 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00074 |       0.00000 | 

********** Iteration 1199 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00107 |       0.00000 |       0.01409 |       0.00015 |       0.06951
      0.00091 |       0.00000 |       0.01403 |       0.00018 |       0.06941
      0.00099 |       0.00000 |       0.01400 |       0.00015 |       0.06954
      0.00078 |       0.00000 |       0.01392 |       0.00015 |       0.06948
      0.00079 |       0.00000 |       0.01404 |       0.00018 |       0.06943
      0.00099 |       0.00000 |       0.01388 |       0.00016 |       0.06931
      0.00087 |       0.00000 |       0.01394 |       0.00017 |       0.06935
      0.00083 |       0.00000 |       0.01404 |       0.00016 |       0.06937
      0.00065 |       0.00000 |       0.01377 |       0.00016 |       0.06906
      0.00110 |       0.00000 |       0.01389 |       0.00018 |       0.06928
Evaluating losses...
      0.00117 |       0.00000 |       0.01382 |       0.00016 |     

      0.00072 |       0.00000 |       0.01232 |       0.00015 |       0.07719
      0.00097 |       0.00000 |       0.01241 |       0.00019 |       0.07712
      0.00087 |       0.00000 |       0.01239 |       0.00017 |       0.07703
Evaluating losses...
      0.00092 |       0.00000 |       0.01230 |       0.00017 |       0.07714
-----------------------------------
| EpLenMean       | 668           |
| EpRewMean       | -4.82         |
| EpThisIter      | 7             |
| EpisodesSoFar   | 7350          |
| TimeElapsed     | 8.91e+03      |
| TimestepsSoFar  | 4935680       |
| ev_tdlam_before | 0.891         |
| loss_ent        | 0.07714387    |
| loss_kl         | 0.00017476903 |
| loss_pol_entpen | 0.0           |
| loss_pol_surr   | 0.0009189572  |
| loss_vf_loss    | 0.012295365   |
-----------------------------------
********** Iteration 1205 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
      0.00114 |       0.00000 | 

KeyboardInterrupt: 

In [165]:
obs = env.reset()
done = False
total_reward = 0

while not done:
    action, _states = model.predict(obs)
    obs, reward, done, info = env.step(action)
    total_reward += reward
    env.render()
env.close()
print("score:", total_reward)

score: -5
