# Tutorial 03: Running RLlib Experiments

This tutorial walks you through the process of running traffic simulations in Flow with trainable RLlib-powered agents. Autonomous agents will learn to maximize a certain reward over the rollouts, using the [**RLlib**](https://ray.readthedocs.io/en/latest/rllib.html) library ([citation](https://arxiv.org/abs/1712.09381)) ([installation instructions](https://flow.readthedocs.io/en/latest/flow_setup.html#optional-install-ray-rllib)). Simulations of this form will depict the propensity of RL agents to influence the traffic of a human fleet in order to make the whole fleet more efficient (for some given metrics). 

In this tutorial, we simulate an initially perturbed single lane ring road, where we introduce a single autonomous vehicle. We witness that, after some training, that the autonomous vehicle learns to dissipate the formation and propagation of "phantom jams" which form when only human driver dynamics are involved.

## 1. Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *network*, and an *environment*. Networks describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc... in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act as the primary interface between the reinforcement learning algorithm and the network. Moreover, custom environments may be used to modify the dynamical features of an network. Finally, in the RL case, it is in the *environment* that the state/action spaces and the reward function are defined. 

## 2. Setting up a Network
Flow contains a plethora of pre-designed networks used to replicate highways, intersections, and merges in both closed and open settings. All these networks are located in flow/networks. For this tutorial, which involves a single lane ring road, we will use the network `RingNetwork`.

### 2.1 Setting up Network Parameters

The network mentioned at the start of this section, as well as all other networks in Flow, are parameterized by the following arguments: 
* name
* vehicles
* net_params
* initial_config

These parameters are explained in detail in `tutorial01_sumo.ipynb`. Moreover, all parameters excluding vehicles (covered in section 2.2) do not change from the previous tutorial. Accordingly, we specify them nearly as we have before, and leave further explanations of the parameters to `tutorial01_sumo.ipynb`.

We begin by choosing the network the experiment will be trained on. We use one of Flow's builtin networks, located in `flow.networks`. A list of all available networks can be found by running the script below.

In [1]:
import flow.networks as networks

# print(networks.__all__)

In this tutorial, we choose to use the ring road network. The network class is then:

In [2]:
from flow.networks import RingNetwork

# ring road network class
network_name = RingNetwork

One key difference between SUMO and RLlib experiments is that, in RLlib experiments, the network classes do not need to be defined; instead users should simply name the network class they wish to use. Later on, an environment setup module will import the correct network class based on the provided names.

In [3]:
# input parameter classes to the network class
from flow.core.params import NetParams, InitialConfig

# name of the network
name = "c_mpg+plus"

# network-specific parameters
from flow.networks.ring import ADDITIONAL_NET_PARAMS
net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

# initial configuration to vehicles
initial_config = InitialConfig(spacing="uniform", perturbation=1)

### 2.2 Adding Trainable Autonomous Vehicles
The `Vehicles` class stores state information on all vehicles in the network. This class is used to identify the dynamical features of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various `get` methods within this class.

The dynamics of vehicles in the `Vehicles` class can either be depicted by sumo or by the dynamical methods located in flow/controllers. For human-driven vehicles, we use the IDM model for acceleration behavior, with exogenous gaussian acceleration noise with std 0.2 m/s2 to induce perturbations that produce stop-and-go behavior. In addition, we use the `ContinousRouter` routing controller so that the vehicles may maintain their routes closed networks.

As we have done in `tutorial01_sumo.ipynb`, human-driven vehicles are defined in the `VehicleParams` class as follows:

In [4]:
# vehicles class
from flow.core.params import VehicleParams

# vehicles dynamics models
from flow.controllers import IDMController, ContinuousRouter

vehicles = VehicleParams()
#vehicles.add("human",
#             acceleration_controller=(IDMController, {}),
#             routing_controller=(ContinuousRouter, {}),
#             num_vehicles=10)

The above addition to the `Vehicles` class only accounts for 21 of the 22 vehicles that are placed in the network. We now add an additional trainable autuonomous vehicle whose actions are dictated by an RL agent. This is done by specifying an `RLController` as the acceleraton controller to the vehicle. 

In [5]:
from flow.controllers import RLController

Note that this controller serves primarirly as a placeholder that marks the vehicle as a component of the RL agent, meaning that lane changing and routing actions can also be specified by the RL agent to this vehicle.

We finally add the vehicle as follows, while again using the `ContinuousRouter` to perpetually maintain the vehicle within the network.

In [6]:
# from flow.energy_models.toyota_energy import TacomaEnergy
# vehicles.add(veh_id="rl",
#              acceleration_controller=(RLController, {}),
#              routing_controller=(ContinuousRouter, {}),
#              initial_speed =20,
#              energy_model = TacomaEnergy,
#              num_vehicles=1)


vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             initial_speed =0,
             num_vehicles=1)

## 3. Setting up an Environment

Several environments in Flow exist to train RL agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. The use of an environment allows us to view the cumulative reward simulation rollouts receive, along with to specify the state/action spaces.

Sumo envrionments in Flow are parametrized by three components:
* `SumoParams`
* `EnvParams`
* `Network`

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length of any simulation step and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and deactivate the GUI. 

**Note** For training purposes, it is highly recommanded to deactivate the GUI in order to avoid global slow down. In such case, one just needs to specify the following: `render=False`

In [7]:
from flow.core.params import SumoParams

sim_params = SumoParams(sim_step=0.1, render=False)

### 3.2 EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. For the environment `WaveAttenuationPOEnv`, these parameters are used to dictate bounds on the accelerations of the autonomous vehicles, as well as the range of ring lengths (and accordingly network densities) the agent is trained on.

Finally, it is important to specify here the *horizon* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [8]:
from flow.core.params import EnvParams

# Define horizon as a variable to ensure consistent use across notebook
HORIZON=2500

env_params = EnvParams(
    # length of one rollout
    horizon=HORIZON,

    additional_params={
        # maximum acceleration of autonomous vehicles
        "max_accel": 4,
        # maximum deceleration of autonomous vehicles
        "max_decel": -4,
        # bounds on the ranges of ring road lengths the autonomous vehicle 
        # is trained on
        "ring_length": [220, 270],
    },
)

### 3.3 Initializing a Gym Environment

Now, we have to specify our Gym Environment and the algorithm that our RL agents will use. Similar to the network, we choose to use on of Flow's builtin environments, a list of which is provided by the script below.

In [9]:
import flow.envs as flowenvs

print(flowenvs.__all__)

['Env', 'AccelEnv', 'LaneChangeAccelEnv', 'LaneChangeAccelPOEnv', 'TrafficLightGridTestEnv', 'MergePOEnv', 'BottleneckEnv', 'BottleneckAccelEnv', 'WaveAttenuationEnv', 'WaveAttenuationPOEnv', 'EnergyOptEnv', 'EnergyOptSPDEnv', 'TrafficLightGridEnv', 'TrafficLightGridPOEnv', 'TrafficLightGridBenchmarkEnv', 'BottleneckDesiredVelocityEnv', 'TestEnv', 'BayBridgeEnv', 'SingleStraightRoad', 'BottleNeckAccelEnv', 'DesiredVelocityEnv', 'PO_TrafficLightGridEnv', 'GreenWaveTestEnv']


We will use the environment "WaveAttenuationPOEnv", which is used to train autonomous vehicles to attenuate the formation and propagation of waves in a partially observable variable density ring road. To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables. These are defined as follows:

In [10]:
from flow.envs import EnergyOptSPDEnv

env_name = EnergyOptSPDEnv

In [11]:
# from flow.envs import WaveAttenuationPOEnv

# env_name = WaveAttenuationPOEnv

### 3.4 Setting up Flow Parameters

RLlib experiments both generate a `params.json` file for each experiment run. For RLlib experiments, the parameters defining the Flow network and environment must be stored as well. As such, in this section we define the dictionary `flow_params`, which contains the variables required by the utility function `make_create_env`. `make_create_env` is a higher-order function which returns a function `create_env` that initializes a Gym environment corresponding to the Flow network specified.

In [12]:
# Creating flow_params. Make sure the dictionary keys are as specified. 
flow_params = dict(
    # name of the experiment
    exp_tag=name,
    # name of the flow environment the experiment is running on
    env_name=env_name,
    # name of the network class the experiment uses
    network=network_name,
    # simulator that is used by the experiment
    simulator='traci',
    # simulation-related parameters
    sim=sim_params,
    # environment related parameters (see flow.core.params.EnvParams)
    env=env_params,
    # network-related parameters (see flow.core.params.NetParams and
    # the network's documentation or ADDITIONAL_NET_PARAMS component)
    net=net_params,
    # vehicles to be placed in the network at the start of a rollout 
    # (see flow.core.vehicles.Vehicles)
    veh=vehicles,
    # (optional) parameters affecting the positioning of vehicles upon 
    # initialization/reset (see flow.core.params.InitialConfig)
    initial=initial_config
)

## 4 Running RL experiments in Ray

### 4.1 Import 

First, we must import modules required to run experiments in Ray. The `json` package is required to store the Flow experiment parameters in the `params.json` file, as is `FlowParamsEncoder`. Ray-related imports are required: the PPO algorithm agent, `ray.tune`'s experiment runner, and environment helper methods `register_env` and `make_create_env`.

In [13]:
import json

import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
# from ray.rllib.agents.agent import get_agent_class
#from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

Instructions for updating:
non-resource variables are not supported in the long term


### 4.2 Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [14]:
# number of parallel workers
N_CPUS = 6
# number of rollouts per training iteration
N_ROLLOUTS = 10
#ray.shutdown()
ray.init(num_cpus=N_CPUS)

2020-08-01 13:32:27,891	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-08-01_13-32-27_891271_4020/logs.
2020-08-01 13:32:28,010	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:46875 to respond...
2020-08-01 13:32:28,146	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:17416 to respond...
2020-08-01 13:32:28,153	INFO services.py:809 -- Starting Redis shard with 3.3 GB max memory.
2020-08-01 13:32:28,245	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-08-01_13-32-27_891271_4020/logs.
2020-08-01 13:32:28,249	INFO services.py:1475 -- Starting the Plasma object store with 4.96 GB memory using /dev/shm.


{'node_ip_address': '127.0.1.1',
 'redis_address': '127.0.1.1:46875',
 'object_store_address': '/tmp/ray/session_2020-08-01_13-32-27_891271_4020/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-08-01_13-32-27_891271_4020/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2020-08-01_13-32-27_891271_4020'}

### 4.3 Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym. 

In [15]:
# The algorithm or model to train. This may refer to "
#      "the name of a built-on algorithm (e.g. RLLib's DQN "
#      "or PPO), or a user-defined trainable function or "
#      "class registered in the tune registry.")
alg_run = "PPO"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.9999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])  # stochastic gradient descent
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 10  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

### 4.4 Running Experiments

Here, we use the `run_experiments` function from `ray.tune`. The function takes a dictionary with one key, a name corresponding to the experiment, and one value, itself a dictionary containing parameters for training.

In [None]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 20,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 2000,  # number of iterations to stop after
        },
    },
})

2020-08-01 13:32:28,580	INFO trial_runner.py:176 -- Starting a new experiment.
2020-08-01 13:32:28,678	ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/6 CPUs, 0/0 GPUs
Memory usage on this node: 5.4/16.5 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 6/6 CPUs, 0/0 GPUs
Memory usage on this node: 5.5/16.5 GB
Result logdir: /home/solom/ray_results/c_mpg+plus
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_EnergyOptSPDEnv-v0_0:	RUNNING

[2m[36m(pid=4072)[0m Instructions for updating:
[2m[36m(pid=4072)[0m non-resource variables are not supported in the long term
[2m[36m(pid=4072)[0m 2020-08-01 13:32:33,142	INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=4072)[0m 2020-08-01 13:32:33.143728: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=4072)[0m 2020-08-01 13:32:33.174619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Freq

[2m[36m(pid=4072)[0m Instructions for updating:
[2m[36m(pid=4072)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=4072)[0m Instructions for updating:
[2m[36m(pid=4072)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=4075)[0m 2020-08-01 13:32:39,795	INFO rollout_worker.py:319 -- Creating policy evaluation worker 1 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=4074)[0m 2020-08-01 13:32:39,817	INFO rollout_worker.py:319 -- Creating policy evaluation worker 5 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=4074)[0m 2020-08-01 13:32:39.818925: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=4074)[0m 2020-08-01 13:32:39.834003: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(pid=4074)[0m 2020-08-01 13:32:39.834622: I tensorflow/compiler/

[2m[36m(pid=4074)[0m Instructions for updating:
[2m[36m(pid=4074)[0m Use `tf.cast` instead.
[2m[36m(pid=4074)[0m Instructions for updating:
[2m[36m(pid=4074)[0m Use `tf.cast` instead.
[2m[36m(pid=4073)[0m Instructions for updating:
[2m[36m(pid=4073)[0m Use `tf.cast` instead.
[2m[36m(pid=4073)[0m Instructions for updating:
[2m[36m(pid=4073)[0m Use `tf.cast` instead.
[2m[36m(pid=4070)[0m Instructions for updating:
[2m[36m(pid=4070)[0m Use keras.layers.Dense instead.
[2m[36m(pid=4070)[0m Instructions for updating:
[2m[36m(pid=4070)[0m Use keras.layers.Dense instead.
[2m[36m(pid=4070)[0m Instructions for updating:
[2m[36m(pid=4070)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=4070)[0m Instructions for updating:
[2m[36m(pid=4070)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=4075)[0m Instructions for updating:
[2m[36m(pid=4075)[0m Use `tf.cast` instead.
[2m[36m(pid=4075)[0m Instructions for updating:
[2m[3

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 2020-08-01 13:32:42,955	INFO sampler.py:304 -- Raw obs from env: { 0: { 'agent0': np.ndarray((3,), dtype=float64, min=0.0, max=0.0, mean=0.0)}}
[2m[36m(pid=4075)[0m 2020-08-01 13:32:42,955	INFO sampler.py:305 -- Info return from env: {0: {'agent0': None}}
[2m[36m(pid=4075)[0m 2020-08-01 13:32:42,956	INFO sampler.py:403 -- Preprocessed obs: np.ndarray((3,), dtype=float64, min=0.0, max=0.0, mean=0.0)
[2m[36m(pid=4075)[0m 2020-08-01 13:32:42,956	INFO sampler.py:407 -- Filtered obs: np.ndarray((3,), dtype=float64, min=0.0, max=0.0, mean=0.0)
[2m[36m(pid=4075)[0m 2020-08-01 13:32:42,958	INFO sampler.py:521 -- Inputs to compute_actions():
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
[2m[36m(pid=4075)[0m                                 

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4072)[0m 2020-08-01 13:33:15,074	INFO tf_policy.py:355 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc1/kernel:0' shape=(3, 16) dtype=float32_ref>
[2m[36m(pid=4072)[0m 2020-08-01 13:33:15,074	INFO tf_policy.py:355 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc1/bias:0' shape=(16,) dtype=float32_ref>
[2m[36m(pid=4072)[0m 2020-08-01 13:33:15,074	INFO tf_policy.py:355 -- Optimizing variable <tf.Variable 'default_policy/default_model/fc2/kernel:0' sh

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-33-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 995.4633572904609
  episode_reward_mean: 332.7187618695875
  episode_reward_min: 110.46818039213686
  episodes_this_iter: 10
  episodes_total: 20
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 326.088
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.4196662902832031
        entropy_coeff: 0.0
        kl: 2.6073284971062094e-06
        policy_loss: 0.0025956081226468086
        total_loss: 33.444496154785156
        vf_explained_var: 0.0006463527679443359
        vf_loss: 33.44189453125
    load_time_ms: 35.661
    num_steps_sampled: 50000
    num_steps_trained: 32768
    sample_time_ms: 34125.399
    update_time_ms: 365.753
  iterations_since_restore: 2
  node_ip: 127.0.1.1
  num_healthy_workers: 5

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 227
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-36-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 995.4633572904609
  episode_reward_mean: 363.3653381744934
  episode_reward_min: 110.46818039213686
  episodes_this_iter: 10
  episodes_total: 70
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 138.57
    learner:
      default_policy:
        cur_kl_coeff: 0.0031250000465661287
        cur_lr: 4.999999873689376e-05
        entropy: 1.4208669662475586
        entropy_coeff: 0.0
        kl: 1.7284037312492728e-07
        policy_loss: -0.002934410236775875
        total_loss: 23.917640686035156
        vf_explained_var: 0.0023058652877807617
        vf_loss: 23.920581817626953
    load_time_ms: 13.721
    num_steps_sampled: 1

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 234
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 220
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-39-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 991.3524779667771
  episode_reward_mean: 411.61716306558014
  episode_reward_min: 154.78896996502326
  episodes_this_iter: 10
  episodes_total: 120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 63.827
    learner:
      default_policy:
        cur_kl_coeff: 9.765625145519152e-05
        cur_lr: 4.999999873689376e-05
        entropy: 1.4223897457122803
        entropy_coeff: 0.0
        kl: 5.465681169880554e-07
        policy_loss: -0.0005645753117278218
        

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 221
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 233
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-41-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 853.986734965989
  episode_reward_mean: 413.5766228229147
  episode_reward_min: 129.35095706368992
  episodes_this_iter: 10
  episodes_total: 170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.927
    learner:
      default_policy:
        cur_kl_coeff: 3.051757857974735e-06
        cur_lr: 4.999999873689376e-05
        entropy: 1.4238572120666504
        entropy_coeff: 0.0
        kl: 9.302602848038077e-07
        policy_loss: -0.0058134631253778934
        total_loss: 34.32814025878906
        vf_explained_var: 0.006093025207519531
        vf_loss: 34.33395767211914
    load_time_ms: 4.283
    num_steps_sampled: 42500

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 255
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-43-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 853.986734965989
  episode_reward_mean: 373.45412759493075
  episode_reward_min: 103.71822054790232
  episodes_this_iter: 10
  episodes_total: 220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.561
    learner:
      default_policy:
        cur_kl_coeff: 9.536743306171047e-08
        cur_lr: 4.999999873689376e-05
        entropy: 1.4224908351898193
        entropy_coeff: 0.0
        kl: 6.710451998515055e-07
        policy_loss: 0.000302795204333961
        tot

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 222
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-45-52
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 837.6359708377563
  episode_reward_mean: 361.5861128763104
  episode_reward_min: 103.71822054790232
  episodes_this_iter: 10
  episodes_total: 270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.004
    learner:
      default_policy:
        cur_kl_coeff: 2.9802322831784522e-09
        cur_lr: 4.999999873689376e-05
        entropy: 1.4225256443023682
        entropy_coeff: 0.0
        kl: 1.5037949196994305e-07
        policy_loss: -0.0004076235927641392
       

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 237
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 240
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 261
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-48-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 991.9094269124618
  episode_reward_mean: 374.6480141098802
  episode_reward_min: 120.17419463095483
  episodes_this_iter: 10
  episodes_total: 320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.99
    learner:
      default_policy:
        cur_kl_coeff: 9.313225884932663e-11
        cur_lr: 4.999999873689376e-05
        entropy: 1.4225409030914307
        entropy_coeff: 0.0
        kl: 1.8267910490976647e-06
        policy_loss: 0.0011985194869339466
        total_loss: 33.80421447753906
        vf_explained_var: 0.014064490795135498
        vf_loss: 33.80300521850586
    load_time_ms: 3.73
    num_steps_sampled: 800000
    num_steps_trained: 524288
    sample_time_ms: 26349.858
    update_time_ms: 3.243
  iterations_since_restore: 32
  node_ip: 127.0.1.1
  num_healthy_workers: 

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 244
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 239
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-50-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 991.9094269124618
  episode_reward_mean: 405.94359559644886
  episode_reward_min: 124.21567771946498
  episodes_this_iter: 10
  episodes_total: 370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.842
    learner:
      default_policy:
        cur_kl_coeff: 2.9103830890414573e-12
        cur_lr: 4.999999873689376e-05
        entropy: 1.4236741065979004
        entropy_coeff: 0.0
        kl: 2.272281562909484e-07
        policy_loss: 0.0015260661020874977
        total_loss: 37.93656539916992
        vf_explained_var: 0.01767951250076294
        vf_loss: 37.935035705566406
    load_time_ms: 3.585
    num_steps_sampled: 925000
    num_steps_trained: 606208
    sample_time_ms: 26334.381
    update_time_ms: 3.031
  iterations_since_restore: 37
  node_ip: 127.0.1.1
  num_healthy_worker

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 257
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 232
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-52-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 881.0515808135598
  episode_reward_mean: 405.9692355244213
  episode_reward_min: 120.25838265751354
  episodes_this_iter: 10
  episodes_total: 420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.175
    learner:
      default_policy:
        cur_kl_coeff: 9.094947153254554e-14
        cur_lr: 4.99999987

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 262
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 245
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-54-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1110.9097663438672
  episode_reward_mean: 411.8742777315024
  episode_reward_min: 120.25838265751354
  episodes_this_iter: 10
  episodes_total: 470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.146
    learner:
      default_policy:
        cur_kl_coeff: 2.842170985392048e-15
        cur_lr: 4.999999873689376e-05
        entropy: 1.4262306690216064
        entropy_coeff: 0.0
        kl: 1.2338205124251544e-06
        policy_loss: -0.0013148499419912696
       

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-56-53
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1110.9097663438672
  episode_reward_mean: 417.27997861706996
  episode_reward_min: 132.37464052327778
  episodes_this_iter: 10
  episodes_total: 520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.291
    learner:
      default_policy:
        cur_kl_coeff: 8.88178432935015e-17
        cur_lr: 4.999999873689376e-05
        entropy: 1.4251155853271484
        entropy_coeff: 0.0
        kl: 6.925201887497678e-07
        policy_loss: 0.0012128979433327913
        total_loss: 31.294353485107422
        vf_explained_var: 0.02934318780899048
        vf_loss: 31.293140411376953
    load_time_ms: 3.461
    num_steps_sampled: 130

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_13-59-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 960.0391725524254
  episode_reward_mean: 436.31604835557243
  episode_reward_min: 122.82557978580952
  episodes_this_iter: 10
  episodes_total: 570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.287
    learner:
      default_policy:
        cur_kl_coeff: 2.775557602921922e-18
        cur_lr: 4.999999873689376e-05
        entropy: 1.4259159564971924
        entropy_coeff: 0.0
        kl: 4.688845365308225e-07
        policy_loss: -0.006538291461765766
        t

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 258
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 252
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 263
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 232
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-01-19
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1017.0845328225946
  episode_reward_mean: 466.73425506297076
  episode_reward_min: 122.82557978580952
  episodes_this_iter: 10
  episodes_total: 620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.977
    learner:
      default_policy:
        cur_kl_coeff: 8.673617509131006e-20
        cur_lr: 4.999999

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-03-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1095.4124503501332
  episode_reward_mean: 484.01947728313985
  episode_reward_min: 170.37232921304715
  episodes_this_iter: 10
  episodes_total: 670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.438
    learner:
      default_policy:
        cur_kl_coeff: 2.7105054716034394e-21
        cur_lr: 4.99999

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 224
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 240
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-05-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1095.4124503501332
  episode_reward_mean: 531.0530000814394
  episode_reward_min: 109.0517034215303
  episodes_this_iter: 10
  episodes_total: 720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.491
    learner:
      default_policy:
        cur_kl_coeff: 8.470329598760748e-23
        cur_lr: 4.999999873689376e-05
        entropy: 1.4425580501556396
        entropy_coeff: 0.0
        kl: 9.896248229779303e-07
        policy_loss: 0.013759504072368145
        tot

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-07-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1062.7504816927944
  episode_reward_mean: 521.2859287899511
  episode_reward_min: 109.0517034215303
  episodes_this_iter: 10
  episodes_total: 770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.302
    learner:
      default_policy:
        cur_kl_coeff: 2.646977999612734e-24
        cur_lr: 4.999999873689376e-05
        entropy: 1.440189003944397
        entropy_coeff: 0.0
        kl: 3.7568934203591198e-06
        policy_loss: 0.0011915724026039243
        total_loss: 38.88322448730469
        vf_explained_var: 0.06088459491729736
        vf_loss: 38.88204574584961
    load_time_ms: 3.707
    num_steps_sampled: 1925000
    num_steps_trained: 1261568
    sample_time_ms: 26431.744
    update_time_ms: 3.327
  iterations_since_restore: 77
  node_ip: 127.0.1.1
  num_healthy_workers

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 269
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 231
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 251
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-10-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 999.1089494374401
  episode_reward_mean: 515.5867198739974
  episode_reward_min: 149.58668523650178
  episodes_this_iter: 10
  episodes_total: 820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.501
    learner:
      default_policy:
        cur_kl_coeff: 8.271806248789793e-26
        cur_lr: 4.99999987

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-12-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1115.3371078206078
  episode_reward_mean: 548.9499335589076
  episode_reward_min: 189.3772733849516
  episodes_this_iter: 10
  episodes_total: 870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.394
    learner:
      default_policy:
        cur_kl_coeff: 2.5849394527468104e-27
        cur_lr: 4.999999873689376e-05
        entropy: 1.4336878061294556
        entropy_coeff: 0.0
        kl: 4.177891241852194e-06
        policy_loss: -7.594120688736439e-06
        total_loss: 65.92720794677734
        vf_explained_var: 0.06704765558242798
        vf_loss: 65.92721557617188
    load_time_ms: 3.837
    num_steps_sampled: 2175

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 256
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 240
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-14-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1115.3371078206078
  episode_reward_mean: 550.121846497393
  episode_reward_min: 150.80534346813548
  episodes_this_iter: 10
  episodes_total: 920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.49
    learner:
      default_policy:
        cur_kl_coeff: 8.077935789833782e-29
        cur_lr: 4.999999873689376e-05
        entropy: 1.4330683946609497
        entropy_coeff: 0.0
        kl: 1.911521394504234e-06
        policy_loss: 0.003944362048059702
        tota

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 239
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-16-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1099.439390404955
  episode_reward_mean: 602.6563597507366
  episode_reward_min: 150.80534346813548
  episodes_this_iter: 10
  episodes_total: 970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.92
    learner:
      default_policy:
        cur_kl_coeff: 2.524354934323057e-30
        cur_lr: 4.999999873689376e-05
        entropy: 1.4429272413253784
        entropy_coeff: 0.0
        kl: 3.914985427400097e-06
        policy_loss: 0.003071992192417383
        total_loss: 83.23689270019531
        vf_explained_var: 0.07643020153045654
        vf_loss: 83.23384094238281
    load_time_ms: 3.758
    num_steps_sampled: 2425000


[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 240
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-19-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1099.439390404955
  episode_reward_mean: 632.1106195967354
  episode_reward_min: 197.8200211763301
  episodes_this_iter: 10
  episodes_total: 1020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.62
    learner:
      default_policy:
        cur_kl_coeff: 7.888609169759553e-32
        cur_lr: 4.999999873689376e-05
        entropy: 1.4444386959075928
        entropy_coeff: 0.0
        kl: 2.934084477601573e-07
        policy_loss: 0.00768383638933301
        total

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 266
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-21-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1138.0593244819522
  episode_reward_mean: 606.0480478067278
  episode_reward_min: 192.74237266005716
  episodes_this_iter: 10
  episodes_total: 1070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.133
    learner:
      default_policy:
        cur_kl_coeff: 2.4651903655498604e-33
        cur_lr: 4.99999

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 259
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 247
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-23-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1138.0593244819522
  episode_reward_mean: 590.5324092892628
  episode_reward_min: 186.1628928692035
  episodes_this_iter: 10
  episodes_total: 1120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.729
    learner:
      default_policy:
        cur_kl_coeff: 7.703719892343314e-35
        cur_lr: 4.999999873689376e-05
        entropy: 1.4398165941238403
        entropy_coeff: 0.0
        kl: 4.6987770474515855e-07
        policy_loss: -0.009647654369473457
        total_loss: 74.28848266601562
        vf_explained_var: 0.08731287717819214
        vf_loss: 74.29812622070312
    load_time_ms: 3.759
    num_steps_sampled: 2800

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 244
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 223
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-25-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1098.1012796117752
  episode_reward_mean: 577.6962487395074
  episode_reward_min: 176.93952612788294
  episodes_this_iter: 10
  episodes_total: 1170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.245
    learner:
      default_policy:
        cur_kl_coeff: 2.4074124663572855e-36
        cur_lr: 4.999999873689376e-05
        entropy: 1.4391422271728516
        entropy_coeff: 0.0
        kl: 2.568285708548501e-07
        policy_loss: -0.007104228250682354
       

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 269
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-27-50
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1073.5753685253203
  episode_reward_mean: 589.3965818316977
  episode_reward_min: 153.33953917537625
  episodes_this_iter: 10
  episodes_total: 1220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.762
    learner:
      default_policy:
        cur_kl_coeff: 7.523163957366517e-38
        cur_lr: 4.999999873689376e-05
        entropy: 1.4366886615753174
        entropy_coeff: 0.0
        kl: 9.587201930116862e-07
        policy_loss: 0.004066852852702141
        t

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-30-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1076.4585852443288
  episode_reward_mean: 592.0245204011695
  episode_reward_min: 153.33953917537625
  episodes_this_iter: 10
  episodes_total: 1270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.609
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4397087097167969
        entropy_coeff: 0.0
        kl: 4.235094820614904e-06
        policy_loss: -0.0028119164053350687
        total_loss: 68.88311767578125
        vf_explained_var: 0.09350407123565674
        vf_loss: 68.88594055175781
    load_time_ms: 3.942
    num_steps_sampled: 3175000
    num_steps_trained: 2080768
    sample_time_ms: 26471.967
    update_time_ms: 3.296
  iterations_since_restore: 127
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 245
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 256
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 245
[2m[36m(pid=4071)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-32-19
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1077.3689348977312
  episode_reward_mean: 589.9746818437669
  episode_reward_min: 178.04663233606058
  episodes_this_iter: 10
  episodes_total: 1320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.932
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4429326057434082
        entropy_coeff: 0.0
        kl: 1.0948679118882865e-06
        policy_loss: 0.0036908253096044064
        total_loss: 76.59208679199219
        vf_explained_var: 0.09652847051620483
        vf_loss: 76.58840942382812
    load_time_ms: 3.73
    num_steps_sampled: 3300000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-34-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1122.4898264816663
  episode_reward_mean: 622.3133175352209
  episode_reward_min: 178.04663233606058
  episodes_this_iter: 10
  episodes_total: 1370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.414
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.439616322517395
        entropy_coeff: 0.0
        kl: 3.7740501284133643e-06
        policy_loss: -0.002254190854728222
        total_loss: 76.784

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 247
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-36-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1122.4898264816663
  episode_reward_mean: 678.0582870390126
  episode_reward_min: 113.65932217346212
  episodes_this_iter: 10
  episodes_total: 1420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.603
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4424042701721191
        entropy_coeff: 0.0
        kl: 1.9974613678641617e-06
        policy_loss: 0.006758060771971941
        total_loss: 121.68

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 239
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 231
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-39-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1111.5804651269812
  episode_reward_mean: 670.1878901968189
  episode_reward_min: 113.65932217346212
  episodes_this_iter: 10
  episodes_total: 1470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.861
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4345248937606812
        entropy_coeff: 0.0
        kl: 4.077017365489155e-06
        policy_loss: -0.006832224316895008
        total_loss: 83.52012634277344
        vf_explained_var: 0.0952376127243042
        vf_loss: 83.5269775390625
    load_time_ms: 3.929
    num_steps_sampled: 3675000
    num_steps_tr

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 238
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 262
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 223
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-41-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1070.5764953946125
  episode_reward_mean: 609.4705457603042
  episode_reward_min: 169.8753791931462
  episodes_this_iter: 10
  episodes_total: 1520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.462
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4328347444534302
        entropy_coeff: 0.0
        kl: 1.4947592717362568e-05
        policy_loss: -0.007001215126365423
        total_loss: 75.68748474121094
        vf_explained_var: 0.10211479663848877
        vf_loss: 75.6944580078125
    load_time_ms: 4.016
    num_steps_sampled: 3800000
    num_steps_t

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 259
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 224
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-43-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1031.6274291178738
  episode_reward_mean: 619.4023821421099
  episode_reward_min: 225.47273569485165
  episodes_this_iter: 10
  episodes_total: 1570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.213
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4370183944702148
        entropy_coeff: 0.0
        kl: 2.04320895136334e-06
        policy_loss: 0.003031957196071744
        total_loss: 69.54139709472656
        vf_explained_var: 0.10332447290420532
        vf_loss: 69.53836059570312
    load_time_ms: 3.858
    num_steps_sampled: 3925000
    num_steps_trained: 2572288
    sample_time_ms: 28990.858
    update_time_ms: 3.545
  iterations_since_restore: 157
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-46-24
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1018.2232922614347
  episode_reward_mean: 603.7392000844666
  episode_reward_min: 203.37549985204035
  episodes_this_iter: 10
  episodes_total: 1620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.057
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4384655952453613
        entropy_coeff: 0.0
        kl: 2.558128471719101e-06
        policy_loss: 0.006205377168953419
        total_loss: 78.58840942382812
        vf_explained_var: 0.10568773746490479
        vf_loss: 78.58219909667969
    load_time_ms: 3.83
    num_steps_sampled: 4050000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 253
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 225
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 247
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-48-49
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1002.1180248251167
  episode_reward_mean: 567.7687254171764
  episode_reward_min: 203.37549985204035
  episodes_this_iter: 10
  episodes_total: 1670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.85
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4374020099639893
        entropy_coeff: 0.0
        kl: 5.096702807350084e-06
        policy_loss: -0.00503364484757185
        total_loss: 72.27679443359375
        vf_explained_var: 0.10829269886016846
        vf_loss: 72.28182220458984
    load_time_ms: 3.876
    num_steps_sampled: 4175000
    num_steps_trained: 2736128
    sample_time_ms: 28947.921
    update_time_ms: 3.571
  iterations_since_restore: 167
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-51-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1085.37068399397
  episode_reward_mean: 595.9297282651996
  episode_reward_min: 187.75683124785866
  episodes_this_iter: 10
  episodes_total: 1720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.806
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4383655786514282
        entropy_coeff: 0.0
        kl: 5.607176717603579e-06
        policy_loss: 0.006821447983384132
        total_loss: 79.38516235351562
        vf_explained_var: 0.10410559177398682
        vf_loss: 79.37832641601562
    load_time_ms: 4.043
    num_steps_sampled: 4300000
    num_steps_tra

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 224
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 236
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-53-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1085.37068399397
  episode_reward_mean: 632.1903470677266
  episode_reward_min: 187.75683124785866
  episodes_this_iter: 10
  episodes_total: 1770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.51
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4433766603469849
        entropy_coeff: 0.0
        kl: 2.67466384684667e-06
        policy_loss: 0.008406012319028378
        total_loss: 77.05418395996094
        vf_explained_var: 0.10540872812271118
        vf_loss: 77.0457763671875
    load_time_ms: 4.116
    num_steps_sampled: 4425000
    num_steps_traine

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 249
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-56-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1015.1381537764365
  episode_reward_mean: 671.6998006460958
  episode_reward_min: 198.0419325588641
  episodes_this_iter: 10
  episodes_total: 1820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.606
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4464253187179565
        entropy_coeff: 0.0
        kl: 4.273970262147486e-06
        policy_loss: 0.007750427350401878
        total_loss: 81.15304

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 227
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 259
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 225
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_14-58-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1046.8733304845764
  episode_reward_mean: 662.3048010734501
  episode_reward_min: 184.59664417283
  episodes_this_iter: 10
  episodes_total: 1870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.647
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4466633796691895
        entropy_coeff: 0.0
        kl: 1.3012442650506273e-05
        policy_loss: -0.004045561887323856
        total_loss: 87.38645

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 222
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 259
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-00-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1046.8733304845764
  episode_reward_mean: 675.6340179723516
  episode_reward_min: 184.59664417283
  episodes_this_iter: 10
  episodes_total: 1920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.441
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4570964574813843
        entropy_coeff: 0.0
        kl: 1.620572220417671e-05
        policy_loss: 0.008048759773373604
        total_loss: 86.05427551269531
        vf_explained_var: 0.1084146499633789
        vf_loss: 86.04622650146484
    load_time_ms: 4.061
    num_steps_sampled: 4800000
    num_steps_trained: 3145728
    sample_time_ms: 29095.544
    update_time_ms: 3.698
  iterations_since_restore: 192
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_est

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 240
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 265
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-03-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1161.591359433492
  episode_reward_mean: 764.2976221149338
  episode_reward_min: 331.0926075217308
  episodes_this_iter: 10
  episodes_total: 1970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.886
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4620531797409058
        entropy_coeff: 0.0
        kl: 1.8172504496760666e-05
        policy_loss: 0.005959328263998032
        total_loss: 90.70404052734375
        vf_explained_var: 0.10543882846832275
        vf_loss: 90.69806671142578
    load_time_ms: 4.173
    num_steps_sampled: 4925000
    num_steps_trained: 3227648
    sample_time_ms: 29094.599
    update_time_ms: 3.561
  iterations_since_restore: 197
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 237
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-05-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1161.591359433492
  episode_reward_mean: 747.0486284399532
  episode_reward_min: 278.66418860902763
  episodes_this_iter: 10
  episodes_total: 2020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.202
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4680876731872559
        entropy_coeff: 0.0
        kl: 4.018471372546628e-06
        policy_loss: 0.0013767275959253311
        total_loss: 94.41060638427734
        vf_explained_var: 0.10485237836837769
        vf_loss: 94.40927124023438
    load_time_ms: 4.145
    num_steps_sampled: 5050000
    num_steps_trained: 3309568
    sample_time_ms: 29097.119
    update_time_ms: 3.396
  iterations_since_restore: 202
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 239
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-08-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1127.407176414733
  episode_reward_mean: 711.3088001527784
  episode_reward_min: 235.19658596749048
  episodes_this_iter: 10
  episodes_total: 2070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.973
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4646973609924316
        entropy_coeff: 0.0
        kl: 5.378788046073169e-07
        policy_loss: -0.01042831689119339
        total_loss: 94.13491821289062
        vf_explained_var: 0.10190063714981079
        vf_loss: 94.14533233642578
    load_time_ms: 4.058
    num_steps_sampled: 5175000
    num_steps_trained: 3391488
    sample_time_ms: 29033.499
    update_time_ms: 3.425
  iterations_since_restore: 207
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 240
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 227
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-10-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1127.407176414733
  episode_reward_mean: 738.5740954239653
  episode_reward_min: 235.19658596749048
  episodes_this_iter: 10
  episodes_total: 2120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.372
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4666451215744019
        entropy_coeff: 0.0
        kl: 4.491957952268422e-06
        policy_loss: -0.004494006745517254
        total_loss: 107.5338363647461
        vf_explained_var: 0.09866553544998169
        vf_loss: 107.538330078125
    load_time_ms: 4.081
    num_steps_sampled: 5300000
    num_steps_trained: 3473408
    sample_time_ms: 29026.78
    update_time_ms: 3.586
  iterations_since_restore: 212
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_e

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-13-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1121.211767978737
  episode_reward_mean: 758.6479986958486
  episode_reward_min: 263.2440653555007
  episodes_this_iter: 10
  episodes_total: 2170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.712
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4592373371124268
        entropy_coeff: 0.0
        kl: 4.7551002353429794e-06
        policy_loss: -0.005651869345456362
        total_loss: 99.11241149902344
        vf_explained_var: 0.10112786293029785
        vf_loss: 99.11804962158203
    load_time_ms: 4.305
    num_steps_sampled: 5425000
    num_steps_t

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 243
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 248
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-15-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1121.211767978737
  episode_reward_mean: 747.3086059560773
  episode_reward_min: 234.89661225494964
  episodes_this_iter: 10
  episodes_total: 2220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.533
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4545350074768066
        entropy_coeff: 0.0
        kl: 4.806490323971957e-06
        policy_loss: -0.001954193226993084
        total_loss: 85.64656066894531
        vf_explained_var: 0.10481947660446167
        vf_loss: 85.64849853515625
    load_time_ms: 4.264
    num_steps_sampled: 5550000
    num_steps_trained: 3637248
    sample_time_ms: 29084.361
    update_time_ms: 3.528
  iterations_since_restore: 222
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 226
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-17-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1080.1134815125765
  episode_reward_mean: 727.0795453422975
  episode_reward_min: 234.89661225494964
  episodes_this_iter: 10
  episodes_total: 2270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.597
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4593775272369385
        entropy_coeff: 0.0
        kl: 1.781499304343015e-05
        policy_loss: 0.00011977064423263073
        total_loss: 91.83

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 247
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 253
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-20-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1128.302605487626
  episode_reward_mean: 753.6556890606272
  episode_reward_min: 287.49580455154796
  episodes_this_iter: 10
  episodes_total: 2320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.567
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4596275091171265
        entropy_coeff: 0.0
        kl: 7.232800271594897e-06
        policy_loss: -0.0026736543513834476
        total_loss: 85.3672103881836
        vf_explained_var: 0.10337823629379272
        vf_loss: 85.36988830566406
    load_time_ms: 4.11
    num_steps_sampled: 5800000
    num_steps_trained: 3801088
    sample_time_ms: 29063.253
    update_time_ms: 3.514
  iterations_since_restore: 232
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 224
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-22-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1145.3982499007718
  episode_reward_mean: 731.4812145496096
  episode_reward_min: 183.0958907949513
  episodes_this_iter: 10
  episodes_total: 2370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.144
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.463472604751587
        entropy_coeff: 0.0
        kl: 3.653298335848376e-06
        policy_loss: -0.0007430408149957657
        total_loss: 101.08573913574219
        vf_explained_var: 0.09658247232437134
        vf_loss: 101.08648681640625
    load_time_ms: 4.026
    num_steps_sampled: 5925000
    num_steps_trained: 3883008
    sample_time_ms: 29082.171
    update_time_ms: 3.567
  iterations_since_restore: 237
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-25-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1145.3982499007718
  episode_reward_mean: 738.8707073305145
  episode_reward_min: 183.0958907949513
  episodes_this_iter: 10
  episodes_total: 2420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.558
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4681063890457153
        entropy_coeff: 0.0
        kl: 6.163008947623894e-06
        policy_loss: 0.0013890051050111651
        total_loss: 81.56130981445312
        vf_explained_var: 0.10498416423797607
        vf_loss: 81.5599365234375
    load_time_ms: 4.017
    num_steps_sampled: 6050000
    num_steps_trained: 3964928
    sample_time_ms: 29046.936
    update_time_ms: 3.546
  iterations_since_restore: 242
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 257
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 258
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 253
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 223
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-27-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.239485038589
  episode_reward_mean: 784.1726668327086
  episode_reward_min: 289.73263620947927
  episodes_this_iter: 10
  episodes_total: 2470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.591
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.475178599357605
        entropy_coeff: 0.0
        kl: 9.7725132945925e-06
        policy_loss: -0.000812494195997715
        total_loss: 118.805572

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 229
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 254
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-30-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1124.3973213851
  episode_reward_mean: 812.0162045612145
  episode_reward_min: 429.3702104923248
  episodes_this_iter: 10
  episodes_total: 2520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.794
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4792166948318481
        entropy_coeff: 0.0
        kl: 2.7677753678290173e-05
        policy_loss: 0.0062624616548419
        total_loss: 113.23341369

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 244
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 255
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-32-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1124.3973213851
  episode_reward_mean: 819.3004196674582
  episode_reward_min: 389.1316056420129
  episodes_this_iter: 10
  episodes_total: 2570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.809
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4789811372756958
        entropy_coeff: 0.0
        kl: 2.4631644919281825e-06
        policy_loss: 0.00010372744873166084
        total_loss: 105.97840118408203
        vf_explained_var: 0.08816653490066528
        vf_loss: 105.97831726074219
    load_time_ms: 3.979
    num_steps_sampled: 6425000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 251
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 259
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-34-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1098.8901127230279
  episode_reward_mean: 777.5352803985618
  episode_reward_min: 325.79285930941944
  episodes_this_iter: 10
  episodes_total: 2620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.687
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4734077453613281
        entropy_coeff: 0.0
        kl: 2.6243797037750483e-06
        policy_loss: 0.0021505369804799557
        total_loss: 90.95497131347656
        vf_explained_var: 0.09787631034851074
        vf_loss: 90.95280456542969
    load_time_ms: 4.119
    num_steps_sampled: 6550000
    num_steps_trained: 4292608
    sample_time_ms: 29146.624
    update_time_ms: 3.747
  iterations_since_restore: 262
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 261
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-37-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1089.060887242102
  episode_reward_mean: 782.4029907878257
  episode_reward_min: 268.52626093316536
  episodes_this_iter: 10
  episodes_total: 2670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.54
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4708263874053955
        entropy_coeff: 0.0
        kl: 2.4427645257674158e-05
        policy_loss: -0.00365739269182086
        total_loss: 95.21281433105469
        vf_explained_var: 0.09557580947875977
        vf_loss: 95.21646118164062
    load_time_ms: 4.193
    num_steps_sampled: 6675000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-39-49
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1089.060887242102
  episode_reward_mean: 806.589801258265
  episode_reward_min: 268.52626093316536
  episodes_this_iter: 10
  episodes_total: 2720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.104
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4704596996307373
        entropy_coeff: 0.0
        kl: 5.5517739383503795e-06
        policy_loss: -0.002971178386360407
        total_loss: 100.375

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 221
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 229
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-42-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1058.8167928556265
  episode_reward_mean: 827.5866900466659
  episode_reward_min: 399.81193761246584
  episodes_this_iter: 10
  episodes_total: 2770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.859
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4695281982421875
        entropy_coeff: 0.0
        kl: 1.52598659042269e-06
        policy_loss: 0.0018185346852988005
        total_loss: 112.50907897949219
        vf_explained_var: 0.08856165409088135
        vf_loss: 112.50727844238281
    load_time_ms: 4.142
    num_steps_sampled: 6925000
    num_steps_trained: 4538368
    sample_time_ms: 29164.154
    update_time_ms: 3.508
  iterations_since_restore: 277
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 221
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-44-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1105.5528690870467
  episode_reward_mean: 849.1451085175531
  episode_reward_min: 399.81193761246584
  episodes_this_iter: 10
  episodes_total: 2820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.956
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.476797103881836
        entropy_coeff: 0.0
        kl: 1.6523081285413355e-06
        policy_loss: 0.0019007152877748013
        total_loss: 136.5423583984375
        vf_explained_var: 0.07178640365600586
        vf_loss: 136.54046630859375
    load_time_ms: 4.022
    num_steps_sampled: 7050000
    num_steps_trained: 4620288
    sample_time_ms: 29206.483
    update_time_ms: 3.526
  iterations_since_restore: 282
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-47-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1105.5528690870467
  episode_reward_mean: 852.8320710546967
  episode_reward_min: 425.5598987603328
  episodes_this_iter: 10
  episodes_total: 2870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.473
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.474578619003296
        entropy_coeff: 0.0
        kl: 1.1001167877111584e-05
        policy_loss: 0.0006173797883093357
        total_loss: 123.857

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 262
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-50-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1049.3623721637227
  episode_reward_mean: 871.084326717852
  episode_reward_min: 507.4717513839814
  episodes_this_iter: 10
  episodes_total: 2920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 66.556
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4721853733062744
        entropy_coeff: 0.0
        kl: 5.1312636060174555e-06
        policy_loss: -0.0005031698383390903
        total_loss: 113.05

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-53-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1090.2677177599269
  episode_reward_mean: 864.6358652604218
  episode_reward_min: 594.9729650947589
  episodes_this_iter: 10
  episodes_total: 2970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 110.117
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.47605299949646
        entropy_coeff: 0.0
        kl: 2.908705937443301e-05
        policy_loss: 0.005522112362086773
        total_loss: 110.65201568603516
        vf_explained_var: 0.08424127101898193
        vf_loss: 110.64646911621094
    load_time_ms: 9.588
    num_steps_sampled: 7425000
    num_steps_t

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 254
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 256
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_15-56-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1090.2677177599269
  episode_reward_mean: 846.7303556356244
  episode_reward_min: 479.1018058167627
  episodes_this_iter: 10
  episodes_total: 3020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 121.03
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4797593355178833
        entropy_coeff: 0.0
        kl: 1.9084825908066705e-05
        policy_loss: -0.001268761814571917
        total_loss: 108.04281616210938
        vf_explained_var: 0.08421772718429565
        vf_loss: 108.04408264160156
    load_time_ms: 10.172
    num_steps_sampled: 7550000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 228
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 230
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-00-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1082.5142186553037
  episode_reward_mean: 857.5359939103739
  episode_reward_min: 479.1018058167627
  episodes_this_iter: 10
  episodes_total: 3070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 120.402
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.482770562171936
        entropy_coeff: 0.0
        kl: 2.9377588361967355e-06
        policy_loss: 0.005665109492838383
        total_loss: 107.496

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 234
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 234
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-03-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1081.4028627731716
  episode_reward_mean: 865.7309688799078
  episode_reward_min: 550.1790005310496
  episodes_this_iter: 10
  episodes_total: 3120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 118.939
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4813812971115112
        entropy_coeff: 0.0
        kl: 7.325434125959873e-07
        policy_loss: -0.001310769934207201
        total_loss: 115.93473815917969
        vf_explained_var: 0.07960790395736694
        vf_loss: 115.9360580444336
    load_time_ms: 10.075
    num_steps_sampled: 7800000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 234
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 263
[2m[36m(pid=4074)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-06-54
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1093.668052682609
  episode_reward_mean: 871.7778618314057
  episode_reward_min: 528.0865822955135
  episodes_this_iter: 10
  episodes_total: 3170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 115.374
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4871902465820312
        entropy_coeff: 0.0
        kl: 8.033475751290098e-07
        policy_loss: 0.007392001338303089
        total_loss: 110.09776306152344
        vf_explained_var: 0.08041763305664062
        vf_loss: 110.09039306640625
    load_time_ms: 10.102
    num_steps_sampled: 7925000
    num_steps

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 224
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 248
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-10-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1093.668052682609
  episode_reward_mean: 875.0863461681788
  episode_reward_min: 528.0865822955135
  episodes_this_iter: 10
  episodes_total: 3220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 110.67
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4860621690750122
        entropy_coeff: 0.0
        kl: 9.985706128645688e-06
        policy_loss: -0.008291278965771198
        total_loss: 114.49127197265625
        vf_explained_var: 0.07964122295379639
        vf_loss: 114.49958038330078
    load_time_ms: 9.379
    num_steps_sampled: 8050000
    num_steps_trained: 5275648
    sample_time_ms: 39605.755
    update_time_ms: 10.842
  iterations_since_restore: 322
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 255
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 232
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-13-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1134.8650608853077
  episode_reward_mean: 866.0470325309344
  episode_reward_min: 593.5989637149323
  episodes_this_iter: 10
  episodes_total: 3270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 108.277
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.483271837234497
        entropy_coeff: 0.0
        kl: 3.0556511774193496e-06
        policy_loss: 0.0036361566744744778
        total_loss: 117.09437561035156
        vf_explained_var: 0.07683467864990234
        vf_loss: 117.09072875976562
    load_time_ms: 8.991
    num_steps_sampled: 8175000
    num_step

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 255
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-16-50
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1134.8650608853077
  episode_reward_mean: 875.1763752915008
  episode_reward_min: 545.6218695997994
  episodes_this_iter: 10
  episodes_total: 3320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 112.317
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4978229999542236
        entropy_coeff: 0.0
        kl: 3.223262319806963e-05
        policy_loss: -0.007005210965871811
        total_loss: 121.55451202392578
        vf_explained_var: 0.07083529233932495
        vf_loss: 121.5615234375
    load_time_ms: 9.465
    num_steps_sampled: 8300000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 266
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-20-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1073.5424672755928
  episode_reward_mean: 887.4611242102067
  episode_reward_min: 545.6218695997994
  episodes_this_iter: 10
  episodes_total: 3370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.427
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4889006614685059
        entropy_coeff: 0.0
        kl: 1.4446450222749263e-06
        policy_loss: -0.0024605272337794304
        total_loss: 112.

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 228
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 238
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 254
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-23-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1064.6497096651321
  episode_reward_mean: 872.9740257581236
  episode_reward_min: 616.9487751889759
  episodes_this_iter: 10
  episodes_total: 3420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.736
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4952600002288818
        entropy_coeff: 0.0
        kl: 9.011007932713255e-06
        policy_loss: -0.0017127082683146
        total_loss: 108.8077

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-26-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1134.8003822596415
  episode_reward_mean: 879.5968620446208
  episode_reward_min: 446.95024951823933
  episodes_this_iter: 10
  episodes_total: 3470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 107.394
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4979530572891235
        entropy_coeff: 0.0
        kl: 1.657694883760996e-05
        policy_loss: 0.004617837257683277
        total_loss: 118.47638702392578
        vf_explained_var: 0.06781262159347534
        vf_loss: 118.47178649902344
    load_time_ms: 8.806
    num_steps_sampled: 8675000
    num_steps_trained: 5685248
    sample_time_ms: 39651.126
    update_time_ms: 10.492
  iterations_since_restore: 347
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 260
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 224
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-30-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1134.8003822596415
  episode_reward_mean: 889.7690654949934
  episode_reward_min: 446.95024951823933
  episodes_this_iter: 10
  episodes_total: 3520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 107.747
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4950567483901978
        entropy_coeff: 0.0
        kl: 1.3401022442849353e-05
        policy_loss: -0.00014104880392551422
        total_loss: 118.76969909667969
        vf_explained_var: 0.06523561477661133
        vf_loss: 118.76982116699219
    load_time_ms: 8.876
    num_steps_sampled: 8800000
    num_steps_trained: 5767168
    sample_time_ms: 39664.222
    update_time_ms: 10.538
  iterations_since_restore: 352
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-33-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1050.1048857535293
  episode_reward_mean: 878.477092737341
  episode_reward_min: 638.2254261655827
  episodes_this_iter: 10
  episodes_total: 3570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 115.353
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4880268573760986
        entropy_coeff: 0.0
        kl: 4.844136128667742e-06
        policy_loss: -0.0005686702206730843
        total_loss: 115.57044982910156
        vf_explained_var: 0.06840628385543823
        vf_loss: 115.57101440429688
    load_time_ms: 9.987
    num_steps_sampled: 8925000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-36-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1019.1263900030692
  episode_reward_mean: 880.0532490969812
  episode_reward_min: 694.471970918632
  episodes_this_iter: 10
  episodes_total: 3620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.901
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4840784072875977
        entropy_coeff: 0.0
        kl: 5.668534868163988e-06
        policy_loss: 6.685219705104828e-05
        total_loss: 116.35731506347656
        vf_explained_var: 0.06426048278808594
        vf_loss: 116.35726165771484
    load_time_ms: 10.1
    num_steps_sampled: 9050000
    num_steps_trained: 5931008
    sample_time_ms: 39719.611
    update_time_ms: 9.425
  iterations_since_restore: 362
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 237
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 259
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-40-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1021.4052521611403
  episode_reward_mean: 885.9500140307816
  episode_reward_min: 501.0873355344222
  episodes_this_iter: 10
  episodes_total: 3670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.995
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4953117370605469
        entropy_coeff: 0.0
        kl: 1.423817957402207e-05
        policy_loss: 0.0032138219103217125
        total_loss: 125.80020141601562
        vf_explained_var: 0.05625641345977783
        vf_loss: 125.79701232910156
    load_time_ms: 10.585
    num_steps_sampled: 9175000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 241
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-43-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.6668421870959
  episode_reward_mean: 883.7528396942674
  episode_reward_min: 501.0873355344222
  episodes_this_iter: 10
  episodes_total: 3720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 122.369
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4975883960723877
        entropy_coeff: 0.0
        kl: 3.896009729942307e-06
        policy_loss: 0.0013006494846194983
        total_loss: 112.61

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 232
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-46-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.6668421870959
  episode_reward_mean: 873.7154471561892
  episode_reward_min: 578.7608510131579
  episodes_this_iter: 10
  episodes_total: 3770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 118.205
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5031917095184326
        entropy_coeff: 0.0
        kl: 9.056977432919666e-06
        policy_loss: 0.006500140763819218
        total_loss: 108.67475128173828
        vf_explained_var: 0.07037752866744995
        vf_loss: 108.66825103759766
    load_time_ms: 9.61
    num_steps_sampled: 9425000
    num_steps_trained: 6176768
    sample_time_ms: 39624.387
    update_time_ms: 9.756
  iterations_since_restore: 377
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 251
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-50-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1019.655515878164
  episode_reward_mean: 876.874656330132
  episode_reward_min: 651.3849615408643
  episodes_this_iter: 10
  episodes_total: 3820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.211
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5022673606872559
        entropy_coeff: 0.0
        kl: 6.531980034196749e-06
        policy_loss: 0.0023050974123179913
        total_loss: 104.6854

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 230
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 224
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-53-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1020.9499623083357
  episode_reward_mean: 887.5984528485357
  episode_reward_min: 679.4880686041306
  episodes_this_iter: 10
  episodes_total: 3870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 113.784
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5028560161590576
        entropy_coeff: 0.0
        kl: 4.426772648002952e-06
        policy_loss: -0.0018025576137006283
        total_loss: 117.4508056640625
        vf_explained_var: 0.05704158544540405
        vf_loss: 117.45262145996094
    load_time_ms: 9.655
    num_steps_sampled: 9675000
    num_steps_trained: 6340608
    sample_time_ms: 39800.575
    update_time_ms: 9.826
  iterations_since_restore: 387
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 222
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 255
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 228
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 262
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_16-56-43
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1020.9499623083357
  episode_reward_mean: 892.7812684031322
  episode_reward_min: 677.1202065923561
  episodes_this_iter: 10
  episodes_total: 3920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 118.09
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5043118000030518
        entropy_coeff: 0.0
        kl: 1.5543300833087415e-05
        policy_loss: -0.00013596750795841217
        total_loss: 114.18440246582031
        vf_explained_var: 0.058480024337768555
        vf_loss: 114.18455505371094
    load_time_ms: 10.024
    num_steps_sampled: 9800000
    num_steps_trained: 6422528
    sample_time_ms: 39787.943
    update_time_ms: 10.533
  iterations_since_restore: 392
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 236
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-00-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1014.2927116010092
  episode_reward_mean: 896.0855200000464
  episode_reward_min: 677.1202065923561
  episodes_this_iter: 10
  episodes_total: 3970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 123.814
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4983361959457397
        entropy_coeff: 0.0
        kl: 1.027055986924097e-06
        policy_loss: 0.005461211781948805
        total_loss: 112.95411682128906
        vf_explained_var: 0.05769389867782593
        vf_loss: 112.94864654541016
    load_time_ms: 10.25
    num_steps_sampled: 9925000
    num_steps

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-03-29
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1014.2927116010092
  episode_reward_mean: 891.3709341855085
  episode_reward_min: 701.7682007676893
  episodes_this_iter: 10
  episodes_total: 4020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 118.033
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5072462558746338
        entropy_coeff: 0.0
        kl: 1.4157729310682043e-05
        policy_loss: 0.003910827450454235
        total_loss: 119.55567932128906
        vf_explained_var: 0.049672722816467285
        vf_loss: 119.55172729492188
    load_time_ms: 9.653
    num_steps_sampled: 10050000
    num_st

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 263
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-07-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 999.7641271357332
  episode_reward_mean: 889.0394711881047
  episode_reward_min: 726.8276473518057
  episodes_this_iter: 10
  episodes_total: 4070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 118.47
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5072911977767944
        entropy_coeff: 0.0
        kl: 1.534691546112299e-05
        policy_loss: -0.01122160255908966
        total_loss: 112.56224060058594
        vf_explained_var: 0.05370593070983887
        vf_loss: 112.5734634399414
    load_time_ms: 10.777
    num_steps_sampled: 10175000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 256
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-10-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1015.1470118364691
  episode_reward_mean: 890.8388939021382
  episode_reward_min: 734.7893643763257
  episodes_this_iter: 10
  episodes_total: 4120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 123.235
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.5082666873931885
        entropy_coeff: 0.0
        kl: 2.127255720552057e-05
        policy_loss: 0.0064214738085865974
        total_loss: 113.16532897949219
        vf_explained_var: 0.04830127954483032
        vf_loss: 113.15890502929688
    load_time_ms: 11.504
    num_steps_sampled: 10300000
    num_steps_trained: 6750208
    sample_time_ms: 40923.468
    update_time_ms: 12.092
  iterations_since_restore: 412
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 224
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 247
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-13-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1020.7223853534326
  episode_reward_mean: 890.6336350780965
  episode_reward_min: 734.7893643763257
  episodes_this_iter: 10
  episodes_total: 4170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 112.048
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.506566047668457
        entropy_coeff: 0.0
        kl: 1.8849146727006882e-05
        policy_loss: -0.004874879494309425
        total_loss: 109.54867553710938
        vf_explained_var: 0.053676068782806396
        vf_loss: 109.55354309082031
    load_time_ms: 10.133
    num_steps_sampled: 10425000
    num_s

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 245
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-17-19
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1020.7223853534326
  episode_reward_mean: 887.588669476939
  episode_reward_min: 750.3071980221096
  episodes_this_iter: 10
  episodes_total: 4220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 115.018
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4974453449249268
        entropy_coeff: 0.0
        kl: 3.4502092603361234e-05
        policy_loss: -0.00512941088527441
        total_loss: 117.09292602539062
        vf_explained_var: 0.040937960147857666
        vf_loss: 117.09809112548828
    load_time_ms: 9.506
    num_steps_sampled: 10550000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-21-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 979.6361055633884
  episode_reward_mean: 881.6621538431366
  episode_reward_min: 745.7142616331456
  episodes_this_iter: 10
  episodes_total: 4270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 130.189
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4820873737335205
        entropy_coeff: 0.0
        kl: 1.8793165509123355e-05
        policy_loss: 0.0022917063906788826
        total_loss: 118.42768859863281
        vf_explained_var: 0.03941386938095093
        vf_loss: 118.4253921508789
    load_time_ms: 11.024
    num_steps_sampled: 10675000
    num_steps_trained: 6995968
    sample_time_ms: 44455.802
    update_time_ms: 10.525
  iterations_since_restore: 427
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 231
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-24-53
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 976.7339540947928
  episode_reward_mean: 883.0822893318339
  episode_reward_min: 711.6157672116545
  episodes_this_iter: 10
  episodes_total: 4320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 143.76
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4801267385482788
        entropy_coeff: 0.0
        kl: 6.9440502556972206e-06
        policy_loss: -0.004689800553023815
        total_loss: 110.5906982421875
        vf_explained_var: 0.048009276390075684
        vf_loss: 110.59538269042969
    load_time_ms: 13.145
    num_steps_sampled: 10800000
    num_steps_trained: 7077888
    sample_time_ms: 45153.524
    update_time_ms: 11.025
  iterations_since_restore: 432
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 248
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-28-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 981.8613119466631
  episode_reward_mean: 890.505384341455
  episode_reward_min: 711.6157672116545
  episodes_this_iter: 10
  episodes_total: 4370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 132.354
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4778923988342285
        entropy_coeff: 0.0
        kl: 1.927539415191859e-05
        policy_loss: -0.005472528748214245
        total_loss: 116.56300354003906
        vf_explained_var: 0.041332244873046875
        vf_loss: 116.56847381591797
    load_time_ms: 11.782
    num_steps_sampled: 10925000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 253
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-31-43
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 981.8613119466631
  episode_reward_mean: 891.655354060784
  episode_reward_min: 742.4484159073779
  episodes_this_iter: 10
  episodes_total: 4420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 116.377
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.479263424873352
        entropy_coeff: 0.0
        kl: 1.8145776266464964e-05
        policy_loss: 0.006165489554405212
        total_loss: 110.11553955078125
        vf_explained_var: 0.0456465482711792
        vf_loss: 110.109375
    load_time_ms: 9.583
    num_steps_sampled: 11050000
    num_steps_trained: 

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-35-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 976.4459431633495
  episode_reward_mean: 887.8473581413247
  episode_reward_min: 775.5451085546749
  episodes_this_iter: 10
  episodes_total: 4470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 114.258
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4739965200424194
        entropy_coeff: 0.0
        kl: 5.785088433185592e-06
        policy_loss: -0.000604754313826561
        total_loss: 112.73039245605469
        vf_explained_var: 0.04345893859863281
        vf_loss: 112.73100280761719
    load_time_ms: 9.089
    num_steps_sampled: 11175000
    num_steps_trained: 7323648
    sample_time_ms: 41034.002
    update_time_ms: 10.321
  iterations_since_restore: 447
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-38-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 979.6137947049842
  episode_reward_mean: 887.7171524217185
  episode_reward_min: 715.0518808824146
  episodes_this_iter: 10
  episodes_total: 4520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 105.678
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4680380821228027
        entropy_coeff: 0.0
        kl: 3.552316775312647e-05
        policy_loss: 0.0014657513238489628
        total_loss: 113.138427734375
        vf_explained_var: 0.045590221881866455
        vf_loss: 113.1369400024414
    load_time_ms: 8.69
    num_steps_sampled: 11300000
    num_steps_trained: 7405568
    sample_time_ms: 40828.774
    update_time_ms: 10.461
  iterations_since_restore: 452
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 243
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-41-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 984.5696907249707
  episode_reward_mean: 893.1883777736671
  episode_reward_min: 715.0518808824146
  episodes_this_iter: 10
  episodes_total: 4570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 75.471
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4682016372680664
        entropy_coeff: 0.0
        kl: 1.7808670236263424e-06
        policy_loss: -8.050980977714062e-05
        total_loss: 112.97315979003906
        vf_explained_var: 0.04083573818206787
        vf_loss: 112.97322845458984
    load_time_ms: 6.683
    num_steps_sampled: 11425000
    num_steps_trained: 7487488
    sample_time_ms: 36415.841
    update_time_ms: 6.969
  iterations_since_restore: 457
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 266
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-43-47
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 984.5696907249707
  episode_reward_mean: 894.3933103848275
  episode_reward_min: 774.9967843061257
  episodes_this_iter: 10
  episodes_total: 4620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.532
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4632900953292847
        entropy_coeff: 0.0
        kl: 3.4701795811997727e-05
        policy_loss: 0.0025427555665373802
        total_loss: 114.38331604003906
        vf_explained_var: 0.03472769260406494
        vf_loss: 114.38079071044922
    load_time_ms: 4.73
    num_steps_sampled: 11550000
    num_steps_trained: 7569408
    sample_time_ms: 31304.591
    update_time_ms: 4.217
  iterations_since_restore: 462
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-46-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 993.7779563652756
  episode_reward_mean: 894.7935480295408
  episode_reward_min: 774.9967843061257
  episodes_this_iter: 10
  episodes_total: 4670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.023
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.458489179611206
        entropy_coeff: 0.0
        kl: 8.54510726640001e-05
        policy_loss: -0.0026542851701378822
        total_loss: 116.76948547363281
        vf_explained_var: 0.03291124105453491
        vf_loss: 116.77214050292969
    load_time_ms: 4.738
    num_steps_sampled: 11675000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 226
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 228
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-49-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 993.7779563652756
  episode_reward_mean: 895.006665402158
  episode_reward_min: 730.1863589360613
  episodes_this_iter: 10
  episodes_total: 4720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 55.35
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4569616317749023
        entropy_coeff: 0.0
        kl: 3.4478012821637094e-05
        policy_loss: -0.0059349811635911465
        total_loss: 109.19464874267578
        vf_explained_var: 0.04185837507247925
        vf_loss: 109.20057678222656
    load_time_ms: 4.656
    num_steps_sampled: 11800000
    num_steps

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 247
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-51-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 966.868666313989
  episode_reward_mean: 895.4156681413926
  episode_reward_min: 730.1863589360613
  episodes_this_iter: 10
  episodes_total: 4770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 52.338
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4522593021392822
        entropy_coeff: 0.0
        kl: 1.2849355698563159e-05
        policy_loss: 0.0006808624602854252
        total_loss: 117.52272033691406
        vf_explained_var: 0.028389573097229004
        vf_loss: 117.52203369140625
    load_time_ms: 4.173
    num_steps_sampled: 11925000
    num_steps_trained: 7815168
    sample_time_ms: 31221.408
    update_time_ms: 4.034
  iterations_since_restore: 477
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-54-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 966.868666313989
  episode_reward_mean: 896.1996032467192
  episode_reward_min: 796.0749432379713
  episodes_this_iter: 10
  episodes_total: 4820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.607
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4509241580963135
        entropy_coeff: 0.0
        kl: 3.6380479286890477e-06
        policy_loss: 0.00027855916414409876
        total_loss: 113.66510009765625
        vf_explained_var: 0.031890034675598145
        vf_loss: 113.66481018066406
    load_time_ms: 4.052
    num_steps_sampled: 12050000
    num_steps_trained: 7897088
    sample_time_ms: 29743.229
    update_time_ms: 3.539
  iterations_since_restore: 482
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 259
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-56-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 970.1945739387396
  episode_reward_mean: 894.4328985789994
  episode_reward_min: 796.0749432379713
  episodes_this_iter: 10
  episodes_total: 4870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.888
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4581196308135986
        entropy_coeff: 0.0
        kl: 1.1559826816665009e-05
        policy_loss: 0.0001715170219540596
        total_loss: 113.8304443359375
        vf_explained_var: 0.0331074595451355
        vf_loss: 113.83027648925781
    load_time_ms: 4.046
    num_steps_sampled: 12175000
    num_steps_trained: 7979008
    sample_time_ms: 29740.534
    update_time_ms: 3.845
  iterations_since_restore: 487
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 238
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 233
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_17-59-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 970.2640876726241
  episode_reward_mean: 895.4720623103005
  episode_reward_min: 800.8957819885854
  episodes_this_iter: 10
  episodes_total: 4920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.639
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.455203652381897
        entropy_coeff: 0.0
        kl: 2.1420928533189e-06
        policy_loss: -0.0020385473035275936
        total_loss: 112.636245

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 253
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-01-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 970.2640876726241
  episode_reward_mean: 893.8672615213112
  episode_reward_min: 800.8957819885854
  episodes_this_iter: 10
  episodes_total: 4970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.153
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4509601593017578
        entropy_coeff: 0.0
        kl: 2.9609542252728716e-05
        policy_loss: 0.005708618555217981
        total_loss: 109.44730377197266
        vf_explained_var: 0.033890724182128906
        vf_loss: 109.44157409667969
    load_time_ms: 4.025
    num_steps_sampled: 12425000
    num_steps_trained: 8142848
    sample_time_ms: 29819.691
    update_time_ms: 3.35
  iterations_since_restore: 497
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 255
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-04-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 967.0121973768069
  episode_reward_mean: 889.8712079095576
  episode_reward_min: 805.620831630857
  episodes_this_iter: 10
  episodes_total: 5020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.11
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4466441869735718
        entropy_coeff: 0.0
        kl: 1.4841360098216683e-05
        policy_loss: 0.004286300390958786
        total_loss: 108.50192260742188
        vf_explained_var: 0.03499966859817505
        vf_loss: 108.49761962890625
    load_time_ms: 4.273
    num_steps_sampled: 12550000
    num_steps_trained: 8224768
    sample_time_ms: 29754.293
    update_time_ms: 3.427
  iterations_since_restore: 502
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 269
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 239
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 224
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-06-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 955.9141241623964
  episode_reward_mean: 883.5595653683371
  episode_reward_min: 781.6704339010781
  episodes_this_iter: 10
  episodes_total: 5070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.444
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4499304294586182
        entropy_coeff: 0.0
        kl: 3.732065306394361e-05
        policy_loss: 0.0023090012837201357
        total_loss: 106.35911560058594
        vf_explained_var: 0.03424787521362305
        vf_loss: 106.3568115234375
    load_time_ms: 4.253
    num_steps_sampled: 12675000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 246
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-09-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 947.3325512404332
  episode_reward_mean: 885.0632973950175
  episode_reward_min: 781.6704339010781
  episodes_this_iter: 10
  episodes_total: 5120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.019
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4456210136413574
        entropy_coeff: 0.0
        kl: 4.553457984002307e-07
        policy_loss: 0.0017816754989326
        total_loss: 113.1922607421875
        vf_explained_var: 0.02583855390548706
        vf_loss: 113.19049072265625
    load_time_ms: 4.146
    num_steps_sampled: 12800000
    num_steps_trained: 8388608
    sample_time_ms: 29807.815
    update_time_ms: 3.676
  iterations_since_restore: 512
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_e

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-11-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 963.5177818955107
  episode_reward_mean: 891.0573387086174
  episode_reward_min: 810.923754408774
  episodes_this_iter: 10
  episodes_total: 5170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.777
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.437934398651123
        entropy_coeff: 0.0
        kl: 1.4852532331133261e-05
        policy_loss: 0.0034991400316357613
        total_loss: 111.88066101074219
        vf_explained_var: 0.027760207653045654
        vf_loss: 111.87712860107422
    load_time_ms: 4.15
    num_steps_sampled: 12925000
    num_steps_trained: 8470528
    sample_time_ms: 29832.769
    update_time_ms: 3.648
  iterations_since_restore: 517
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-14-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 963.5177818955107
  episode_reward_mean: 891.7458446441977
  episode_reward_min: 804.8483986052646
  episodes_this_iter: 10
  episodes_total: 5220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.045
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4337353706359863
        entropy_coeff: 0.0
        kl: 8.442686521448195e-06
        policy_loss: -0.008814390748739243
        total_loss: 115.11021423339844
        vf_explained_var: 0.026336669921875
        vf_loss: 115.11904907226562
    load_time_ms: 4.15
    num_steps_sampled: 13050000
    num_steps_trained: 8552448
    sample_time_ms: 29838.543
    update_time_ms: 3.788
  iterations_since_restore: 522
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 226
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 223
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-16-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 962.5932965909681
  episode_reward_mean: 889.2587887010741
  episode_reward_min: 795.8082200651425
  episodes_this_iter: 10
  episodes_total: 5270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.532
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4364497661590576
        entropy_coeff: 0.0
        kl: 1.0153016773983836e-06
        policy_loss: 0.001565688056871295
        total_loss: 108.05134582519531
        vf_explained_var: 0.031667232513427734
        vf_loss: 108.0497817993164
    load_time_ms: 4.008
    num_steps_sampled: 13175000
    num_steps_trained: 8634368
    sample_time_ms: 29842.562
    update_time_ms: 3.703
  iterations_since_restore: 527
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 245
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 251
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-19-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 957.7779705476127
  episode_reward_mean: 888.3175562517404
  episode_reward_min: 795.8082200651425
  episodes_this_iter: 10
  episodes_total: 5320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.078
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4387606382369995
        entropy_coeff: 0.0
        kl: 1.1071471817558631e-05
        policy_loss: -0.0033715011086314917
        total_loss: 113.49839782714844
        vf_explained_var: 0.026702284812927246
        vf_loss: 113.50175476074219
    load_time_ms: 3.978
    num_steps_sampled: 13300000
    num_st

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 243
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-21-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 946.896209938778
  episode_reward_mean: 889.8806074526717
  episode_reward_min: 789.2506966013835
  episodes_this_iter: 10
  episodes_total: 5370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.313
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4349992275238037
        entropy_coeff: 0.0
        kl: 1.5334226191043854e-05
        policy_loss: -0.0036339983344078064
        total_loss: 113.0341796875
        vf_explained_var: 0.02292180061340332
        vf_loss: 113.03780364990234
    load_time_ms: 4.302
    num_steps_sampled: 13425000
    num_steps_trained: 8798208
    sample_time_ms: 30169.131
    update_time_ms: 3.721
  iterations_since_restore: 537
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 258
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 256
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 241
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 268
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 250
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-24-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 946.896209938778
  episode_reward_mean: 891.5086678089726
  episode_reward_min: 789.2506966013835
  episodes_this_iter: 10
  episodes_total: 5420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.109
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.43038010597229
        entropy_coeff: 0.0
        kl: 1.1700343748088926e-05
        policy_loss: -4.417030140757561e-05
        total_loss: 111.38436889648438
        vf_explained_var: 0.024078369140625
        vf_loss: 111.38442993164062
    load_time_ms: 4.372
    num_steps_sampled: 13550000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 238
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-26-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 943.7074778753922
  episode_reward_mean: 889.2714676041506
  episode_reward_min: 790.7344079260964
  episodes_this_iter: 10
  episodes_total: 5470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.349
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4303696155548096
        entropy_coeff: 0.0
        kl: 2.708031388465315e-06
        policy_loss: -0.000743647338822484
        total_loss: 111.5155029296875
        vf_explained_var: 0.023411333560943604
        vf_loss: 111.51625061035156
    load_time_ms: 4.006
    num_steps_sampled: 13675000
    num_steps_trained: 8962048
    sample_time_ms: 29788.479
    update_time_ms: 3.442
  iterations_since_restore: 547
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 250
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-29-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.3714448113002
  episode_reward_mean: 883.450654041575
  episode_reward_min: 761.6431270829739
  episodes_this_iter: 10
  episodes_total: 5520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.862
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4240648746490479
        entropy_coeff: 0.0
        kl: 3.707552605192177e-05
        policy_loss: 0.0027077053673565388
        total_loss: 108.31806945800781
        vf_explained_var: 0.02631378173828125
        vf_loss: 108.31533813476562
    load_time_ms: 3.795
    num_steps_sampled: 13800000
    num_steps_trained: 9043968
    sample_time_ms: 29814.702
    update_time_ms: 3.38
  iterations_since_restore: 552
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-31-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 952.1786198144315
  episode_reward_mean: 885.843385197523
  episode_reward_min: 761.6431270829739
  episodes_this_iter: 10
  episodes_total: 5570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.917
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4234832525253296
        entropy_coeff: 0.0
        kl: 1.6821923054521903e-05
        policy_loss: -0.0013912837021052837
        total_loss: 110.75762939453125
        vf_explained_var: 0.01923346519470215
        vf_loss: 110.759033203125
    load_time_ms: 4.027
    num_steps_sampled: 13925000
    num_steps_

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 253
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 251
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-34-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 952.1786198144315
  episode_reward_mean: 889.6730844616467
  episode_reward_min: 805.7976934755421
  episodes_this_iter: 10
  episodes_total: 5620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.484
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.424883484840393
        entropy_coeff: 0.0
        kl: 2.665583451744169e-06
        policy_loss: -0.002634040080010891
        total_loss: 110.23660278320312
        vf_explained_var: 0.023199498653411865
        vf_loss: 110.23919677734375
    load_time_ms: 3.981
    num_steps_sampled: 14050000
    num_steps

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 239
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-36-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 951.1827087041964
  episode_reward_mean: 891.1845988820893
  episode_reward_min: 805.7976934755421
  episodes_this_iter: 10
  episodes_total: 5670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.373
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4278759956359863
        entropy_coeff: 0.0
        kl: 1.009239349514246e-05
        policy_loss: -0.00010225176811218262
        total_loss: 112.2599105834961
        vf_explained_var: 0.02003300189971924
        vf_loss: 112.26001739501953
    load_time_ms: 3.809
    num_steps_sampled: 14175000
    num_step

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 232
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-39-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.1367324608794
  episode_reward_mean: 890.0079477985353
  episode_reward_min: 801.7808536490547
  episodes_this_iter: 10
  episodes_total: 5720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.049
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.429154634475708
        entropy_coeff: 0.0
        kl: 7.0935144321993e-05
        policy_loss: 0.004016554914414883
        total_loss: 107.13809204101562
        vf_explained_var: 0.024419009685516357
        vf_loss: 107.13410186767578
    load_time_ms: 3.931
    num_steps_sampled: 14300000
    num_steps_trained: 9371648
    sample_time_ms: 29876.64
    update_time_ms: 3.493
  iterations_since_restore: 572
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_e

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-41-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.1367324608794
  episode_reward_mean: 887.8854497871235
  episode_reward_min: 801.7808536490547
  episodes_this_iter: 10
  episodes_total: 5770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.444
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4244133234024048
        entropy_coeff: 0.0
        kl: 1.395611252519302e-05
        policy_loss: -0.0006528337253257632
        total_loss: 111.56248474121094
        vf_explained_var: 0.017748355865478516
        vf_loss: 111.5631332397461
    load_time_ms: 4.082
    num_steps_sampled: 14425000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 239
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-44-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 953.5351259410244
  episode_reward_mean: 889.4909103148528
  episode_reward_min: 824.141362589466
  episodes_this_iter: 10
  episodes_total: 5820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.285
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4286229610443115
        entropy_coeff: 0.0
        kl: 6.658307393081486e-05
        policy_loss: 0.010433089919388294
        total_loss: 107.81124114990234
        vf_explained_var: 0.02139383554458618
        vf_loss: 107.80081939697266
    load_time_ms: 4.035
    num_steps_sampled: 14550000
    num_steps_t

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-46-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 953.5351259410244
  episode_reward_mean: 888.7088758809505
  episode_reward_min: 824.141362589466
  episodes_this_iter: 10
  episodes_total: 5870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.816
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4243342876434326
        entropy_coeff: 0.0
        kl: 1.1978045222349465e-05
        policy_loss: -4.641176201403141e-05
        total_loss: 110.916

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-49-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.2792103520342
  episode_reward_mean: 887.7716095068175
  episode_reward_min: 827.2312264339815
  episodes_this_iter: 10
  episodes_total: 5920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.075
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4313013553619385
        entropy_coeff: 0.0
        kl: 1.7191461665788665e-05
        policy_loss: 0.0013324604369699955
        total_loss: 112.240478515625
        vf_explained_var: 0.017113804817199707
        vf_loss: 112.23912048339844
    load_time_ms: 4.068
    num_steps_sampled: 14800000
    num_steps_trained: 9699328
    sample_time_ms: 29863.724
    update_time_ms: 3.627
  iterations_since_restore: 592
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-51-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 935.7842750361385
  episode_reward_mean: 889.8275738522461
  episode_reward_min: 827.2312264339815
  episodes_this_iter: 10
  episodes_total: 5970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.255
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4360995292663574
        entropy_coeff: 0.0
        kl: 5.3423467761604115e-05
        policy_loss: 0.0009979240130633116
        total_loss: 110.86405944824219
        vf_explained_var: 0.017178595066070557
        vf_loss: 110.86306762695312
    load_time_ms: 4.297
    num_steps_sampled: 14925000
    num_steps_trained: 9781248
    sample_time_ms: 29872.594
    update_time_ms: 3.658
  iterations_since_restore: 597
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 223
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-54-01
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 947.8538766135548
  episode_reward_mean: 891.4034323563136
  episode_reward_min: 824.8838735582356
  episodes_this_iter: 10
  episodes_total: 6020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.21
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.450544834136963
        entropy_coeff: 0.0
        kl: 7.359314622590318e-06
        policy_loss: 0.005322844255715609
        total_loss: 110.5172424

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 250
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 238
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 245
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 239
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-56-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 963.4530360213964
  episode_reward_mean: 885.7977673747585
  episode_reward_min: 824.8838735582356
  episodes_this_iter: 10
  episodes_total: 6070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.359
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.458065152168274
        entropy_coeff: 0.0
        kl: 4.275482569937594e-05
        policy_loss: 0.0060569606721401215
        total_loss: 108.08198547363281
        vf_explained_var: 0.019002318382263184
        vf_loss: 108.07593536376953
    load_time_ms: 4.052
    num_steps_sampled: 15175000
    num_steps_trained: 9945088
    sample_time_ms: 29859.922
    update_time_ms: 3.788
  iterations_since_restore: 607
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 258
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 259
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 265
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_18-59-01
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 963.4530360213964
  episode_reward_mean: 880.3383442101741
  episode_reward_min: 820.4419311119633
  episodes_this_iter: 10
  episodes_total: 6120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.175
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4639402627944946
        entropy_coeff: 0.0
        kl: 3.922628820873797e-06
        policy_loss: -0.00259901350364089
        total_loss: 108.64933

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 223
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 227
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-01-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.7314464578845
  episode_reward_mean: 881.3556242721567
  episode_reward_min: 820.4419311119633
  episodes_this_iter: 10
  episodes_total: 6170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.789
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4549496173858643
        entropy_coeff: 0.0
        kl: 3.512375405989587e-05
        policy_loss: -0.0027146562933921814
        total_loss: 107.280

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-04-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.7314464578845
  episode_reward_mean: 880.401017740135
  episode_reward_min: 829.9315845218316
  episodes_this_iter: 10
  episodes_total: 6220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.947
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4669106006622314
        entropy_coeff: 0.0
        kl: 1.789269663277082e-05
        policy_loss: 0.0034214418847113848
        total_loss: 105.05511474609375
        vf_explained_var: 0.01974797248840332
        vf_loss: 105.05171966552734
    load_time_ms: 4.023
    num_steps_sampled: 15550000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 247
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 265
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-06-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 924.8804870013336
  episode_reward_mean: 883.0631676682028
  episode_reward_min: 829.9315845218316
  episodes_this_iter: 10
  episodes_total: 6270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.905
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4677555561065674
        entropy_coeff: 0.0
        kl: 1.6882229829207063e-05
        policy_loss: -0.005184045992791653
        total_loss: 109.033

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 243
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 244
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-08-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 950.6160881935524
  episode_reward_mean: 888.5973601702807
  episode_reward_min: 841.293117843733
  episodes_this_iter: 10
  episodes_total: 6320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 54.527
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4623701572418213
        entropy_coeff: 0.0
        kl: 3.902565367752686e-06
        policy_loss: 0.008525815792381763
        total_loss: 109.5494384765625
        vf_explained_var: 0.013649702072143555
        vf_loss: 109.54090118408203
    load_time_ms: 4.398
    num_steps_sampled: 15800000
    num_steps_trained: 10354688
    sample_time_ms: 29854.442
    update_time_ms: 3.425
  iterations_since_restore: 632
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-11-29
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 950.6160881935524
  episode_reward_mean: 889.9688461330957
  episode_reward_min: 846.064120227463
  episodes_this_iter: 10
  episodes_total: 6370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 50.546
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4639711380004883
        entropy_coeff: 0.0
        kl: 1.9469251128612086e-05
        policy_loss: 0.004498487338423729
        total_loss: 108.23847961425781
        vf_explained_var: 0.013893842697143555
        vf_loss: 108.23397064208984
    load_time_ms: 4.532
    num_steps_sampled: 15925000
    num_steps_trained: 10436608
    sample_time_ms: 29883.072
    update_time_ms: 3.62
  iterations_since_restore: 637
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 229
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 221
[2m[36m(pid=4074)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-13-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.0041610949264
  episode_reward_mean: 889.1999533017138
  episode_reward_min: 834.1039191226756
  episodes_this_iter: 10
  episodes_total: 6420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.386
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4575705528259277
        entropy_coeff: 0.0
        kl: 1.3335051335161552e-05
        policy_loss: -0.00468745781108737
        total_loss: 110.5563735961914
        vf_explained_var: 0.01298433542251587
        vf_loss: 110.56106567382812
    load_time_ms: 4.267
    num_steps_sampled: 16050000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 265
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 262
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 232
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 258
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-16-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 933.2043011897231
  episode_reward_mean: 888.7415205652618
  episode_reward_min: 811.0336810313207
  episodes_this_iter: 10
  episodes_total: 6470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.933
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4439160823822021
        entropy_coeff: 0.0
        kl: 2.3658347345190123e-05
        policy_loss: 0.0026955490466207266
        total_loss: 110.63960266113281
        vf_explained_var: 0.01439368724822998
        vf_loss: 110.63691711425781
    load_time_ms: 4.067
    num_steps_sampled: 16175000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-18-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.2810549694295
  episode_reward_mean: 890.2325409048568
  episode_reward_min: 811.0336810313207
  episodes_this_iter: 10
  episodes_total: 6520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.67
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4360520839691162
        entropy_coeff: 0.0
        kl: 6.342436245176941e-05
        policy_loss: -0.008070030249655247
        total_loss: 111.12774658203125
        vf_explained_var: 0.013856172561645508
        vf_loss: 111.13581085205078
    load_time_ms: 4.008
    num_steps_sampled: 16300000
    num_steps

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 252
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 220
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-21-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 959.7890439558901
  episode_reward_mean: 893.4506568033805
  episode_reward_min: 818.6605399231996
  episodes_this_iter: 10
  episodes_total: 6570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.394
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4494328498840332
        entropy_coeff: 0.0
        kl: 7.302165613509715e-06
        policy_loss: -0.0035199543926864862
        total_loss: 110.197

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 223
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-24-11
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 959.7890439558901
  episode_reward_mean: 891.3660847006727
  episode_reward_min: 818.6605399231996
  episodes_this_iter: 10
  episodes_total: 6620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.466
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4474693536758423
        entropy_coeff: 0.0
        kl: 1.9457416783552617e-05
        policy_loss: -0.0013570988085120916
        total_loss: 110.122314453125
        vf_explained_var: 0.01416933536529541
        vf_loss: 110.1236801147461
    load_time_ms: 4.573
    num_steps_sampled: 16550000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-26-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 926.0250224284988
  episode_reward_mean: 887.9760187922527
  episode_reward_min: 814.456730186884
  episodes_this_iter: 10
  episodes_total: 6670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.545
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4557126760482788
        entropy_coeff: 0.0
        kl: 1.392634294461459e-05
        policy_loss: -0.0008269329555332661
        total_loss: 111.96582794189453
        vf_explained_var: 0.012774050235748291
        vf_loss: 111.96669006347656
    load_time_ms: 3.955
    num_steps_sampled: 16675000
    num_step

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 253
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-29-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.6491192612312
  episode_reward_mean: 890.6497671960034
  episode_reward_min: 814.456730186884
  episodes_this_iter: 10
  episodes_total: 6720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.553
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4550683498382568
        entropy_coeff: 0.0
        kl: 1.160365718533285e-05
        policy_loss: 0.005949420388787985
        total_loss: 111.60189056396484
        vf_explained_var: 0.011361837387084961
        vf_loss: 111.595947265625
    load_time_ms: 3.997
    num_steps_sampled: 16800000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 243
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-31-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.6491192612312
  episode_reward_mean: 890.8526362929783
  episode_reward_min: 842.8247925342713
  episodes_this_iter: 10
  episodes_total: 6770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.715
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4496700763702393
        entropy_coeff: 0.0
        kl: 1.611055267858319e-05
        policy_loss: 0.009804710745811462
        total_loss: 109.9283676147461
        vf_explained_var: 0.011923432350158691
        vf_loss: 109.91856384277344
    load_time_ms: 4.36
    num_steps_sampled: 16925000
    num_steps_t

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 244
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-34-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 937.6342612165919
  episode_reward_mean: 888.4168205675525
  episode_reward_min: 830.2112268814504
  episodes_this_iter: 10
  episodes_total: 6820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.939
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4561854600906372
        entropy_coeff: 0.0
        kl: 1.1029660527128726e-05
        policy_loss: 0.004114036913961172
        total_loss: 108.65746307373047
        vf_explained_var: 0.013236165046691895
        vf_loss: 108.65335083007812
    load_time_ms: 4.478
    num_steps_sampled: 17050000
    num_steps_trained: 11173888
    sample_time_ms: 29790.3
    update_time_ms: 3.422
  iterations_since_restore: 682
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 245
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-36-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.8228941306166
  episode_reward_mean: 888.9907673780774
  episode_reward_min: 825.5143989778747
  episodes_this_iter: 10
  episodes_total: 6870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.957
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4570811986923218
        entropy_coeff: 0.0
        kl: 2.5848796212812886e-05
        policy_loss: -0.0046967677772045135
        total_loss: 111.79664611816406
        vf_explained_var: 0.010928988456726074
        vf_loss: 111.80131530761719
    load_time_ms: 4.356
    num_steps_sampled: 17175000
    num_steps_trained: 11255808
    sample_time_ms: 29818.096
    update_time_ms: 3.444
  iterations_since_restore: 687
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 258
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 255
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-39-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.8228941306166
  episode_reward_mean: 890.3375779163746
  episode_reward_min: 825.5143989778747
  episodes_this_iter: 10
  episodes_total: 6920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.59
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4562506675720215
        entropy_coeff: 0.0
        kl: 1.5690991858718917e-05
        policy_loss: -0.005611165426671505
        total_loss: 110.68473052978516
        vf_explained_var: 0.010321438312530518
        vf_loss: 110.69035339355469
    load_time_ms: 4.193
    num_steps_sampled: 17300000
    num_steps_trained: 11337728
    sample_time_ms: 29834.663
    update_time_ms: 3.501
  iterations_since_restore: 692
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 239
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 232
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 231
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-41-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 956.9085247375224
  episode_reward_mean: 890.8710990758904
  episode_reward_min: 827.9281439395982
  episodes_this_iter: 10
  episodes_total: 6970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.445
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.458362102508545
        entropy_coeff: 0.0
        kl: 2.4624940124340355e-05
        policy_loss: 0.001548693748190999
        total_loss: 109.12310791015625
        vf_explained_var: 0.011958837509155273
        vf_loss: 109.12158203125
    load_time_ms: 4.374
    num_steps_sampled: 17425000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-44-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 956.9085247375224
  episode_reward_mean: 889.6583151697114
  episode_reward_min: 838.8572878671106
  episodes_this_iter: 10
  episodes_total: 7020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.479
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4598995447158813
        entropy_coeff: 0.0
        kl: 2.368484274484217e-05
        policy_loss: -0.0022484618239104748
        total_loss: 109.18962097167969
        vf_explained_var: 0.010963976383209229
        vf_loss: 109.19185638427734
    load_time_ms: 4.395
    num_steps_sampled: 17550000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-46-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 946.7306788332918
  episode_reward_mean: 889.720754923484
  episode_reward_min: 850.1261675659989
  episodes_this_iter: 10
  episodes_total: 7070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.296
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4640038013458252
        entropy_coeff: 0.0
        kl: 0.00010770858352771029
        policy_loss: 0.008483469486236572
        total_loss: 109.67718505859375
        vf_explained_var: 0.01086115837097168
        vf_loss: 109.66871643066406
    load_time_ms: 4.37
    num_steps_sampled: 17675000
    num_steps_t

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 262
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 238
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-49-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 946.7306788332918
  episode_reward_mean: 892.1821978902211
  episode_reward_min: 842.4805059329992
  episodes_this_iter: 10
  episodes_total: 7120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.446
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.470043420791626
        entropy_coeff: 0.0
        kl: 2.3978453100426123e-05
        policy_loss: -0.0041494364850223064
        total_loss: 109.48286437988281
        vf_explained_var: 0.01053541898727417
        vf_loss: 109.48701477050781
    load_time_ms: 4.343
    num_steps_sampled: 17800000
    num_steps_trained: 11665408
    sample_time_ms: 29871.571
    update_time_ms: 3.672
  iterations_since_restore: 712
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 238
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 230
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-51-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.764196039574
  episode_reward_mean: 890.6401705815937
  episode_reward_min: 836.5658184999382
  episodes_this_iter: 10
  episodes_total: 7170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.592
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4764854907989502
        entropy_coeff: 0.0
        kl: 6.417951226467267e-06
        policy_loss: -0.0004199041286483407
        total_loss: 106.59455871582031
        vf_explained_var: 0.013624608516693115
        vf_loss: 106.59496307373047
    load_time_ms: 3.934
    num_steps_sampled: 17925000
    num_step

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 260
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-54-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.764196039574
  episode_reward_mean: 891.4296870688141
  episode_reward_min: 836.5658184999382
  episodes_this_iter: 10
  episodes_total: 7220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.538
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4734227657318115
        entropy_coeff: 0.0
        kl: 1.1774351150961593e-05
        policy_loss: 0.004559155087918043
        total_loss: 107.67541

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 239
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-56-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.3881397914107
  episode_reward_mean: 893.183083564222
  episode_reward_min: 845.8801275083712
  episodes_this_iter: 10
  episodes_total: 7270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.627
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4746429920196533
        entropy_coeff: 0.0
        kl: 3.6797246139030904e-06
        policy_loss: 0.0020404502283781767
        total_loss: 110.42606353759766
        vf_explained_var: 0.011562108993530273
        vf_loss: 110.42401885986328
    load_time_ms: 4.392
    num_steps_sampled: 18175000
    num_steps_trained: 11911168
    sample_time_ms: 29882.874
    update_time_ms: 3.694
  iterations_since_restore: 727
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_19-59-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 934.4365630879297
  episode_reward_mean: 891.8139674577947
  episode_reward_min: 830.0429950522008
  episodes_this_iter: 10
  episodes_total: 7320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.823
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4701377153396606
        entropy_coeff: 0.0
        kl: 2.176268753828481e-05
        policy_loss: 0.00024170009419322014
        total_loss: 111.588

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-01-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 934.4365630879297
  episode_reward_mean: 891.4129631766498
  episode_reward_min: 830.0429950522008
  episodes_this_iter: 10
  episodes_total: 7370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.771
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4691795110702515
        entropy_coeff: 0.0
        kl: 3.2099997042678297e-06
        policy_loss: -0.00298607861623168
        total_loss: 111.5499

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 221
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-04-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 926.749183720267
  episode_reward_mean: 889.5018390622895
  episode_reward_min: 831.0635356270705
  episodes_this_iter: 10
  episodes_total: 7420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.438
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.456835150718689
        entropy_coeff: 0.0
        kl: 2.0354491425678134e-05
        policy_loss: 0.005368770100176334
        total_loss: 108.72843933105469
        vf_explained_var: 0.010794579982757568
        vf_loss: 108.72306823730469
    load_time_ms: 4.306
    num_steps_sampled: 18550000
    num_steps_trained: 12156928
    sample_time_ms: 29885.965
    update_time_ms: 3.449
  iterations_since_restore: 742
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 252
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 266
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 245
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-06-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 937.786668448991
  episode_reward_mean: 890.7192902948108
  episode_reward_min: 831.0635356270705
  episodes_this_iter: 10
  episodes_total: 7470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.225
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4596832990646362
        entropy_coeff: 0.0
        kl: 1.4392426237463951e-06
        policy_loss: -0.0008940086700022221
        total_loss: 110.61296081542969
        vf_explained_var: 0.008821964263916016
        vf_loss: 110.61385345458984
    load_time_ms: 3.882
    num_steps_sampled: 18675000
    num_steps_trained: 12238848
    sample_time_ms: 29883.29
    update_time_ms: 3.413
  iterations_since_restore: 747
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 241
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-09-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 937.786668448991
  episode_reward_mean: 889.8216767363797
  episode_reward_min: 822.3397926757519
  episodes_this_iter: 10
  episodes_total: 7520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.159
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.455355167388916
        entropy_coeff: 0.0
        kl: 1.6830363165354356e-06
        policy_loss: 0.002679006662219763
        total_loss: 110.55022430419922
        vf_explained_var: 0.009227991104125977
        vf_loss: 110.54756164550781
    load_time_ms: 3.839
    num_steps_sampled: 18800000
    num_steps_trained: 12320768
    sample_time_ms: 29941.186
    update_time_ms: 3.595
  iterations_since_restore: 752
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-11-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.7731140459338
  episode_reward_mean: 891.4479947881877
  episode_reward_min: 822.3397926757519
  episodes_this_iter: 10
  episodes_total: 7570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.074
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4511265754699707
        entropy_coeff: 0.0
        kl: 1.770524977473542e-05
        policy_loss: -0.003156851977109909
        total_loss: 111.9530

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 232
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 262
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-14-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.4522679639632
  episode_reward_mean: 894.4288343280637
  episode_reward_min: 850.6354241807993
  episodes_this_iter: 10
  episodes_total: 7620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.215
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4505484104156494
        entropy_coeff: 0.0
        kl: 1.404789145453833e-05
        policy_loss: -0.0024324862752109766
        total_loss: 109.39305114746094
        vf_explained_var: 0.008806228637695312
        vf_loss: 109.3955078125
    load_time_ms: 4.021
    num_steps_sampled: 19050000
    num_steps_trained: 12484608
    sample_time_ms: 29943.877
    update_time_ms: 3.518
  iterations_since_restore: 762
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 241
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 256
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-16-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.4522679639632
  episode_reward_mean: 891.5942638317389
  episode_reward_min: 840.6079467224331
  episodes_this_iter: 10
  episodes_total: 7670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.667
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4533569812774658
        entropy_coeff: 0.0
        kl: 1.2093623809050769e-05
        policy_loss: 0.0003654877655208111
        total_loss: 106.3953857421875
        vf_explained_var: 0.011006712913513184
        vf_loss: 106.39498901367188
    load_time_ms: 4.334
    num_steps_sampled: 19175000
    num_steps_trained: 12566528
    sample_time_ms: 29949.765
    update_time_ms: 3.4
  iterations_since_restore: 767
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 240
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-19-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 929.3151497375172
  episode_reward_mean: 892.8087459613769
  episode_reward_min: 837.3379713015655
  episodes_this_iter: 10
  episodes_total: 7720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.919
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4498207569122314
        entropy_coeff: 0.0
        kl: 1.900794086395763e-05
        policy_loss: -0.0005752237048000097
        total_loss: 108.80987548828125
        vf_explained_var: 0.008967161178588867
        vf_loss: 108.81045532226562
    load_time_ms: 4.348
    num_steps_sampled: 19300000
    num_steps_trained: 12648448
    sample_time_ms: 29947.774
    update_time_ms: 3.473
  iterations_since_restore: 772
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 226
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 231
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-21-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.5049116991736
  episode_reward_mean: 892.5497189450251
  episode_reward_min: 837.3379713015655
  episodes_this_iter: 10
  episodes_total: 7770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.057
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4572360515594482
        entropy_coeff: 0.0
        kl: 3.7782556319143623e-06
        policy_loss: 0.0011447207070887089
        total_loss: 108.95221710205078
        vf_explained_var: 0.009687542915344238
        vf_loss: 108.95103454589844
    load_time_ms: 4.205
    num_steps_sampled: 19425000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 269
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-24-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.6962268172347
  episode_reward_mean: 890.2731070305463
  episode_reward_min: 824.0591503970862
  episodes_this_iter: 10
  episodes_total: 7820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.623
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4458496570587158
        entropy_coeff: 0.0
        kl: 2.7243037038715556e-05
        policy_loss: 0.0012462171725928783
        total_loss: 110.21320343017578
        vf_explained_var: 0.008632421493530273
        vf_loss: 110.21194458007812
    load_time_ms: 4.017
    num_steps_sampled: 19550000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 223
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 227
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 259
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-26-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 949.8847940705446
  episode_reward_mean: 893.0078422228748
  episode_reward_min: 824.0591503970862
  episodes_this_iter: 10
  episodes_total: 7870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.294
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4399244785308838
        entropy_coeff: 0.0
        kl: 1.2308795703575015e-05
        policy_loss: -0.004429616965353489
        total_loss: 113.09461975097656
        vf_explained_var: 0.007854282855987549
        vf_loss: 113.09907531738281
    load_time_ms: 3.988
    num_steps_sampled: 19675000
    num_ste

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 224
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 232
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 249
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-29-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 949.8847940705446
  episode_reward_mean: 894.2429943899529
  episode_reward_min: 852.5553845260029
  episodes_this_iter: 10
  episodes_total: 7920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.208
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.438624620437622
        entropy_coeff: 0.0
        kl: 3.6554971302393824e-05
        policy_loss: -0.0030717789195477962
        total_loss: 110.125

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 264
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 236
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-31-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 945.7547407671155
  episode_reward_mean: 892.3099764911544
  episode_reward_min: 830.7210410674484
  episodes_this_iter: 10
  episodes_total: 7970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.357
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4428291320800781
        entropy_coeff: 0.0
        kl: 4.691234425990842e-05
        policy_loss: -1.9917264580726624e-05
        total_loss: 109.13876342773438
        vf_explained_var: 0.0075647830963134766
        vf_loss: 109.1387939453125
    load_time_ms: 3.981
    num_steps_sampled: 19925000
    num_steps_trained: 13058048
    sample_time_ms: 29971.992
    update_time_ms: 3.476
  iterations_since_restore: 797
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 257
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-34-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.7351209078472
  episode_reward_mean: 891.598462563482
  episode_reward_min: 820.6683109062135
  episodes_this_iter: 10
  episodes_total: 8020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.277
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4424288272857666
        entropy_coeff: 0.0
        kl: 2.8372625820338726e-05
        policy_loss: -0.0029073553159832954
        total_loss: 106.34100341796875
        vf_explained_var: 0.008783817291259766
        vf_loss: 106.34391784667969
    load_time_ms: 3.848
    num_steps_sampled: 20050000
    num_steps_trained: 13139968
    sample_time_ms: 29970.102
    update_time_ms: 3.606
  iterations_since_restore: 802
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 222
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-36-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.7351209078472
  episode_reward_mean: 890.9290837194297
  episode_reward_min: 820.6683109062135
  episodes_this_iter: 10
  episodes_total: 8070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.174
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.445044994354248
        entropy_coeff: 0.0
        kl: 1.120503293350339e-05
        policy_loss: -0.0032871016301214695
        total_loss: 109.95149230957031
        vf_explained_var: 0.005578815937042236
        vf_loss: 109.9547348022461
    load_time_ms: 3.876
    num_steps_sampled: 20175000
    num_steps

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-39-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 929.0451859134516
  episode_reward_mean: 892.7512111171006
  episode_reward_min: 817.3742965284765
  episodes_this_iter: 10
  episodes_total: 8120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.159
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4478137493133545
        entropy_coeff: 0.0
        kl: 4.029880074085668e-06
        policy_loss: 0.0011525051668286324
        total_loss: 108.9035

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-42-21
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 929.0451859134516
  episode_reward_mean: 892.0287657688463
  episode_reward_min: 817.3742965284765
  episodes_this_iter: 10
  episodes_total: 8170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 56.429
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4487800598144531
        entropy_coeff: 0.0
        kl: 7.565369742223993e-06
        policy_loss: 0.0008207423379644752
        total_loss: 109.82176208496094
        vf_explained_var: 0.0063735246658325195
        vf_loss: 109.82093048095703
    load_time_ms: 4.885
    num_steps_sampled: 20425000
    num_steps_trained: 13385728
    sample_time_ms: 34258.57
    update_time_ms: 4.173
  iterations_since_restore: 817
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 259
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 228
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-45-21
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 930.4231704955296
  episode_reward_mean: 889.3306688504911
  episode_reward_min: 842.9937826396178
  episodes_this_iter: 10
  episodes_total: 8220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 63.099
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4543001651763916
        entropy_coeff: 0.0
        kl: 7.08504012436606e-06
        policy_loss: 0.004021816421300173
        total_loss: 109.087631

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-48-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 930.4231704955296
  episode_reward_mean: 890.6108964238795
  episode_reward_min: 839.9323207666584
  episodes_this_iter: 10
  episodes_total: 8270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 61.542
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4560338258743286
        entropy_coeff: 0.0
        kl: 2.9740738682448864e-05
        policy_loss: -0.0046554752625525
        total_loss: 107.94534301757812
        vf_explained_var: 0.006886184215545654
        vf_loss: 107.95001220703125
    load_time_ms: 5.241
    num_steps_sampled: 20675000
    num_steps

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 256
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-50-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 926.7715478561959
  episode_reward_mean: 890.9521203788292
  episode_reward_min: 839.9323207666584
  episodes_this_iter: 10
  episodes_total: 8320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 52.648
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4504449367523193
        entropy_coeff: 0.0
        kl: 3.662767994683236e-05
        policy_loss: -0.00581996887922287
        total_loss: 108.81353759765625
        vf_explained_var: 0.005042076110839844
        vf_loss: 108.81938171386719
    load_time_ms: 4.648
    num_steps_sampled: 20800000
    num_steps

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-53-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 916.0731062286312
  episode_reward_mean: 887.9001464145545
  episode_reward_min: 840.4944554281338
  episodes_this_iter: 10
  episodes_total: 8370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.715
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4635124206542969
        entropy_coeff: 0.0
        kl: 2.569067873992026e-07
        policy_loss: -0.003789836773648858
        total_loss: 108.30436706542969
        vf_explained_var: 0.005074799060821533
        vf_loss: 108.30816650390625
    load_time_ms: 4.166
    num_steps_sampled: 20925000
    num_steps_trained: 13713408
    sample_time_ms: 29980.454
    update_time_ms: 3.525
  iterations_since_restore: 837
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 229
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-55-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 916.7289586467899
  episode_reward_mean: 886.9481711525576
  episode_reward_min: 840.4944554281338
  episodes_this_iter: 10
  episodes_total: 8420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.155
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4614695310592651
        entropy_coeff: 0.0
        kl: 1.685671304585412e-05
        policy_loss: -0.002398415468633175
        total_loss: 107.97167205810547
        vf_explained_var: 0.004780709743499756
        vf_loss: 107.97405242919922
    load_time_ms: 4.189
    num_steps_sampled: 21050000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 251
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_20-58-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 920.1199821099092
  episode_reward_mean: 887.7388827671671
  episode_reward_min: 837.1961764137271
  episodes_this_iter: 10
  episodes_total: 8470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.765
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.460424542427063
        entropy_coeff: 0.0
        kl: 4.0920174797065556e-05
        policy_loss: -0.005389563739299774
        total_loss: 106.13841247558594
        vf_explained_var: 0.00546950101852417
        vf_loss: 106.143798828125
    load_time_ms: 3.868
    num_steps_sampled: 21175000
    num_steps_trained: 13877248
    sample_time_ms: 30038.59
    update_time_ms: 3.257
  iterations_since_restore: 847
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 239
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-00-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 920.1199821099092
  episode_reward_mean: 886.8877946966466
  episode_reward_min: 837.1961764137271
  episodes_this_iter: 10
  episodes_total: 8520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.47
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4554133415222168
        entropy_coeff: 0.0
        kl: 2.1227431716397405e-05
        policy_loss: -0.0026914402842521667
        total_loss: 108.37378692626953
        vf_explained_var: 0.00476759672164917
        vf_loss: 108.3764877319336
    load_time_ms: 3.981
    num_steps_sampled: 21300000
    num_steps

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-03-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.0769001785105
  episode_reward_mean: 886.0106116777429
  episode_reward_min: 851.7956657670172
  episodes_this_iter: 10
  episodes_total: 8570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.193
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.445845603942871
        entropy_coeff: 0.0
        kl: 1.2849563063355163e-05
        policy_loss: 0.004029836505651474
        total_loss: 106.70097351074219
        vf_explained_var: 0.004614889621734619
        vf_loss: 106.69693756103516
    load_time_ms: 4.191
    num_steps_sampled: 21425000
    num_steps

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 260
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 231
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 256
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-05-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.0769001785105
  episode_reward_mean: 886.1195453994125
  episode_reward_min: 851.7956657670172
  episodes_this_iter: 10
  episodes_total: 8620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.134
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4358851909637451
        entropy_coeff: 0.0
        kl: 1.7813999875215814e-05
        policy_loss: -0.0033553713001310825
        total_loss: 106.46

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 239
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 258
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-08-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 914.7166660721866
  episode_reward_mean: 885.8374652074395
  episode_reward_min: 847.1308383424747
  episodes_this_iter: 10
  episodes_total: 8670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.411
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4500597715377808
        entropy_coeff: 0.0
        kl: 1.3671928172698244e-05
        policy_loss: 0.0064080567099153996
        total_loss: 108.150

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 247
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-10-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 914.7166660721866
  episode_reward_mean: 884.2033986283728
  episode_reward_min: 847.1308383424747
  episodes_this_iter: 10
  episodes_total: 8720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.659
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4589682817459106
        entropy_coeff: 0.0
        kl: 1.3507160474546254e-05
        policy_loss: 0.0004938615020364523
        total_loss: 107.24642944335938
        vf_explained_var: 0.004094958305358887
        vf_loss: 107.24596405029297
    load_time_ms: 4.62
    num_steps_sampled: 21800000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-13-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 913.0983015571404
  episode_reward_mean: 882.0687779584991
  episode_reward_min: 847.1451122689512
  episodes_this_iter: 10
  episodes_total: 8770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.369
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4687273502349854
        entropy_coeff: 0.0
        kl: 1.1735537555068731e-06
        policy_loss: 0.0008133207447826862
        total_loss: 105.68231964111328
        vf_explained_var: 0.00412595272064209
        vf_loss: 105.68150329589844
    load_time_ms: 4.883
    num_steps_sampled: 21925000
    num_step

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 242
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 255
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-15-47
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 913.0983015571404
  episode_reward_mean: 884.2552784149661
  episode_reward_min: 847.66267512364
  episodes_this_iter: 10
  episodes_total: 8820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.254
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4616541862487793
        entropy_coeff: 0.0
        kl: 2.7128917281515896e-06
        policy_loss: -0.006448336876928806
        total_loss: 107.24794006347656
        vf_explained_var: 0.004291832447052002
        vf_loss: 107.25439453125
    load_time_ms: 4.224
    num_steps_sampled: 22050000
    num_steps_tr

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 236
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-18-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 921.9159638540633
  episode_reward_mean: 885.857226486914
  episode_reward_min: 830.3075550837123
  episodes_this_iter: 10
  episodes_total: 8870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.004
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4709171056747437
        entropy_coeff: 0.0
        kl: 2.4826622393447906e-05
        policy_loss: -0.003935860935598612
        total_loss: 108.1826

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-20-49
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 921.9159638540633
  episode_reward_mean: 886.9604213863164
  episode_reward_min: 830.3075550837123
  episodes_this_iter: 10
  episodes_total: 8920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.91
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4661146402359009
        entropy_coeff: 0.0
        kl: 1.2223463272675872e-05
        policy_loss: -0.003523557912558317
        total_loss: 106.31452941894531
        vf_explained_var: 0.004429042339324951
        vf_loss: 106.31803894042969
    load_time_ms: 4.277
    num_steps_sampled: 22300000
    num_steps_trained: 14614528
    sample_time_ms: 30136.948
    update_time_ms: 3.656
  iterations_since_restore: 892
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 252
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 221
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-23-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 927.7589656108134
  episode_reward_mean: 889.1260824837545
  episode_reward_min: 851.8473250292606
  episodes_this_iter: 10
  episodes_total: 8970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.988
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4694445133209229
        entropy_coeff: 0.0
        kl: 4.7440709749935195e-05
        policy_loss: -0.00022571836598217487
        total_loss: 107.54662322998047
        vf_explained_var: 0.004324555397033691
        vf_loss: 107.54685974121094
    load_time_ms: 3.951
    num_steps_sampled: 22425000
    num_steps_trained: 14696448
    sample_time_ms: 30140.715
    update_time_ms: 3.503
  iterations_since_restore: 897
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 251
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 242
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 262
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-25-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 927.7589656108134
  episode_reward_mean: 892.2397205390479
  episode_reward_min: 851.8473250292606
  episodes_this_iter: 10
  episodes_total: 9020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.142
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.466025710105896
        entropy_coeff: 0.0
        kl: 1.3287739420775324e-05
        policy_loss: 0.002642042236402631
        total_loss: 110.47323608398438
        vf_explained_var: 0.0033844709396362305
        vf_loss: 110.47061157226562
    load_time_ms: 3.952
    num_steps_sampled: 22550000
    num_steps_trained: 14778368
    sample_time_ms: 30117.163
    update_time_ms: 3.389
  iterations_since_restore: 902
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 224
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-28-21
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 929.1704756277135
  episode_reward_mean: 894.1719100898682
  episode_reward_min: 845.8053320218587
  episodes_this_iter: 10
  episodes_total: 9070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.398
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.463438868522644
        entropy_coeff: 0.0
        kl: 1.0581552487565205e-05
        policy_loss: 0.006769713945686817
        total_loss: 109.77338

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 261
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 264
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 226
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 263
[2m[36m(pid=4074)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-30-53
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 935.6089420398645
  episode_reward_mean: 895.4539862956801
  episode_reward_min: 845.8053320218587
  episodes_this_iter: 10
  episodes_total: 9120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.924
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.460524082183838
        entropy_coeff: 0.0
        kl: 3.302357072243467e-06
        policy_loss: -0.005709449760615826
        total_loss: 111.48435

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 233
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-33-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 935.6089420398645
  episode_reward_mean: 894.9537352146311
  episode_reward_min: 855.7579733100382
  episodes_this_iter: 10
  episodes_total: 9170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.659
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4587879180908203
        entropy_coeff: 0.0
        kl: 1.829044776968658e-05
        policy_loss: 0.004774793051183224
        total_loss: 110.66624450683594
        vf_explained_var: 0.004021048545837402
        vf_loss: 110.66146850585938
    load_time_ms: 4.208
    num_steps_sampled: 22925000
    num_steps

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 246
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-35-54
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.7387091498941
  episode_reward_mean: 896.4523553391824
  episode_reward_min: 848.6186698255754
  episodes_this_iter: 10
  episodes_total: 9220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.509
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4545445442199707
        entropy_coeff: 0.0
        kl: 4.2895735532511026e-06
        policy_loss: 0.010256798006594181
        total_loss: 109.7249

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 242
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 225
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-38-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.7387091498941
  episode_reward_mean: 899.1563114047706
  episode_reward_min: 834.5173429067794
  episodes_this_iter: 10
  episodes_total: 9270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.666
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.462737798690796
        entropy_coeff: 0.0
        kl: 2.05271462618839e-05
        policy_loss: 0.00686594657599926
        total_loss: 112.54955291

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 250
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 256
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 223
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-40-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.9882964528115
  episode_reward_mean: 898.9574411893642
  episode_reward_min: 819.8597817162939
  episodes_this_iter: 10
  episodes_total: 9320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.831
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4613251686096191
        entropy_coeff: 0.0
        kl: 4.896366590401158e-06
        policy_loss: 0.0009204751113429666
        total_loss: 112.7311

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 268
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 237
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 220
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-43-27
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.9882964528115
  episode_reward_mean: 900.6163887753905
  episode_reward_min: 819.8597817162939
  episodes_this_iter: 10
  episodes_total: 9370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.788
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.465512990951538
        entropy_coeff: 0.0
        kl: 9.987576049752533e-06
        policy_loss: -0.005014277063310146
        total_loss: 111.7080078125
        vf_explained_var: 0.003749072551727295
        vf_loss: 111.71298217773438
    load_time_ms: 3.996
    num_steps_sampled: 23425000
    num_steps_trained: 15351808
    sample_time_ms: 30095.481
    update_time_ms: 3.687
  iterations_since_restore: 937
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 253
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 262
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-45-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 953.6909140619377
  episode_reward_mean: 901.0269505396354
  episode_reward_min: 857.647239701258
  episodes_this_iter: 10
  episodes_total: 9420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.684
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.465626835823059
        entropy_coeff: 0.0
        kl: 5.375695764087141e-07
        policy_loss: -0.0009990772232413292
        total_loss: 111.93147277832031
        vf_explained_var: 0.003832697868347168
        vf_loss: 111.93246459960938
    load_time_ms: 4.071
    num_steps_sampled: 23550000
    num_steps_trained: 15433728
    sample_time_ms: 30075.891
    update_time_ms: 3.628
  iterations_since_restore: 942
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 257
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-48-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 953.6909140619377
  episode_reward_mean: 899.067456413396
  episode_reward_min: 846.027328744214
  episodes_this_iter: 10
  episodes_total: 9470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.082
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4697084426879883
        entropy_coeff: 0.0
        kl: 4.128972068428993e-05
        policy_loss: 0.004571081139147282
        total_loss: 108.2684326171875
        vf_explained_var: 0.004965245723724365
        vf_loss: 108.26384735107422
    load_time_ms: 3.905
    num_steps_sampled: 23675000
    num_steps_trained: 15515648
    sample_time_ms: 30068.222
    update_time_ms: 3.399
  iterations_since_restore: 947
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 228
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 257
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 254
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-50-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.778019224543
  episode_reward_mean: 894.9670843061949
  episode_reward_min: 841.0692000953142
  episodes_this_iter: 10
  episodes_total: 9520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.904
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4709299802780151
        entropy_coeff: 0.0
        kl: 2.390897861914709e-06
        policy_loss: 0.005770962685346603
        total_loss: 109.02546691894531
        vf_explained_var: 0.004007875919342041
        vf_loss: 109.01972961425781
    load_time_ms: 3.773
    num_steps_sampled: 23800000
    num_steps_trained: 15597568
    sample_time_ms: 30084.552
    update_time_ms: 3.518
  iterations_since_restore: 952
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 247
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-53-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.8214250671315
  episode_reward_mean: 895.6162849055542
  episode_reward_min: 841.0692000953142
  episodes_this_iter: 10
  episodes_total: 9570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.739
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.471621036529541
        entropy_coeff: 0.0
        kl: 2.165419573429972e-06
        policy_loss: -0.0022974847815930843
        total_loss: 110.91526794433594
        vf_explained_var: 0.0030297040939331055
        vf_loss: 110.91758728027344
    load_time_ms: 5.645
    num_steps_sampled: 23925000
    num_steps_trained: 15679488
    sample_time_ms: 30089.737
    update_time_ms: 3.648
  iterations_since_restore: 957
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 247
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-56-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.8214250671315
  episode_reward_mean: 897.6134813306011
  episode_reward_min: 848.3614952322106
  episodes_this_iter: 10
  episodes_total: 9620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.835
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4683830738067627
        entropy_coeff: 0.0
        kl: 9.49691457208246e-06
        policy_loss: -0.0006592022255063057
        total_loss: 112.825439453125
        vf_explained_var: 0.0032269954681396484
        vf_loss: 112.82608795166016
    load_time_ms: 5.688
    num_steps_sampled: 24050000
    num_steps

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 261
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_21-58-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.9095966495048
  episode_reward_mean: 896.1842067162835
  episode_reward_min: 848.3614952322106
  episodes_this_iter: 10
  episodes_total: 9670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.828
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4630835056304932
        entropy_coeff: 0.0
        kl: 6.966882210690528e-06
        policy_loss: 0.004836151842027903
        total_loss: 108.96778106689453
        vf_explained_var: 0.003500819206237793
        vf_loss: 108.96290588378906
    load_time_ms: 4.12
    num_steps_sampled: 24175000
    num_steps_trained: 15843328
    sample_time_ms: 30159.933
    update_time_ms: 3.607
  iterations_since_restore: 967
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-01-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.9095966495048
  episode_reward_mean: 895.2476750521852
  episode_reward_min: 848.5749973196725
  episodes_this_iter: 10
  episodes_total: 9720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.873
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4682910442352295
        entropy_coeff: 0.0
        kl: 6.234058673726395e-06
        policy_loss: 0.006692764814943075
        total_loss: 109.40255737304688
        vf_explained_var: 0.0034924745559692383
        vf_loss: 109.39585876464844
    load_time_ms: 4.233
    num_steps_sampled: 24300000
    num_step

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 239
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 222
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-03-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 936.256101801468
  episode_reward_mean: 895.6433374927932
  episode_reward_min: 851.100050524138
  episodes_this_iter: 10
  episodes_total: 9770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.711
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4617390632629395
        entropy_coeff: 0.0
        kl: 3.070977982133627e-06
        policy_loss: 0.0036837677471339703
        total_loss: 111.76783752441406
        vf_explained_var: 0.0026755332946777344
        vf_loss: 111.76414489746094
    load_time_ms: 4.142
    num_steps_sampled: 24425000
    num_steps

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 255
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 249
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 249
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-06-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.9926554216403
  episode_reward_mean: 894.1780775392633
  episode_reward_min: 837.1998081589654
  episodes_this_iter: 10
  episodes_total: 9820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.316
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.456777811050415
        entropy_coeff: 0.0
        kl: 7.5936768553219736e-06
        policy_loss: 0.0026436427142471075
        total_loss: 109.0953

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 233
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 226
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-08-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 940.9926554216403
  episode_reward_mean: 893.55441927676
  episode_reward_min: 837.1998081589654
  episodes_this_iter: 10
  episodes_total: 9870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.109
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4605567455291748
        entropy_coeff: 0.0
        kl: 1.0309078788850456e-05
        policy_loss: 0.002635571174323559
        total_loss: 111.28988647460938
        vf_explained_var: 0.0033189058303833008
        vf_loss: 111.2872543334961
    load_time_ms: 4.546
    num_steps_sampled: 24675000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 242
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-11-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.989686510229
  episode_reward_mean: 897.1159813184793
  episode_reward_min: 856.7172179351668
  episodes_this_iter: 10
  episodes_total: 9920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.399
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4600841999053955
        entropy_coeff: 0.0
        kl: 2.7069967472925782e-05
        policy_loss: 0.0038531559985131025
        total_loss: 108.8291

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 240
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 231
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-13-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.989686510229
  episode_reward_mean: 897.2963457404345
  episode_reward_min: 849.8017506888262
  episodes_this_iter: 10
  episodes_total: 9970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.226
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4563394784927368
        entropy_coeff: 0.0
        kl: 2.2684482246404514e-05
        policy_loss: -0.002484634518623352
        total_loss: 110.8426

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 257
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-16-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.7875835801906
  episode_reward_mean: 896.5895343956158
  episode_reward_min: 835.0897069275716
  episodes_this_iter: 10
  episodes_total: 10020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.418
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4566503763198853
        entropy_coeff: 0.0
        kl: 1.2296677596168593e-05
        policy_loss: -0.0023963197600096464
        total_loss: 111.4486083984375
        vf_explained_var: 0.0025101304054260254
        vf_loss: 111.45096588134766
    load_time_ms: 4.131
    num_steps_sampled: 25050000
    num_s

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 228
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 249
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 259
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-18-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.7875835801906
  episode_reward_mean: 896.5656176020467
  episode_reward_min: 835.0897069275716
  episodes_this_iter: 10
  episodes_total: 10070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.709
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.468174934387207
        entropy_coeff: 0.0
        kl: 4.249798075761646e-05
        policy_loss: 0.0032711385283619165
        total_loss: 109.5625

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 259
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 258
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-21-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.9687975388247
  episode_reward_mean: 896.9140359307156
  episode_reward_min: 865.6488459852029
  episodes_this_iter: 10
  episodes_total: 10120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.657
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4680275917053223
        entropy_coeff: 0.0
        kl: 4.873487341683358e-06
        policy_loss: -0.004552221391350031
        total_loss: 110.667

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 246
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 252
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-23-43
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 939.9687975388247
  episode_reward_mean: 894.863227190574
  episode_reward_min: 821.9354753439466
  episodes_this_iter: 10
  episodes_total: 10170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.866
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.467738151550293
        entropy_coeff: 0.0
        kl: 1.4273609849624336e-07
        policy_loss: -0.01042267121374607
        total_loss: 110.4568099975586
        vf_explained_var: 0.0030802488327026367
        vf_loss: 110.46722412109375
    load_time_ms: 4.117
    num_steps_sampled: 25425000
    num_steps_trained: 16662528
    sample_time_ms: 30134.103
    update_time_ms: 3.545
  iterations_since_restore: 1017
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 231
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 265
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-26-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.3381197348152
  episode_reward_mean: 893.1146452082364
  episode_reward_min: 821.9354753439466
  episodes_this_iter: 10
  episodes_total: 10220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.744
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.463978886604309
        entropy_coeff: 0.0
        kl: 2.196164132328704e-06
        policy_loss: -0.0007959723006933928
        total_loss: 110.42098999023438
        vf_explained_var: 0.002663135528564453
        vf_loss: 110.42178344726562
    load_time_ms: 4.011
    num_steps_sampled: 25550000
    num_steps_trained: 16744448
    sample_time_ms: 30103.949
    update_time_ms: 3.675
  iterations_since_restore: 1022
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 239
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-28-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.3381197348152
  episode_reward_mean: 895.2427402731445
  episode_reward_min: 849.1367416402207
  episodes_this_iter: 10
  episodes_total: 10270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.156
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4704469442367554
        entropy_coeff: 0.0
        kl: 8.342936780536547e-06
        policy_loss: -0.00048578158020973206
        total_loss: 110.6

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 270
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 251
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 258
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 232
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-31-15
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.4234899826213
  episode_reward_mean: 897.2090546034997
  episode_reward_min: 846.778347645467
  episodes_this_iter: 10
  episodes_total: 10320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.346
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4744064807891846
        entropy_coeff: 0.0
        kl: 1.6709451301721856e-05
        policy_loss: 0.0024537513963878155
        total_loss: 110.343

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 222
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 228
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-33-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 947.1431508105848
  episode_reward_mean: 899.2377853949366
  episode_reward_min: 846.778347645467
  episodes_this_iter: 10
  episodes_total: 10370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.949
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.477766752243042
        entropy_coeff: 0.0
        kl: 1.6437217709608376e-05
        policy_loss: 0.004837939515709877
        total_loss: 110.10484

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 238
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-36-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 947.1431508105848
  episode_reward_mean: 896.2422060149956
  episode_reward_min: 846.7838964409916
  episodes_this_iter: 10
  episodes_total: 10420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.908
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4733641147613525
        entropy_coeff: 0.0
        kl: 3.84942686650902e-05
        policy_loss: 0.0025238459929823875
        total_loss: 108.84012603759766
        vf_explained_var: 0.002279222011566162
        vf_loss: 108.83761596679688
    load_time_ms: 4.047
    num_steps_sampled: 26050000
    num_steps_trained: 17072128
    sample_time_ms: 30130.854
    update_time_ms: 3.487
  iterations_since_restore: 1042
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 226
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-38-49
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.6299554135156
  episode_reward_mean: 894.0573674584417
  episode_reward_min: 855.6842025610633
  episodes_this_iter: 10
  episodes_total: 10470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.979
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4666571617126465
        entropy_coeff: 0.0
        kl: 2.3436474293703213e-05
        policy_loss: -0.006312631536275148
        total_loss: 109.35165405273438
        vf_explained_var: 0.002385854721069336
        vf_loss: 109.35794830322266
    load_time_ms: 4.117
    num_steps_sampled: 26175000
    num_steps_trained: 17154048
    sample_time_ms: 30172.415
    update_time_ms: 3.42
  iterations_since_restore: 1047
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 233
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 222
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 250
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-41-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.0856026566114
  episode_reward_mean: 895.2639899540062
  episode_reward_min: 836.2932818247613
  episodes_this_iter: 10
  episodes_total: 10520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.761
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4701972007751465
        entropy_coeff: 0.0
        kl: 4.080964208696969e-05
        policy_loss: -0.004523596726357937
        total_loss: 110.4002914428711
        vf_explained_var: 0.002057015895843506
        vf_loss: 110.40480041503906
    load_time_ms: 4.069
    num_steps_sampled: 26300000
    num_steps_trained: 17235968
    sample_time_ms: 30163.997
    update_time_ms: 3.483
  iterations_since_restore: 1052
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 267
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 230
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-43-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 942.0856026566114
  episode_reward_mean: 893.8366226944985
  episode_reward_min: 836.2932818247613
  episodes_this_iter: 10
  episodes_total: 10570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.658
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.466890573501587
        entropy_coeff: 0.0
        kl: 2.710093031055294e-05
        policy_loss: -0.0051876395009458065
        total_loss: 107.61302947998047
        vf_explained_var: 0.002581775188446045
        vf_loss: 107.61819458007812
    load_time_ms: 3.999
    num_steps_sampled: 26425000
    num_ste

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 253
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 255
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 251
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 268
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 246
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-46-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 933.0041176063734
  episode_reward_mean: 892.3146958936314
  episode_reward_min: 843.5731825274324
  episodes_this_iter: 10
  episodes_total: 10620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.638
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.466705322265625
        entropy_coeff: 0.0
        kl: 8.499322575516999e-06
        policy_loss: -0.0006312481127679348
        total_loss: 109.087

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 239
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-48-54
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 933.7088933913936
  episode_reward_mean: 893.1707329286278
  episode_reward_min: 848.381927871806
  episodes_this_iter: 10
  episodes_total: 10670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.439
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.473682165145874
        entropy_coeff: 0.0
        kl: 7.890103006502613e-06
        policy_loss: 0.006811268627643585
        total_loss: 108.38981628417969
        vf_explained_var: 0.0019838809967041016
        vf_loss: 108.38299560546875
    load_time_ms: 3.88
    num_steps_sampled: 26675000
    num_steps_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 251
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 227
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 222
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-51-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 933.7088933913936
  episode_reward_mean: 892.4762324052505
  episode_reward_min: 856.7945055138488
  episodes_this_iter: 10
  episodes_total: 10720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.795
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4791903495788574
        entropy_coeff: 0.0
        kl: 1.5255671314662322e-05
        policy_loss: -0.006676746532320976
        total_loss: 108.47700500488281
        vf_explained_var: 0.0019662976264953613
        vf_loss: 108.48367309570312
    load_time_ms: 4.046
    num_steps_sampled: 26800000
    num_s

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-53-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 923.40650734752
  episode_reward_mean: 889.5412651167072
  episode_reward_min: 856.7945055138488
  episodes_this_iter: 10
  episodes_total: 10770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.76
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4811162948608398
        entropy_coeff: 0.0
        kl: 4.320565494708717e-06
        policy_loss: -0.0003388151526451111
        total_loss: 107.76701354980469
        vf_explained_var: 0.0018948912620544434
        vf_loss: 107.76734924316406
    load_time_ms: 4.089
    num_steps_sampled: 26925000
    num_steps_trained: 17645568
    sample_time_ms: 30186.125
    update_time_ms: 3.76
  iterations_since_restore: 1077
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_po

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 266
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 258
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 239
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-56-27
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 922.8219546759387
  episode_reward_mean: 888.7097215857045
  episode_reward_min: 858.7700058764935
  episodes_this_iter: 10
  episodes_total: 10820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.923
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4727556705474854
        entropy_coeff: 0.0
        kl: 5.398815483204089e-05
        policy_loss: 0.0047072176821529865
        total_loss: 109.07106018066406
        vf_explained_var: 0.0015498995780944824
        vf_loss: 109.0663833618164
    load_time_ms: 4.106
    num_steps_sampled: 27050000
    num_steps_trained: 17727488
    sample_time_ms: 30166.936
    update_time_ms: 3.465
  iterations_since_restore: 1082
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 247
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 257
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_22-59-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.7082077198991
  episode_reward_mean: 890.6804986350202
  episode_reward_min: 840.7540315360387
  episodes_this_iter: 10
  episodes_total: 10870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.703
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4730558395385742
        entropy_coeff: 0.0
        kl: 1.383026028634049e-05
        policy_loss: -0.004220335278660059
        total_loss: 107.674

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 243
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-01-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.7082077198991
  episode_reward_mean: 892.5522606215162
  episode_reward_min: 840.7540315360387
  episodes_this_iter: 10
  episodes_total: 10920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.218
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4710190296173096
        entropy_coeff: 0.0
        kl: 2.7434762159828097e-06
        policy_loss: -0.00011550867930054665
        total_loss: 109.22967529296875
        vf_explained_var: 0.0015541315078735352
        vf_loss: 109.22979736328125
    load_time_ms: 4.174
    num_steps_sampled: 27300000
    num

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 229
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-04-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.7988727681787
  episode_reward_mean: 891.2805052133189
  episode_reward_min: 854.2213101349022
  episodes_this_iter: 10
  episodes_total: 10970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.453
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4794117212295532
        entropy_coeff: 0.0
        kl: 9.187908290186897e-06
        policy_loss: -0.005104871932417154
        total_loss: 107.97209167480469
        vf_explained_var: 0.0016950368881225586
        vf_loss: 107.97720336914062
    load_time_ms: 4.157
    num_steps_sampled: 27425000
    num_steps_trained: 17973248
    sample_time_ms: 30138.64
    update_time_ms: 3.479
  iterations_since_restore: 1097
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 266
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 265
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 262
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 267
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 259
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-06-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 931.9191250513578
  episode_reward_mean: 891.3292596564161
  episode_reward_min: 851.5942559511468
  episodes_this_iter: 10
  episodes_total: 11020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.944
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4801535606384277
        entropy_coeff: 0.0
        kl: 1.4421031664824113e-05
        policy_loss: -0.00622634869068861
        total_loss: 109.910

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 230
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 263
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 229
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 236
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-09-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 934.5389114936692
  episode_reward_mean: 894.0959294241572
  episode_reward_min: 851.5942559511468
  episodes_this_iter: 10
  episodes_total: 11070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.219
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4817451238632202
        entropy_coeff: 0.0
        kl: 1.3569275324698538e-05
        policy_loss: -0.003379462519660592
        total_loss: 110.39244079589844
        vf_explained_var: 0.0015574097633361816
        vf_loss: 110.39580535888672
    load_time_ms: 4.103
    num_steps_sampled: 27675000
    num_steps_trained: 18137088
    sample_time_ms: 30194.061
    update_time_ms: 3.756
  iterations_since_restore: 1107
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 260
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 256
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-11-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 934.5389114936692
  episode_reward_mean: 891.6288662302018
  episode_reward_min: 842.084718096505
  episodes_this_iter: 10
  episodes_total: 11120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.012
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.48710298538208
        entropy_coeff: 0.0
        kl: 1.5082932804943994e-05
        policy_loss: -0.0011578514240682125
        total_loss: 107.32150268554688
        vf_explained_var: 0.0015996098518371582
        vf_loss: 107.32266235351562
    load_time_ms: 4.12
    num_steps_sampled: 27800000
    num_steps_trained: 18219008
    sample_time_ms: 30183.759
    update_time_ms: 3.669
  iterations_since_restore: 1112
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 230
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-14-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.398927184843
  episode_reward_mean: 889.7761391220841
  episode_reward_min: 842.084718096505
  episodes_this_iter: 10
  episodes_total: 11170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.679
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4930942058563232
        entropy_coeff: 0.0
        kl: 1.4128516340861097e-05
        policy_loss: -0.0034599686041474342
        total_loss: 108.0129623413086
        vf_explained_var: 0.0015916824340820312
        vf_loss: 108.01641845703125
    load_time_ms: 4.149
    num_steps_sampled: 27925000
    num_steps_trained: 18300928
    sample_time_ms: 30178.054
    update_time_ms: 3.692
  iterations_since_restore: 1117
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 269
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 225
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 247
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-16-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.398927184843
  episode_reward_mean: 890.2212793990565
  episode_reward_min: 849.9855265670635
  episodes_this_iter: 10
  episodes_total: 11220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.289
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.486161708831787
        entropy_coeff: 0.0
        kl: 3.943809133488685e-05
        policy_loss: 0.0016149573493748903
        total_loss: 108.82202911376953
        vf_explained_var: 0.0012983083724975586
        vf_loss: 108.82042694091797
    load_time_ms: 4.187
    num_steps_sampled: 28050000
    num_steps_trained: 18382848
    sample_time_ms: 30196.952
    update_time_ms: 3.427
  iterations_since_restore: 1122
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 249
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 249
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-19-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 927.2797591465187
  episode_reward_mean: 891.0879125829597
  episode_reward_min: 849.9855265670635
  episodes_this_iter: 10
  episodes_total: 11270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.257
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4863510131835938
        entropy_coeff: 0.0
        kl: 9.256862540496513e-06
        policy_loss: 0.0007451698184013367
        total_loss: 107.88817596435547
        vf_explained_var: 0.0015344619750976562
        vf_loss: 107.88743591308594
    load_time_ms: 4.077
    num_steps_sampled: 28175000
    num_st

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 255
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-21-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.8452107029412
  episode_reward_mean: 891.5134581498318
  episode_reward_min: 841.7062156981439
  episodes_this_iter: 10
  episodes_total: 11320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.874
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4887577295303345
        entropy_coeff: 0.0
        kl: 4.508685378823429e-06
        policy_loss: 0.0020916862413287163
        total_loss: 106.89347839355469
        vf_explained_var: 0.0014549493789672852
        vf_loss: 106.89137268066406
    load_time_ms: 3.944
    num_steps_sampled: 28300000
    num_st

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 235
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 263
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 227
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-24-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 932.8452107029412
  episode_reward_mean: 891.1858880902332
  episode_reward_min: 841.7062156981439
  episodes_this_iter: 10
  episodes_total: 11370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.337
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4791724681854248
        entropy_coeff: 0.0
        kl: 1.3145338016329333e-05
        policy_loss: 0.002422049641609192
        total_loss: 108.86583709716797
        vf_explained_var: 0.0012278556823730469
        vf_loss: 108.8634262084961
    load_time_ms: 4.001
    num_steps_sampled: 28425000
    num_steps_trained: 18628608
    sample_time_ms: 30219.606
    update_time_ms: 3.503
  iterations_since_restore: 1137
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 268
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 246
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 240
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 254
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-26-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 926.8614192645247
  episode_reward_mean: 894.00013035043
  episode_reward_min: 849.2454538127256
  episodes_this_iter: 10
  episodes_total: 11420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.447
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4642759561538696
        entropy_coeff: 0.0
        kl: 1.576249997015111e-05
        policy_loss: 0.003702439134940505
        total_loss: 108.11571502685547
        vf_explained_var: 0.0014157891273498535
        vf_loss: 108.11201477050781
    load_time_ms: 4.01
    num_steps_sampled: 28550000
    num_steps_trained: 18710528
    sample_time_ms: 30220.982
    update_time_ms: 3.381
  iterations_since_restore: 1142
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 220
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 269
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 259
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-29-15
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 936.6377795896617
  episode_reward_mean: 894.6915976418603
  episode_reward_min: 857.008621593516
  episodes_this_iter: 10
  episodes_total: 11470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.557
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4758011102676392
        entropy_coeff: 0.0
        kl: 1.4664990885648876e-05
        policy_loss: -0.006203575059771538
        total_loss: 109.86013793945312
        vf_explained_var: 0.0011271238327026367
        vf_loss: 109.86637878417969
    load_time_ms: 5.739
    num_steps_sampled: 28675000
    num_steps_trained: 18792448
    sample_time_ms: 30189.766
    update_time_ms: 3.613
  iterations_since_restore: 1147
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 227
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-31-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 936.6377795896617
  episode_reward_mean: 892.6024347946535
  episode_reward_min: 850.5862077053791
  episodes_this_iter: 10
  episodes_total: 11520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.228
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4734141826629639
        entropy_coeff: 0.0
        kl: 3.456461854511872e-06
        policy_loss: -0.0005771178985014558
        total_loss: 109.73283386230469
        vf_explained_var: 0.0012204647064208984
        vf_loss: 109.73339080810547
    load_time_ms: 5.999
    num_steps_sampled: 28800000
    num_steps_trained: 18874368
    sample_time_ms: 30157.75
    update_time_ms: 3.579
  iterations_since_restore: 1152
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 239
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 235
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 262
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 234
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-34-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 933.1361302930707
  episode_reward_mean: 892.813816804769
  episode_reward_min: 850.5862077053791
  episodes_this_iter: 10
  episodes_total: 11570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.549
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.467043399810791
        entropy_coeff: 0.0
        kl: 1.9864637579303235e-06
        policy_loss: 0.007978642359375954
        total_loss: 109.4501953125
        vf_explained_var: 0.0011892318725585938
        vf_loss: 109.44219207763672
    load_time_ms: 4.231
    num_steps_sampled: 28925000
    num_steps_tr

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 229
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 232
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 242
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 234
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 269
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-36-50
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 928.4549559559935
  episode_reward_mean: 893.7974343058254
  episode_reward_min: 862.9829455319192
  episodes_this_iter: 10
  episodes_total: 11620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.355
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.47258722782135
        entropy_coeff: 0.0
        kl: 4.1501931264065206e-05
        policy_loss: -0.0023713232949376106
        total_loss: 110.70848083496094
        vf_explained_var: 0.0010927915573120117
        vf_loss: 110.71089172363281
    load_time_ms: 4.06
    num_steps_sampled: 29050000
    num_steps_trained: 19038208
    sample_time_ms: 30236.789
    update_time_ms: 3.418
  iterations_since_restore: 1162
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 238
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 221
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 253
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-39-21
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 929.0767206657488
  episode_reward_mean: 895.3691976943172
  episode_reward_min: 862.7450333293174
  episodes_this_iter: 10
  episodes_total: 11670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.44
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4713200330734253
        entropy_coeff: 0.0
        kl: 1.4653993275715038e-05
        policy_loss: -0.0025103571824729443
        total_loss: 110.38084411621094
        vf_explained_var: 0.0010178685188293457
        vf_loss: 110.38336944580078
    load_time_ms: 4.099
    num_steps_sampled: 29175000
    num_s

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 246
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 233
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 234
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-41-52
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 930.1065910659765
  episode_reward_mean: 896.1762414563381
  episode_reward_min: 862.7450333293174
  episodes_this_iter: 10
  episodes_total: 11720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.272
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4673722982406616
        entropy_coeff: 0.0
        kl: 9.832729119807482e-08
        policy_loss: -0.00267479894682765
        total_loss: 108.70127868652344
        vf_explained_var: 0.0012244582176208496
        vf_loss: 108.7039794921875
    load_time_ms: 4.042
    num_steps_sampled: 29300000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 250
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 256
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 245
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-44-24
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 930.1065910659765
  episode_reward_mean: 894.4082272851634
  episode_reward_min: 852.3739181505848
  episodes_this_iter: 10
  episodes_total: 11770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.713
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4690783023834229
        entropy_coeff: 0.0
        kl: 1.7907404981087893e-06
        policy_loss: -0.0022736499086022377
        total_loss: 110.48814392089844
        vf_explained_var: 0.0011488795280456543
        vf_loss: 110.49041748046875
    load_time_ms: 4.14
    num_steps_sampled: 29425000
    num_steps_trained: 19283968
    sample_time_ms: 30209.618
    update_time_ms: 3.5
  iterations_since_restore: 1177
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 235
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 260
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 249
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 266
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 254
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-46-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.9393457921367
  episode_reward_mean: 893.575715034798
  episode_reward_min: 852.3739181505848
  episodes_this_iter: 10
  episodes_total: 11820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.299
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4642870426177979
        entropy_coeff: 0.0
        kl: 1.8244507373310626e-05
        policy_loss: -0.003380813170224428
        total_loss: 109.050

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 243
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 233
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 247
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-49-27
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.9393457921367
  episode_reward_mean: 891.9573014032408
  episode_reward_min: 854.0230276479418
  episodes_this_iter: 10
  episodes_total: 11870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.32
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4614238739013672
        entropy_coeff: 0.0
        kl: 1.0464467777637765e-05
        policy_loss: -0.0035624299198389053
        total_loss: 107.2874526977539
        vf_explained_var: 0.0010535717010498047
        vf_loss: 107.291015625
    load_time_ms: 4.052
    num_steps_sampled: 29675000
    num_steps_t

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 265
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 261
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 224
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-51-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 935.8204779423178
  episode_reward_mean: 890.6306237028418
  episode_reward_min: 848.6362113972958
  episodes_this_iter: 10
  episodes_total: 11920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.438
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.466447114944458
        entropy_coeff: 0.0
        kl: 1.4830988220637664e-05
        policy_loss: -0.003530866000801325
        total_loss: 109.229

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 240
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 252
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 245
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 241
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-54-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 915.789780086422
  episode_reward_mean: 890.5076333934783
  episode_reward_min: 848.6362113972958
  episodes_this_iter: 10
  episodes_total: 11970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.006
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4622975587844849
        entropy_coeff: 0.0
        kl: 1.3519813364837319e-05
        policy_loss: -0.004743038211017847
        total_loss: 108.691

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 269
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 261
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 250
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 226
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 246
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-57-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 915.6047900514735
  episode_reward_mean: 888.3674487028593
  episode_reward_min: 849.3447175354823
  episodes_this_iter: 10
  episodes_total: 12020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.113
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4596407413482666
        entropy_coeff: 0.0
        kl: 2.1212457795627415e-05
        policy_loss: -0.001249802066013217
        total_loss: 106.95

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 240
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 268
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 255
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 243
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-01_23-59-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 915.6047900514735
  episode_reward_mean: 888.3871443444467
  episode_reward_min: 849.3447175354823
  episodes_this_iter: 10
  episodes_total: 12070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.11
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.464489459991455
        entropy_coeff: 0.0
        kl: 9.788681927602738e-05
        policy_loss: -0.0015353788621723652
        total_loss: 108.29403686523438
        vf_explained_var: 0.0007625222206115723
        vf_loss: 108.2955551147461
    load_time_ms: 4.475
    num_steps_sampled: 30175000
    num_step

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 261
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 229
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 224
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-02-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.2724574547317
  episode_reward_mean: 890.0873431830812
  episode_reward_min: 863.9940172101193
  episodes_this_iter: 10
  episodes_total: 12120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.93
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4731065034866333
        entropy_coeff: 0.0
        kl: 6.810700142523274e-06
        policy_loss: -0.0036956253461539745
        total_loss: 109.17266082763672
        vf_explained_var: 0.0007653236389160156
        vf_loss: 109.17634582519531
    load_time_ms: 4.454
    num_steps_sampled: 30300000
    num_steps_trained: 19857408
    sample_time_ms: 30303.561
    update_time_ms: 3.755
  iterations_since_restore: 1212
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 255
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 225
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 258
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-04-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.2724574547317
  episode_reward_mean: 891.6174169922938
  episode_reward_min: 857.1408601425218
  episodes_this_iter: 10
  episodes_total: 12170
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.177
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4821321964263916
        entropy_coeff: 0.0
        kl: 2.47869138547685e-05
        policy_loss: -0.010208100080490112
        total_loss: 108.08425903320312
        vf_explained_var: 0.0010071992874145508
        vf_loss: 108.09449768066406
    load_time_ms: 4.291
    num_steps_sampled: 30425000
    num_steps_trained: 19939328
    sample_time_ms: 30238.496
    update_time_ms: 3.644
  iterations_since_restore: 1217
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 226
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 223
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 248
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 240
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-07-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 922.6207039978224
  episode_reward_mean: 891.0794320613159
  episode_reward_min: 837.2920441836424
  episodes_this_iter: 10
  episodes_total: 12220
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.667
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4926857948303223
        entropy_coeff: 0.0
        kl: 1.0805379133671522e-06
        policy_loss: -0.0033133362885564566
        total_loss: 110.2

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 244
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 266
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 221
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 228
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 251
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-09-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 922.2057861623272
  episode_reward_mean: 889.9505597777676
  episode_reward_min: 837.2920441836424
  episodes_this_iter: 10
  episodes_total: 12270
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.988
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4924983978271484
        entropy_coeff: 0.0
        kl: 3.883569297613576e-05
        policy_loss: -0.0027559897862374783
        total_loss: 107.48362731933594
        vf_explained_var: 0.0008466243743896484
        vf_loss: 107.48637390136719
    load_time_ms: 4.002
    num_steps_sampled: 30675000
    num_steps_trained: 20103168
    sample_time_ms: 30297.178
    update_time_ms: 3.524
  iterations_since_restore: 1227
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 262
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 222
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 240
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-12-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 922.6539167350901
  episode_reward_mean: 889.6792450169929
  episode_reward_min: 858.627513942131
  episodes_this_iter: 10
  episodes_total: 12320
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.204
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4922728538513184
        entropy_coeff: 0.0
        kl: 1.1196349078090861e-05
        policy_loss: -0.0010523684322834015
        total_loss: 107.84915924072266
        vf_explained_var: 0.0007156133651733398
        vf_loss: 107.85022735595703
    load_time_ms: 4.045
    num_steps_sampled: 30800000
    num_steps_trained: 20185088
    sample_time_ms: 30285.846
    update_time_ms: 3.639
  iterations_since_restore: 1232
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 226
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 242
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 260
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 270
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 244
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 260
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-14-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 922.6539167350901
  episode_reward_mean: 889.653045748248
  episode_reward_min: 858.627513942131
  episodes_this_iter: 10
  episodes_total: 12370
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.397
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4895074367523193
        entropy_coeff: 0.0
        kl: 2.2822892788099125e-05
        policy_loss: 0.00020274543203413486
        total_loss: 106.984

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 231
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 254
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 253
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 238
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 267
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-17-16
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 918.6333152816186
  episode_reward_mean: 889.9864830209973
  episode_reward_min: 862.5838296025619
  episodes_this_iter: 10
  episodes_total: 12420
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.007
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4954442977905273
        entropy_coeff: 0.0
        kl: 2.5598357751732692e-05
        policy_loss: 0.0016693423967808485
        total_loss: 106.76832580566406
        vf_explained_var: 0.0006453990936279297
        vf_loss: 106.76669311523438
    load_time_ms: 3.935
    num_steps_sampled: 31050000
    num_steps_trained: 20348928
    sample_time_ms: 30257.924
    update_time_ms: 3.503
  iterations_since_restore: 1242
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 269
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 244
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 254
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 225
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 248
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 268
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-19-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 918.6333152816186
  episode_reward_mean: 891.3836844664195
  episode_reward_min: 857.8766276517254
  episodes_this_iter: 10
  episodes_total: 12470
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.81
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.495627522468567
        entropy_coeff: 0.0
        kl: 2.8819587896578014e-05
        policy_loss: -0.0009105151984840631
        total_loss: 109.182

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 234
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 266
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 266
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 227
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 241
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 228
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 265
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-22-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 923.8440465237662
  episode_reward_mean: 890.8574025424499
  episode_reward_min: 857.8766276517254
  episodes_this_iter: 10
  episodes_total: 12520
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.29
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.485556960105896
        entropy_coeff: 0.0
        kl: 1.367459844914265e-05
        policy_loss: 0.0028321496210992336
        total_loss: 105.99662

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 223
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 252
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 261
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 237
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-24-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 923.8440465237662
  episode_reward_mean: 888.188743506563
  episode_reward_min: 863.4040436164996
  episodes_this_iter: 10
  episodes_total: 12570
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.902
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4852197170257568
        entropy_coeff: 0.0
        kl: 5.729443728341721e-05
        policy_loss: -0.001582837663590908
        total_loss: 106.8359

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 257
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 245
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 242
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 263
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 230
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-27-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.0409911891516
  episode_reward_mean: 887.781143380037
  episode_reward_min: 857.7560707062855
  episodes_this_iter: 10
  episodes_total: 12620
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.602
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4911555051803589
        entropy_coeff: 0.0
        kl: 1.3589840818895027e-05
        policy_loss: -0.003125135088339448
        total_loss: 108.06868743896484
        vf_explained_var: 0.0006009936332702637
        vf_loss: 108.07182312011719
    load_time_ms: 4.052
    num_steps_sampled: 31550000
    num_st

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 248
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 227
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 255
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 233
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 256
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-29-55
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 915.3529988499025
  episode_reward_mean: 888.6581135865407
  episode_reward_min: 857.7560707062855
  episodes_this_iter: 10
  episodes_total: 12670
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.647
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4873043298721313
        entropy_coeff: 0.0
        kl: 4.2318646592320874e-05
        policy_loss: 0.002874040976166725
        total_loss: 107.80144500732422
        vf_explained_var: 0.0005486011505126953
        vf_loss: 107.798583984375
    load_time_ms: 4.095
    num_steps_sampled: 31675000
    num_steps_trained: 20758528
    sample_time_ms: 30254.096
    update_time_ms: 3.915
  iterations_since_restore: 1267
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 223
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 262
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 239
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 265
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 241
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 270
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 232
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-32-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 915.3529988499025
  episode_reward_mean: 888.9155964668226
  episode_reward_min: 858.6051835349166
  episodes_this_iter: 10
  episodes_total: 12720
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.704
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4827523231506348
        entropy_coeff: 0.0
        kl: 2.621032763272524e-06
        policy_loss: -0.0012992528500035405
        total_loss: 107.91

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 233
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 242
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 245
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 243
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 220
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 264
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 264
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-34-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 911.2693216039746
  episode_reward_mean: 886.5771958856614
  episode_reward_min: 839.8407281156909
  episodes_this_iter: 10
  episodes_total: 12770
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.924
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4941470623016357
        entropy_coeff: 0.0
        kl: 2.5556870241416618e-05
        policy_loss: -0.0020049321465194225
        total_loss: 106.1

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 245
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 248
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 248
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 266
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 263
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 253
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 263
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 239
[2m[36m(pid=4073)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-37-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 911.2693216039746
  episode_reward_mean: 885.6869671318868
  episode_reward_min: 839.8407281156909
  episodes_this_iter: 10
  episodes_total: 12820
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.33
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4878813028335571
        entropy_coeff: 0.0
        kl: 5.047630475019105e-05
        policy_loss: 0.006403600797057152
        total_loss: 106.92420

[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 237
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 269
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 262
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 256
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 235
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-40-01
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 913.1066216434386
  episode_reward_mean: 885.3153109661039
  episode_reward_min: 852.6828501086178
  episodes_this_iter: 10
  episodes_total: 12870
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.085
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4765956401824951
        entropy_coeff: 0.0
        kl: 2.829117511282675e-05
        policy_loss: 0.009221200831234455
        total_loss: 107.17097473144531
        vf_explained_var: 0.000537574291229248
        vf_loss: 107.16175842285156
    load_time_ms: 4.433
    num_steps_sampled: 32175000
    num_steps_trained: 21086208
    sample_time_ms: 30267.327
    update_time_ms: 3.493
  iterations_since_restore: 1287
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off_p

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 235
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 229
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 237
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 248
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 264
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 260
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-42-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 913.1066216434386
  episode_reward_mean: 886.5633894043631
  episode_reward_min: 851.6576055234049
  episodes_this_iter: 10
  episodes_total: 12920
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.668
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4698125123977661
        entropy_coeff: 0.0
        kl: 9.47052103583701e-06
        policy_loss: -0.0056367916986346245
        total_loss: 107.88153839111328
        vf_explained_var: 0.0004741549491882324
        vf_loss: 107.88716125488281
    load_time_ms: 4.111
    num_steps_sampled: 32300000
    num_steps_trained: 21168128
    sample_time_ms: 30262.145
    update_time_ms: 3.539
  iterations_since_restore: 1292
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  off

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 238
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 270
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 247
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 236
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 249
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 239
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-45-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 909.4349115073521
  episode_reward_mean: 886.3499867403854
  episode_reward_min: 851.6576055234049
  episodes_this_iter: 10
  episodes_total: 12970
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.044
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4789507389068604
        entropy_coeff: 0.0
        kl: 3.170775016769767e-06
        policy_loss: 0.0009230609284713864
        total_loss: 107.478

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 226
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 252
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 250
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 270
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 252
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-47-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 909.2007481807298
  episode_reward_mean: 884.0210865579538
  episode_reward_min: 859.7619788707277
  episodes_this_iter: 10
  episodes_total: 13020
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.269
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4812750816345215
        entropy_coeff: 0.0
        kl: 7.2703478508628905e-06
        policy_loss: -0.003852135967463255
        total_loss: 105.65706634521484
        vf_explained_var: 0.0005078911781311035
        vf_loss: 105.66089630126953
    load_time_ms: 4.318
    num_steps_sampled: 32550000
    num_steps_trained: 21331968
    sample_time_ms: 30323.545
    update_time_ms: 3.909
  iterations_since_restore: 1302
  node_ip: 127.0.1.1
  num_healthy_workers: 5
  of

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 222
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 228
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 230
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 261
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 267
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 221
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2

[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 260
[2m[36m(pid=4070)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-50-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 912.3777167878226
  episode_reward_mean: 884.141026636787
  episode_reward_min: 845.0264093927243
  episodes_this_iter: 10
  episodes_total: 13070
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.983
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4848697185516357
        entropy_coeff: 0.0
        kl: 2.8611939342226833e-06
        policy_loss: 0.006908298470079899
        total_loss: 106.29987335205078
        vf_explained_var: 0.00048810243606567383
        vf_loss: 106.29295349121094
    load_time_ms: 4.257
    num_steps_sampled: 32675000
    num_st

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 257
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 232
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 262
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 227
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 259
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 255
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 226
[2m[36m(pid=4075)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-08-02_00-52-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 912.3777167878226
  episode_reward_mean: 884.1048439059689
  episode_reward_min: 845.0264093927243
  episodes_this_iter: 10
  episodes_total: 13120
  experiment_id: be2ab74f1b7f47e9b669beba01c0cb38
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.108
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4851856231689453
        entropy_coeff: 0.0
        kl: 3.0300518119474873e-05
        policy_loss: -0.00723490072414279
        total_loss: 106.640

[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 242
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4074)[0m 
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4074)[0m ring length: 227
[2m[36m(pid=4074)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4075)[0m ring length: 267
[2m[36m(pid=4075)[0m -----------------------
[2m[36m(pid=4073)[0m 
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4073)[0m ring length: 244
[2m[36m(pid=4073)[0m -----------------------
[2m[36m(pid=4070)[0m 
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4070)[0m ring length: 247
[2m[36m(pid=4070)[0m -----------------------
[2m[36m(pid=4071)[0m 
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4071)[0m ring length: 250
[2m[36m(pid=4071)[0m -----------------------
[2m[36m(pid=4075)[0m 
[2

### 4.5 Visualizing the results

The simulation results are saved within the `ray_results/training_example` directory (we defined `training_example` at the start of this tutorial). The `ray_results` folder is by default located at your root `~/ray_results`. 

You can run `tensorboard --logdir=~/ray_results/training_example` (install it with `pip install tensorboard`) to visualize the different data outputted by your simulation.

For more instructions about visualizing, please see `tutorial05_visualize.ipynb`. 

### 4.6 Restart from a checkpoint / Transfer learning

If you wish to do transfer learning, or to resume a previous training, you will need to start the simulation from a previous checkpoint. To do that, you can add a `restore` parameter in the `run_experiments` argument, as follows:

```python
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "restore": "/ray_results/experiment/dir/checkpoint_50/checkpoint-50"
        "checkpoint_freq": 1,
        "checkpoint_at_end": True,
        "max_failures": 999,
        "stop": {
            "training_iteration": 1,
        },
    },
})
```

The `"restore"` path should be such that the `[restore]/.tune_metadata` file exists.

There is also a `"resume"` parameter that you can set to `True` if you just wish to continue the training from a previously saved checkpoint, in case you are still training on the same experiment. 

In [None]:
# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "/ray_results/ray_results/c_mpg+plus/PPO_EnergyOptSPDEnv-v0_0_2020-07-30_18-03-31jr4g5y86/checkpoint_1160", 
#         "checkpoint_freq": 20,
#         "resume" : True,
#         "checkpoint_at_end": True,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 2000,
#         },
#     },
# })

In [None]:
from flow.core.vehicles import Vehicles