# Tutorial 03: Running RLlib Experiments

This tutorial walks you through the process of running traffic simulations in Flow with trainable RLlib-powered agents. Autonomous agents will learn to maximize a certain reward over the rollouts, using the [**RLlib**](https://ray.readthedocs.io/en/latest/rllib.html) library ([citation](https://arxiv.org/abs/1712.09381)) ([installation instructions](https://flow.readthedocs.io/en/latest/flow_setup.html#optional-install-ray-rllib)). Simulations of this form will depict the propensity of RL agents to influence the traffic of a human fleet in order to make the whole fleet more efficient (for some given metrics). 

In this tutorial, we simulate an initially perturbed single lane ring road, where we introduce a single autonomous vehicle. We witness that, after some training, that the autonomous vehicle learns to dissipate the formation and propagation of "phantom jams" which form when only human driver dynamics are involved.

## 1. Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *network*, and an *environment*. Networks describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc... in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act as the primary interface between the reinforcement learning algorithm and the network. Moreover, custom environments may be used to modify the dynamical features of an network. Finally, in the RL case, it is in the *environment* that the state/action spaces and the reward function are defined. 

## 2. Setting up a Network
Flow contains a plethora of pre-designed networks used to replicate highways, intersections, and merges in both closed and open settings. All these networks are located in flow/networks. For this tutorial, which involves a single lane ring road, we will use the network `RingNetwork`.

### 2.1 Setting up Network Parameters

The network mentioned at the start of this section, as well as all other networks in Flow, are parameterized by the following arguments: 
* name
* vehicles
* net_params
* initial_config

These parameters are explained in detail in `tutorial01_sumo.ipynb`. Moreover, all parameters excluding vehicles (covered in section 2.2) do not change from the previous tutorial. Accordingly, we specify them nearly as we have before, and leave further explanations of the parameters to `tutorial01_sumo.ipynb`.

We begin by choosing the network the experiment will be trained on. We use one of Flow's builtin networks, located in `flow.networks`. A list of all available networks can be found by running the script below.

In [1]:
import flow.networks as networks

# print(networks.__all__)

In this tutorial, we choose to use the ring road network. The network class is then:

In [2]:
from flow.networks import RingNetwork

# ring road network class
network_name = RingNetwork

One key difference between SUMO and RLlib experiments is that, in RLlib experiments, the network classes do not need to be defined; instead users should simply name the network class they wish to use. Later on, an environment setup module will import the correct network class based on the provided names.

In [3]:
# input parameter classes to the network class
from flow.core.params import NetParams, InitialConfig

# name of the network
name = "training_example16"

# network-specific parameters
from flow.networks.ring import ADDITIONAL_NET_PARAMS
net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

# initial configuration to vehicles
initial_config = InitialConfig(spacing="uniform", perturbation=1)

### 2.2 Adding Trainable Autonomous Vehicles
The `Vehicles` class stores state information on all vehicles in the network. This class is used to identify the dynamical features of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various `get` methods within this class.

The dynamics of vehicles in the `Vehicles` class can either be depicted by sumo or by the dynamical methods located in flow/controllers. For human-driven vehicles, we use the IDM model for acceleration behavior, with exogenous gaussian acceleration noise with std 0.2 m/s2 to induce perturbations that produce stop-and-go behavior. In addition, we use the `ContinousRouter` routing controller so that the vehicles may maintain their routes closed networks.

As we have done in `tutorial01_sumo.ipynb`, human-driven vehicles are defined in the `VehicleParams` class as follows:

In [4]:
# vehicles class
from flow.core.params import VehicleParams

# vehicles dynamics models
from flow.controllers import IDMController, ContinuousRouter

vehicles = VehicleParams()
#vehicles.add("human",
#             acceleration_controller=(IDMController, {}),
#             routing_controller=(ContinuousRouter, {}),
#             num_vehicles=10)

The above addition to the `Vehicles` class only accounts for 21 of the 22 vehicles that are placed in the network. We now add an additional trainable autuonomous vehicle whose actions are dictated by an RL agent. This is done by specifying an `RLController` as the acceleraton controller to the vehicle. 

In [5]:
from flow.controllers import RLController

Note that this controller serves primarirly as a placeholder that marks the vehicle as a component of the RL agent, meaning that lane changing and routing actions can also be specified by the RL agent to this vehicle.

We finally add the vehicle as follows, while again using the `ContinuousRouter` to perpetually maintain the vehicle within the network.

In [6]:
# from flow.energy_models.toyota_energy import TacomaEnergy
# vehicles.add(veh_id="rl",
#              acceleration_controller=(RLController, {}),
#              routing_controller=(ContinuousRouter, {}),
#              initial_speed =20,
#              energy_model = TacomaEnergy,
#              num_vehicles=1)


vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             initial_speed =15,
             num_vehicles=1)

## 3. Setting up an Environment

Several environments in Flow exist to train RL agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. The use of an environment allows us to view the cumulative reward simulation rollouts receive, along with to specify the state/action spaces.

Sumo envrionments in Flow are parametrized by three components:
* `SumoParams`
* `EnvParams`
* `Network`

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length of any simulation step and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and deactivate the GUI. 

**Note** For training purposes, it is highly recommanded to deactivate the GUI in order to avoid global slow down. In such case, one just needs to specify the following: `render=False`

In [7]:
from flow.core.params import SumoParams

sim_params = SumoParams(sim_step=0.1, render=False)

### 3.2 EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. For the environment `WaveAttenuationPOEnv`, these parameters are used to dictate bounds on the accelerations of the autonomous vehicles, as well as the range of ring lengths (and accordingly network densities) the agent is trained on.

Finally, it is important to specify here the *horizon* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [8]:
from flow.core.params import EnvParams

# Define horizon as a variable to ensure consistent use across notebook
HORIZON=2000

env_params = EnvParams(
    # length of one rollout
    horizon=HORIZON,

    additional_params={
        # maximum acceleration of autonomous vehicles
        "max_accel": 1,
        # maximum deceleration of autonomous vehicles
        "max_decel": 1,
        # bounds on the ranges of ring road lengths the autonomous vehicle 
        # is trained on
        "ring_length": [220, 270],
    },
)

### 3.3 Initializing a Gym Environment

Now, we have to specify our Gym Environment and the algorithm that our RL agents will use. Similar to the network, we choose to use on of Flow's builtin environments, a list of which is provided by the script below.

In [9]:
import flow.envs as flowenvs

print(flowenvs.__all__)

['Env', 'AccelEnv', 'LaneChangeAccelEnv', 'LaneChangeAccelPOEnv', 'TrafficLightGridTestEnv', 'MergePOEnv', 'BottleneckEnv', 'BottleneckAccelEnv', 'WaveAttenuationEnv', 'WaveAttenuationPOEnv', 'EnergyOptEnv', 'EnergyOptPOEnv', 'TrafficLightGridEnv', 'TrafficLightGridPOEnv', 'TrafficLightGridBenchmarkEnv', 'BottleneckDesiredVelocityEnv', 'TestEnv', 'BayBridgeEnv', 'SingleStraightRoad', 'BottleNeckAccelEnv', 'DesiredVelocityEnv', 'PO_TrafficLightGridEnv', 'GreenWaveTestEnv']


We will use the environment "WaveAttenuationPOEnv", which is used to train autonomous vehicles to attenuate the formation and propagation of waves in a partially observable variable density ring road. To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables. These are defined as follows:

In [10]:
from flow.envs import EnergyOptPOEnv

env_name = EnergyOptPOEnv

In [11]:
# from flow.envs import WaveAttenuationPOEnv

# env_name = WaveAttenuationPOEnv

### 3.4 Setting up Flow Parameters

RLlib experiments both generate a `params.json` file for each experiment run. For RLlib experiments, the parameters defining the Flow network and environment must be stored as well. As such, in this section we define the dictionary `flow_params`, which contains the variables required by the utility function `make_create_env`. `make_create_env` is a higher-order function which returns a function `create_env` that initializes a Gym environment corresponding to the Flow network specified.

In [12]:
# Creating flow_params. Make sure the dictionary keys are as specified. 
flow_params = dict(
    # name of the experiment
    exp_tag=name,
    # name of the flow environment the experiment is running on
    env_name=env_name,
    # name of the network class the experiment uses
    network=network_name,
    # simulator that is used by the experiment
    simulator='traci',
    # simulation-related parameters
    sim=sim_params,
    # environment related parameters (see flow.core.params.EnvParams)
    env=env_params,
    # network-related parameters (see flow.core.params.NetParams and
    # the network's documentation or ADDITIONAL_NET_PARAMS component)
    net=net_params,
    # vehicles to be placed in the network at the start of a rollout 
    # (see flow.core.vehicles.Vehicles)
    veh=vehicles,
    # (optional) parameters affecting the positioning of vehicles upon 
    # initialization/reset (see flow.core.params.InitialConfig)
    initial=initial_config
)

## 4 Running RL experiments in Ray

### 4.1 Import 

First, we must import modules required to run experiments in Ray. The `json` package is required to store the Flow experiment parameters in the `params.json` file, as is `FlowParamsEncoder`. Ray-related imports are required: the PPO algorithm agent, `ray.tune`'s experiment runner, and environment helper methods `register_env` and `make_create_env`.

In [13]:
import json

import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
# from ray.rllib.agents.agent import get_agent_class
#from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

Instructions for updating:
non-resource variables are not supported in the long term


### 4.2 Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [14]:
# number of parallel workers
N_CPUS = 6
# number of rollouts per training iteration
N_ROLLOUTS = 1
#ray.shutdown()
ray.init(num_cpus=N_CPUS)

2020-07-28 18:03:34,383	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-28_18-03-34_382327_12178/logs.
2020-07-28 18:03:34,529	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:23649 to respond...
2020-07-28 18:03:34,699	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:31066 to respond...
2020-07-28 18:03:34,713	INFO services.py:809 -- Starting Redis shard with 3.3 GB max memory.
2020-07-28 18:03:34,831	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-28_18-03-34_382327_12178/logs.
2020-07-28 18:03:34,844	INFO services.py:1475 -- Starting the Plasma object store with 4.96 GB memory using /dev/shm.


{'node_ip_address': '192.168.100.38',
 'redis_address': '192.168.100.38:23649',
 'object_store_address': '/tmp/ray/session_2020-07-28_18-03-34_382327_12178/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-07-28_18-03-34_382327_12178/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2020-07-28_18-03-34_382327_12178'}

### 4.3 Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym. 

In [15]:
# The algorithm or model to train. This may refer to "
#      "the name of a built-on algorithm (e.g. RLLib's DQN "
#      "or PPO), or a user-defined trainable function or "
#      "class registered in the tune registry.")
alg_run = "PPO"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])  # stochastic gradient descent
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 10  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

### 4.4 Running Experiments

Here, we use the `run_experiments` function from `ray.tune`. The function takes a dictionary with one key, a name corresponding to the experiment, and one value, itself a dictionary containing parameters for training.

In [None]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 20,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 1500,  # number of iterations to stop after
        },
    },
})

2020-07-28 18:03:35,618	INFO trial_runner.py:176 -- Starting a new experiment.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/6 CPUs, 0/0 GPUs
Memory usage on this node: 9.9/16.5 GB



2020-07-28 18:03:35,856	ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 6/6 CPUs, 0/0 GPUs
Memory usage on this node: 9.9/16.5 GB
Result logdir: /home/solom/ray_results/training_example16
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_EnergyOptPOEnv-v0_0:	RUNNING

[2m[36m(pid=12225)[0m Instructions for updating:
[2m[36m(pid=12225)[0m non-resource variables are not supported in the long term
[2m[36m(pid=12225)[0m 2020-07-28 18:03:44,154	INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=12225)[0m 2020-07-28 18:03:44.156763: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=12225)[0m 2020-07-28 18:03:44.193624: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(pid=12225)[0m 2020-07-28 18:03:44.194590: I tensorflow/compiler/xla/service/servic

[2m[36m(pid=12225)[0m 2020-07-28 18:03:53,823	INFO trainable.py:105 -- _setup took 11.591 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
[2m[36m(pid=12225)[0m Instructions for updating:
[2m[36m(pid=12225)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=12225)[0m Instructions for updating:
[2m[36m(pid=12225)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=12223)[0m 2020-07-28 18:03:55,246	INFO rollout_worker.py:319 -- Creating policy evaluation worker 4 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=12223)[0m 2020-07-28 18:03:55.251179: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=12223)[0m 2020-07-28 18:03:55.293705: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(p

[2m[36m(pid=12228)[0m Instructions for updating:
[2m[36m(pid=12228)[0m Use keras.layers.Dense instead.
[2m[36m(pid=12228)[0m Instructions for updating:
[2m[36m(pid=12228)[0m Use keras.layers.Dense instead.
[2m[36m(pid=12228)[0m Instructions for updating:
[2m[36m(pid=12228)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=12228)[0m Instructions for updating:
[2m[36m(pid=12228)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=12224)[0m Instructions for updating:
[2m[36m(pid=12224)[0m Use `tf.cast` instead.
[2m[36m(pid=12224)[0m Instructions for updating:
[2m[36m(pid=12224)[0m Use `tf.cast` instead.
[2m[36m(pid=12226)[0m Instructions for updating:
[2m[36m(pid=12226)[0m Use `tf.cast` instead.
[2m[36m(pid=12226)[0m Instructions for updating:
[2m[36m(pid=12226)[0m Use `tf.cast` instead.
[2m[36m(pid=12227)[0m Instructions for updating:
[2m[36m(pid=12227)[0m Use `tf.cast` instead.
[2m[36m(pid=12227)[0m Instructions fo

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 250
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 221
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 2020-07-28 18:03:59,075	INFO rollout_worker.py:451 -- Generating sample batch of size 200
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 243
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m 2020-07-28 18:04:00,472	INFO sampler.py:304 -- Raw obs from env: { 0: { 'agent0': np.ndarray((3,), dtype=float64, min=0.0, max=1.0, mean=0.333)}}
[2m[36m(pid=12228)[0m 2020-07-28 18:04:00,473	INFO sampler.py:305 -- Info return from env: {0: {'agent0': None}}
[2m[36m(pid=12228)[0m 2020-07-28 18:04:00,475	INFO sampler.py:403 -- Preprocessed obs: np.ndarray

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-04-06
  done: false
  episode_len_mean: .nan
  episode_reward_max: .nan
  episode_reward_mean: .nan
  episode_reward_min: .nan
  episodes_this_iter: 0
  episodes_total: 0
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 1149.347
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.4152172803878784
        entropy_coeff: 0.0
        kl: 1.309937215410173e-05
        policy_loss: -0.0005203213659115136
        total_loss: 0.05580121651291847
        vf_explained_var: -0.0004712343215942383
        vf_loss: 0.05631893128156662
    load_time_ms: 111.918
    num_steps_sampled: 2000
    num_steps_trained: 2000
    sample_time_ms: 10188.242
    update_time_ms: 1117.106
  iterations_since_restore: 1
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_policy_estimator: {}
  pe

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-04-46
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 18.02937122375481
  episode_reward_mean: 16.690167357372573
  episode_reward_min: 15.130968498161673
  episodes_this_iter: 5
  episodes_total: 5
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 330.119
    learner:
      default_policy:
        cur_kl_coeff: 0.012500000186264515
        cur_lr: 4.999999873689376e-05
        entropy: 1.4133905172348022
        entropy_coeff: 0.0
        kl: 4.735887159768026e-06
        policy_loss: -0.0002447795995976776
        total_loss: 0.05529431998729706
        vf_explained_var: 0.010105490684509277
        vf_loss: 0.05553904548287392
    load_time_ms: 26.157
    num_steps_sampled: 10000
    num_steps_trained: 10000
    sample_time_ms: 9926.402
    update_time_ms: 236.969
  iterations_since_restore: 5
  node_ip: 192.168.100.38
  num_healthy_wo

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 229
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 263
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 241
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 248
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 248
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-05-36
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 243
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 259
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 265
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 253
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 266
[2m[36m(pid=12226)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-06-07
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 233
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 232
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 236
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 245
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 227
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-06-36
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-07-17
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 15.282105929802892
  episode_reward_min: 12.19972212733805
  episodes_this_iter: 0
  episodes_total: 25
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 76.878
    learner:
      default_policy:
        cur_kl_coeff: 5.9604645663569045e-09
        cur_lr: 4.999999873689376e-05
        entropy: 1.4118396043777466
        entropy_coeff: 0.0
        kl: 5.449861419037916e-06
        policy_loss: -0.00017861365631688386
        total_loss: 0.05522125959396362
        vf_explained_var: 0.03642106056213379
        vf_loss: 0.055399879813194275
    load_time_ms: 3.354
    num_steps_sampled: 52000
    num_steps_trained: 52000
    sample_time_ms: 6167.939
    update_time_ms: 17.268
  iterations_since_restore: 26
  node_ip: 192.168.100.38
  num_healthy_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-07-47
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 14.981943367587132
  episode_reward_min: 10.495285209635023
  episodes_this_iter: 0
  episodes_total: 30
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 66.518
    learner:
      default_policy:
        cur_kl_coeff: 4.6566129424663316e-11
        cur_lr: 4.999999873689376e-05
        entropy: 1.4045116901397705
        entropy_coeff: 0.0
        kl: 1.784759842848871e-05
        policy_loss: -0.00015057182463351637
        total_loss: 0.03326907008886337
        vf_explained_var: 0.20881366729736328
        vf_loss: 0.033419638872146606
    load_time_ms: 2.897
    num_steps_sampled: 66000
    num_steps_trained: 66000
    sample_time_ms: 5673.762
    update_time_ms: 12.442
  iterations_since_restore: 33
  node_ip: 192.168.100.38
  num_healthy

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 269
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 241
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 235
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 252
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 226
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-08-15
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-08-48
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 13.922003668134936
  episode_reward_min: 9.136506222477985
  episodes_this_iter: 0
  episodes_total: 45
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 55.206
    learner:
      default_policy:
        cur_kl_coeff: 5.684341970784096e-15
        cur_lr: 4.999999873689376e-05
        entropy: 1.3891853094100952
        entropy_coeff: 0.0
        kl: 2.6477635401533917e-05
        policy_loss: -0.0003484373155515641
        total_loss: 0.051303230226039886
        vf_explained_var: 0.13506770133972168
        vf_loss: 0.05165165662765503
    load_time_ms: 2.255
    num_steps_sampled: 92000
    num_steps_trained: 92000
    sample_time_ms: 4888.617
    update_time_ms: 11.256
  iterations_since_restore: 46
  node_ip: 192.168.100.38
  num_healthy_w

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 235
[2m[36m(pid=12227)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-09-19
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 13.528851129274859
  episode_reward_min: 8.412621555413866
  episodes_this_iter: 5
  episodes_total: 50
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 67.577
    learner:
      default_policy:
        cur_kl_coeff: 3.55271373174006e-16
        cur_lr: 4.999999873689376e-05
        entropy: 1.4040744304656982
        entropy_coeff: 0.0
        kl: 5.6251884927860374e-08
        policy_loss: 6.031989869370591e-06
        total_loss: 0.0020828761626034975
        vf_explained_var: 0.6025709509849548
        vf_loss: 0.0020768397953361273
    load_time_ms: 2.821
    num_steps_sample

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 245
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 230
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 226
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 265
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 231
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-09-58
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-10-26
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 13.05239482810396
  episode_reward_min: 6.989441015264951
  episodes_this_iter: 0
  episodes_total: 55
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 78.9
    learner:
      default_policy:
        cur_kl_coeff: 6.938894007304805e-19
        cur_lr: 4.999999873689376e-05
        entropy: 1.403085470199585
        entropy_coeff: 0.0
        kl: 1.775324278696644e-07
        policy_loss: -6.426811069104588e-06
        total_loss: 0.0034143361262977123
        vf_explained_var: 0.7336118221282959
        vf_loss: 0.0034207659773528576
    load_time_ms: 4.3
    num_steps_sampled: 118000
    num_steps_trained: 118000
    sample_time_ms: 7477.338
    update_time_ms: 19.869
  iterations_since_restore: 59
  node_ip: 192.168.100.38
  num_healthy_work

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-11-00
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 12.715686925453197
  episode_reward_min: 6.989441015264951
  episodes_this_iter: 0
  episodes_total: 60
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 79.059
    learner:
      default_policy:
        cur_kl_coeff: 4.336808754565503e-20
        cur_lr: 4.999999873689376e-05
        entropy: 1.3985285758972168
        entropy_coeff: 0.0
        kl: 1.7587542515684618e-06
        policy_loss: 1.4713048585690558e-05
        total_loss: 0.009067738428711891
        vf_explained_var: 0.651542067527771
        vf_loss: 0.009053037501871586
    load_time_ms: 3.478
    num_steps_sampled: 126000
    num_steps_trained: 126000
    sample_time_ms: 7608.066
    update_time_ms: 19.85
  iterations_since_restore: 63
  node_ip: 192.168.100.38
  num_healthy_w

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-11-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 12.398475509974306
  episode_reward_min: 6.989441015264951
  episodes_this_iter: 0
  episodes_total: 65
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 71.22
    learner:
      default_policy:
        cur_kl_coeff: 2.7105054716034394e-21
        cur_lr: 4.999999873689376e-05
        entropy: 1.394382357597351
        entropy_coeff: 0.0
        kl: 8.54802146932343e-06
        policy_loss: -0.00029419874772429466
        total_loss: 0.023401085287332535
        vf_explained_var: 0.36425840854644775
        vf_loss: 0.02369528077542782
    load_time_ms: 3.21
    num_steps_sampled: 134000
    num_steps_trained: 134000
    sample_time_ms: 7376.343
    update_time_ms: 19.016
  iterations_since_restore: 67
  node_ip: 192.168.100.38
  num_healthy_wo

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-11-58
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  episode_reward_mean: 12.174293778410942
  episode_reward_min: 6.989441015264951
  episodes_this_iter: 0
  episodes_total: 70
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 50.267
    learner:
      default_policy:
        cur_kl_coeff: 2.117582399690187e-23
        cur_lr: 4.999999873689376e-05
        entropy: 1.4047181606292725
        entropy_coeff: 0.0
        kl: 2.4636983653181233e-06
        policy_loss: -7.564735278720036e-05
        total_loss: 0.00211337860673666
        vf_explained_var: 0.8239113092422485
        vf_loss: 0.002189035527408123
    load_time_ms: 2.293
    num_steps_sampled: 148000
    num_steps_trained: 148000
    sample_time_ms: 4891.725
    update_time_ms: 12.585
  iterations_since_restore: 74
  node_ip: 192.168.100.38
  num_healthy_

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 260
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 255
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 249
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 258
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 257
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-12-25
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 223
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 230
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 268
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 234
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 225
[2m[36m(pid=12227)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-12-52
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 235
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 239
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 255
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 264
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 262
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-13-16
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 19.010186783077017
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 234
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 256
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 258
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 226
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 245
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-13-41
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 17.904615709036594
  e

[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 249
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 230
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 226
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 254
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 238
[2m[36m(pid=12227)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-14-07
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 16.944436589809737
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 236
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 256
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 261
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 261
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 236
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-14-31
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 16.944436589809737
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 246
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 250
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 246
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 266
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 220
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-14-55
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.484351732238368
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 257
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 248
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 242
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 266
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 232
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-15-21
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.301667742452855
  e

[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 233
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 268
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 222
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 248
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 270
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-15-46
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  e

[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 249
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 244
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 236
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 261
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 242
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-16-12
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-16-36
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 8.495304945941516
  episode_reward_min: 4.9059291346737846
  episodes_this_iter: 0
  episodes_total: 175
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 29.53
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3906606435775757
        entropy_coeff: 0.0
        kl: 4.351139068603516e-06
        policy_loss: -0.00041080964729189873
        total_loss: 0.001323311822488904
        vf_explained_var: 0.9175403714179993
        vf_loss: 0.001734117860905826
    load_time_ms: 1.502
    num_steps_sampled: 358000
    num_steps_trained: 358000
    sample_time_ms: 2707.775
    update_time_ms: 4.929
  iterations_since_restore: 179
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-17-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 8.686049198062273
  episode_reward_min: 4.9059291346737846
  episodes_this_iter: 0
  episodes_total: 185
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.286
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3809325695037842
        entropy_coeff: 0.0
        kl: 3.2930374800343998e-06
        policy_loss: -0.00024164009664673358
        total_loss: 0.0028820906300097704
        vf_explained_var: 0.9242920875549316
        vf_loss: 0.0031237308867275715
    load_time_ms: 1.249
    num_steps_sampled: 378000
    num_steps_trained: 378000
    sample_time_ms: 2680.725
    update_time_ms: 5.715
  iterations_since_restore: 189
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-17-25
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 8.883354358977039
  episode_reward_min: 4.9059291346737846
  episodes_this_iter: 0
  episodes_total: 195
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 27.612
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3503713607788086
        entropy_coeff: 0.0
        kl: 3.919899427273776e-06
        policy_loss: -2.923250212916173e-05
        total_loss: 0.013699603267014027
        vf_explained_var: 0.7593136429786682
        vf_loss: 0.01372882816940546
    load_time_ms: 1.335
    num_steps_sampled: 394000
    num_steps_trained: 394000
    sample_time_ms: 2606.088
    update_time_ms: 4.853
  iterations_since_restore: 197
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 243
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 225
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 235
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 268
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 265
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-17-50
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 236
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 259
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 249
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 229
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 252
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-18-15
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-18-40
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 9.272343476364355
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 220
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.073
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3663586378097534
        entropy_coeff: 0.0
        kl: 1.2244283880136209e-06
        policy_loss: -0.0001047668483806774
        total_loss: 0.003347322577610612
        vf_explained_var: 0.9140421748161316
        vf_loss: 0.0034520833287388086
    load_time_ms: 1.318
    num_steps_sampled: 448000
    num_steps_trained: 448000
    sample_time_ms: 2783.435
    update_time_ms: 5.217
  iterations_since_restore: 224
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-19-05
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 9.423203887598879
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 230
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 33.353
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3506733179092407
        entropy_coeff: 0.0
        kl: 1.527869790152181e-05
        policy_loss: -0.000334568991092965
        total_loss: 0.0049047451466321945
        vf_explained_var: 0.863880455493927
        vf_loss: 0.005239300429821014
    load_time_ms: 1.386
    num_steps_sampled: 466000
    num_steps_trained: 466000
    sample_time_ms: 2777.805
    update_time_ms: 7.428
  iterations_since_restore: 233
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-19-30
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 9.535045224494612
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 240
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.968
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3223072290420532
        entropy_coeff: 0.0
        kl: 6.1237215049914084e-06
        policy_loss: -7.431364065269008e-05
        total_loss: 0.013711275532841682
        vf_explained_var: 0.7548965215682983
        vf_loss: 0.013785598799586296
    load_time_ms: 1.189
    num_steps_sampled: 484000
    num_steps_trained: 484000
    sample_time_ms: 2674.786
    update_time_ms: 5.079
  iterations_since_restore: 242
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-19-58
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.783670247790553
  episode_reward_mean: 9.630563260384422
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 250
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 27.862
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3077542781829834
        entropy_coeff: 0.0
        kl: 4.384040948934853e-06
        policy_loss: 2.2202015315997414e-05
        total_loss: 0.01636091060936451
        vf_explained_var: 0.6483395099639893
        vf_loss: 0.01633869856595993
    load_time_ms: 1.486
    num_steps_sampled: 504000
    num_steps_trained: 504000
    sample_time_ms: 2727.204
    update_time_ms: 4.914
  iterations_since_restore: 252
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 247
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 265
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 245
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 237
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 267
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-20-23
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.73245530941248
  ep

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 243
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 236
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 254
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 220
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 224
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-20-48
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.73245530941248
  ep

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 225
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 221
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 269
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 252
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 268
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-21-15
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.73245530941248
  ep

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-21-37
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.942752698236877
  episode_reward_mean: 9.828639323254789
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 285
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.703
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3136897087097168
        entropy_coeff: 0.0
        kl: 8.641302883916069e-06
        policy_loss: 9.656906513555441e-06
        total_loss: 0.005540479440242052
        vf_explained_var: 0.8644092679023743
        vf_loss: 0.005530829541385174
    load_time_ms: 1.477
    num_steps_sampled: 576000
    num_steps_trained: 576000
    sample_time_ms: 2860.036
    update_time_ms: 6.703
  iterations_since_restore: 288
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-22-04
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.942752698236877
  episode_reward_mean: 9.82223577750308
  episode_reward_min: 6.286269036276553
  episodes_this_iter: 0
  episodes_total: 295
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.455
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3014402389526367
        entropy_coeff: 0.0
        kl: 7.2931943577714264e-06
        policy_loss: -0.0001557421637699008
        total_loss: 0.010173271410167217
        vf_explained_var: 0.7765362858772278
        vf_loss: 0.010329012759029865
    load_time_ms: 1.498
    num_steps_sampled: 594000
    num_steps_trained: 594000
    sample_time_ms: 2840.029
    update_time_ms: 4.412
  iterations_since_restore: 297
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 242
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 254
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 259
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 245
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 239
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-22-27
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.942752698236877
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 260
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 256
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 265
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 238
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 263
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-22-55
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.942752698236877
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-23-18
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.942752698236877
  episode_reward_mean: 9.996166323888177
  episode_reward_min: 7.881919441098772
  episodes_this_iter: 0
  episodes_total: 320
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.686
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3259600400924683
        entropy_coeff: 0.0
        kl: 3.401726416996098e-06
        policy_loss: -7.2801711212378e-05
        total_loss: 0.004016408231109381
        vf_explained_var: 0.8873124122619629
        vf_loss: 0.00408921018242836
    load_time_ms: 1.523
    num_steps_sampled: 646000
    num_steps_trained: 646000
    sample_time_ms: 2914.193
    update_time_ms: 5.357
  iterations_since_restore: 323
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_poli

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 224
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 239
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 267
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 247
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 258
[2m[36m(pid=12227)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-23-42
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-24-04
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.091950527941167
  episode_reward_min: 8.113644238183573
  episodes_this_iter: 0
  episodes_total: 335
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.623
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3295817375183105
        entropy_coeff: 0.0
        kl: 6.899893378431443e-06
        policy_loss: -0.0005579695571213961
        total_loss: 0.0018169613322243094
        vf_explained_var: 0.9119032621383667
        vf_loss: 0.002374940551817417
    load_time_ms: 1.542
    num_steps_sampled: 678000
    num_steps_trained: 678000
    sample_time_ms: 2820.704
    update_time_ms: 5.739
  iterations_since_restore: 339
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-24-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.156296465034995
  episode_reward_min: 8.113644238183573
  episodes_this_iter: 0
  episodes_total: 345
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 29.819
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2760460376739502
        entropy_coeff: 0.0
        kl: 9.229957868228666e-06
        policy_loss: -0.00010706615285016596
        total_loss: 0.01032605953514576
        vf_explained_var: 0.6838303804397583
        vf_loss: 0.010433118790388107
    load_time_ms: 1.471
    num_steps_sampled: 694000
    num_steps_trained: 694000
    sample_time_ms: 2899.196
    update_time_ms: 7.893
  iterations_since_restore: 347
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-24-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.141309899712615
  episode_reward_min: 8.113644238183573
  episodes_this_iter: 0
  episodes_total: 355
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.548
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2755227088928223
        entropy_coeff: 0.0
        kl: 4.208433529129252e-05
        policy_loss: -0.0002897338999900967
        total_loss: 0.008003738708794117
        vf_explained_var: 0.8432930111885071
        vf_loss: 0.008293463848531246
    load_time_ms: 1.141
    num_steps_sampled: 714000
    num_steps_trained: 714000
    sample_time_ms: 2802.682
    update_time_ms: 8.571
  iterations_since_restore: 357
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 229
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 235
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 247
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 241
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 221
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-25-20
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-25-43
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.215500755829797
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 370
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.129
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2985124588012695
        entropy_coeff: 0.0
        kl: 3.736883400051738e-06
        policy_loss: -0.00011372566223144531
        total_loss: 0.003012657631188631
        vf_explained_var: 0.8800949454307556
        vf_loss: 0.0031263851560652256
    load_time_ms: 1.328
    num_steps_sampled: 746000
    num_steps_trained: 746000
    sample_time_ms: 2832.003
    update_time_ms: 5.665
  iterations_since_restore: 373
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 237
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 266
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 239
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 222
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 232
[2m[36m(pid=12227)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-26-06
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  e

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 240
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 231
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 267
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 221
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 247
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-26-30
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-26-56
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.260653985312743
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 395
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 38.183
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2953087091445923
        entropy_coeff: 0.0
        kl: 3.1198858323477907e-06
        policy_loss: -0.00028148459387011826
        total_loss: 0.0023430194705724716
        vf_explained_var: 0.8902881741523743
        vf_loss: 0.002624501008540392
    load_time_ms: 1.713
    num_steps_sampled: 798000
    num_steps_trained: 798000
    sample_time_ms: 2813.696
    update_time_ms: 5.447
  iterations_since_restore: 399
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-27-19
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.007756407862903
  episode_reward_mean: 10.313135728971986
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 405
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 29.336
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2298613786697388
        entropy_coeff: 0.0
        kl: 6.807741738157347e-05
        policy_loss: -0.0005650825332850218
        total_loss: 0.009365366771817207
        vf_explained_var: 0.7334765791893005
        vf_loss: 0.00993044301867485
    load_time_ms: 1.154
    num_steps_sampled: 814000
    num_steps_trained: 814000
    sample_time_ms: 2851.119
    update_time_ms: 5.025
  iterations_since_restore: 407
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 250
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 248
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 243
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 224
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 223
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-27-43
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-28-07
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.332306618867172
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 420
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.983
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.266073226928711
        entropy_coeff: 0.0
        kl: 1.3576239325630013e-05
        policy_loss: -0.0008230981766246259
        total_loss: 0.0012534226989373565
        vf_explained_var: 0.8660944700241089
        vf_loss: 0.002076522447168827
    load_time_ms: 1.229
    num_steps_sampled: 848000
    num_steps_trained: 848000
    sample_time_ms: 2766.628
    update_time_ms: 5.787
  iterations_since_restore: 424
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-28-30
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.347194092988042
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 430
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.642
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1955031156539917
        entropy_coeff: 0.0
        kl: 2.2128582713776268e-05
        policy_loss: -4.335308040026575e-05
        total_loss: 0.008188245818018913
        vf_explained_var: 0.798556923866272
        vf_loss: 0.008231599815189838
    load_time_ms: 1.253
    num_steps_sampled: 864000
    num_steps_trained: 864000
    sample_time_ms: 2780.721
    update_time_ms: 6.477
  iterations_since_restore: 432
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 244
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 247
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 221
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 230
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 270
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-28-56
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-29-19
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.3829252224365
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 445
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.426
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2553986310958862
        entropy_coeff: 0.0
        kl: 2.908569513238035e-05
        policy_loss: -0.000546055322047323
        total_loss: 0.002014508470892906
        vf_explained_var: 0.9253324270248413
        vf_loss: 0.002560562454164028
    load_time_ms: 1.287
    num_steps_sampled: 898000
    num_steps_trained: 898000
    sample_time_ms: 2901.821
    update_time_ms: 7.405
  iterations_since_restore: 449
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-29-45
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.494612668529564
  episode_reward_min: 8.191959177835225
  episodes_this_iter: 0
  episodes_total: 455
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 29.57
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2107268571853638
        entropy_coeff: 0.0
        kl: 5.375898035708815e-05
        policy_loss: -6.701373786199838e-05
        total_loss: 0.003630230436101556
        vf_explained_var: 0.755587100982666
        vf_loss: 0.003697242122143507
    load_time_ms: 1.162
    num_steps_sampled: 916000
    num_steps_trained: 916000
    sample_time_ms: 2778.051
    update_time_ms: 5.241
  iterations_since_restore: 458
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 232
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 257
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 226
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 270
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 222
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-30-09
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-30-33
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.628396466982446
  episode_reward_min: 8.753233756311687
  episodes_this_iter: 0
  episodes_total: 470
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.927
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.2547922134399414
        entropy_coeff: 0.0
        kl: 6.532490260724444e-06
        policy_loss: -0.0003827858017757535
        total_loss: 0.001405161339789629
        vf_explained_var: 0.9323396682739258
        vf_loss: 0.001787954824976623
    load_time_ms: 1.18
    num_steps_sampled: 948000
    num_steps_trained: 948000
    sample_time_ms: 3057.354
    update_time_ms: 4.698
  iterations_since_restore: 474
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-31-04
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.695126907719441
  episode_reward_min: 8.96486702911218
  episodes_this_iter: 0
  episodes_total: 480
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.955
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1555930376052856
        entropy_coeff: 0.0
        kl: 8.180859731510282e-05
        policy_loss: -0.00018458938575349748
        total_loss: 0.006763671990483999
        vf_explained_var: 0.6630679965019226
        vf_loss: 0.006948278285562992
    load_time_ms: 1.929
    num_steps_sampled: 964000
    num_steps_trained: 964000
    sample_time_ms: 3700.08
    update_time_ms: 9.405
  iterations_since_restore: 482
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 258
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 263
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 249
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 256
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 255
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-31-35
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-32-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.730374735405322
  episode_reward_min: 8.96486702911218
  episodes_this_iter: 0
  episodes_total: 495
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 38.755
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1910805702209473
        entropy_coeff: 0.0
        kl: 1.7138301700470038e-05
        policy_loss: 0.0001434726727893576
        total_loss: 0.0030568866059184074
        vf_explained_var: 0.8259185552597046
        vf_loss: 0.002913400763645768
    load_time_ms: 1.956
    num_steps_sampled: 996000
    num_steps_trained: 996000
    sample_time_ms: 3587.248
    update_time_ms: 8.177
  iterations_since_restore: 498
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 231
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 260
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 247
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 228
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 263
[2m[36m(pid=12226)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-32-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-32-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.308646449322124
  episode_reward_mean: 10.771708637871589
  episode_reward_min: 8.96486702911218
  episodes_this_iter: 0
  episodes_total: 510
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 35.207
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1954296827316284
        entropy_coeff: 0.0
        kl: 1.4335631931317039e-05
        policy_loss: -0.0005642354371957481
        total_loss: 0.001617103349417448
        vf_explained_var: 0.872527003288269
        vf_loss: 0.002181329997256398
    load_time_ms: 1.772
    num_steps_sampled: 1028000
    num_steps_trained: 1028000
    sample_time_ms: 3495.859
    update_time_ms: 9.203
  iterations_since_restore: 514
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-33-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  episode_reward_mean: 10.796843142411984
  episode_reward_min: 8.96486702911218
  episodes_this_iter: 0
  episodes_total: 520
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 40.112
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.12739098072052
        entropy_coeff: 0.0
        kl: 9.6652984211687e-05
        policy_loss: -0.00039534259121865034
        total_loss: 0.005460167769342661
        vf_explained_var: 0.7375046014785767
        vf_loss: 0.005855516530573368
    load_time_ms: 1.734
    num_steps_sampled: 1044000
    num_steps_trained: 1044000
    sample_time_ms: 3570.83
    update_time_ms: 10.304
  iterations_since_restore: 522
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 250
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 253
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 244
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 232
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 224
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-34-00
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-34-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  episode_reward_mean: 10.819850191632804
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 535
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.898
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1739423274993896
        entropy_coeff: 0.0
        kl: 1.1069804713770282e-05
        policy_loss: -0.00033226393861696124
        total_loss: 0.0015288409776985645
        vf_explained_var: 0.9183663129806519
        vf_loss: 0.0018610936822369695
    load_time_ms: 2.181
    num_steps_sampled: 1076000
    num_steps_trained: 1076000
    sample_time_ms: 3659.823
    update_time_ms: 9.436
  iterations_since_restore: 538
  node_ip: 192.168.100.38
  num_healthy_workers: 5


[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 243
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 264
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 262
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 244
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 243
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-34-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-35-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  episode_reward_mean: 10.866376579055208
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 550
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 38.535
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1689461469650269
        entropy_coeff: 0.0
        kl: 1.016747955873143e-05
        policy_loss: -0.0005288133397698402
        total_loss: 0.001108469907194376
        vf_explained_var: 0.9302800893783569
        vf_loss: 0.0016372936079278588
    load_time_ms: 2.072
    num_steps_sampled: 1108000
    num_steps_trained: 1108000
    sample_time_ms: 3677.358
    update_time_ms: 8.729
  iterations_since_restore: 554
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-35-58
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  episode_reward_mean: 10.861048736404191
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 560
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 50.458
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.0619112253189087
        entropy_coeff: 0.0
        kl: 7.410821126541123e-05
        policy_loss: -9.069537918549031e-05
        total_loss: 0.005380129907280207
        vf_explained_var: 0.7626311182975769
        vf_loss: 0.005470830947160721
    load_time_ms: 2.192
    num_steps_sampled: 1124000
    num_steps_trained: 1124000
    sample_time_ms: 3605.108
    update_time_ms: 9.438
  iterations_since_restore: 562
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 258
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 237
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 247
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 263
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 225
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-36-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-36-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.916973527527926
  episode_reward_mean: 10.880201340450558
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 575
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 50.986
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1165120601654053
        entropy_coeff: 0.0
        kl: 1.5685081962146796e-05
        policy_loss: -9.241962106898427e-05
        total_loss: 0.0023114834912121296
        vf_explained_var: 0.874774694442749
        vf_loss: 0.002403904916718602
    load_time_ms: 1.885
    num_steps_sampled: 1156000
    num_steps_trained: 1156000
    sample_time_ms: 3559.859
    update_time_ms: 8.078
  iterations_since_restore: 578
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 261
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 249
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 254
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 240
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 270
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-37-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-38-01
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 10.966213159991035
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 590
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.415
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1580332517623901
        entropy_coeff: 0.0
        kl: 2.281883280375041e-05
        policy_loss: -0.0009098167647607625
        total_loss: 0.0005026512080803514
        vf_explained_var: 0.9347376823425293
        vf_loss: 0.0014124676818028092
    load_time_ms: 2.449
    num_steps_sampled: 1188000
    num_steps_trained: 1188000
    sample_time_ms: 3895.954
    update_time_ms: 10.257
  iterations_since_restore: 594
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 221
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 228
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 241
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 270
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 225
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-38-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-39-02
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 11.008221139743242
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 605
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 37.344
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1054939031600952
        entropy_coeff: 0.0
        kl: 6.216192559804767e-05
        policy_loss: -0.0012294092448428273
        total_loss: 0.0009706182754598558
        vf_explained_var: 0.8902115821838379
        vf_loss: 0.002200038405135274
    load_time_ms: 1.597
    num_steps_sampled: 1218000
    num_steps_trained: 1218000
    sample_time_ms: 3757.145
    update_time_ms: 11.257
  iterations_since_restore: 609
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-39-36
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 11.025822996502804
  episode_reward_min: 9.263280905548632
  episodes_this_iter: 0
  episodes_total: 615
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.272
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.0386401414871216
        entropy_coeff: 0.0
        kl: 8.808487473288551e-05
        policy_loss: -0.0006604738300666213
        total_loss: 0.00473405234515667
        vf_explained_var: 0.7253221869468689
        vf_loss: 0.005394541192799807
    load_time_ms: 2.182
    num_steps_sampled: 1234000
    num_steps_trained: 1234000
    sample_time_ms: 3938.617
    update_time_ms: 8.851
  iterations_since_restore: 617
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-40-05
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 11.071077894450582
  episode_reward_min: 9.948785406980893
  episodes_this_iter: 0
  episodes_total: 620
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.451
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1073118448257446
        entropy_coeff: 0.0
        kl: 1.596587935637217e-05
        policy_loss: -0.0005080108530819416
        total_loss: 0.001596189453266561
        vf_explained_var: 0.8822010159492493
        vf_loss: 0.0021041971631348133
    load_time_ms: 1.729
    num_steps_sampled: 1248000
    num_steps_trained: 1248000
    sample_time_ms: 4302.559
    update_time_ms: 8.671
  iterations_since_restore: 624
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 244
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 228
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 224
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 248
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 258
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-40-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-41-00
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 11.132452844927426
  episode_reward_min: 10.261786008758232
  episodes_this_iter: 0
  episodes_total: 635
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.598
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.0131961107254028
        entropy_coeff: 0.0
        kl: 5.177843559067696e-05
        policy_loss: -9.477186540607363e-05
        total_loss: 0.00568698113784194
        vf_explained_var: 0.7410953044891357
        vf_loss: 0.0057817683555185795
    load_time_ms: 2.086
    num_steps_sampled: 1274000
    num_steps_trained: 1274000
    sample_time_ms: 4083.386
    update_time_ms: 10.664
  iterations_since_restore: 637
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-41-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 11.980456165225856
  episode_reward_mean: 11.161403739429076
  episode_reward_min: 10.386016247427902
  episodes_this_iter: 0
  episodes_total: 640
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.945
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.1192197799682617
        entropy_coeff: 0.0
        kl: 1.746356429066509e-05
        policy_loss: -0.0007401139591820538
        total_loss: 0.0008207604987546802
        vf_explained_var: 0.8992769718170166
        vf_loss: 0.001560875796712935
    load_time_ms: 2.182
    num_steps_sampled: 1288000
    num_steps_trained: 1288000
    sample_time_ms: 4414.837
    update_time_ms: 12.209
  iterations_since_restore: 644
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 236
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 252
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 267
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 220
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 223
[2m[36m(pid=12226)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-42-01
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-42-30
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.196483602947092
  episode_reward_min: 10.386016247427902
  episodes_this_iter: 0
  episodes_total: 655
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.468
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9866384267807007
        entropy_coeff: 0.0
        kl: 0.0001584009878570214
        policy_loss: -0.00056317332200706
        total_loss: 0.004776307847350836
        vf_explained_var: 0.7894218564033508
        vf_loss: 0.005339473951607943
    load_time_ms: 2.152
    num_steps_sampled: 1314000
    num_steps_trained: 1314000
    sample_time_ms: 4263.128
    update_time_ms: 16.181
  iterations_since_restore: 657
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-42-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.205345784808536
  episode_reward_min: 10.386016247427902
  episodes_this_iter: 0
  episodes_total: 660
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.198
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.0663671493530273
        entropy_coeff: 0.0
        kl: 1.2332171536399983e-05
        policy_loss: -0.00043609191197901964
        total_loss: 0.0012464523315429688
        vf_explained_var: 0.9161909818649292
        vf_loss: 0.0016825358616188169
    load_time_ms: 2.341
    num_steps_sampled: 1328000
    num_steps_trained: 1328000
    sample_time_ms: 4259.591
    update_time_ms: 12.111
  iterations_since_restore: 664
  node_ip: 192.168.100.38
  num_healthy_workers: 

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 234
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 266
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 264
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 259
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 227
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-43-25
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-43-54
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.279591299676849
  episode_reward_min: 10.432605221020813
  episodes_this_iter: 0
  episodes_total: 675
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.068
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9188870787620544
        entropy_coeff: 0.0
        kl: 7.92648788774386e-05
        policy_loss: 9.483337635174394e-05
        total_loss: 0.0071290950290858746
        vf_explained_var: 0.7133393883705139
        vf_loss: 0.007034269627183676
    load_time_ms: 2.341
    num_steps_sampled: 1354000
    num_steps_trained: 1354000
    sample_time_ms: 4016.672
    update_time_ms: 10.898
  iterations_since_restore: 677
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 241
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 256
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 248
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 261
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 250
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-44-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-44-56
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.363480573591925
  episode_reward_min: 10.432605221020813
  episodes_this_iter: 0
  episodes_total: 690
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.003
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9084152579307556
        entropy_coeff: 0.0
        kl: 6.816696986788884e-05
        policy_loss: -0.000668325403239578
        total_loss: 0.0050969733856618404
        vf_explained_var: 0.6726292371749878
        vf_loss: 0.005765300709754229
    load_time_ms: 1.677
    num_steps_sampled: 1384000
    num_steps_trained: 1384000
    sample_time_ms: 3945.402
    update_time_ms: 11.394
  iterations_since_restore: 692
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-45-26
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.420245979382807
  episode_reward_min: 10.467659099019484
  episodes_this_iter: 0
  episodes_total: 695
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.908
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.0213563442230225
        entropy_coeff: 0.0
        kl: 8.209264342440292e-05
        policy_loss: -0.0013211669865995646
        total_loss: 0.000104282378742937
        vf_explained_var: 0.9214306473731995
        vf_loss: 0.0014254519483074546
    load_time_ms: 2.06
    num_steps_sampled: 1398000
    num_steps_trained: 1398000
    sample_time_ms: 4050.658
    update_time_ms: 9.685
  iterations_since_restore: 699
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 228
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 239
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 269
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 263
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 263
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-45-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-46-24
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.44908959314663
  episode_reward_min: 10.467659099019484
  episodes_this_iter: 0
  episodes_total: 710
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.464
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9809752702713013
        entropy_coeff: 0.0
        kl: 1.2155115655332338e-05
        policy_loss: 2.0783423678949475e-05
        total_loss: 0.0018291816813871264
        vf_explained_var: 0.9045804738998413
        vf_loss: 0.0018083920003846288
    load_time_ms: 1.823
    num_steps_sampled: 1426000
    num_steps_trained: 1426000
    sample_time_ms: 3780.459
    update_time_ms: 9.792
  iterations_since_restore: 713
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-46-50
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.469027609719987
  episode_reward_min: 10.484247135767813
  episodes_this_iter: 0
  episodes_total: 715
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 56.953
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9501033425331116
        entropy_coeff: 0.0
        kl: 2.737087015702855e-05
        policy_loss: 0.0001471242867410183
        total_loss: 0.002416536444798112
        vf_explained_var: 0.8759782314300537
        vf_loss: 0.0022694135550409555
    load_time_ms: 2.012
    num_steps_sampled: 1436000
    num_steps_trained: 1436000
    sample_time_ms: 4563.213
    update_time_ms: 13.114
  iterations_since_restore: 718
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  



Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-47-07
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.48127542256214
  episode_reward_min: 10.484247135767813
  episodes_this_iter: 5
  episodes_total: 720
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 67.153
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9796096086502075
        entropy_coeff: 0.0
        kl: 5.359616989153437e-05
        policy_loss: -0.00048567354679107666
        total_loss: 0.35053306818008423
        vf_explained_var: -0.0008293390274047852
        vf_loss: 0.3510187566280365
    load_time_ms: 2.651
    num_steps_sampled: 1440000
    num_steps_trained: 1440000
    sample_time_ms: 5418.356
    update_time_ms: 13.517
  iterations_since_restore: 720
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-47-43
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.494070526053143
  episode_reward_min: 10.484247135767813
  episodes_this_iter: 0
  episodes_total: 725
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 72.725
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.7110881209373474
        entropy_coeff: 0.0
        kl: 0.0002654731215443462
        policy_loss: -0.0017384715611115098
        total_loss: 0.02794583886861801
        vf_explained_var: 0.48598527908325195
        vf_loss: 0.029684288427233696
    load_time_ms: 3.109
    num_steps_sampled: 1452000
    num_steps_trained: 1452000
    sample_time_ms: 6294.796
    update_time_ms: 12.952
  iterations_since_restore: 726
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-48-26
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.523187763359974
  episode_reward_min: 10.484247135767813
  episodes_this_iter: 5
  episodes_total: 730
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 85.98
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.016942024230957
        entropy_coeff: 0.0
        kl: 5.08302446178277e-06
        policy_loss: 9.782886627363041e-05
        total_loss: 0.3474511206150055
        vf_explained_var: 0.0051610469818115234
        vf_loss: 0.3473533093929291
    load_time_ms: 3.837
    num_steps_sampled: 1460000
    num_steps_trained: 1460000
    sample_time_ms: 7739.181
    update_time_ms: 19.092
  iterations_since_restore: 730
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-48-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.524594901324498
  episode_reward_min: 10.796809491242296
  episodes_this_iter: 0
  episodes_total: 735
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 70.674
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.872529149055481
        entropy_coeff: 0.0
        kl: 8.63372115418315e-05
        policy_loss: -0.00037224864354357123
        total_loss: 0.003922241739928722
        vf_explained_var: 0.8007694482803345
        vf_loss: 0.004294487182050943
    load_time_ms: 3.131
    num_steps_sampled: 1474000
    num_steps_trained: 1474000
    sample_time_ms: 6366.047
    update_time_ms: 18.574
  iterations_since_restore: 737
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-49-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.300825743297942
  episode_reward_mean: 11.537115754633126
  episode_reward_min: 10.796809491242296
  episodes_this_iter: 0
  episodes_total: 740
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.265
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.9745312929153442
        entropy_coeff: 0.0
        kl: 0.0001464660163037479
        policy_loss: -0.0019881678745150566
        total_loss: -0.000716398237273097
        vf_explained_var: 0.9196460843086243
        vf_loss: 0.0012717553181573749
    load_time_ms: 2.048
    num_steps_sampled: 1488000
    num_steps_trained: 1488000
    sample_time_ms: 4270.433
    update_time_ms: 12.731
  iterations_since_restore: 744
  node_ip: 192.168.100.38
  num_healthy_workers: 5


[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 266
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 269
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 233
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 230
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 230
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-49-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.289611994824325
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-50-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.289611994824325
  episode_reward_mean: 11.534384824863153
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 755
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.114
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.8834840059280396
        entropy_coeff: 0.0
        kl: 0.00012229735148139298
        policy_loss: -0.00018183994689024985
        total_loss: 0.001808023895137012
        vf_explained_var: 0.8994019031524658
        vf_loss: 0.001989854034036398
    load_time_ms: 1.885
    num_steps_sampled: 1516000
    num_steps_trained: 1516000
    sample_time_ms: 3924.41
    update_time_ms: 11.187
  iterations_since_restore: 758
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 265
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 238
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 265
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 226
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 240
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-50-56
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-51-27
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  episode_reward_mean: 11.584623095682781
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 770
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 38.883
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.7865175008773804
        entropy_coeff: 0.0
        kl: 0.00015837588580325246
        policy_loss: -0.0005753154982812703
        total_loss: 0.003707763273268938
        vf_explained_var: 0.7724772691726685
        vf_loss: 0.0042830840684473515
    load_time_ms: 1.739
    num_steps_sampled: 1544000
    num_steps_trained: 1544000
    sample_time_ms: 4173.512
    update_time_ms: 9.447
  iterations_since_restore: 772
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 233
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 244
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 252
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 250
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 261
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-51-59
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-52-30
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  episode_reward_mean: 11.565138003673404
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 785
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.29
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.757593035697937
        entropy_coeff: 0.0
        kl: 1.4876872228342108e-05
        policy_loss: -6.848478369647637e-05
        total_loss: 0.004142005927860737
        vf_explained_var: 0.818937361240387
        vf_loss: 0.004210484679788351
    load_time_ms: 2.341
    num_steps_sampled: 1574000
    num_steps_trained: 1574000
    sample_time_ms: 4449.745
    update_time_ms: 11.509
  iterations_since_restore: 787
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-53-01
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  episode_reward_mean: 11.568058865578273
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 790
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 59.78
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.8347997069358826
        entropy_coeff: 0.0
        kl: 1.475527915317798e-05
        policy_loss: 2.9659271604032256e-05
        total_loss: 0.0014853152679279447
        vf_explained_var: 0.7587697505950928
        vf_loss: 0.0014556607929989696
    load_time_ms: 2.564
    num_steps_sampled: 1588000
    num_steps_trained: 1588000
    sample_time_ms: 4391.505
    update_time_ms: 12.703
  iterations_since_restore: 794
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 254
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 228
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 260
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 250
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 247
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-53-29
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-54-02
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  episode_reward_mean: 11.61542246652297
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 805
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.025
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.787118673324585
        entropy_coeff: 0.0
        kl: 4.007193274446763e-05
        policy_loss: -0.0003584318037610501
        total_loss: 0.0011401643278077245
        vf_explained_var: 0.9106580018997192
        vf_loss: 0.001498601515777409
    load_time_ms: 1.858
    num_steps_sampled: 1616000
    num_steps_trained: 1616000
    sample_time_ms: 4103.235
    update_time_ms: 9.363
  iterations_since_restore: 808
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 222
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 239
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 234
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 248
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 237
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-54-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.331507497431044
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-55-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.69540982231876
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 820
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.089
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.7947023510932922
        entropy_coeff: 0.0
        kl: 0.00019503245130181313
        policy_loss: -0.00226816744543612
        total_loss: -0.0012293376494199038
        vf_explained_var: 0.90693598985672
        vf_loss: 0.0010388264199718833
    load_time_ms: 1.517
    num_steps_sampled: 1648000
    num_steps_trained: 1648000
    sample_time_ms: 3607.699
    update_time_ms: 7.656
  iterations_since_restore: 824
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-55-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.726963797680076
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 830
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.325
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.7660039067268372
        entropy_coeff: 0.0
        kl: 9.543684427626431e-05
        policy_loss: -0.0009494552505202591
        total_loss: 0.000337472913088277
        vf_explained_var: 0.808148205280304
        vf_loss: 0.0012869187630712986
    load_time_ms: 1.645
    num_steps_sampled: 1668000
    num_steps_trained: 1668000
    sample_time_ms: 2385.684
    update_time_ms: 5.065
  iterations_since_restore: 834
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-55-49
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.78564986905858
  episode_reward_min: 10.821563491381733
  episodes_this_iter: 0
  episodes_total: 840
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.762
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.6589390635490417
        entropy_coeff: 0.0
        kl: 3.1277537004825717e-07
        policy_loss: -1.482677453168435e-05
        total_loss: 0.0035374618601053953
        vf_explained_var: 0.8388810157775879
        vf_loss: 0.0035522817634046078
    load_time_ms: 1.703
    num_steps_sampled: 1684000
    num_steps_trained: 1684000
    sample_time_ms: 2554.57
    update_time_ms: 4.019
  iterations_since_restore: 842
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-56-17
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.862397866740114
  episode_reward_min: 11.12026813692992
  episodes_this_iter: 0
  episodes_total: 850
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.755
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.7122856974601746
        entropy_coeff: 0.0
        kl: 0.0002790876606013626
        policy_loss: -0.0007005395600572228
        total_loss: 0.0010314054088667035
        vf_explained_var: 0.8048441410064697
        vf_loss: 0.0017319354228675365
    load_time_ms: 1.153
    num_steps_sampled: 1706000
    num_steps_trained: 1706000
    sample_time_ms: 2500.567
    update_time_ms: 4.685
  iterations_since_restore: 853
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  o

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-56-43
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.912306131211606
  episode_reward_min: 11.12026813692992
  episodes_this_iter: 0
  episodes_total: 860
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.22
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.6738499999046326
        entropy_coeff: 0.0
        kl: 1.3426422810880467e-05
        policy_loss: -5.7386398111702874e-05
        total_loss: 0.0015888996422290802
        vf_explained_var: 0.7604209780693054
        vf_loss: 0.0016462845960631967
    load_time_ms: 1.077
    num_steps_sampled: 1726000
    num_steps_trained: 1726000
    sample_time_ms: 2523.761
    update_time_ms: 4.889
  iterations_since_restore: 863
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-57-08
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 11.946877132990334
  episode_reward_min: 11.12026813692992
  episodes_this_iter: 0
  episodes_total: 870
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.661
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.6902309060096741
        entropy_coeff: 0.0
        kl: 5.940616119914921e-06
        policy_loss: -0.00014409732830245048
        total_loss: 0.0011915421346202493
        vf_explained_var: 0.81326824426651
        vf_loss: 0.0013356332201510668
    load_time_ms: 1.101
    num_steps_sampled: 1746000
    num_steps_trained: 1746000
    sample_time_ms: 2492.549
    update_time_ms: 5.759
  iterations_since_restore: 873
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 230
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 234
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 267
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 259
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 247
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-57-33
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  ep

[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 250
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 264
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 257
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 233
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 240
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-57-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  ep

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 228
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 248
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 264
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 244
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 240
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-58-23
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  ep

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 248
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 233
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 266
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 256
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 247
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-58-49
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  ep

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-59-15
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.48621766096005
  episode_reward_mean: 12.007296053012078
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 915
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.231
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.5851531028747559
        entropy_coeff: 0.0
        kl: 2.3793696527718566e-05
        policy_loss: -3.0522824090439826e-05
        total_loss: 0.0014120254199951887
        vf_explained_var: 0.7712337970733643
        vf_loss: 0.0014425499830394983
    load_time_ms: 1.303
    num_steps_sampled: 1838000
    num_steps_trained: 1838000
    sample_time_ms: 2859.264
    update_time_ms: 6.376
  iterations_since_restore: 919
  node_ip: 192.168.100.38
  num_healthy_workers: 5


Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_18-59-42
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.0474317994106
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 925
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.14
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.6200413703918457
        entropy_coeff: 0.0
        kl: 0.0003924739430658519
        policy_loss: -0.0031070217955857515
        total_loss: -0.0023075228091329336
        vf_explained_var: 0.8977749943733215
        vf_loss: 0.0007995054475031793
    load_time_ms: 1.155
    num_steps_sampled: 1858000
    num_steps_trained: 1858000
    sample_time_ms: 2658.636
    update_time_ms: 4.945
  iterations_since_restore: 929
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-00-09
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.052701077741016
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 935
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.034
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.18900282680988312
        entropy_coeff: 0.0
        kl: 0.011531932279467583
        policy_loss: -0.004829156678169966
        total_loss: 0.038986608386039734
        vf_explained_var: 0.5993626117706299
        vf_loss: 0.04381575807929039
    load_time_ms: 1.837
    num_steps_sampled: 1872000
    num_steps_trained: 1872000
    sample_time_ms: 3331.021
    update_time_ms: 6.475
  iterations_since_restore: 936
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-00-33
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.062407249009858
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 940
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 57.154
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.15907132625579834
        entropy_coeff: 0.0
        kl: 9.229341230820864e-05
        policy_loss: -0.00021300697699189186
        total_loss: 0.04463575407862663
        vf_explained_var: 0.556377649307251
        vf_loss: 0.04484875500202179
    load_time_ms: 2.172
    num_steps_sampled: 1882000
    num_steps_trained: 1882000
    sample_time_ms: 4419.763
    update_time_ms: 9.982
  iterations_since_restore: 941
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 226
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 229
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 245
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 238
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 243
[2m[36m(pid=12224)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-00-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  ep

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-01-22
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.0763061910769
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 955
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 36.828
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.5260812640190125
        entropy_coeff: 0.0
        kl: 0.0002730650012381375
        policy_loss: -0.0012349624885246158
        total_loss: -0.0002650032110977918
        vf_explained_var: 0.8390589356422424
        vf_loss: 0.000969946850091219
    load_time_ms: 1.567
    num_steps_sampled: 1918000
    num_steps_trained: 1918000
    sample_time_ms: 2711.636
    update_time_ms: 7.397
  iterations_since_restore: 959
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-01-46
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.099914384335657
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 965
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.998
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.46354952454566956
        entropy_coeff: 0.0
        kl: 1.4927982761037129e-07
        policy_loss: -3.440380169195123e-06
        total_loss: 0.0012650227872654796
        vf_explained_var: 0.7885802388191223
        vf_loss: 0.0012684569228440523
    load_time_ms: 1.176
    num_steps_sampled: 1936000
    num_steps_trained: 1936000
    sample_time_ms: 2590.0
    update_time_ms: 4.512
  iterations_since_restore: 968
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-02-09
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.119739666852514
  episode_reward_min: 11.582827318007082
  episodes_this_iter: 0
  episodes_total: 975
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.316
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.4602167308330536
        entropy_coeff: 0.0
        kl: 0.0003184604283887893
        policy_loss: 0.0008915166836231947
        total_loss: 0.001883010845631361
        vf_explained_var: 0.8984900116920471
        vf_loss: 0.000991489039734006
    load_time_ms: 1.058
    num_steps_sampled: 1956000
    num_steps_trained: 1956000
    sample_time_ms: 2345.594
    update_time_ms: 4.461
  iterations_since_restore: 978
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 262
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 265
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 256
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 255
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 262
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-02-32
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  ep

[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 254
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 227
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 228
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 233
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 247
[2m[36m(pid=12228)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-02-57
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  ep

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-03-21
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.193568164173232
  episode_reward_min: 11.802681132572525
  episodes_this_iter: 0
  episodes_total: 1000
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.849
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.4198755919933319
        entropy_coeff: 0.0
        kl: 0.00022808140784036368
        policy_loss: -0.0010047964751720428
        total_loss: -2.371311211391003e-06
        vf_explained_var: 0.8558553457260132
        vf_loss: 0.0010024361545220017
    load_time_ms: 1.185
    num_steps_sampled: 2008000
    num_steps_trained: 2008000
    sample_time_ms: 2680.745
    update_time_ms: 6.104
  iterations_since_restore: 1004
  node_ip: 192.168.100.38
  num_healthy_workers: 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-03-45
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.70688362756533
  episode_reward_mean: 12.231338292683272
  episode_reward_min: 11.802681132572525
  episodes_this_iter: 0
  episodes_total: 1010
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.268
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.4022929072380066
        entropy_coeff: 0.0
        kl: 2.832484278769698e-05
        policy_loss: -2.5658964659669437e-05
        total_loss: 0.0010808194056153297
        vf_explained_var: 0.7182184457778931
        vf_loss: 0.0011064793216064572
    load_time_ms: 1.057
    num_steps_sampled: 2028000
    num_steps_trained: 2028000
    sample_time_ms: 2340.575
    update_time_ms: 3.962
  iterations_since_restore: 1014
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-04-09
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.656212212773392
  episode_reward_mean: 12.235620389423946
  episode_reward_min: 11.802681132572525
  episodes_this_iter: 0
  episodes_total: 1020
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.002
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.3625120222568512
        entropy_coeff: 0.0
        kl: 0.0005893782945349813
        policy_loss: -0.0003905754128936678
        total_loss: 0.000905220047570765
        vf_explained_var: 0.7204900979995728
        vf_loss: 0.001295790309086442
    load_time_ms: 1.049
    num_steps_sampled: 2048000
    num_steps_trained: 2048000
    sample_time_ms: 2313.514
    update_time_ms: 4.192
  iterations_since_restore: 1024
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-04-33
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.300161875597466
  episode_reward_min: 11.802681132572525
  episodes_this_iter: 0
  episodes_total: 1030
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.765
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.3863304555416107
        entropy_coeff: 0.0
        kl: 0.0006703626131638885
        policy_loss: -0.0015458643902093172
        total_loss: -0.0007134447223506868
        vf_explained_var: 0.8850376605987549
        vf_loss: 0.0008324201917275786
    load_time_ms: 1.097
    num_steps_sampled: 2068000
    num_steps_trained: 2068000
    sample_time_ms: 2356.756
    update_time_ms: 4.505
  iterations_since_restore: 1034
  node_ip: 192.168.100.38
  num_healthy_workers: 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-04-56
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.322244785994293
  episode_reward_min: 11.948573781850294
  episodes_this_iter: 0
  episodes_total: 1040
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.104
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.36212098598480225
        entropy_coeff: 0.0
        kl: 0.00044381406041793525
        policy_loss: -0.0016202469123527408
        total_loss: -0.0008667869842611253
        vf_explained_var: 0.8004154562950134
        vf_loss: 0.0007534671458415687
    load_time_ms: 1.047
    num_steps_sampled: 2088000
    num_steps_trained: 2088000
    sample_time_ms: 2307.036
    update_time_ms: 4.049
  iterations_since_restore: 1044
  node_ip: 192.168.100.38
  num_healthy_workers

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-05-22
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.311679942405187
  episode_reward_min: 11.948573781850294
  episodes_this_iter: 0
  episodes_total: 1050
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 29.258
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.35266777873039246
        entropy_coeff: 0.0
        kl: 0.0035834484733641148
        policy_loss: -0.006728469859808683
        total_loss: -0.006096876226365566
        vf_explained_var: 0.8111460208892822
        vf_loss: 0.0006316041690297425
    load_time_ms: 1.186
    num_steps_sampled: 2108000
    num_steps_trained: 2108000
    sample_time_ms: 2469.44
    update_time_ms: 6.047
  iterations_since_restore: 1054
  node_ip: 192.168.100.38
  num_healthy_workers: 5


Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-05-47
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.30872489442151
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1060
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 34.625
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.2555045485496521
        entropy_coeff: 0.0
        kl: 6.071391544537619e-05
        policy_loss: -0.00016200923710130155
        total_loss: 0.00076046132016927
        vf_explained_var: 0.7118685245513916
        vf_loss: 0.0009224663954228163
    load_time_ms: 1.322
    num_steps_sampled: 2128000
    num_steps_trained: 2128000
    sample_time_ms: 2489.907
    update_time_ms: 6.286
  iterations_since_restore: 1064
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-06-12
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.339275137420946
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1070
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.017
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.24361567199230194
        entropy_coeff: 0.0
        kl: 5.101412625663215e-06
        policy_loss: -1.814937604649458e-05
        total_loss: 0.0008973274379968643
        vf_explained_var: 0.667052149772644
        vf_loss: 0.0009154768195003271
    load_time_ms: 1.3
    num_steps_sampled: 2148000
    num_steps_trained: 2148000
    sample_time_ms: 2431.663
    update_time_ms: 6.808
  iterations_since_restore: 1074
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-06-38
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.358827404663343
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1080
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.154
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.27564239501953125
        entropy_coeff: 0.0
        kl: 0.00015632495342288166
        policy_loss: -0.0009680175571702421
        total_loss: -0.0002771291765384376
        vf_explained_var: 0.816662073135376
        vf_loss: 0.0006908811046741903
    load_time_ms: 1.06
    num_steps_sampled: 2168000
    num_steps_trained: 2168000
    sample_time_ms: 2560.997
    update_time_ms: 6.078
  iterations_since_restore: 1084
  node_ip: 192.168.100.38
  num_healthy_workers: 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-07-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.367371466233077
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1090
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.919
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.23978041112422943
        entropy_coeff: 0.0
        kl: 0.0006606184178963304
        policy_loss: -0.000588800641708076
        total_loss: 0.0002138693380402401
        vf_explained_var: 0.7841747999191284
        vf_loss: 0.0008026500581763685
    load_time_ms: 1.42
    num_steps_sampled: 2188000
    num_steps_trained: 2188000
    sample_time_ms: 2482.7
    update_time_ms: 4.232
  iterations_since_restore: 1094
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-07-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.378002865848321
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1100
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.72
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.22103171050548553
        entropy_coeff: 0.0
        kl: 0.00021198215836193413
        policy_loss: -0.00025480843032710254
        total_loss: 0.0007609081221744418
        vf_explained_var: 0.4440661072731018
        vf_loss: 0.0010157182114198804
    load_time_ms: 1.126
    num_steps_sampled: 2208000
    num_steps_trained: 2208000
    sample_time_ms: 2404.2
    update_time_ms: 4.224
  iterations_since_restore: 1104
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-07-52
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.412251314175574
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1110
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.382
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.20195020735263824
        entropy_coeff: 0.0
        kl: 5.045124999014661e-05
        policy_loss: -0.00033683489891700447
        total_loss: 0.000604450237005949
        vf_explained_var: 0.6362971663475037
        vf_loss: 0.0009412856888957322
    load_time_ms: 1.134
    num_steps_sampled: 2228000
    num_steps_trained: 2228000
    sample_time_ms: 2330.018
    update_time_ms: 4.305
  iterations_since_restore: 1114
  node_ip: 192.168.100.38
  num_healthy_workers: 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-08-16
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 13.146919375391235
  episode_reward_mean: 12.437152442883633
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1120
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.774
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.20160479843616486
        entropy_coeff: 0.0
        kl: 0.0004649074107874185
        policy_loss: 0.0005066166049800813
        total_loss: 0.001359191839583218
        vf_explained_var: 0.8685117363929749
        vf_loss: 0.0008525826851837337
    load_time_ms: 1.078
    num_steps_sampled: 2248000
    num_steps_trained: 2248000
    sample_time_ms: 2348.355
    update_time_ms: 4.071
  iterations_since_restore: 1124
  node_ip: 192.168.100.38
  num_healthy_workers: 5


Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-08-39
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.391875606104616
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1130
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.129
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.22579941153526306
        entropy_coeff: 0.0
        kl: 0.0007365649216808379
        policy_loss: -0.0017729087267071009
        total_loss: -0.0009373717475682497
        vf_explained_var: 0.6891589164733887
        vf_loss: 0.0008355215541087091
    load_time_ms: 1.129
    num_steps_sampled: 2268000
    num_steps_trained: 2268000
    sample_time_ms: 2338.826
    update_time_ms: 4.194
  iterations_since_restore: 1134
  node_ip: 192.168.100.38
  num_healthy_workers: 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-09-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.405457828245723
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1140
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.86
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.2420586347579956
        entropy_coeff: 0.0
        kl: 0.0005943040596321225
        policy_loss: -0.0014786854153499007
        total_loss: -0.0006948185036890209
        vf_explained_var: 0.7409272789955139
        vf_loss: 0.0007838792516849935
    load_time_ms: 1.091
    num_steps_sampled: 2288000
    num_steps_trained: 2288000
    sample_time_ms: 2340.234
    update_time_ms: 3.978
  iterations_since_restore: 1144
  node_ip: 192.168.100.38
  num_healthy_workers: 5


Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-09-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.413482962599915
  episode_reward_min: 11.937700817122522
  episodes_this_iter: 0
  episodes_total: 1150
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.184
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.20962820947170258
        entropy_coeff: 0.0
        kl: 0.00019491421699058264
        policy_loss: -0.0004481411015149206
        total_loss: 0.000337779987603426
        vf_explained_var: 0.7414171695709229
        vf_loss: 0.0007859198958612978
    load_time_ms: 1.079
    num_steps_sampled: 2308000
    num_steps_trained: 2308000
    sample_time_ms: 2397.498
    update_time_ms: 3.965
  iterations_since_restore: 1154
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-09-52
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.429356469571548
  episode_reward_min: 12.016594478977373
  episodes_this_iter: 0
  episodes_total: 1160
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.207
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.17554892599582672
        entropy_coeff: 0.0
        kl: 3.262481186538935e-05
        policy_loss: -9.542464977130294e-05
        total_loss: 0.0008706283406354487
        vf_explained_var: 0.5956748127937317
        vf_loss: 0.0009660400100983679
    load_time_ms: 1.068
    num_steps_sampled: 2328000
    num_steps_trained: 2328000
    sample_time_ms: 2356.604
    update_time_ms: 4.366
  iterations_since_restore: 1164
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-10-15
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.432058471253347
  episode_reward_min: 12.016594478977373
  episodes_this_iter: 0
  episodes_total: 1170
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.788
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.21579395234584808
        entropy_coeff: 0.0
        kl: 0.00027980023878626525
        policy_loss: -0.0008929882314987481
        total_loss: -0.00014412069867830724
        vf_explained_var: 0.7007656097412109
        vf_loss: 0.0007488891133107245
    load_time_ms: 1.104
    num_steps_sampled: 2348000
    num_steps_trained: 2348000
    sample_time_ms: 2329.579
    update_time_ms: 4.068
  iterations_since_restore: 1174
  node_ip: 192.168.100.38
  num_healthy_workers

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-10-39
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.42777051656411
  episode_reward_min: 12.016594478977373
  episodes_this_iter: 0
  episodes_total: 1180
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.266
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.19950540363788605
        entropy_coeff: 0.0
        kl: 0.00038791823317296803
        policy_loss: -0.000666761421598494
        total_loss: 0.00016998767387121916
        vf_explained_var: 0.7294909358024597
        vf_loss: 0.0008367541013285518
    load_time_ms: 1.105
    num_steps_sampled: 2368000
    num_steps_trained: 2368000
    sample_time_ms: 2314.928
    update_time_ms: 4.456
  iterations_since_restore: 1184
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-11-03
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.46171841144857
  episode_reward_min: 12.124696183017663
  episodes_this_iter: 0
  episodes_total: 1190
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.348
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.21038872003555298
        entropy_coeff: 0.0
        kl: 8.802687807474285e-05
        policy_loss: -7.142019603634253e-05
        total_loss: 0.000635449425317347
        vf_explained_var: 0.7062204480171204
        vf_loss: 0.0007068866398185492
    load_time_ms: 1.099
    num_steps_sampled: 2388000
    num_steps_trained: 2388000
    sample_time_ms: 2342.445
    update_time_ms: 4.059
  iterations_since_restore: 1194
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-11-27
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.453035737907305
  episode_reward_min: 12.12794616039066
  episodes_this_iter: 0
  episodes_total: 1200
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.832
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.1769646555185318
        entropy_coeff: 0.0
        kl: 0.0006397590623237193
        policy_loss: -0.0013655085349455476
        total_loss: -0.000589628703892231
        vf_explained_var: 0.7647919058799744
        vf_loss: 0.0007758848951198161
    load_time_ms: 1.098
    num_steps_sampled: 2408000
    num_steps_trained: 2408000
    sample_time_ms: 2340.411
    update_time_ms: 4.108
  iterations_since_restore: 1204
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-11-50
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.83804018639208
  episode_reward_mean: 12.42549517734954
  episode_reward_min: 12.12794616039066
  episodes_this_iter: 0
  episodes_total: 1210
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.832
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.16028404235839844
        entropy_coeff: 0.0
        kl: 0.0012250315630808473
        policy_loss: -0.002798727946355939
        total_loss: -0.0019248452736064792
        vf_explained_var: 0.6782882809638977
        vf_loss: 0.0008738908800296485
    load_time_ms: 1.169
    num_steps_sampled: 2428000
    num_steps_trained: 2428000
    sample_time_ms: 2336.918
    update_time_ms: 3.899
  iterations_since_restore: 1214
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-12-14
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.729962182386172
  episode_reward_mean: 12.417671549111404
  episode_reward_min: 12.12794616039066
  episodes_this_iter: 0
  episodes_total: 1220
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.131
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.1761072427034378
        entropy_coeff: 0.0
        kl: 4.842647831537761e-05
        policy_loss: -9.986400255002081e-05
        total_loss: 0.0006285276613198221
        vf_explained_var: 0.8477863073348999
        vf_loss: 0.0007283860468305647
    load_time_ms: 1.134
    num_steps_sampled: 2448000
    num_steps_trained: 2448000
    sample_time_ms: 2337.838
    update_time_ms: 4.157
  iterations_since_restore: 1224
  node_ip: 192.168.100.38
  num_healthy_workers: 5


Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-12-40
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.729962182386172
  episode_reward_mean: 12.40630834790464
  episode_reward_min: 12.12794616039066
  episodes_this_iter: 0
  episodes_total: 1230
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 33.268
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.1259351372718811
        entropy_coeff: 0.0
        kl: 0.001748613896779716
        policy_loss: -0.003213921096175909
        total_loss: -0.0023208041675388813
        vf_explained_var: 0.7858640551567078
        vf_loss: 0.0008931102929636836
    load_time_ms: 1.337
    num_steps_sampled: 2466000
    num_steps_trained: 2466000
    sample_time_ms: 2758.828
    update_time_ms: 7.388
  iterations_since_restore: 1233
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-13-04
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.748563142154454
  episode_reward_mean: 12.421977135778796
  episode_reward_min: 12.12794616039066
  episodes_this_iter: 0
  episodes_total: 1240
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.401
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.05739126726984978
        entropy_coeff: 0.0
        kl: 0.0001321038289461285
        policy_loss: -0.0009486046037636697
        total_loss: 0.0003256797790527344
        vf_explained_var: 0.6746050119400024
        vf_loss: 0.0012742914259433746
    load_time_ms: 1.102
    num_steps_sampled: 2486000
    num_steps_trained: 2486000
    sample_time_ms: 2352.684
    update_time_ms: 4.046
  iterations_since_restore: 1243
  node_ip: 192.168.100.38
  num_healthy_workers: 5

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-13-28
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.976854022894397
  episode_reward_mean: 12.477831567181711
  episode_reward_min: 12.176705114728456
  episodes_this_iter: 0
  episodes_total: 1250
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.168
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.0888102576136589
        entropy_coeff: 0.0
        kl: 0.0003189712588209659
        policy_loss: -0.0003102002083323896
        total_loss: 0.0005822009989060462
        vf_explained_var: 0.8029764294624329
        vf_loss: 0.00089240912348032
    load_time_ms: 1.083
    num_steps_sampled: 2506000
    num_steps_trained: 2506000
    sample_time_ms: 2355.403
    update_time_ms: 4.266
  iterations_since_restore: 1253
  node_ip: 192.168.100.38
  num_healthy_workers: 5
 

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-13-53
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.976854022894397
  episode_reward_mean: 12.476730748860835
  episode_reward_min: 12.176705114728456
  episodes_this_iter: 0
  episodes_total: 1260
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.488
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: -0.008798339404165745
        entropy_coeff: 0.0
        kl: 0.0010435049189254642
        policy_loss: -0.0002738580806180835
        total_loss: 0.0018786363070830703
        vf_explained_var: 0.9410486221313477
        vf_loss: 0.002152503002434969
    load_time_ms: 1.237
    num_steps_sampled: 2524000
    num_steps_trained: 2524000
    sample_time_ms: 2632.794
    update_time_ms: 5.052
  iterations_since_restore: 1262
  node_ip: 192.168.100.38
  num_healthy_workers:

[2m[36m(pid=12227)[0m 
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12227)[0m ring length: 253
[2m[36m(pid=12227)[0m -----------------------
[2m[36m(pid=12224)[0m 
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12224)[0m ring length: 246
[2m[36m(pid=12224)[0m -----------------------
[2m[36m(pid=12228)[0m 
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12228)[0m ring length: 251
[2m[36m(pid=12228)[0m -----------------------
[2m[36m(pid=12226)[0m 
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12226)[0m ring length: 245
[2m[36m(pid=12226)[0m -----------------------
[2m[36m(pid=12223)[0m 
[2m[36m(pid=12223)[0m -----------------------
[2m[36m(pid=12223)[0m ring length: 263
[2m[36m(pid=12223)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-14-17
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.976854022894397
  e

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-28_19-14-44
  done: false
  episode_len_mean: 2000.0
  episode_reward_max: 12.976854022894397
  episode_reward_mean: 12.459019901169814
  episode_reward_min: 12.176705114728456
  episodes_this_iter: 0
  episodes_total: 1275
  experiment_id: 67a41ba09bf0458c86f090b701c4ea07
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 35.426
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 0.06845808774232864
        entropy_coeff: 0.0
        kl: 0.001724923960864544
        policy_loss: -0.0015596385346725583
        total_loss: -0.0006107635563239455
        vf_explained_var: 0.801252007484436
        vf_loss: 0.0009488716023042798
    load_time_ms: 1.568
    num_steps_sampled: 2556000
    num_steps_trained: 2556000
    sample_time_ms: 3422.355
    update_time_ms: 7.918
  iterations_since_restore: 1278
  node_ip: 192.168.100.38
  num_healthy_workers: 5

### 4.5 Visualizing the results

The simulation results are saved within the `ray_results/training_example` directory (we defined `training_example` at the start of this tutorial). The `ray_results` folder is by default located at your root `~/ray_results`. 

You can run `tensorboard --logdir=~/ray_results/training_example` (install it with `pip install tensorboard`) to visualize the different data outputted by your simulation.

For more instructions about visualizing, please see `tutorial05_visualize.ipynb`. 

### 4.6 Restart from a checkpoint / Transfer learning

If you wish to do transfer learning, or to resume a previous training, you will need to start the simulation from a previous checkpoint. To do that, you can add a `restore` parameter in the `run_experiments` argument, as follows:

```python
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "restore": "/ray_results/experiment/dir/checkpoint_50/checkpoint-50"
        "checkpoint_freq": 1,
        "checkpoint_at_end": True,
        "max_failures": 999,
        "stop": {
            "training_iteration": 1,
        },
    },
})
```

The `"restore"` path should be such that the `[restore]/.tune_metadata` file exists.

There is also a `"resume"` parameter that you can set to `True` if you just wish to continue the training from a previously saved checkpoint, in case you are still training on the same experiment. 

In [None]:
# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "/ray_results/training_example13/PPO_EnergyOptPOEnv-v0_0_2020-07-23_13-30-07yze28sum/checkpoint_400/checkpoint-400", 
#         "checkpoint_freq": 20,
#         "checkpoint_at_end": True,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 700,
#         },
#     },
# })

In [None]:
from flow.core.vehicles import Vehicles