# Tutorial 03: Running RLlib Experiments

This tutorial walks you through the process of running traffic simulations in Flow with trainable RLlib-powered agents. Autonomous agents will learn to maximize a certain reward over the rollouts, using the [**RLlib**](https://ray.readthedocs.io/en/latest/rllib.html) library ([citation](https://arxiv.org/abs/1712.09381)) ([installation instructions](https://flow.readthedocs.io/en/latest/flow_setup.html#optional-install-ray-rllib)). Simulations of this form will depict the propensity of RL agents to influence the traffic of a human fleet in order to make the whole fleet more efficient (for some given metrics). 

In this tutorial, we simulate an initially perturbed single lane ring road, where we introduce a single autonomous vehicle. We witness that, after some training, that the autonomous vehicle learns to dissipate the formation and propagation of "phantom jams" which form when only human driver dynamics are involved.

## 1. Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *network*, and an *environment*. Networks describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc... in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act as the primary interface between the reinforcement learning algorithm and the network. Moreover, custom environments may be used to modify the dynamical features of an network. Finally, in the RL case, it is in the *environment* that the state/action spaces and the reward function are defined. 

## 2. Setting up a Network
Flow contains a plethora of pre-designed networks used to replicate highways, intersections, and merges in both closed and open settings. All these networks are located in flow/networks. For this tutorial, which involves a single lane ring road, we will use the network `RingNetwork`.

### 2.1 Setting up Network Parameters

The network mentioned at the start of this section, as well as all other networks in Flow, are parameterized by the following arguments: 
* name
* vehicles
* net_params
* initial_config

These parameters are explained in detail in `tutorial01_sumo.ipynb`. Moreover, all parameters excluding vehicles (covered in section 2.2) do not change from the previous tutorial. Accordingly, we specify them nearly as we have before, and leave further explanations of the parameters to `tutorial01_sumo.ipynb`.

We begin by choosing the network the experiment will be trained on. We use one of Flow's builtin networks, located in `flow.networks`. A list of all available networks can be found by running the script below.

In [1]:
import flow.networks as networks

# print(networks.__all__)

In this tutorial, we choose to use the ring road network. The network class is then:

In [2]:
from flow.networks import RingNetwork

# ring road network class
network_name = RingNetwork

One key difference between SUMO and RLlib experiments is that, in RLlib experiments, the network classes do not need to be defined; instead users should simply name the network class they wish to use. Later on, an environment setup module will import the correct network class based on the provided names.

In [3]:
# input parameter classes to the network class
from flow.core.params import NetParams, InitialConfig

# name of the network
name = "c_mpg+plus"

# network-specific parameters
from flow.networks.ring import ADDITIONAL_NET_PARAMS
net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

# initial configuration to vehicles
initial_config = InitialConfig(spacing="uniform", perturbation=1)

### 2.2 Adding Trainable Autonomous Vehicles
The `Vehicles` class stores state information on all vehicles in the network. This class is used to identify the dynamical features of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various `get` methods within this class.

The dynamics of vehicles in the `Vehicles` class can either be depicted by sumo or by the dynamical methods located in flow/controllers. For human-driven vehicles, we use the IDM model for acceleration behavior, with exogenous gaussian acceleration noise with std 0.2 m/s2 to induce perturbations that produce stop-and-go behavior. In addition, we use the `ContinousRouter` routing controller so that the vehicles may maintain their routes closed networks.

As we have done in `tutorial01_sumo.ipynb`, human-driven vehicles are defined in the `VehicleParams` class as follows:

In [4]:
# vehicles class
from flow.core.params import VehicleParams

# vehicles dynamics models
from flow.controllers import IDMController, ContinuousRouter

vehicles = VehicleParams()
#vehicles.add("human",
#             acceleration_controller=(IDMController, {}),
#             routing_controller=(ContinuousRouter, {}),
#             num_vehicles=10)

The above addition to the `Vehicles` class only accounts for 21 of the 22 vehicles that are placed in the network. We now add an additional trainable autuonomous vehicle whose actions are dictated by an RL agent. This is done by specifying an `RLController` as the acceleraton controller to the vehicle. 

In [5]:
from flow.controllers import RLController

Note that this controller serves primarirly as a placeholder that marks the vehicle as a component of the RL agent, meaning that lane changing and routing actions can also be specified by the RL agent to this vehicle.

We finally add the vehicle as follows, while again using the `ContinuousRouter` to perpetually maintain the vehicle within the network.

In [6]:
# from flow.energy_models.toyota_energy import TacomaEnergy
# vehicles.add(veh_id="rl",
#              acceleration_controller=(RLController, {}),
#              routing_controller=(ContinuousRouter, {}),
#              initial_speed =20,
#              energy_model = TacomaEnergy,
#              num_vehicles=1)


vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             initial_speed =0,
             num_vehicles=1)

## 3. Setting up an Environment

Several environments in Flow exist to train RL agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. The use of an environment allows us to view the cumulative reward simulation rollouts receive, along with to specify the state/action spaces.

Sumo envrionments in Flow are parametrized by three components:
* `SumoParams`
* `EnvParams`
* `Network`

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length of any simulation step and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and deactivate the GUI. 

**Note** For training purposes, it is highly recommanded to deactivate the GUI in order to avoid global slow down. In such case, one just needs to specify the following: `render=False`

In [7]:
from flow.core.params import SumoParams

sim_params = SumoParams(sim_step=0.1, render=False)

### 3.2 EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. For the environment `WaveAttenuationPOEnv`, these parameters are used to dictate bounds on the accelerations of the autonomous vehicles, as well as the range of ring lengths (and accordingly network densities) the agent is trained on.

Finally, it is important to specify here the *horizon* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [8]:
from flow.core.params import EnvParams

# Define horizon as a variable to ensure consistent use across notebook
HORIZON=2500

env_params = EnvParams(
    # length of one rollout
    horizon=HORIZON,

    additional_params={
        # maximum acceleration of autonomous vehicles
        "max_accel": 4,
        # maximum deceleration of autonomous vehicles
        "max_decel": -4,
        # bounds on the ranges of ring road lengths the autonomous vehicle 
        # is trained on
        "ring_length": [220, 270],
    },
)

### 3.3 Initializing a Gym Environment

Now, we have to specify our Gym Environment and the algorithm that our RL agents will use. Similar to the network, we choose to use on of Flow's builtin environments, a list of which is provided by the script below.

In [9]:
import flow.envs as flowenvs

print(flowenvs.__all__)

['Env', 'AccelEnv', 'LaneChangeAccelEnv', 'LaneChangeAccelPOEnv', 'TrafficLightGridTestEnv', 'MergePOEnv', 'BottleneckEnv', 'BottleneckAccelEnv', 'WaveAttenuationEnv', 'WaveAttenuationPOEnv', 'EnergyOptEnv', 'EnergyOptSPDEnv', 'TrafficLightGridEnv', 'TrafficLightGridPOEnv', 'TrafficLightGridBenchmarkEnv', 'BottleneckDesiredVelocityEnv', 'TestEnv', 'BayBridgeEnv', 'SingleStraightRoad', 'BottleNeckAccelEnv', 'DesiredVelocityEnv', 'PO_TrafficLightGridEnv', 'GreenWaveTestEnv']


We will use the environment "WaveAttenuationPOEnv", which is used to train autonomous vehicles to attenuate the formation and propagation of waves in a partially observable variable density ring road. To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables. These are defined as follows:

In [10]:
from flow.envs import EnergyOptSPDEnv

env_name = EnergyOptSPDEnv

In [11]:
# from flow.envs import WaveAttenuationPOEnv

# env_name = WaveAttenuationPOEnv

### 3.4 Setting up Flow Parameters

RLlib experiments both generate a `params.json` file for each experiment run. For RLlib experiments, the parameters defining the Flow network and environment must be stored as well. As such, in this section we define the dictionary `flow_params`, which contains the variables required by the utility function `make_create_env`. `make_create_env` is a higher-order function which returns a function `create_env` that initializes a Gym environment corresponding to the Flow network specified.

In [12]:
# Creating flow_params. Make sure the dictionary keys are as specified. 
flow_params = dict(
    # name of the experiment
    exp_tag=name,
    # name of the flow environment the experiment is running on
    env_name=env_name,
    # name of the network class the experiment uses
    network=network_name,
    # simulator that is used by the experiment
    simulator='traci',
    # simulation-related parameters
    sim=sim_params,
    # environment related parameters (see flow.core.params.EnvParams)
    env=env_params,
    # network-related parameters (see flow.core.params.NetParams and
    # the network's documentation or ADDITIONAL_NET_PARAMS component)
    net=net_params,
    # vehicles to be placed in the network at the start of a rollout 
    # (see flow.core.vehicles.Vehicles)
    veh=vehicles,
    # (optional) parameters affecting the positioning of vehicles upon 
    # initialization/reset (see flow.core.params.InitialConfig)
    initial=initial_config
)

## 4 Running RL experiments in Ray

### 4.1 Import 

First, we must import modules required to run experiments in Ray. The `json` package is required to store the Flow experiment parameters in the `params.json` file, as is `FlowParamsEncoder`. Ray-related imports are required: the PPO algorithm agent, `ray.tune`'s experiment runner, and environment helper methods `register_env` and `make_create_env`.

In [13]:
import json

import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
# from ray.rllib.agents.agent import get_agent_class
#from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

Instructions for updating:
non-resource variables are not supported in the long term


### 4.2 Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [14]:
# number of parallel workers
N_CPUS = 6
# number of rollouts per training iteration
N_ROLLOUTS = 1
#ray.shutdown()
ray.init(num_cpus=N_CPUS)

2020-07-30 18:03:30,973	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-30_18-03-30_972842_25964/logs.
2020-07-30 18:03:31,092	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:55213 to respond...
2020-07-30 18:03:31,235	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:58697 to respond...
2020-07-30 18:03:31,241	INFO services.py:809 -- Starting Redis shard with 3.3 GB max memory.
2020-07-30 18:03:31,298	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-30_18-03-30_972842_25964/logs.
2020-07-30 18:03:31,301	INFO services.py:1475 -- Starting the Plasma object store with 4.96 GB memory using /dev/shm.


{'node_ip_address': '192.168.100.38',
 'redis_address': '192.168.100.38:55213',
 'object_store_address': '/tmp/ray/session_2020-07-30_18-03-30_972842_25964/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-07-30_18-03-30_972842_25964/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2020-07-30_18-03-30_972842_25964'}

### 4.3 Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym. 

In [15]:
# The algorithm or model to train. This may refer to "
#      "the name of a built-on algorithm (e.g. RLLib's DQN "
#      "or PPO), or a user-defined trainable function or "
#      "class registered in the tune registry.")
alg_run = "PPO"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.99999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])  # stochastic gradient descent
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 10  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

### 4.4 Running Experiments

Here, we use the `run_experiments` function from `ray.tune`. The function takes a dictionary with one key, a name corresponding to the experiment, and one value, itself a dictionary containing parameters for training.

In [16]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 20,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 1500,  # number of iterations to stop after
        },
    },
})

2020-07-30 18:03:31,610	INFO trial_runner.py:176 -- Starting a new experiment.
2020-07-30 18:03:31,736	ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/6 CPUs, 0/0 GPUs
Memory usage on this node: 9.1/16.5 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 6/6 CPUs, 0/0 GPUs
Memory usage on this node: 9.1/16.5 GB
Result logdir: /home/solom/ray_results/c_mpg+plus
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_EnergyOptSPDEnv-v0_0:	RUNNING

[2m[36m(pid=26009)[0m Instructions for updating:
[2m[36m(pid=26009)[0m non-resource variables are not supported in the long term
[2m[36m(pid=26009)[0m 2020-07-30 18:03:35,444	INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=26009)[0m 2020-07-30 18:03:35.445359: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=26009)[0m 2020-07-30 18:03:35.468962: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU

[2m[36m(pid=26007)[0m Instructions for updating:
[2m[36m(pid=26007)[0m non-resource variables are not supported in the long term
[2m[36m(pid=26009)[0m Instructions for updating:
[2m[36m(pid=26009)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=26009)[0m Instructions for updating:
[2m[36m(pid=26009)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=26006)[0m 2020-07-30 18:03:40,571	INFO rollout_worker.py:319 -- Creating policy evaluation worker 2 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=26006)[0m 2020-07-30 18:03:40.572980: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=26005)[0m 2020-07-30 18:03:40,586	INFO rollout_worker.py:319 -- Creating policy evaluation worker 4 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=26005)[0m 2020-07-30 18:03:40.587869: I tensorflow/core/p

[2m[36m(pid=26008)[0m Instructions for updating:
[2m[36m(pid=26008)[0m Use `tf.cast` instead.
[2m[36m(pid=26008)[0m Instructions for updating:
[2m[36m(pid=26008)[0m Use `tf.cast` instead.
[2m[36m(pid=26004)[0m Instructions for updating:
[2m[36m(pid=26004)[0m Use `tf.cast` instead.
[2m[36m(pid=26004)[0m Instructions for updating:
[2m[36m(pid=26004)[0m Use `tf.cast` instead.
[2m[36m(pid=26007)[0m 2020-07-30 18:03:40,860	INFO rollout_worker.py:319 -- Creating policy evaluation worker 5 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=26007)[0m 2020-07-30 18:03:40.862046: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=26007)[0m 2020-07-30 18:03:40.869928: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(pid=26007)[0m 2020-07-30 18:03:40.870138: I tensorflow/compiler/xla/service/service.cc:168

[2m[36m(pid=26008)[0m Instructions for updating:
[2m[36m(pid=26008)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=26008)[0m Instructions for updating:
[2m[36m(pid=26008)[0m Prefer Variable.assign which has equivalent behavior in 2.X.
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 236
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 2020-07-30 18:03:42,986	INFO sampler.py:304 -- Raw obs from env: { 0: { 'agent0': np.ndarray((3,), dtype=float64, min=0.0, max=0.0, mean=0.0)}}
[2m[36m(pid=26004)[0m 2020-07-30 18:03:42,986	INFO sampler.py:305 -- Info return from env: {0: {'agent0': None}}
[2m[36m(pid=26004)[0m 2020-07-30 18:03:42,986	INFO sampler.py:403 -- Preprocessed obs: np.ndarray((3,), dtype=float64, min=0.0, max=0.0, mean=0.0)
[2m[36m(pid=26004)[0m 2020-07-30 18:03:42,986	INFO sampler.py:407 -- Filtered obs: np.ndarray((3,), dtype=float64, min=0

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-03-46
  done: false
  episode_len_mean: .nan
  episode_reward_max: .nan
  episode_reward_mean: .nan
  episode_reward_min: .nan
  episodes_this_iter: 0
  episodes_total: 0
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 322.373
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.4187803268432617
        entropy_coeff: 0.0
        kl: 2.979278690418141e-07
        policy_loss: 3.0692412110511214e-05
        total_loss: 5.060368537902832
        vf_explained_var: 0.00028771162033081055
        vf_loss: 5.060336589813232
    load_time_ms: 54.955
    num_steps_sampled: 2600
    num_steps_trained: 2500
    sample_time_ms: 5727.582
    update_time_ms: 582.942
  iterations_since_restore: 1
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_policy_estimator: {}
  perf:
   

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 262
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-04-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 606.4036123591301
  episode_reward_mean: 433.86099541364916
  episode_reward_min: 166.59669214702404
  episodes_this_iter: 1
  episodes_total: 6
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 56.735
    learner:
      default_policy:
        cur_kl_coeff: 0.0007812500116415322
        cur_lr: 4.999999873689376e-05
        entropy: 1.4176993370056152
        entropy_coeff: 0.0
        kl: 4.283428154394642e-07
        policy_loss: 0.0008164261817000806
        total_loss: 29.64784049987793
        vf_explained_var: 0.0015592575073242188
        vf_loss: 29.647022247314453
    load_time_ms: 7.137
    num_steps_sampled: 2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-04-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 645.844145864241
  episode_reward_mean: 415.16444618551344
  episode_reward_min: 157.54734207482142
  episodes_this_iter: 0
  episodes_total: 15
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.071
    learner:
      default_policy:
        cur_kl_coeff: 3.051757857974735e-06
        cur_lr: 4.999999873689376e-05
        entropy: 1.417754888534546
        entropy_coeff: 0.0
        kl: 1.8644332300254973e-08
        policy_loss: 0.007586339022964239
        total_loss: 14.60401725769043
        vf_explained_var: 0.002976536750793457
        vf_loss: 14.596430778503418
    load_time_ms: 1.137
    num_steps_sampled: 44200
    num_steps_trained: 42500
    sample_time_ms: 3430.385
    update_time_ms: 4.018
  iterations_since_restore: 17
  node_ip: 192.168.100.38
  num_healthy_workers:

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 243
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 237
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 224
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-05-07
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 645.844145864241
  episode_reward_mean: 411.88563695834983
  episode_reward_min: 157.54734207482142
  episodes_this_iter: 1
  episodes_total: 24
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.779
    learner:
      default_policy:
        cur_kl_coeff: 1.1920929132713809e-08
        cur_lr:

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 265
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-05-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 645.844145864241
  episode_reward_mean: 378.1078199796122
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 0
  episodes_total: 31
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.42
    learner:
      default_policy:
        cur_kl_coeff: 4.6566129424663316e-11
        cur_lr: 4.999999873689376e-05
        entropy: 1.4204174280166626
        entropy_coeff: 0.0
        kl: 2.8325320045041735e-07
        policy_loss: -0.0034448641818016768
        total_loss: 38.959476470947266
        vf_explained_var: 0.007436871528625488
        vf_loss: 38.96292495727539
    load_time_ms: 1.183
    num_steps_sampled: 85

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 262
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 234
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-06-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 699.3911588084176
  episode_reward_mean: 394.09942336822934
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 0
  episodes_total: 40
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 27.601
    learner:
      default_policy:
        cur_kl_coeff: 1.8189894306509108e-13
        cur_lr: 4.999999873689376e-05
        entropy: 1.420275330543518
        entropy_coeff: 0.0
        kl: 6.652355182268366e-07
        policy_loss: -0.0015470476355403662
   

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 235
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 241
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-06-32
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 810.6955632493704
  episode_reward_mean: 402.0604182469674
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 2
  episodes_total: 49
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.034
    learner:
      default_policy:
        cur_kl_coeff: 7.10542746348012e-16
        cur_lr: 4.999999873689376e-05
        entropy: 1.418236494064331
        entropy_coeff: 0.0
        kl: 2.459764516515861e-07
        policy_loss: 0.007801887579262257
        

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 238
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 264
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-07-01
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 810.6955632493704
  episode_reward_mean: 403.4845149153205
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 0
  episodes_total: 57
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.763
    learner:
      default_policy:
        cur_kl_coeff: 2.775557602921922e-18
        cur_lr: 4.999999873689376e-05
        entropy: 1.4179662466049194
        entropy_coeff: 0.0
        kl: 2.2464990934167872e-07
        policy_loss: -0.0013116314075887203
   

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 245
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 267
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-07-29
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 810.6955632493704
  episode_reward_mean: 399.21722428795874
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 1
  episodes_total: 65
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.14
    learner:
      default_policy:
        cur_kl_coeff: 1.0842021886413758e-20
        cur_lr: 4.999999873689376e-05
        entropy: 1.418475866317749
        entropy_coeff: 0.0
        kl: 1.6974210836906423e-07
        policy_loss: 0.0012597679160535336
    

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 267
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-07-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 844.6505203111441
  episode_reward_mean: 402.02783271417786
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 1
  episodes_total: 73
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.062
    learner:
      default_policy:
        cur_kl_coeff: 4.235164799380374e-23
        cur_lr: 4.999999873689376e-05
        entropy: 1.4191551208496094
        entropy_coeff: 0.0
        kl: 3.993511299427155e-09
        policy_loss: 0.003869878826662898
        total_loss: 28.761343002319336
        vf_explained_var: 0.03168320655822754
        vf_loss: 28.757471084594727
    load_time_ms: 1.15
    num_steps_sampled: 1898

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 228
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 244
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-08-24
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 844.6505203111441
  episode_reward_mean: 400.9345396430285
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 0
  episodes_total: 82
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.835
    learner:
      default_policy:
        cur_kl_coeff: 1.6543612497579586e-25
        cur_lr: 4.999999873689376e-05
        entropy: 1.4201650619506836
        entropy_coeff: 0.0
        kl: 7.462263056368101e-07
        policy_loss: -0.005568448919802904
    

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 266
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 235
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 230
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 261
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 222
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-08-52
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 844.6505203111441
  e

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-09-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 844.6505203111441
  episode_reward_mean: 411.4094007191923
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 0
  episodes_total: 97
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.875
    learner:
      default_policy:
        cur_kl_coeff: 2.524354934323057e-30
        cur_lr: 4.999999873689376e-05
        entropy: 1.4264366626739502
        entropy_coeff: 0.0
        kl: 2.8186082090542186e-06
        policy_loss: 0.002745038131251931
        total_loss: 130.25436401367188
        vf_explained_var: 0.020709216594696045
        vf_loss: 130.25161743164062
    load_time_ms: 1.226
    num_steps_sampled: 252200
    num_steps_trained: 242500
    sample_time_ms: 3729.164
    update_time_ms: 4.249
  iterations_since_restore: 97
  node_ip: 192.168.100.38
  num_healthy_worke

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 237
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 265
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 258
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 223
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-09-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  episode_reward_mean: 427.08630874149196
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 2
  episodes_total: 107
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 249
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-10-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  episode_reward_mean: 430.1586340049557
  episode_reward_min: 141.4551062513581
  episodes_this_iter: 1
  episodes_total: 113
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 34.995
    learner:
      default_policy:
        cur_kl_coeff: 7.703719892343314e-35
        cur_lr: 4.999999873689376e-05
        entropy: 1.4259274005889893
        entropy_coeff: 0.0
        kl: 3.385543934086854e-08
        policy_loss: -0.002658007200807333
        total_loss: 26.185544967651367
        vf_explained_var: 0.05540728569030762
        vf_loss: 26.188203811645508
    load_time_ms: 1.453
    num_steps_sampled: 29

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-10-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  episode_reward_mean: 421.59691756358944
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 0
  episodes_total: 122
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.262
    learner:
      default_policy:
        cur_kl_coeff: 3.009265582946607e-37
        cur_lr: 4.999999873689376e-05
        entropy: 1.4235337972640991
        entropy_coeff: 0.0
        kl: 1.1503696129011587e-07
        policy_loss: 0.0037824944593012333
        total_loss: 37.507999420166016
        vf_explained_var: 0.056506216526031494
        vf_loss: 37.50421142578125
    load_time_ms: 1.26
    num_steps_sampled: 312000
    num_steps_trained: 300000
    sample_time_ms: 3326.868
    update_time_ms: 4.755
  iterations_since_restore: 120
  node_ip: 192.168.100.38
  num_healthy_wor

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 254
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 237
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 225
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 252
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 268
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-11-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  e

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 223
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 253
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 223
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-11-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  episode_reward_mean: 439.47240116150846
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 3
  episodes_total: 140
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.229
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 243
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 256
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-12-11
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 944.7236355929094
  episode_reward_mean: 441.6490813382301
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 0
  episodes_total: 147
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 31.554
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4250637292861938
        entropy_coeff: 0.0
        kl: 2.615451819565351e-07
        policy_loss: -0.006610901094973087
        total_loss: 17

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 230
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 268
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-12-43
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 450.6808989764182
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 2
  episodes_total: 156
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 37.19
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4268112182617188
        entropy_coeff: 0.0
        kl: 1.9496203549351776e-06
        policy_loss: 0.005153419449925423
        total_loss: 33

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 223
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-13-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 455.0471220293254
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 1
  episodes_total: 164
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 37.944
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4245350360870361
        entropy_coeff: 0.0
        kl: 2.97749039646078e-07
        policy_loss: -0.0009983980562537909
        total_loss: 15.455641746520996
        vf_explained_var: 0.06123560667037964
        vf_loss: 15.456639289855957
    load_time_ms: 1.751
    num_steps_sampled: 416000
    num_step

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 254
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 267
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 256
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-13-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 455.8553208354314
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 1
  episodes_total: 173
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 27.722
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 240
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-14-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 457.3949958463992
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 0
  episodes_total: 180
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.019
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.432767391204834
        entropy_coeff: 0.0
        kl: 1.1447191354818642e-06
        policy_loss: 0.0007305109174922109
        total_loss: 28.37211799621582
        vf_explained_var: 0.09784317016601562
        vf_loss: 28.37139320373535
    load_time_ms: 1.244
    num_steps_sampled: 457600
    num_steps_

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 236
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 224
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 237
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-14-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 445.40238721496587
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 2
  episodes_total: 190
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.871
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 244
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 261
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-15-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 448.45902002785306
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 1
  episodes_total: 197
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.881
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4329769611358643
        entropy_coeff: 0.0
        kl: 1.1086464013487785e-07
        policy_loss: -0.0027601660694926977
        total_loss

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 269
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-15-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 435.66959386294496
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 0
  episodes_total: 205
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.043
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.434146523475647
        entropy_coeff: 0.0
        kl: 5.862236207576643e-07
        policy_loss: -0.0010466730454936624
        total_loss: 32.2055549621582
        vf_explained_var: 0.08178472518920898
        vf_loss: 32.20658874511719
    load_time_ms: 1.334
    num_steps_sampled: 520000
    num_steps_

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 226
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 267
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 262
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-16-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 441.23081008772397
  episode_reward_min: 137.5418868105293
  episodes_this_iter: 3
  episodes_total: 214
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 34.283
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-16-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 447.086147952142
  episode_reward_min: 182.48059621587115
  episodes_this_iter: 0
  episodes_total: 220
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.177
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4323540925979614
        entropy_coeff: 0.0
        kl: 7.525682690356916e-07
        policy_loss: -0.0026543999556452036
        total_loss: 23.405513763427734
        vf_explained_var: 0.050208091735839844
        vf_loss: 23.408172607421875
    load_time_ms: 1.425
    num_steps_sampled: 556400
    num_steps_trained: 535000
    sample_time_ms: 4859.271
    update_time_ms: 7.074
  iterations_since_restore: 214
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 249
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 233
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 252
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-17-27
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 443.5294797852762
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 3
  episodes_total: 228
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 32.826
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-17-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 448.80890517634924
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 0
  episodes_total: 235
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.435
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4340417385101318
        entropy_coeff: 0.0
        kl: 1.5485287008232262e-07
        policy_loss: 0.0017892224714159966
        total_loss: 17.0643310546875
        vf_explained_var: 0.07920634746551514
        vf_loss: 17.062541961669922
    load_time_ms: 1.271
    num_steps_sampled: 595400
    num_steps_trained: 572500
    sample_time_ms: 3806.484
    update_time_ms: 4.347
  iterations_since_restore: 229
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 232
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 227
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 225
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-18-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 448.2423284345486
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 2
  episodes_total: 245
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.277
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 262
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 237
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-18-55
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1006.7023529932075
  episode_reward_mean: 446.66333123377
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 0
  episodes_total: 252
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.924
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4314594268798828
        entropy_coeff: 0.0
        kl: 2.365756017752574e-06
        policy_loss: -0.000920503051020205
        total_loss: 32

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 223
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 253
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 262
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 242
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-19-19
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 443.6450554947971
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 2
  episodes_total: 262
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 262
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 223
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 235
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-19-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 440.6532893495581
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 3
  episodes_total: 270
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.916
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 228
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 266
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-20-15
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 439.85642930054894
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 0
  episodes_total: 277
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.263
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4302641153335571
        entropy_coeff: 0.0
        kl: 9.344577733827464e-07
        policy_loss: 0.0035372532438486814
        total_loss: 

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 220
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 267
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 245
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 237
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-20-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 444.2331397366043
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 3
  episodes_total: 287
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 226
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 248
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-21-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 445.75286322784393
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 1
  episodes_total: 294
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.976
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4282559156417847
        entropy_coeff: 0.0
        kl: 6.12020514267897e-08
        policy_loss: -0.0027464861050248146
        total_loss: 

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 240
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 262
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-21-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 453.74429895344326
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 1
  episodes_total: 302
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.857
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.428588628768921
        entropy_coeff: 0.0
        kl: 9.990692433348158e-07
        policy_loss: -0.0005307972896844149
        total_loss: 

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 224
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 267
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 221
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-22-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 462.58711124775306
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 1
  episodes_total: 311
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.463
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 233
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 264
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-22-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 461.81903864794435
  episode_reward_min: 176.89235923777426
  episodes_this_iter: 1
  episodes_total: 319
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.536
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4247205257415771
        entropy_coeff: 0.0
        kl: 2.6718139451986644e-06
        policy_loss: -0.010387009009718895
        total_loss:

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 235
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 258
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-23-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 471.2861951843066
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 327
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.403
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4239568710327148
        entropy_coeff: 0.0
        kl: 6.218671728674963e-07
        policy_loss: 0.00014238452422432601
        total_loss: 

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 259
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 238
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 223
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-23-28
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 941.9759148877148
  episode_reward_mean: 463.87192595160116
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 1
  episodes_total: 336
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.492
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 262
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-23-54
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.1085166386691
  episode_reward_mean: 466.4260262789077
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 343
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.886
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4237358570098877
        entropy_coeff: 0.0
        kl: 1.9528866346263385e-07
        policy_loss: 0.004837001208215952
        total_loss: 22.46617317199707
        vf_explained_var: 0.11602914333343506
        vf_loss: 22.461332321166992
    load_time_ms: 1.106
    num_steps_sampled: 865800
    num_steps

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 247
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 226
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-24-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.1085166386691
  episode_reward_mean: 466.0481030337503
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 2
  episodes_total: 353
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.815
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4247266054153442
        entropy_coeff: 0.0
        kl: 1.313090365329117e-07
        policy_loss: 0.0018207002431154251
        total_loss: 2

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 235
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 254
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 248
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-24-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.1085166386691
  episode_reward_mean: 460.02104734576494
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 3
  episodes_total: 361
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.007
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-25-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.1085166386691
  episode_reward_mean: 467.68544290518497
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 368
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.361
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4281625747680664
        entropy_coeff: 0.0
        kl: 2.5011300408550596e-07
        policy_loss: 0.006285396404564381
        total_loss: 61.55594253540039
        vf_explained_var: 0.03614509105682373
        vf_loss: 61.54967498779297
    load_time_ms: 1.252
    num_steps_sampled: 928200
    num_steps_trained: 892500
    sample_time_ms: 3137.449
    update_time_ms: 3.814
  iterations_since_restore: 357
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_polic

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 230
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 268
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 266
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-25-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 938.1085166386691
  episode_reward_mean: 485.3435772520638
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 378
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.946
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 257
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 229
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 248
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-26-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 481.38462729496007
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 2
  episodes_total: 386
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.002
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 237
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-26-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 492.09291068904474
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 393
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.576
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.42753267288208
        entropy_coeff: 0.0
        kl: 2.676296162462677e-06
        policy_loss: -0.0034210665617138147
        total_loss: 46.505210876464844
        vf_explained_var: 0.04785865545272827
        vf_loss: 46.50862503051758
    load_time_ms: 1.117
    num_steps_sampled: 990600
    num_steps

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-26-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 491.30646632991795
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 0
  episodes_total: 402
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.317
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4252004623413086
        entropy_coeff: 0.0
        kl: 1.3252497410576325e-07
        policy_loss: 0.00018208782421424985
        total_loss: 47.992950439453125
        vf_explained_var: 0.06497371196746826
        vf_loss: 47.99277877807617
    load_time_ms: 1.065
    num_steps_sampled: 1011400
    num_steps_trained: 972500
    sample_time_ms: 3308.739
    update_time_ms: 3.79
  iterations_since_restore: 389
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 267
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 222
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 261
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 239
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 263
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-27-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  e

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 259
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-27-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 488.03823982416475
  episode_reward_min: 120.10138577388487
  episodes_this_iter: 1
  episodes_total: 418
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.523
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4284358024597168
        entropy_coeff: 0.0
        kl: 3.8232803944993066e-07
        policy_loss: 0.003887576749548316
        total_loss: 42.6652946472168
        vf_explained_var: 0.04603421688079834
        vf_loss: 42.66141128540039
    load_time_ms: 1.075
    num_steps_sampled: 1053000
    num_steps

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 227
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-28-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 482.1151931016437
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 0
  episodes_total: 427
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.762
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.426820158958435
        entropy_coeff: 0.0
        kl: 2.4256706865344313e-07
        policy_loss: -0.0035062015522271395
        total_loss: 23.24089241027832
        vf_explained_var: 0.06976455450057983
        vf_loss: 23.244396209716797
    load_time_ms: 1.031
    num_steps_sampled: 1073800
    num_ste

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 261
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 267
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 222
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 235
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-28-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 487.21526015161623
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 4
  episodes_total: 437
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 230
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-29-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 887.7077352087381
  episode_reward_mean: 485.10934634522835
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 1
  episodes_total: 443
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.743
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4493145942687988
        entropy_coeff: 0.0
        kl: 2.8808831302740145e-06
        policy_loss: -0.0004802923067472875
        total_loss: 98.69927215576172
        vf_explained_var: 0.011673986911773682
        vf_loss: 98.69975280761719
    load_time_ms: 1.264
    num_steps_sampled: 1115400
    num_s

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 224
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-29-38
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 916.2031883288246
  episode_reward_mean: 501.93821518632626
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 0
  episodes_total: 452
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.236
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4284179210662842
        entropy_coeff: 0.0
        kl: 6.776094210181327e-07
        policy_loss: 0.005109558347612619
        total_loss: 33.850311279296875
        vf_explained_var: 0.0673142671585083
        vf_loss: 33.8452033996582
    load_time_ms: 1.053
    num_steps_sampled: 1136200
    num_steps_

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 258
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 263
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 251
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 233
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-30-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 519.0718806644009
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 0
  episodes_total: 461
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 243
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 258
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 235
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-30-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 516.3273215557368
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 2
  episodes_total: 469
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.847
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-30-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 517.3703364887816
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 0
  episodes_total: 476
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.71
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4323277473449707
        entropy_coeff: 0.0
        kl: 5.373477733883192e-07
        policy_loss: -0.006173980887979269
        total_loss: 35.0867919921875
        vf_explained_var: 0.015174448490142822
        vf_loss: 35.09296417236328
    load_time_ms: 1.155
    num_steps_sampled: 1198600
    num_steps_trained: 1152500
    sample_time_ms: 3309.24
    update_time_ms: 3.44
  iterations_since_restore: 461
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_policy

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 261
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 251
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-31-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 515.2055178053968
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 1
  episodes_total: 486
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.651
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4252684116363525
        entropy_coeff: 0.0
        kl: 2.2394657150925923e-07
        policy_loss: 0.0007853957358747721
        total_loss:

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 240
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 221
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 224
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-31-53
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 505.77788404535016
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 2
  episodes_total: 494
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 21.616
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-32-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 508.81192740342755
  episode_reward_min: 143.86014310313982
  episodes_this_iter: 0
  episodes_total: 501
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.027
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.434259295463562
        entropy_coeff: 0.0
        kl: 1.1985301284767047e-07
        policy_loss: -0.003753701690584421
        total_loss: 38.06419372558594
        vf_explained_var: 0.015060961246490479
        vf_loss: 38.06795120239258
    load_time_ms: 1.212
    num_steps_sampled: 1261000
    num_steps_trained: 1212500
    sample_time_ms: 3259.029
    update_time_ms: 3.251
  iterations_since_restore: 485
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 239
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 244
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 257
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-32-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1041.0804019138245
  episode_reward_mean: 508.72140427406043
  episode_reward_min: 194.0640247316132
  episodes_this_iter: 0
  episodes_total: 511
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.696
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 230
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 242
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 260
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 221
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-33-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 512.1591667245526
  episode_reward_min: 194.0640247316132
  episodes_this_iter: 3
  episodes_total: 520
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 252
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 224
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-33-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 532.5655666915121
  episode_reward_min: 194.0640247316132
  episodes_this_iter: 1
  episodes_total: 527
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.585
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4347350597381592
        entropy_coeff: 0.0
        kl: 8.350610869456432e-08
        policy_loss: 0.005312837194651365
        total_loss: 58

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 226
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 240
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 247
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 264
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-34-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 531.3597693990722
  episode_reward_min: 183.50029495949883
  episodes_this_iter: 1
  episodes_total: 536
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 265
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-34-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 532.4798515835613
  episode_reward_min: 183.50029495949883
  episodes_this_iter: 1
  episodes_total: 543
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.268
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.433532953262329
        entropy_coeff: 0.0
        kl: 1.574063276166271e-06
        policy_loss: 0.006345752160996199
        total_loss: 72.91104125976562
        vf_explained_var: 0.013590455055236816
        vf_loss: 72.90470886230469
    load_time_ms: 1.07
    num_steps_sampled: 1365000
    num_steps_

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 221
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 251
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-34-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 532.1263697668717
  episode_reward_min: 183.50029495949883
  episodes_this_iter: 0
  episodes_total: 552
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.682
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4240005016326904
        entropy_coeff: 0.0
        kl: 4.580378458740597e-07
        policy_loss: 0.005997353699058294
        total_loss: 2

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 235
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 220
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 258
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 221
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-35-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 523.889319667694
  episode_reward_min: 183.50029495949883
  episodes_this_iter: 2
  episodes_total: 561
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 263
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-35-50
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 534.2565385151291
  episode_reward_min: 183.50029495949883
  episodes_this_iter: 1
  episodes_total: 568
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.6
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4373960494995117
        entropy_coeff: 0.0
        kl: 1.0489940223123995e-06
        policy_loss: -0.0006176370661705732
        total_loss: 79.52926635742188
        vf_explained_var: 0.008176267147064209
        vf_loss: 79.5298843383789
    load_time_ms: 1.137
    num_steps_sampled: 1427400
    num_step

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 262
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 247
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-36-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 536.3671267555004
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 0
  episodes_total: 577
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.272
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4326014518737793
        entropy_coeff: 0.0
        kl: 7.876634526837734e-07
        policy_loss: 0.0027517187409102917
        total_loss: 

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 269
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 260
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 269
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 270
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-36-46
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 550.500411832922
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 2
  episodes_total: 586
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 264
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-37-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 554.5744027306067
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 1
  episodes_total: 593
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.687
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4320425987243652
        entropy_coeff: 0.0
        kl: 1.1217593964829575e-07
        policy_loss: 0.0036707737017422915
        total_loss: 64.28057861328125
        vf_explained_var: 0.022569775581359863
        vf_loss: 64.27691650390625
    load_time_ms: 1.21
    num_steps_sampled: 1489800
    num_ste

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 246
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 247
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 260
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-37-43
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 561.7912385559628
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 0
  episodes_total: 602
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.58
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 268
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 264
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 253
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-38-10
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1166.0592011823494
  episode_reward_mean: 563.5863388370049
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 2
  episodes_total: 611
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.835
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 268
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 265
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 260
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-38-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1097.5016173988267
  episode_reward_mean: 568.2872090906942
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 2
  episodes_total: 619
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.118
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 236
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-39-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 563.9843691538802
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 1
  episodes_total: 627
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.892
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4301327466964722
        entropy_coeff: 0.0
        kl: 2.538108901717351e-06
        policy_loss: -0.0042144074104726315
        total_loss: 50.46963119506836
        vf_explained_var: 0.014225184917449951
        vf_loss: 50.47385025024414
    load_time_ms: 1.011
    num_steps_sampled: 1573000
    num_st

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 254
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 247
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 250
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-39-27
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 573.4416944164387
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 3
  episodes_total: 636
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.862
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 227
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 249
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-39-52
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 575.4317631199788
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 0
  episodes_total: 643
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.702
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4296042919158936
        entropy_coeff: 0.0
        kl: 2.613901983750111e-07
        policy_loss: -0.0009032919770106673
        total_loss:

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 252
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 229
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 221
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-40-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 577.5554015548217
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 3
  episodes_total: 653
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.889
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 223
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 226
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-40-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 589.4466846549533
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 1
  episodes_total: 660
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.176
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4331398010253906
        entropy_coeff: 0.0
        kl: 4.968619123246754e-06
        policy_loss: 0.004202247131615877
        total_loss: 5

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 234
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-41-04
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 589.9866846505244
  episode_reward_min: 176.78854378406191
  episodes_this_iter: 0
  episodes_total: 668
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.685
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4355480670928955
        entropy_coeff: 0.0
        kl: 6.385350388882216e-06
        policy_loss: 0.005772717297077179
        total_loss: 51.283565521240234
        vf_explained_var: 0.02564483880996704
        vf_loss: 51.27779769897461
    load_time_ms: 0.984
    num_steps_sampled: 1677000
    num_step

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 264
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 242
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-41-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 589.4381865754574
  episode_reward_min: 181.5620350593278
  episodes_this_iter: 1
  episodes_total: 677
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.026
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4369568824768066
        entropy_coeff: 0.0
        kl: 1.3879060816179845e-06
        policy_loss: -0.004373582545667887
        total_loss: 

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 265
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 239
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-41-54
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1030.5710320502847
  episode_reward_mean: 578.6812066994577
  episode_reward_min: 181.5620350593278
  episodes_this_iter: 1
  episodes_total: 685
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.891
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.452365756034851
        entropy_coeff: 0.0
        kl: 1.997494791794452e-06
        policy_loss: 0.0038588172756135464
        total_loss: 71

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 266
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 255
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-42-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 588.2596494204563
  episode_reward_min: 181.5620350593278
  episodes_this_iter: 0
  episodes_total: 693
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.633
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4407126903533936
        entropy_coeff: 0.0
        kl: 3.907608970621368e-06
        policy_loss: 0.00018505763728171587
        total_loss: 

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 240
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 226
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 237
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 224
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-42-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 585.4494856600112
  episode_reward_min: 181.5620350593278
  episodes_this_iter: 3
  episodes_total: 703
  experiment_id: 0227ad3d84eb4f9da7b6d7d2

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-43-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 586.9448413775974
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 0
  episodes_total: 709
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.966
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4494105577468872
        entropy_coeff: 0.0
        kl: 9.847879312019359e-08
        policy_loss: 0.0013699167175218463
        total_loss: 47.19916534423828
        vf_explained_var: 0.014732301235198975
        vf_loss: 47.197792053222656
    load_time_ms: 1.007
    num_steps_sampled: 1781000
    num_steps_trained: 1712500
    sample_time_ms: 3047.273
    update_time_ms: 2.924
  iterations_since_restore: 685
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 268
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 234
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 247
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-43-30
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 590.5473195339654
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 0
  episodes_total: 719
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.831
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 244
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 245
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-43-55
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 586.0800937218356
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 1
  episodes_total: 726
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.832
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4722659587860107
        entropy_coeff: 0.0
        kl: 3.887720231432468e-05
        policy_loss: -0.006441743578761816
        total_loss: 

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 222
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-44-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 591.0411915162366
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 1
  episodes_total: 735
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.662
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4513003826141357
        entropy_coeff: 0.0
        kl: 1.1512756827869453e-06
        policy_loss: -0.004034173674881458
        total_loss: 52.833892822265625
        vf_explained_var: 0.0027039647102355957
        vf_loss: 52.83792495727539
    load_time_ms: 0.999
    num_steps_sampled: 1843400
    num_

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 252
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 225
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 256
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-44-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 596.9746536637092
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 1
  episodes_total: 744
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.863
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 246
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 239
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 254
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-45-09
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 593.3701024585479
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 2
  episodes_total: 752
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 20.168
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-45-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 579.9880660792297
  episode_reward_min: 166.18092327538963
  episodes_this_iter: 0
  episodes_total: 759
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.829
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4445997476577759
        entropy_coeff: 0.0
        kl: 1.3299227248353418e-06
        policy_loss: 0.007813814096152782
        total_loss: 47.53923416137695
        vf_explained_var: 0.013983309268951416
        vf_loss: 47.531402587890625
    load_time_ms: 1.059
    num_steps_sampled: 1905800
    num_steps_trained: 1832500
    sample_time_ms: 3062.46
    update_time_ms: 3.028
  iterations_since_restore: 733
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 257
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 242
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 237
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 222
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-45-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 564.455292915999
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 769
  experiment_id: 0227ad3d84eb4f9da7b6d7d23

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 269
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-46-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 557.124248918019
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 776
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.446
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4454809427261353
        entropy_coeff: 0.0
        kl: 1.3970375221106224e-06
        policy_loss: -0.0019358398858457804
        total_loss: 56.95645523071289
        vf_explained_var: 0.0017172694206237793
        vf_loss: 56.95839309692383
    load_time_ms: 1.021
    num_steps_sampled: 1947400
    num_st

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 268
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-46-47
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1065.7183618542192
  episode_reward_mean: 569.3748786002021
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 785
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.568
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4487985372543335
        entropy_coeff: 0.0
        kl: 1.063466015693848e-06
        policy_loss: 0.007790709845721722
        total_loss: 92.8300552368164
        vf_explained_var: 0.0030120015144348145
        vf_loss: 92.82225036621094
    load_time_ms: 1.017
    num_steps_sampled: 1968200
    num_steps

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 257
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 267
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 231
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 241
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-47-11
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 566.4869656612233
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 794
  experiment_id: 0227ad3d84eb4f9da7b6d7d23

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-47-34
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 570.6203179985139
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 0
  episodes_total: 800
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.763
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4518568515777588
        entropy_coeff: 0.0
        kl: 1.7695665519568138e-06
        policy_loss: -0.004352667834609747
        total_loss: 108.45480346679688
        vf_explained_var: 0.002406895160675049
        vf_loss: 108.45915222167969
    load_time_ms: 1.022
    num_steps_sampled: 2009800
    num_steps_trained: 1932500
    sample_time_ms: 2914.097
    update_time_ms: 3.138
  iterations_since_restore: 773
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 236
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-47-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 574.3781013155218
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 810
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.032
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4319393634796143
        entropy_coeff: 0.0
        kl: 1.078724878311732e-07
        policy_loss: 0.0028505437076091766
        total_loss: 29.712797164916992
        vf_explained_var: 0.01044541597366333
        vf_loss: 29.709949493408203
    load_time_ms: 1.049
    num_steps_sampled: 2030600
    num_step

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 223
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 247
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 226
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-48-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 554.9184726470761
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 0
  episodes_total: 818
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.581
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 261
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-48-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 546.3474784050721
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 826
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.256
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.436368703842163
        entropy_coeff: 0.0
        kl: 4.821062020710087e-07
        policy_loss: 0.0016856346046552062
        total_loss: 64.57587432861328
        vf_explained_var: 0.0047234296798706055
        vf_loss: 64.5741958618164
    load_time_ms: 0.971
    num_steps_sampled: 2072200
    num_steps_

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 249
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-49-12
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 546.6226490531454
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 1
  episodes_total: 835
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.566
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4437764883041382
        entropy_coeff: 0.0
        kl: 1.2049674751324346e-06
        policy_loss: -0.0007596589275635779
        total_loss: 88.73332214355469
        vf_explained_var: 0.004251599311828613
        vf_loss: 88.73408508300781
    load_time_ms: 1.243
    num_steps_sampled: 2093000
    num_ste

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 268
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 239
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 225
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 244
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-49-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 551.967480256529
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 0
  episodes_total: 844
  experiment_id: 0227ad3d84eb4f9da7b6d7d230

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 251
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 269
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 234
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-50-00
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 546.3141103808077
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 3
  episodes_total: 852
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.784
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-50-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 549.4199836039231
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 0
  episodes_total: 859
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.817
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4547549486160278
        entropy_coeff: 0.0
        kl: 3.7232637168926885e-06
        policy_loss: -6.396093522198498e-05
        total_loss: 69.86652374267578
        vf_explained_var: 0.00798046588897705
        vf_loss: 69.86659240722656
    load_time_ms: 1.033
    num_steps_sampled: 2155400
    num_steps_trained: 2072500
    sample_time_ms: 2943.514
    update_time_ms: 2.996
  iterations_since_restore: 829
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 252
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 222
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-50-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 560.2359057612924
  episode_reward_min: 154.2684189978663
  episodes_this_iter: 2
  episodes_total: 866
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 28.61
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4696091413497925
        entropy_coeff: 0.0
        kl: 9.097647307498846e-06
        policy_loss: 0.0018460266292095184
        total_loss: 85.

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 233
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-51-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 565.2775625849129
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 1
  episodes_total: 871
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.665
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4723131656646729
        entropy_coeff: 0.0
        kl: 1.8788885427056812e-05
        policy_loss: -0.0027995810378342867
        total_loss: 66.2655029296875
        vf_explained_var: 0.006810903549194336
        vf_loss: 66.26830291748047
    load_time_ms: 2.161
    num_steps_sampled: 2184000
    num_step

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 231
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 266
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 236
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 264
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-51-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 571.0999141706243
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 4
  episodes_total: 879
  experiment_id: 0227ad3d84eb4f9da7b6d7d23

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 250
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 263
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-52-22
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1092.076866493051
  episode_reward_mean: 556.3924004983744
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 1
  episodes_total: 885
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.998
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4612616300582886
        entropy_coeff: 0.0
        kl: 4.646706656785682e-06
        policy_loss: 0.004016960505396128
        total_loss: 49.

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 231
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 234
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 240
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 239
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-52-59
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.42198971329
  episode_reward_mean: 538.2794185577345
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 2
  episodes_total: 894
  experiment_id: 0227ad3d84eb4f9da7b6d7d230f

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 238
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-53-36
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.42198971329
  episode_reward_mean: 529.0997429798966
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 0
  episodes_total: 900
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 43.479
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4796706438064575
        entropy_coeff: 0.0
        kl: 1.0391139767307322e-05
        policy_loss: -0.002339128404855728
        total_loss: 73.27437591552734
        vf_explained_var: 0.008684396743774414
        vf_loss: 73.27671813964844
    load_time_ms: 1.826
    num_steps_sampled: 2259400
    num_steps_

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 232
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 256
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 268
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-54-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 529.2374120708265
  episode_reward_min: 222.2270392332099
  episodes_this_iter: 3
  episodes_total: 909
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 41.994
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 225
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-54-42
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 535.0497073120238
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 1
  episodes_total: 915
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.062
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.475329875946045
        entropy_coeff: 0.0
        kl: 7.710576142017089e-07
        policy_loss: -0.0070840646512806416
        total_loss: 55.29148864746094
        vf_explained_var: 0.0052032470703125
        vf_loss: 55.29857635498047
    load_time_ms: 2.1
    num_steps_sampled: 2295800
    num_steps_tr

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 252
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 240
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-55-14
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 531.6034682654843
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 2
  episodes_total: 923
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 52.717
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4624446630477905
        entropy_coeff: 0.0
        kl: 2.067089077684159e-08
        policy_loss: -0.007774475030601025
        total_loss: 3

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-55-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 535.8596441664117
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 929
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 66.243
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4570392370224
        entropy_coeff: 0.0
        kl: 6.695032084280683e-07
        policy_loss: 0.0007311344961635768
        total_loss: 27.675203323364258
        vf_explained_var: 0.009439706802368164
        vf_loss: 27.674474716186523
    load_time_ms: 2.437
    num_steps_sampled: 2329600
    num_steps_trained: 2240000
    sample_time_ms: 4987.685
    update_time_ms: 12.791
  iterations_since_restore: 896
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-56-17
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 522.6561855060929
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 934
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.459
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.487586498260498
        entropy_coeff: 0.0
        kl: 4.402160811878275e-06
        policy_loss: -0.00012686531408689916
        total_loss: 99.04553985595703
        vf_explained_var: 0.0002987980842590332
        vf_loss: 99.04568481445312
    load_time_ms: 1.863
    num_steps_sampled: 2345200
    num_steps_trained: 2255000
    sample_time_ms: 4795.536
    update_time_ms: 10.757
  iterations_since_restore: 902
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 227
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 262
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 243
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-56-48
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 519.9008530471881
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 3
  episodes_total: 943
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.145
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-57-25
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 512.3082799827229
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 949
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 49.902
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4630309343338013
        entropy_coeff: 0.0
        kl: 6.224417575140251e-06
        policy_loss: 0.008299537003040314
        total_loss: 59.817054748535156
        vf_explained_var: 0.005020856857299805
        vf_loss: 59.80875778198242
    load_time_ms: 2.003
    num_steps_sampled: 2379000
    num_steps_trained: 2287500
    sample_time_ms: 5146.973
    update_time_ms: 9.959
  iterations_since_restore: 915
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-57-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 512.2427827297993
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 954
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.168
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4634253978729248
        entropy_coeff: 0.0
        kl: 1.506853095634142e-06
        policy_loss: -0.004430784378200769
        total_loss: 74.28563690185547
        vf_explained_var: 0.001753687858581543
        vf_loss: 74.2900619506836
    load_time_ms: 2.263
    num_steps_sampled: 2392000
    num_steps_trained: 2300000
    sample_time_ms: 5185.358
    update_time_ms: 9.427
  iterations_since_restore: 920
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_poli

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-58-20
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 521.3326338175866
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 959
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.145
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4656790494918823
        entropy_coeff: 0.0
        kl: 1.964878947546822e-06
        policy_loss: -0.0074799200519919395
        total_loss: 49.08163070678711
        vf_explained_var: 0.0026724934577941895
        vf_loss: 49.08910369873047
    load_time_ms: 2.214
    num_steps_sampled: 2407600
    num_steps_trained: 2315000
    sample_time_ms: 4906.21
    update_time_ms: 9.826
  iterations_since_restore: 926
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 256
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 261
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 223
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 241
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 225
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-58-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  e

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-59-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 507.27213452430266
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 974
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 55.41
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.454683542251587
        entropy_coeff: 0.0
        kl: 2.531290022034227e-07
        policy_loss: -0.001128487172536552
        total_loss: 40.16108703613281
        vf_explained_var: 0.0023383498191833496
        vf_loss: 40.1622200012207
    load_time_ms: 2.503
    num_steps_sampled: 2441400
    num_steps_trained: 2347500
    sample_time_ms: 4892.533
    update_time_ms: 10.96
  iterations_since_restore: 939
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_poli

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_18-59-58
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 506.0299580695004
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 979
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.806
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4616079330444336
        entropy_coeff: 0.0
        kl: 2.363681801398343e-07
        policy_loss: -0.0070132422260940075
        total_loss: 34.11901092529297
        vf_explained_var: 0.005145907402038574
        vf_loss: 34.12602996826172
    load_time_ms: 2.245
    num_steps_sampled: 2457000
    num_steps_trained: 2362500
    sample_time_ms: 5078.785
    update_time_ms: 10.555
  iterations_since_restore: 945
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 262
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 261
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 233
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-00-29
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 982.2050207458504
  episode_reward_mean: 498.9683380533624
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 3
  episodes_total: 987
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.192
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 235
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-01-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.6535183330809
  episode_reward_mean: 498.02852326448766
  episode_reward_min: 160.77137044531065
  episodes_this_iter: 0
  episodes_total: 994
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.711
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4538400173187256
        entropy_coeff: 0.0
        kl: 5.04660590650019e-07
        policy_loss: -0.0007401317125186324
        total_loss: 41.400115966796875
        vf_explained_var: 0.007057070732116699
        vf_loss: 41.400856018066406
    load_time_ms: 2.322
    num_steps_sampled: 2490800
    num_s

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 257
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-01-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.6535183330809
  episode_reward_mean: 500.47863679847194
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 1
  episodes_total: 1000
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 50.098
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.477843999862671
        entropy_coeff: 0.0
        kl: 2.069354110290078e-07
        policy_loss: -0.0022425903007388115
        total_loss: 62.28451919555664
        vf_explained_var: 0.0015954971313476562
        vf_loss: 62.28676986694336
    load_time_ms: 2.544
    num_steps_sampled: 2509000
    num_s

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 260
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 257
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 250
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-02-08
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.6535183330809
  episode_reward_mean: 482.54543269927467
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 3
  episodes_total: 1009
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.232
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 252
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 220
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 239
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-02-33
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.6535183330809
  episode_reward_mean: 484.2471859661588
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 3
  episodes_total: 1014
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 46.822
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 228
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-03-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 919.6535183330809
  episode_reward_mean: 484.8171165660219
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 0
  episodes_total: 1019
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.774
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4618752002716064
        entropy_coeff: 0.0
        kl: 2.5172232653858373e-07
        policy_loss: -0.007302435114979744
        total_loss: 35.06182098388672
        vf_explained_var: 0.004560351371765137
        vf_loss: 35.069129943847656
    load_time_ms: 2.238
    num_steps_sampled: 2553200
    num_s

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 263
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 247
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-03-40
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 886.6247928152688
  episode_reward_mean: 484.3421903775717
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 2
  episodes_total: 1026
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 56.279
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4740577936172485
        entropy_coeff: 0.0
        kl: 3.052640067835455e-06
        policy_loss: -0.0046426765620708466
        total_loss:

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-04-18
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 886.6247928152688
  episode_reward_mean: 494.6209776537586
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 0
  episodes_total: 1034
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.653
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4611786603927612
        entropy_coeff: 0.0
        kl: 2.1331072730390588e-06
        policy_loss: 0.0078694187104702
        total_loss: 27.451318740844727
        vf_explained_var: 0.004077315330505371
        vf_loss: 27.443452835083008
    load_time_ms: 2.2
    num_steps_sampled: 2592200
    num_steps_trained: 2492500
    sample_time_ms: 4810.608
    update_time_ms: 10.458
  iterations_since_restore: 997
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_pol

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 244
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 236
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 237
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-04-53
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 860.5125826951629
  episode_reward_mean: 486.92340998512987
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 2
  episodes_total: 1042
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 55.575
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 249
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 225
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 235
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-05-24
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 860.5125826951629
  episode_reward_mean: 490.70432841663626
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 3
  episodes_total: 1049
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 55.681
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 260
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 257
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 265
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-05-56
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 860.5125826951629
  episode_reward_mean: 481.78721765764743
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 1
  episodes_total: 1055
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.357
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 244
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-06-31
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 884.3356614709331
  episode_reward_mean: 470.9206947263417
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 1
  episodes_total: 1061
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 62.179
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4791516065597534
        entropy_coeff: 0.0
        kl: 6.927728577466041e-07
        policy_loss: -0.005036517512053251
        total_loss: 65.89998626708984
        vf_explained_var: 0.00048792362213134766
        vf_loss: 65.90502166748047
    load_time_ms: 2.814
    num_steps_sampled: 2659800
    num_s

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 256
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 258
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 248
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-07-05
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 483.5383163272789
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 3
  episodes_total: 1070
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 44.239
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 267
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 248
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-07-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 490.3574359486194
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 0
  episodes_total: 1075
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.441
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4545080661773682
        entropy_coeff: 0.0
        kl: 2.033138343904284e-06
        policy_loss: 0.0013847054215148091
        total_loss: 

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-08-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 493.3273013492686
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 0
  episodes_total: 1080
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.038
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4717575311660767
        entropy_coeff: 0.0
        kl: 6.15882868260087e-07
        policy_loss: -0.006254149135202169
        total_loss: 58.7424201965332
        vf_explained_var: 0.0017327666282653809
        vf_loss: 58.748695373535156
    load_time_ms: 2.169
    num_steps_sampled: 2709200
    num_steps_trained: 2605000
    sample_time_ms: 5170.159
    update_time_ms: 10.191
  iterations_since_restore: 1042
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 225
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-08-35
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 509.34831404164436
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 1
  episodes_total: 1086
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.912
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4768149852752686
        entropy_coeff: 0.0
        kl: 5.914855137234554e-06
        policy_loss: 0.0008180229342542589
        total_loss: 60.18553161621094
        vf_explained_var: 0.0016716718673706055
        vf_loss: 60.18471145629883
    load_time_ms: 2.275
    num_steps_sampled: 2722200
    num_s

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 262
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 248
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 229
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-09-06
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 517.0484926392343
  episode_reward_min: 158.68399777397497
  episodes_this_iter: 3
  episodes_total: 1094
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 42.57
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-09-39
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 514.8490380765403
  episode_reward_min: 221.34208940978556
  episodes_this_iter: 0
  episodes_total: 1099
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.797
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.462918758392334
        entropy_coeff: 0.0
        kl: 6.6292523115407676e-06
        policy_loss: 0.0036004860885441303
        total_loss: 28.372106552124023
        vf_explained_var: 0.0039196014404296875
        vf_loss: 28.368499755859375
    load_time_ms: 1.838
    num_steps_sampled: 2756000
    num_steps_trained: 2650000
    sample_time_ms: 4762.468
    update_time_ms: 9.021
  iterations_since_restore: 1060
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 220
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 256
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-10-15
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 515.8016345849051
  episode_reward_min: 221.34208940978556
  episodes_this_iter: 2
  episodes_total: 1108
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 53.208
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4633792638778687
        entropy_coeff: 0.0
        kl: 1.192069021271891e-06
        policy_loss: 0.0051371073350310326
        total_loss: 

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-10-45
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 509.55486187321725
  episode_reward_min: 221.34208940978556
  episodes_this_iter: 0
  episodes_total: 1114
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 54.829
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4498661756515503
        entropy_coeff: 0.0
        kl: 8.182525590427758e-08
        policy_loss: -0.0038257979322224855
        total_loss: 22.701953887939453
        vf_explained_var: 0.005640268325805664
        vf_loss: 22.70578384399414
    load_time_ms: 2.421
    num_steps_sampled: 2789800
    num_steps_trained: 2682500
    sample_time_ms: 4855.298
    update_time_ms: 9.896
  iterations_since_restore: 1073
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-11-15
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 507.2827189782731
  episode_reward_min: 229.4382216915332
  episodes_this_iter: 0
  episodes_total: 1119
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 56.492
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4556206464767456
        entropy_coeff: 0.0
        kl: 6.407737487279519e-07
        policy_loss: -0.003947692923247814
        total_loss: 28.51411247253418
        vf_explained_var: 0.002434968948364258
        vf_loss: 28.51805877685547
    load_time_ms: 2.446
    num_steps_sampled: 2805400
    num_steps_trained: 2697500
    sample_time_ms: 4977.813
    update_time_ms: 10.376
  iterations_since_restore: 1079
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_p

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-11-41
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 498.3240792356711
  episode_reward_min: 229.4382216915332
  episodes_this_iter: 0
  episodes_total: 1124
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 60.915
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.459098219871521
        entropy_coeff: 0.0
        kl: 1.1163712088091415e-06
        policy_loss: -0.0026378303300589323
        total_loss: 36.006103515625
        vf_explained_var: 0.002684295177459717
        vf_loss: 36.00874328613281
    load_time_ms: 2.533
    num_steps_sampled: 2818400
    num_steps_trained: 2710000
    sample_time_ms: 4965.855
    update_time_ms: 11.029
  iterations_since_restore: 1084
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off_po

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-12-02
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 500.60566125313284
  episode_reward_min: 206.60878524365535
  episodes_this_iter: 0
  episodes_total: 1129
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 59.927
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.453578233718872
        entropy_coeff: 0.0
        kl: 7.0095063087194376e-09
        policy_loss: 0.0013931564753875136
        total_loss: 22.361312866210938
        vf_explained_var: 0.002789914608001709
        vf_loss: 22.35991668701172
    load_time_ms: 2.599
    num_steps_sampled: 2828800
    num_steps_trained: 2720000
    sample_time_ms: 5052.557
    update_time_ms: 10.737
  iterations_since_restore: 1088
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  of

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 224
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 233
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-12-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 483.5906785013027
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 2
  episodes_total: 1136
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 59.741
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4667267799377441
        entropy_coeff: 0.0
        kl: 1.6900301602618129e-07
        policy_loss: -0.0019981125369668007
        total_loss:

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 246
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 230
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 260
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 260
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-13-03
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 487.47450965349685
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 4
  episodes_total: 1143
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 269
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-13-37
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 486.8769024530631
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 0
  episodes_total: 1149
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 48.275
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4577770233154297
        entropy_coeff: 0.0
        kl: 1.6273021401502774e-06
        policy_loss: -0.0018339782254770398
        total_loss: 42.886695861816406
        vf_explained_var: 0.0008455514907836914
        vf_loss: 42.88853454589844
    load_time_ms: 2.198
    num_steps_sampled: 2878200
    num_

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 227
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 257
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-14-13
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 485.2526390961122
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 2
  episodes_total: 1156
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 57.011
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4625846147537231
        entropy_coeff: 0.0
        kl: 7.347106816268933e-07
        policy_loss: -0.0015849132323637605
        total_loss: 

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 239
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 253
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 229
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 251
[2m[36m(pid=26007)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-14-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 925.9889134036155
  episode_reward_mean: 489.07994700659214
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 4
  episodes_total: 1164
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 251
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 267
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 255
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-15-11
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 493.50351841718395
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 3
  episodes_total: 1169
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 59.461
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-15-44
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 492.49119496874204
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 0
  episodes_total: 1174
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 45.977
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.456620693206787
        entropy_coeff: 0.0
        kl: 4.016208549728617e-06
        policy_loss: -0.0002871963370125741
        total_loss: 47.87030792236328
        vf_explained_var: 0.002409636974334717
        vf_loss: 47.87059020996094
    load_time_ms: 2.096
    num_steps_sampled: 2943200
    num_steps_trained: 2830000
    sample_time_ms: 4854.267
    update_time_ms: 10.282
  iterations_since_restore: 1132
  node_ip: 192.168.100.38
  num_healthy_workers: 5
  off

[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 242
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 251
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 229
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 255
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-16-21
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 479.6972252324823
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 2
  episodes_total: 1184
  experiment_id: 0227ad3d84eb4f9da7b6d7d

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 237
[2m[36m(pid=26004)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-16-51
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 472.40898511954055
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 1
  episodes_total: 1190
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 51.366
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.449819564819336
        entropy_coeff: 0.0
        kl: 4.381346570880851e-06
        policy_loss: -0.003025740385055542
        total_loss: 50.88562774658203
        vf_explained_var: 0.0004588961601257324
        vf_loss: 50.88866424560547
    load_time_ms: 2.244
    num_steps_sampled: 2979600
    num_st

[2m[36m(pid=26008)[0m 
[2m[36m(pid=26008)[0m -----------------------
[2m[36m(pid=26008)[0m ring length: 243
[2m[36m(pid=26008)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-17-26
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 472.8754459912702
  episode_reward_min: 151.4130383424371
  episodes_this_iter: 1
  episodes_total: 1196
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.223
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4553993940353394
        entropy_coeff: 0.0
        kl: 4.347968115325784e-06
        policy_loss: -0.003063187701627612
        total_loss: 45.05433654785156
        vf_explained_var: 0.00037407875061035156
        vf_loss: 45.057411193847656
    load_time_ms: 2.02
    num_steps_sampled: 2997800
    num_s

[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 242
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 244
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 268
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-17-55
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 473.8131959305023
  episode_reward_min: 140.2443830298491
  episodes_this_iter: 3
  episodes_total: 1204
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 39.902
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=26007)[0m 
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26007)[0m ring length: 220
[2m[36m(pid=26007)[0m -----------------------
[2m[36m(pid=26006)[0m 
[2m[36m(pid=26006)[0m -----------------------
[2m[36m(pid=26006)[0m ring length: 257
[2m[36m(pid=26006)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-18-23
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 469.54169434684974
  episode_reward_min: 140.2443830298491
  episodes_this_iter: 0
  episodes_total: 1209
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 38.579
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.443091630935669
        entropy_coeff: 0.0
        kl: 5.369901714402658e-07
        policy_loss: 0.0017986358143389225
        total_loss: 

[2m[36m(pid=26004)[0m 
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26004)[0m ring length: 255
[2m[36m(pid=26004)[0m -----------------------
[2m[36m(pid=26005)[0m 
[2m[36m(pid=26005)[0m -----------------------
[2m[36m(pid=26005)[0m ring length: 227
[2m[36m(pid=26005)[0m -----------------------
Result for PPO_EnergyOptSPDEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-30_19-18-57
  done: false
  episode_len_mean: 2500.0
  episode_reward_max: 1035.1014877303473
  episode_reward_mean: 474.64640647136594
  episode_reward_min: 140.2443830298491
  episodes_this_iter: 1
  episodes_total: 1216
  experiment_id: 0227ad3d84eb4f9da7b6d7d230fb1c49
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 47.66
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.4475589990615845
        entropy_coeff: 0.0
        kl: 5.376339231588645e-07
        policy_loss: 0.008351102471351624
        total_loss: 3

2020-07-30 19:19:03,582	ERROR import_thread.py:89 -- ImportThread: Connection closed by server.
2020-07-30 19:19:03,584	ERROR worker.py:1716 -- listen_error_messages_raylet: Connection closed by server.
2020-07-30 19:19:03,585	ERROR worker.py:1616 -- print_logs: Connection closed by server.


KeyboardInterrupt: 

### 4.5 Visualizing the results

The simulation results are saved within the `ray_results/training_example` directory (we defined `training_example` at the start of this tutorial). The `ray_results` folder is by default located at your root `~/ray_results`. 

You can run `tensorboard --logdir=~/ray_results/training_example` (install it with `pip install tensorboard`) to visualize the different data outputted by your simulation.

For more instructions about visualizing, please see `tutorial05_visualize.ipynb`. 

### 4.6 Restart from a checkpoint / Transfer learning

If you wish to do transfer learning, or to resume a previous training, you will need to start the simulation from a previous checkpoint. To do that, you can add a `restore` parameter in the `run_experiments` argument, as follows:

```python
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "restore": "/ray_results/experiment/dir/checkpoint_50/checkpoint-50"
        "checkpoint_freq": 1,
        "checkpoint_at_end": True,
        "max_failures": 999,
        "stop": {
            "training_iteration": 1,
        },
    },
})
```

The `"restore"` path should be such that the `[restore]/.tune_metadata` file exists.

There is also a `"resume"` parameter that you can set to `True` if you just wish to continue the training from a previously saved checkpoint, in case you are still training on the same experiment. 

In [None]:
# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "/ray_results/training_example13/PPO_EnergyOptPOEnv-v0_0_2020-07-23_13-30-07yze28sum/checkpoint_400/checkpoint-400", 
#         "checkpoint_freq": 20,
#         "checkpoint_at_end": True,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 700,
#         },
#     },
# })

In [None]:
from flow.core.vehicles import Vehicles