# Tutorial 03: Running RLlib Experiments

This tutorial walks you through the process of running traffic simulations in Flow with trainable RLlib-powered agents. Autonomous agents will learn to maximize a certain reward over the rollouts, using the [**RLlib**](https://ray.readthedocs.io/en/latest/rllib.html) library ([citation](https://arxiv.org/abs/1712.09381)) ([installation instructions](https://flow.readthedocs.io/en/latest/flow_setup.html#optional-install-ray-rllib)). Simulations of this form will depict the propensity of RL agents to influence the traffic of a human fleet in order to make the whole fleet more efficient (for some given metrics). 

In this tutorial, we simulate an initially perturbed single lane ring road, where we introduce a single autonomous vehicle. We witness that, after some training, that the autonomous vehicle learns to dissipate the formation and propagation of "phantom jams" which form when only human driver dynamics are involved.

## 1. Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *network*, and an *environment*. Networks describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc... in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act as the primary interface between the reinforcement learning algorithm and the network. Moreover, custom environments may be used to modify the dynamical features of an network. Finally, in the RL case, it is in the *environment* that the state/action spaces and the reward function are defined. 

## 2. Setting up a Network
Flow contains a plethora of pre-designed networks used to replicate highways, intersections, and merges in both closed and open settings. All these networks are located in flow/networks. For this tutorial, which involves a single lane ring road, we will use the network `RingNetwork`.

### 2.1 Setting up Network Parameters

The network mentioned at the start of this section, as well as all other networks in Flow, are parameterized by the following arguments: 
* name
* vehicles
* net_params
* initial_config

These parameters are explained in detail in `tutorial01_sumo.ipynb`. Moreover, all parameters excluding vehicles (covered in section 2.2) do not change from the previous tutorial. Accordingly, we specify them nearly as we have before, and leave further explanations of the parameters to `tutorial01_sumo.ipynb`.

We begin by choosing the network the experiment will be trained on. We use one of Flow's builtin networks, located in `flow.networks`. A list of all available networks can be found by running the script below.

In [1]:
import flow.networks as networks

# print(networks.__all__)

In this tutorial, we choose to use the ring road network. The network class is then:

In [2]:
from flow.networks import RingNetwork

# ring road network class
network_name = RingNetwork

One key difference between SUMO and RLlib experiments is that, in RLlib experiments, the network classes do not need to be defined; instead users should simply name the network class they wish to use. Later on, an environment setup module will import the correct network class based on the provided names.

In [3]:
# input parameter classes to the network class
from flow.core.params import NetParams, InitialConfig

# name of the network
name = "training_example15"

# network-specific parameters
from flow.networks.ring import ADDITIONAL_NET_PARAMS
net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

# initial configuration to vehicles
initial_config = InitialConfig(spacing="uniform", perturbation=1)

### 2.2 Adding Trainable Autonomous Vehicles
The `Vehicles` class stores state information on all vehicles in the network. This class is used to identify the dynamical features of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various `get` methods within this class.

The dynamics of vehicles in the `Vehicles` class can either be depicted by sumo or by the dynamical methods located in flow/controllers. For human-driven vehicles, we use the IDM model for acceleration behavior, with exogenous gaussian acceleration noise with std 0.2 m/s2 to induce perturbations that produce stop-and-go behavior. In addition, we use the `ContinousRouter` routing controller so that the vehicles may maintain their routes closed networks.

As we have done in `tutorial01_sumo.ipynb`, human-driven vehicles are defined in the `VehicleParams` class as follows:

In [4]:
# vehicles class
from flow.core.params import VehicleParams

# vehicles dynamics models
from flow.controllers import IDMController, ContinuousRouter

vehicles = VehicleParams()
#vehicles.add("human",
#             acceleration_controller=(IDMController, {}),
#             routing_controller=(ContinuousRouter, {}),
#             num_vehicles=10)

The above addition to the `Vehicles` class only accounts for 21 of the 22 vehicles that are placed in the network. We now add an additional trainable autuonomous vehicle whose actions are dictated by an RL agent. This is done by specifying an `RLController` as the acceleraton controller to the vehicle. 

In [5]:
from flow.controllers import RLController

Note that this controller serves primarirly as a placeholder that marks the vehicle as a component of the RL agent, meaning that lane changing and routing actions can also be specified by the RL agent to this vehicle.

We finally add the vehicle as follows, while again using the `ContinuousRouter` to perpetually maintain the vehicle within the network.

In [6]:
# from flow.energy_models.toyota_energy import TacomaEnergy
# vehicles.add(veh_id="rl",
#              acceleration_controller=(RLController, {}),
#              routing_controller=(ContinuousRouter, {}),
#              initial_speed =20,
#              energy_model = TacomaEnergy,
#              num_vehicles=1)


vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             initial_speed =20,
             num_vehicles=1)

## 3. Setting up an Environment

Several environments in Flow exist to train RL agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. The use of an environment allows us to view the cumulative reward simulation rollouts receive, along with to specify the state/action spaces.

Sumo envrionments in Flow are parametrized by three components:
* `SumoParams`
* `EnvParams`
* `Network`

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length of any simulation step and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and deactivate the GUI. 

**Note** For training purposes, it is highly recommanded to deactivate the GUI in order to avoid global slow down. In such case, one just needs to specify the following: `render=False`

In [7]:
from flow.core.params import SumoParams

sim_params = SumoParams(sim_step=0.1, render=False)

### 3.2 EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. For the environment `WaveAttenuationPOEnv`, these parameters are used to dictate bounds on the accelerations of the autonomous vehicles, as well as the range of ring lengths (and accordingly network densities) the agent is trained on.

Finally, it is important to specify here the *horizon* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [8]:
from flow.core.params import EnvParams

# Define horizon as a variable to ensure consistent use across notebook
HORIZON=1000

env_params = EnvParams(
    # length of one rollout
    horizon=HORIZON,

    additional_params={
        # maximum acceleration of autonomous vehicles
        "max_accel": 1,
        # maximum deceleration of autonomous vehicles
        "max_decel": 1,
        # bounds on the ranges of ring road lengths the autonomous vehicle 
        # is trained on
        "ring_length": [220, 270],
    },
)

### 3.3 Initializing a Gym Environment

Now, we have to specify our Gym Environment and the algorithm that our RL agents will use. Similar to the network, we choose to use on of Flow's builtin environments, a list of which is provided by the script below.

In [9]:
import flow.envs as flowenvs

print(flowenvs.__all__)

['Env', 'AccelEnv', 'LaneChangeAccelEnv', 'LaneChangeAccelPOEnv', 'TrafficLightGridTestEnv', 'MergePOEnv', 'BottleneckEnv', 'BottleneckAccelEnv', 'WaveAttenuationEnv', 'WaveAttenuationPOEnv', 'EnergyOptEnv', 'EnergyOptPOEnv', 'TrafficLightGridEnv', 'TrafficLightGridPOEnv', 'TrafficLightGridBenchmarkEnv', 'BottleneckDesiredVelocityEnv', 'TestEnv', 'BayBridgeEnv', 'SingleStraightRoad', 'BottleNeckAccelEnv', 'DesiredVelocityEnv', 'PO_TrafficLightGridEnv', 'GreenWaveTestEnv']


We will use the environment "WaveAttenuationPOEnv", which is used to train autonomous vehicles to attenuate the formation and propagation of waves in a partially observable variable density ring road. To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables. These are defined as follows:

In [10]:
from flow.envs import EnergyOptPOEnv

env_name = EnergyOptPOEnv

In [11]:
# from flow.envs import WaveAttenuationPOEnv

# env_name = WaveAttenuationPOEnv

### 3.4 Setting up Flow Parameters

RLlib experiments both generate a `params.json` file for each experiment run. For RLlib experiments, the parameters defining the Flow network and environment must be stored as well. As such, in this section we define the dictionary `flow_params`, which contains the variables required by the utility function `make_create_env`. `make_create_env` is a higher-order function which returns a function `create_env` that initializes a Gym environment corresponding to the Flow network specified.

In [12]:
# Creating flow_params. Make sure the dictionary keys are as specified. 
flow_params = dict(
    # name of the experiment
    exp_tag=name,
    # name of the flow environment the experiment is running on
    env_name=env_name,
    # name of the network class the experiment uses
    network=network_name,
    # simulator that is used by the experiment
    simulator='traci',
    # simulation-related parameters
    sim=sim_params,
    # environment related parameters (see flow.core.params.EnvParams)
    env=env_params,
    # network-related parameters (see flow.core.params.NetParams and
    # the network's documentation or ADDITIONAL_NET_PARAMS component)
    net=net_params,
    # vehicles to be placed in the network at the start of a rollout 
    # (see flow.core.vehicles.Vehicles)
    veh=vehicles,
    # (optional) parameters affecting the positioning of vehicles upon 
    # initialization/reset (see flow.core.params.InitialConfig)
    initial=initial_config
)

## 4 Running RL experiments in Ray

### 4.1 Import 

First, we must import modules required to run experiments in Ray. The `json` package is required to store the Flow experiment parameters in the `params.json` file, as is `FlowParamsEncoder`. Ray-related imports are required: the PPO algorithm agent, `ray.tune`'s experiment runner, and environment helper methods `register_env` and `make_create_env`.

In [13]:
import json

import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
# from ray.rllib.agents.agent import get_agent_class
#from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

Instructions for updating:
non-resource variables are not supported in the long term


### 4.2 Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [14]:
# number of parallel workers
N_CPUS = 4
# number of rollouts per training iteration
N_ROLLOUTS = 1
#ray.shutdown()
ray.init(num_cpus=N_CPUS)

2020-07-23 18:17:23,170	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-23_18-17-23_169420_16993/logs.
2020-07-23 18:17:23,287	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:41576 to respond...
2020-07-23 18:17:23,436	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:17355 to respond...
2020-07-23 18:17:23,442	INFO services.py:809 -- Starting Redis shard with 3.3 GB max memory.
2020-07-23 18:17:23,518	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-07-23_18-17-23_169420_16993/logs.
2020-07-23 18:17:23,527	INFO services.py:1475 -- Starting the Plasma object store with 4.96 GB memory using /dev/shm.


{'node_ip_address': '192.168.100.38',
 'redis_address': '192.168.100.38:41576',
 'object_store_address': '/tmp/ray/session_2020-07-23_18-17-23_169420_16993/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-07-23_18-17-23_169420_16993/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2020-07-23_18-17-23_169420_16993'}

### 4.3 Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym. 

In [15]:
# The algorithm or model to train. This may refer to "
#      "the name of a built-on algorithm (e.g. RLLib's DQN "
#      "or PPO), or a user-defined trainable function or "
#      "class registered in the tune registry.")
alg_run = "PPO"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])  # stochastic gradient descent
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 10  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

### 4.4 Running Experiments

Here, we use the `run_experiments` function from `ray.tune`. The function takes a dictionary with one key, a name corresponding to the experiment, and one value, itself a dictionary containing parameters for training.

In [16]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 20,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 700,  # number of iterations to stop after
        },
    },
})

2020-07-23 18:17:23,862	INFO trial_runner.py:176 -- Starting a new experiment.
2020-07-23 18:17:24,014	ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 7.8/16.5 GB





== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 0/0 GPUs
Memory usage on this node: 7.9/16.5 GB
Result logdir: /home/solom/ray_results/training_example15
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_EnergyOptPOEnv-v0_0:	RUNNING

[2m[36m(pid=17036)[0m Instructions for updating:
[2m[36m(pid=17036)[0m non-resource variables are not supported in the long term
[2m[36m(pid=17036)[0m 2020-07-23 18:17:28,824	INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=17036)[0m 2020-07-23 18:17:28.826642: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=17036)[0m 2020-07-23 18:17:28.863038: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(pid=17036)[0m 2020-07-23 18:17:28.864174: I tensorflow/compiler/xla/service/servic

[2m[36m(pid=17034)[0m 2020-07-23 18:17:34,999	INFO rollout_worker.py:319 -- Creating policy evaluation worker 2 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=17034)[0m 2020-07-23 18:17:35.002078: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[2m[36m(pid=17034)[0m 2020-07-23 18:17:35.034957: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 1999965000 Hz
[2m[36m(pid=17034)[0m 2020-07-23 18:17:35.035318: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f22b8000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
[2m[36m(pid=17034)[0m 2020-07-23 18:17:35.035370: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
[2m[36m(pid=17034)[0m 2020-07-23 18:17:35.039243: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not lo

[2m[36m(pid=17033)[0m Instructions for updating:
[2m[36m(pid=17033)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=17033)[0m Instructions for updating:
[2m[36m(pid=17033)[0m Please use `layer.__call__` method instead.
[2m[36m(pid=17033)[0m Instructions for updating:
[2m[36m(pid=17033)[0m Use `tf.cast` instead.
[2m[36m(pid=17033)[0m Instructions for updating:
[2m[36m(pid=17033)[0m Use `tf.cast` instead.
[2m[36m(pid=17035)[0m 2020-07-23 18:17:35,261	INFO dynamic_tf_policy.py:324 -- Initializing loss function with dummy input:
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m { 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
[2m[36m(pid=17035)[0m   'actions': <tf.Tensor 'default_policy/actions:0' shape=(?, 1) dtype=float32>,
[2m[36m(pid=17035)[0m   'advantages': <tf.Tensor 'default_policy/advantages:0' shape=(?,) dtype=float32>,
[2m[36m(pid=17035)[0m   'behaviour_logits': <tf.Tensor 'default_policy/behavi

[2m[36m(pid=17035)[0m 2020-07-23 18:17:38,667	INFO sample_batch_builder.py:161 -- Trajectory fragment after postprocess_trajectory():
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m { 'agent0': { 'data': { 'action_prob': np.ndarray((200,), dtype=float32, min=0.005, max=0.401, mean=0.295),
[2m[36m(pid=17035)[0m                         'actions': np.ndarray((200, 1), dtype=float32, min=-2.876, max=2.908, mean=0.12),
[2m[36m(pid=17035)[0m                         'advantages': np.ndarray((200,), dtype=float32, min=-0.108, max=0.0, mean=-0.077),
[2m[36m(pid=17035)[0m                         'agent_index': np.ndarray((200,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[2m[36m(pid=17035)[0m                         'behaviour_logits': np.ndarray((200, 2), dtype=float32, min=-0.006, max=-0.005, mean=-0.006),
[2m[36m(pid=17035)[0m                         'dones': np.ndarray((200,), dtype=bool, min=0.0, max=0.0, mean=0.0),
[2m[36m(pid=17035)[0m                         'eps_i

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-17-40
  done: false
  episode_len_mean: .nan
  episode_reward_max: .nan
  episode_reward_mean: .nan
  episode_reward_min: .nan
  episodes_this_iter: 0
  episodes_total: 0
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 435.806
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.4178375005722046
        entropy_coeff: 0.0
        kl: 2.4992763428599574e-05
        policy_loss: -0.0006492786342278123
        total_loss: 0.00418465631082654
        vf_explained_var: 0.0005594491958618164
        vf_loss: 0.004828924313187599
    load_time_ms: 64.083
    num_steps_sampled: 1000
    num_steps_trained: 1000
    sample_time_ms: 5238.806
    update_time_ms: 832.057
  iterations_since_restore: 1
  node_ip: 192.168.100.38
  num_healthy_workers: 3
  off_policy_estimator: {}
  perf:

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 235
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 268
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 225
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-18-10
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2.3628428581012884
  episode_reward_mean: -2.538489268219056
  episode_reward_min: -2.7812310064886363
  episodes_this_iter: 3
  episodes_total: 12
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.193
    learner:
      default_policy:
        cur_kl_coeff: 9.765625145519152e-05
        cur_l

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 232
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 259
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 246
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-18-43
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2.0926997608494196
  episode_reward_mean: -2.4115117555012326
  episode_reward_min: -2.7812310064886363
  episodes_this_iter: 3
  episodes_total: 24
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.703
    learner:
      default_policy:
        cur_kl_coeff: 2.3841858265427618e-08
        cur

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 227
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 237
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 221
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-19-14
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9531865851314698
  episode_reward_mean: -2.3003037368044748
  episode_reward_min: -2.7812310064886363
  episodes_this_iter: 3
  episodes_total: 36
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.053
    learner:
      default_policy:
        cur_kl_coeff: 5.8207661780829145e-12
        cur

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 255
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 234
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 228
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-19-38
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -2.233996989499085
  episode_reward_min: -2.7812310064886363
  episodes_this_iter: 3
  episodes_total: 45
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.028
    learner:
      default_policy:
        cur_kl_coeff: 1.1368683941568192e-14
        cur_

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 252
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 226
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 263
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-20-09
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -157.29696407551444
  episode_reward_min: -8836.99777249184
  episodes_this_iter: 3
  episodes_total: 57
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.366
    learner:
      default_policy:
        cur_kl_coeff: 2.775557602921922e-18
        cur_lr

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 245
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 262
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 258
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-20-41
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -1859.779423661346
  episode_reward_min: -25251.147091211416
  episodes_this_iter: 3
  episodes_total: 69
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.594
    learner:
      default_policy:
        cur_kl_coeff: 6.776263679008599e-22
        cur_l

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 222
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 231
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 268
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-21-05
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -4501.990354205035
  episode_reward_min: -36600.106556821855
  episodes_this_iter: 3
  episodes_total: 78
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.757
    learner:
      default_policy:
        cur_kl_coeff: 1.323488999806367e-24
        cur_l

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 261
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 232
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 244
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-21-28
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -7497.848917845529
  episode_reward_min: -51117.029107478906
  episodes_this_iter: 3
  episodes_total: 87
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.469
    learner:
      default_policy:
        cur_kl_coeff: 2.5849394527468104e-27
        cur_

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 237
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 257
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 222
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-21-53
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -10880.038554039706
  episode_reward_min: -71316.28269522227
  episodes_this_iter: 3
  episodes_total: 96
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.815
    learner:
      default_policy:
        cur_kl_coeff: 5.048709868646114e-30
        cur_l

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 233
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 257
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 264
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-22-17
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -14044.024211909378
  episode_reward_min: -71316.28269522227
  episodes_this_iter: 3
  episodes_total: 105
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.111
    learner:
      default_policy:
        cur_kl_coeff: 9.860761462199441e-33
        cur_

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-22-39
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -17119.18446300678
  episode_reward_min: -83938.92479503428
  episodes_this_iter: 0
  episodes_total: 111
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.747
    learner:
      default_policy:
        cur_kl_coeff: 3.851859946171657e-35
        cur_lr: 4.999999873689376e-05
        entropy: 1.317049264907837
        entropy_coeff: 0.0
        kl: 4.697501481132349e-06
        policy_loss: 3.307914812467061e-05
        total_loss: 2787387.0
        vf_explained_var: -7.081031799316406e-05
        vf_loss: 2787387.0
    load_time_ms: 1.265
    num_steps_sampled: 113000
    num_steps_trained: 113000
    sample_time_ms: 2753.583
    update_time_ms: 4.789
  iterations_since_restore: 113
  node_ip: 192.168.100.38
  num_healthy_workers: 3
  off_

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 258
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 262
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 243
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-22-59
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -21361.917056781054
  episode_reward_min: -83938.92479503428
  episodes_this_iter: 3
  episodes_total: 120
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 26.202
    learner:
      default_policy:
        cur_kl_coeff: 3.009265582946607e-37
        cur_

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 230
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 264
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 229
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-23-23
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -25478.581807981347
  episode_reward_min: -83938.92479503428
  episodes_this_iter: 3
  episodes_total: 129
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 23.746
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 235
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 268
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 269
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-23-48
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9232900454108732
  episode_reward_mean: -30416.511948211402
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 138
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.247
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 266
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 245
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 235
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-24-07
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -1.9836056388508754
  episode_reward_mean: -33566.09483620264
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 0
  episodes_total: 144
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 22.981
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 270
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 222
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 237
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-24-29
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2.4744405330623827
  episode_reward_mean: -38017.38827542755
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 153
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.451
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 233
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 221
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 225
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-24-54
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2.837500077081288
  episode_reward_mean: -41339.28909470646
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 162
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 24.577
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 251
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 232
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 254
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-25-18
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5686.5466966780095
  episode_reward_mean: -43645.47183653359
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 171
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 30.454
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

Result for PPO_EnergyOptPOEnv-v0_0:
  date: 2020-07-23_18-25-40
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -10738.29325135081
  episode_reward_mean: -44243.99045346593
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 0
  episodes_total: 177
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 31.019
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.342880129814148
        entropy_coeff: 0.0
        kl: 2.666711793608556e-07
        policy_loss: -5.547714317799546e-05
        total_loss: 7033645.5
        vf_explained_var: -0.00011277198791503906
        vf_loss: 7033645.0
    load_time_ms: 1.34
    num_steps_sampled: 179000
    num_steps_trained: 179000
    sample_time_ms: 2748.324
    update_time_ms: 5.233
  iterations_since_restore: 179
  node_ip: 192.168.100.38
  num_healthy_workers: 3
  off_policy_estimator: {}
  perf:
    cpu_ut

Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-25-59
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -10738.29325135081
  episode_reward_mean: -44080.424879887054
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 0
  episodes_total: 183
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 25.613
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3435639142990112
        entropy_coeff: 0.0
        kl: 5.47140814433078e-07
        policy_loss: -6.418800330720842e-05
        total_loss: 17116314.0
        vf_explained_var: -6.0439109802246094e-05
        vf_loss: 17116314.0
    load_time_ms: 1.279
    num_steps_sampled: 185000
    num_steps_trained: 185000
    sample_time_ms: 3000.482
    update_time_ms: 5.247
  iterations_since_restore: 185
  node_ip: 192.168.100.38
  num_healthy_workers: 3
  off_policy_estimat

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 256
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 269
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 222
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-26-17
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -10743.118166463773
  episode_reward_mean: -44578.9328041109
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 0
  episodes_total: 192
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.725
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 260
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 260
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 262
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-26-30
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -10743.118166463773
  episode_reward_mean: -43121.326364789755
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 201
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.163
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 244
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 242
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 226
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-26-51
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -7590.736610838424
  episode_reward_mean: -42060.62228065729
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 210
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 19.337
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 229
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 223
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 245
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-27-13
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5060.286586672388
  episode_reward_mean: -40496.17905416081
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 3
  episodes_total: 219
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.286
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 234
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 249
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 241
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-27-29
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5060.286586672388
  episode_reward_mean: -39114.45187905513
  episode_reward_min: -86504.65034667363
  episodes_this_iter: 0
  episodes_total: 225
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 18.591
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 246
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 268
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 232
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-27-45
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5060.286586672388
  episode_reward_mean: -36582.85429417938
  episode_reward_min: -82056.9522993052
  episodes_this_iter: 3
  episodes_total: 234
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.154
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 230
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 261
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 251
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-28-01
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5060.286586672388
  episode_reward_mean: -34841.27707352413
  episode_reward_min: -71434.96612977947
  episodes_this_iter: 3
  episodes_total: 243
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.883
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 261
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 251
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 222
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-28-17
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2538.446241547162
  episode_reward_mean: -33017.95041358371
  episode_reward_min: -71434.96612977947
  episodes_this_iter: 0
  episodes_total: 252
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.166
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 244
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 242
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 237
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-28-33
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2532.9171776324088
  episode_reward_mean: -30354.74759440398
  episode_reward_min: -70060.45291573861
  episodes_this_iter: 0
  episodes_total: 261
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.119
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 247
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 246
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 265
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-28-51
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -2532.9171776324088
  episode_reward_mean: -28157.562452439663
  episode_reward_min: -70060.45291573861
  episodes_this_iter: 0
  episodes_total: 270
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.255
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 262
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 243
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 223
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-29-07
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -6.247634568097545
  episode_reward_mean: -26018.913064809854
  episode_reward_min: -70060.45291573861
  episodes_this_iter: 0
  episodes_total: 279
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.233
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 234
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 248
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-29-25
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -24226.64894041367
  episode_reward_min: -60620.93343373706
  episodes_this_iter: 1
  episodes_total: 285
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 17.835
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.326059103012085
        entropy_coeff: 0.0
        kl: 3.255903720855713e-06
        policy_loss: -0.00024079513968899846
        total_loss:

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 244
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 242
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 223
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-29-42
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -22300.292541159288
  episode_reward_min: -60620.93343373706
  episodes_this_iter: 0
  episodes_total: 294
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.804
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 252
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 238
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 259
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-29-59
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -20204.01945515842
  episode_reward_min: -60620.93343373706
  episodes_this_iter: 0
  episodes_total: 303
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.851
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 238
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 252
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 256
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-30-16
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -17798.739240315488
  episode_reward_min: -60620.93343373706
  episodes_this_iter: 0
  episodes_total: 312
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.198
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 222
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 221
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 220
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-30-34
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -15570.747308643959
  episode_reward_min: -54938.348922438425
  episodes_this_iter: 0
  episodes_total: 321
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.641
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 256
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 239
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 266
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-30-50
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -13972.884061955374
  episode_reward_min: -54938.348922438425
  episodes_this_iter: 0
  episodes_total: 330
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.902
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 269
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 265
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 229
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-31-07
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -11170.197513595158
  episode_reward_min: -54938.348922438425
  episodes_this_iter: 0
  episodes_total: 339
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.598
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 229
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 267
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 252
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-31-25
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -9073.765717382776
  episode_reward_min: -54938.348922438425
  episodes_this_iter: 0
  episodes_total: 348
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.682
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 238
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 259
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 249
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-31-41
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -7475.8665187159495
  episode_reward_min: -34098.93249779816
  episodes_this_iter: 0
  episodes_total: 357
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.68
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 248
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 259
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 239
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-31-57
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -6477.525331797149
  episode_reward_min: -34098.93249779816
  episodes_this_iter: 0
  episodes_total: 366
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.544
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 270
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 235
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 232
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-32-14
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.098577456046133
  episode_reward_mean: -5403.186843273583
  episode_reward_min: -31605.680819544545
  episodes_this_iter: 0
  episodes_total: 375
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.445
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 239
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 270
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 258
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-32-31
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.181812414910842
  episode_reward_mean: -4557.3826085204755
  episode_reward_min: -22757.221193421992
  episodes_this_iter: 0
  episodes_total: 384
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.706
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 253
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 224
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 226
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-32-48
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.181812414910842
  episode_reward_mean: -3875.164827031801
  episode_reward_min: -22757.221193421992
  episodes_this_iter: 0
  episodes_total: 393
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.256
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 258
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 225
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 258
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-33-04
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -3433.8064397848125
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 402
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.347
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 245
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 231
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 246
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-33-21
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -3446.0617522139546
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 411
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.2
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 256
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 223
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 254
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-33-37
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -3469.4906003346837
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 420
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.001
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 242
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 241
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 234
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-33-53
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -3393.480384775766
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 429
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.561
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 231
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 226
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 233
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-34-09
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -2957.7267777158936
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 438
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.903
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 225
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 253
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 240
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-34-26
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -3052.270442743962
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 447
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.47
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 240
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 248
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 224
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-34-44
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -2900.208286889105
  episode_reward_min: -21470.76033194405
  episodes_this_iter: 0
  episodes_total: 456
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.665
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 255
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 244
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 222
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-35-02
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.247490168390454
  episode_reward_mean: -2546.3469140304296
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 0
  episodes_total: 465
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.707
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 250
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 242
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 221
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-35-19
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.423347598780352
  episode_reward_mean: -2641.1722192658185
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 0
  episodes_total: 474
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.814
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 248
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 242
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 232
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-35-34
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.423347598780352
  episode_reward_mean: -2356.89901651875
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 0
  episodes_total: 483
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.831
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 267
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-35-48
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -2363.3695894920525
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 3
  episodes_total: 492
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.625
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689376e-05
        entropy: 1.3700737953186035
        entropy_coeff: 0.0
        kl: 5.160331511433469e-06
        policy_loss: -0.0003172159194946289
        total_loss: 23925.0078125
        vf_explained_var: -0.0002180337905883789
        vf_loss: 23925.005859375
    load_time_ms: 0.918
    num_steps_sampled: 492000
    num_steps

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 250
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 236
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 269
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-36-04
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -2135.544332498005
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 0
  episodes_total: 501
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.333
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 247
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 251
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 235
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-36-21
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -1674.6881668263043
  episode_reward_min: -20201.948432758036
  episodes_this_iter: 0
  episodes_total: 510
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.725
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 270
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 228
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 246
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-36-45
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -1541.9742872095592
  episode_reward_min: -17043.693723663153
  episodes_this_iter: 0
  episodes_total: 525
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.236
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 261
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 246
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 255
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-37-07
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -1352.603674163488
  episode_reward_min: -17043.693723663153
  episodes_this_iter: 0
  episodes_total: 537
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.865
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 223
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 229
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 248
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-37-29
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.620004704282471
  episode_reward_mean: -1226.4801460138085
  episode_reward_min: -10736.477853410228
  episodes_this_iter: 0
  episodes_total: 549
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.485
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.999999873689

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 245
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 225
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 231
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-37-51
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -1049.5663125516203
  episode_reward_min: -10736.477853410228
  episodes_this_iter: 3
  episodes_total: 561
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.24
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 238
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 251
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 223
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-38-13
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -935.813882538562
  episode_reward_min: -10736.477853410228
  episodes_this_iter: 0
  episodes_total: 573
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.652
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 258
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 238
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 269
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-38-36
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -752.5441389610221
  episode_reward_min: -10736.477853410228
  episodes_this_iter: 0
  episodes_total: 585
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.343
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 232
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 246
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 248
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-39-01
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -695.6430291625907
  episode_reward_min: -10738.435098189335
  episodes_this_iter: 0
  episodes_total: 600
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 16.061
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 241
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 259
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 229
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 247
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 262
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 257
[2m[36m(pid=17035)[0m -----------------------
Resu

[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 259
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 227
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 244
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-39-47
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -784.3115732329154
  episode_reward_min: -14525.788691568561
  episodes_this_iter: 0
  episodes_total: 627
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.06
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 251
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 268
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 260
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-40-10
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -607.5628118478327
  episode_reward_min: -14525.788691568561
  episodes_this_iter: 0
  episodes_total: 639
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.907
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 257
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 229
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 265
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-40-32
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.468003427410517
  episode_reward_mean: -626.5181867641603
  episode_reward_min: -14525.788691568561
  episodes_this_iter: 0
  episodes_total: 651
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.658
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 233
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 270
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 257
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-40-53
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.604987030729615
  episode_reward_mean: -639.3319648990032
  episode_reward_min: -14525.788691568561
  episodes_this_iter: 0
  episodes_total: 663
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 14.613
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 248
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 251
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 262
[2m[36m(pid=17033)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-41-16
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.604987030729615
  episode_reward_mean: -816.2363555732586
  episode_reward_min: -15803.393034681561
  episodes_this_iter: 0
  episodes_total: 675
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 15.731
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.9999998736893

[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 265
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 229
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 225
[2m[36m(pid=17035)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-41-38
  done: false
  episode_len_mean: 1000.0
  episode_reward_max: -5.94247897100012
  episode_reward_mean: -784.7040059759457
  episode_reward_min: -15803.393034681561
  episodes_this_iter: 0
  episodes_total: 687
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 12.983
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

[2m[36m(pid=17035)[0m 
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17035)[0m ring length: 239
[2m[36m(pid=17035)[0m -----------------------
[2m[36m(pid=17033)[0m 
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17033)[0m ring length: 247
[2m[36m(pid=17033)[0m -----------------------
[2m[36m(pid=17034)[0m 
[2m[36m(pid=17034)[0m -----------------------
[2m[36m(pid=17034)[0m ring length: 223
[2m[36m(pid=17034)[0m -----------------------
Result for PPO_EnergyOptPOEnv-v0_0:
  custom_metrics: {}
  date: 2020-07-23_18-41-58
  done: true
  episode_len_mean: 1000.0
  episode_reward_max: -6.010638117001379
  episode_reward_mean: -702.7642679412602
  episode_reward_min: -15803.393034681561
  episodes_this_iter: 0
  episodes_total: 699
  experiment_id: 7eca8b5e43da4c948c7b8f10416b7db0
  hostname: solom-XPS-13-9380
  info:
    grad_time_ms: 13.586
    learner:
      default_policy:
        cur_kl_coeff: 0.0
        cur_lr: 4.99999987368937

### 4.5 Visualizing the results

The simulation results are saved within the `ray_results/training_example` directory (we defined `training_example` at the start of this tutorial). The `ray_results` folder is by default located at your root `~/ray_results`. 

You can run `tensorboard --logdir=~/ray_results/training_example` (install it with `pip install tensorboard`) to visualize the different data outputted by your simulation.

For more instructions about visualizing, please see `tutorial05_visualize.ipynb`. 

### 4.6 Restart from a checkpoint / Transfer learning

If you wish to do transfer learning, or to resume a previous training, you will need to start the simulation from a previous checkpoint. To do that, you can add a `restore` parameter in the `run_experiments` argument, as follows:

```python
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "restore": "/ray_results/experiment/dir/checkpoint_50/checkpoint-50"
        "checkpoint_freq": 1,
        "checkpoint_at_end": True,
        "max_failures": 999,
        "stop": {
            "training_iteration": 1,
        },
    },
})
```

The `"restore"` path should be such that the `[restore]/.tune_metadata` file exists.

There is also a `"resume"` parameter that you can set to `True` if you just wish to continue the training from a previously saved checkpoint, in case you are still training on the same experiment. 

In [17]:
# trials = run_experiments({
#     flow_params["exp_tag"]: {
#         "run": alg_run,
#         "env": gym_name,
#         "config": {
#             **config
#         },
#         "restore": "/ray_results/training_example13/PPO_EnergyOptPOEnv-v0_0_2020-07-23_13-30-07yze28sum/checkpoint_400/checkpoint-400", 
#         "checkpoint_freq": 20,
#         "checkpoint_at_end": True,
#         "max_failures": 999,
#         "stop": {
#             "training_iteration": 700,
#         },
#     },
# })