# Tutorial ACS_UPB_LAB1: Running Sumo Simulations

__Credits: most of the credits for this ipynb goes to https://github.com/flow-project/flow/tree/master/tutorials__

This tutorial walks through the process of running non-RL traffic simulations in Flow. Simulations of this form act as non-autonomous baselines and depict the behavior of human dynamics on a network. Similar simulations may also be used to evaluate the performance of hand-designed controllers on a network. This tutorial focuses primarily on the former use case, while an example of the latter may be found in `exercise07_controllers.ipynb`.

In this exercise, we simulate a initially perturbed single lane ring road. We witness in simulation that as time advances the initially perturbations do not dissipate, but instead propagates and expands until vehicles are forced to periodically stop and accelerate. For more information on this behavior, we refer the reader to the following article [1].

## 1.1 Components of a Simulation
All simulations, both in the presence and absence of RL, require two components: a *network*, and an *environment*. Networks describe the features of the transportation network used in simulation. This includes the positions and properties of nodes and edges constituting the lanes and junctions, as well as properties of the vehicles, traffic lights, inflows, etc. in the network. Environments, on the other hand, initialize, reset, and advance simulations, and act the primary interface between the reinforcement learning algorithm and the network. Moreover, custom environments may be used to modify the dynamical features of an network.

## 1.2 Setting up the environment of current lab (ENV1)
Load configurations for lab 1.

## 2. Setting up a Network
Flow contains a plethora of pre-designed networks used to replicate highways, intersections, and merges in both closed and open settings. All these networks are located in flow/networks. In order to recreate a ring road network, we begin by importing the network `RingNetwork`.

In [1]:
from flow.envs.nemodrive_lab import ENV1 as ENV

# from flow.networks.figure_eight import FigureEightNetwork
network_name = ENV["NETWORK"]
print(network_name.__name__)

FigureEightNetwork


This network, as well as all other networks in Flow, is parametrized by the following arguments: 
* name
* vehicles
* net_params
* initial_config
* traffic_lights

These parameters allow a single network to be recycled for a multitude of different network settings. For example, `RingNetwork` may be used to create ring roads of variable length with a variable number of lanes and vehicles.

### 2.1 Name
The `name` argument is a string variable depicting the name of the network. This has no effect on the type of network created.

In [2]:
name = network_name.__name__

### 2.2 VehicleParams
The `VehicleParams` class stores state information on all vehicles in the network. This class is used to identify the dynamical behavior of a vehicle and whether it is controlled by a reinforcement learning agent. Morover, information pertaining to the observations and reward function can be collected from various get methods within this class.

The initial configuration of this class describes the number of vehicles in the network at the start of every simulation, as well as the properties of these vehicles. We begin by creating an empty `VehicleParams` object.

In [3]:
vehicles = ENV["VEHICLES"]()

# code in get_vehicles 
# from flow.core.params import VehicleParams

# vehicles = VehicleParams()

Once this object is created, vehicles may be introduced using the `add` method. This method specifies the types and quantities of vehicles at the start of a simulation rollout. For a description of the various arguements associated with the `add` method, we refer the reader to the following documentation ([VehicleParams.add](https://flow.readthedocs.io/en/latest/flow.core.html?highlight=vehicleparam#flow.core.params.VehicleParams)).

When adding vehicles, their dynamical behaviors may be specified either by the simulator (default), or by user-generated models. For longitudinal (acceleration) dynamics, several prominent car-following models are implemented in Flow. For this example, the acceleration behavior of all vehicles will be defined by the Intelligent Driver Model (IDM) [2].

In [4]:
# code in get_vehicles 
# from flow.controllers.car_following_models import IDMController

Another controller we define is for the vehicle's routing behavior. For closed network where the route for any vehicle is repeated, the `ContinuousRouter` controller is used to perpetually reroute all vehicles to the initial set route.

In [5]:
# code in get_vehicles 
# from flow.controllers.routing_controllers import ContinuousRouter

Finally, we add 22 vehicles of type "human" with the above acceleration and routing behavior into the `Vehicles` class.

In [6]:
# (E.g. code in get_vehicles)
# vehicles.add("human",
#              acceleration_controller=(IDMController, {}),
#              routing_controller=(ContinuousRouter, {}),
#              num_vehicles=22)

### 2.3 NetParams

`NetParams` are network-specific parameters used to define the shape and properties of a network. Unlike most other parameters, `NetParams` may vary drastically depending on the specific network configuration, and accordingly most of its parameters are stored in `additional_params`. In order to determine which `additional_params` variables may be needed for a specific network, we refer to the `ADDITIONAL_NET_PARAMS` variable located in the network file.

In [7]:
# from flow.networks.ring import ADDITIONAL_NET_PARAMS

ADDITIONAL_NET_PARAMS = ENV["ADDITIONAL_NET_PARAMS"]

print(ADDITIONAL_NET_PARAMS)

{'radius_ring': 60, 'lanes': 2, 'speed_limit': 30, 'resolution': 40}


Importing the `ADDITIONAL_NET_PARAMS` dict from the ring road network, we see that the required parameters are:

* **length**: length of the ring road
* **lanes**: number of lanes
* **speed**: speed limit for all edges
* **resolution**: resolution of the curves on the ring. Setting this value to 1 converts the ring to a diamond.


At times, other inputs may be needed from `NetParams` to recreate proper network features/behavior. These requirements can be founded in the network's documentation. For the ring road, no attributes are needed aside from the `additional_params` terms. Furthermore, for this exercise, we use the network's default parameters when creating the `NetParams` object.

In [8]:
from flow.core.params import NetParams

net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)

### 2.4 InitialConfig

`InitialConfig` specifies parameters that affect the positioning of vehicle in the network at the start of a simulation. These parameters can be used to limit the edges and number of lanes vehicles originally occupy, and provide a means of adding randomness to the starting positions of vehicles. In order to introduce a small initial disturbance to the system of vehicles in the network, we set the `perturbation` term in `InitialConfig` to 1m.

In [9]:
from flow.core.params import InitialConfig
initial_config_param = ENV["INITIAL_CONFIG_PARAMS"]
print(initial_config_param)

initial_config = InitialConfig(**initial_config_param)

{'spacing': 'random', 'perturbation': 50}


### 2.5 TrafficLightParams

`TrafficLightParams` are used to describe the positions and types of traffic lights in the network. These inputs are outside the scope of this tutorial, and instead are covered in `exercise06_traffic_lights.ipynb`. For our example, we create an empty `TrafficLightParams` object, thereby ensuring that none are placed on any nodes.

In [10]:
from flow.core.params import TrafficLightParams

traffic_lights = TrafficLightParams()

## 3. Setting up an Environment

Several envionrments in Flow exist to train autonomous agents of different forms (e.g. autonomous vehicles, traffic lights) to perform a variety of different tasks. These environments are often network or task specific; however, some can be deployed on an ambiguous set of networks as well. One such environment, `AccelEnv`, may be used to train a variable number of vehicles in a fully observable network with a *static* number of vehicles.

In [11]:
# from flow.envs.nemodrive_lab.env1_lab import LaneChangeAccelEnv1
env_name = ENV["ENVIRONMENT"]
print(env_name)

<class 'flow.envs.nemodrive_lab.env1_lab.LaneChangeAccelEnv1'>


Although we will not be training any autonomous agents in this exercise, the use of an environment allows us to view the cumulative reward simulation rollouts receive in the absence of autonomy.

Envrionments in Flow are parametrized by three components:
* `EnvParams`
* `SumoParams`
* `Network`

### 3.1 SumoParams
`SumoParams` specifies simulation-specific variables. These variables include the length a simulation step (in seconds) and whether to render the GUI when running the experiment. For this example, we consider a simulation step length of 0.1s and activate the GUI.

Another useful parameter is `emission_path`, which is used to specify the path where the emissions output will be generated. They contain a lot of information about the simulation, for instance the position and speed of each car at each time step. If you do not specify any emission path, the emission file will not be generated. More on this in Section 5.

In [12]:
from flow.core.params import SumoParams

sumo_params = SumoParams(sim_step=0.1, render=True, emission_path='data', restart_instance=True)

### 3.2 EnvParams

`EnvParams` specify environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. Much like `NetParams`, the attributes associated with this parameter are mostly environment specific, and can be found in the environment's `ADDITIONAL_ENV_PARAMS` dictionary.

In [13]:
# from flow.envs.nemodrive_lab.env1_lab import ADDITIONAL_ENV1_PARAMS
ADDITIONAL_ENV_PARAMS = ENV["ADDITIONAL_ENV_PARAMS"]

print(ADDITIONAL_ENV_PARAMS)

{'max_accel': 3, 'max_decel': 3, 'lane_change_duration': 5, 'target_velocity': 10, 'sort_vehicles': False, 'forward_progress_gain': 0.1, 'collision_reward': -1, 'lane_change_reward': -0.1, 'frontal_collision_distance': 2.0, 'lateral_collision_distance': 3.0}


Importing the `ADDITIONAL_ENV_PARAMS` variable, we see that it consists of only one entry, "target_velocity", which is used when computing the reward function associated with the environment. We use this default value when generating the `EnvParams` object.

In [14]:
from flow.core.params import EnvParams

env_params = EnvParams(additional_params=ADDITIONAL_ENV_PARAMS, horizon=ENV["HORIZON"])

## 4. Setting up and Running the Experiment
Once the inputs to the network and environment classes are ready, we are ready to set up a `Experiment` object.

In [15]:
from flow.core.experiment import Experiment

These objects may be used to simulate rollouts in the absence of reinforcement learning agents, as well as acquire behaviors and rewards that may be used as a baseline with which to compare the performance of the learning agent. In this case, we choose to run our experiment for one rollout consisting of 3000 steps (300 s).

**Note**: When executing the below code, remeber to click on the    <img style="display:inline;" src="img/play_button.png"> Play button after the GUI is rendered.

In [16]:
# create the network object
network = network_name(name="ring_example",
                       vehicles=vehicles,
                       net_params=net_params,
                       initial_config=initial_config,
                       traffic_lights=traffic_lights)



In [None]:
# create the environment object
sumo_params.render = False
env = env_name(env_params, sumo_params, network)

# create the experiment object
exp = Experiment(env)
_ = exp.run(1, 3000, convert_to_csv=True)


Run still agent.

In [None]:
sumo_params.render = False
env = env_name(env_params, sumo_params, network)

# create the experiment object
exp = Experiment(env)

rl_actions = lambda state: [0, 0]

_ = exp.run(1, 3000, convert_to_csv=True, rl_actions=rl_actions)

Run random agent.

Use __FullExperiment__ to test agent that expects _state, reward, done, info_.

In [None]:
from flow.core.experiment_with_reward import FullExperiment
import numpy as np

class RandomAgent():
    def __init__(self, env):
        self.action_space = env.action_space
        self.max_decel = env.env_params.additional_params["max_decel"]
        self.max_accel = env.env_params.additional_params["max_accel"]
        self.change_lane_step_freq = 10
        self.num_steps = 0
        
    def act(self, state, reward, done, info):
        self.num_steps += 1
        d = 0
        if self.num_steps % self.change_lane_step_freq == 0:
            d = np.random.randint(3) - 1

        acc = np.random.uniform(-self.max_decel, self.max_accel)
        action =  np.array([acc, d])

        yield action

sumo_params.render = False
env = env_name(env_params, sumo_params, network)

exp = FullExperiment(env)

agent = RandomAgent(env)

_ = exp.run(10, 3000, convert_to_csv=True, rl_actions=agent.act)


Feel free to experiment with all these problems and more!

## Bibliography
[1] Sugiyama, Yuki, et al. "Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam." New journal of physics 10.3 (2008): 033001.

[2] Treiber, Martin, Ansgar Hennecke, and Dirk Helbing. "Congested traffic states in empirical observations and microscopic simulations." Physical review E 62.2 (2000): 1805.

## 5 Setting up Flow Parameters

RLlib experiments both generate a `params.json` file for each experiment run. For RLlib experiments, the parameters defining the Flow network and environment must be stored as well. As such, in this section we define the dictionary `flow_params`, which contains the variables required by the utility function `make_create_env`. `make_create_env` is a higher-order function which returns a function `create_env` that initializes a Gym environment corresponding to the Flow network specified.

In [17]:
# Creating flow_params. Make sure the dictionary keys are as specified. 
sumo_params.render = False
sumo_params.print_warnings=False
flow_params = dict(
    # name of the experiment
    exp_tag=name,
    # name of the flow environment the experiment is running on
    env_name=env_name,
    # name of the network class the experiment uses
    network=network_name,
    # simulator that is used by the experiment
    simulator='traci',
    # sumo-related parameters (see flow.core.params.SumoParams)
    sim=sumo_params,
    # environment related parameters (see flow.core.params.EnvParams)
    env=env_params,
    # network-related parameters (see flow.core.params.NetParams and
    # the network's documentation or ADDITIONAL_NET_PARAMS component)
    net=net_params,
    # vehicles to be placed in the network at the start of a rollout 
    # (see flow.core.vehicles.Vehicles)
    veh=vehicles,
    # (optional) parameters affecting the positioning of vehicles upon 
    # initialization/reset (see flow.core.params.InitialConfig)
    initial=initial_config
)

## 4 Running RL experiments in Ray

### 4.1 Import 

First, we must import modules required to run experiments in Ray. The `json` package is required to store the Flow experiment parameters in the `params.json` file, as is `FlowParamsEncoder`. Ray-related imports are required: the PPO algorithm agent, `ray.tune`'s experiment runner, and environment helper methods `register_env` and `make_create_env`.

In [18]:
import json

import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env

from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


### 4.2 Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [19]:
# number of parallel workers
N_CPUS = 8
# number of rollouts per training iteration
N_ROLLOUTS = 20

ray.init(num_cpus=N_CPUS)

2019-11-29 12:28:31,294	INFO resource_spec.py:205 -- Starting Ray with 1.71 GiB memory available for workers and up to 0.87 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).


{'node_ip_address': '172.19.3.168',
 'redis_address': '172.19.3.168:38662',
 'object_store_address': '/tmp/ray/session_2019-11-29_12-28-31_293489_18475/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2019-11-29_12-28-31_293489_18475/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2019-11-29_12-28-31_293489_18475'}

### 4.3 Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym. 

In [20]:
# The algorithm or model to train. This may refer to "
#      "the name of a built-on algorithm (e.g. RLLib's DQN "
#      "or PPO), or a user-defined trainable function or "
#      "class registered in the tune registry.")
alg_run = "PPO"
HORIZON = 100

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])  # stochastic gradient descent
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 500  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

### 4.4 Running Experiments

Here, we use the `run_experiments` function from `ray.tune`. The function takes a dictionary with one key, a name corresponding to the experiment, and one value, itself a dictionary containing parameters for training.

In [None]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 1,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 500,  # number of iterations to stop after
        },
    },
})



== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/1.71 GiB heap, 0.0/0.59 GiB objects
Memory usage on this node: 12.2/15.6 GiB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 0/1 GPUs, 0.0/1.71 GiB heap, 0.0/0.59 GiB objects
Memory usage on this node: 12.3/15.6 GiB
Result logdir: /home/andrei/ray_results/FigureEightNetwork
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_LaneChangeAccelEnv1-v0_0:	RUNNING

[2m[36m(pid=18531)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=18531)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=18531)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=18531)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=18531)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=18531)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=18531)[0m 2019-11-29 12:28:38,471	I

[2m[36m(pid=18528)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=18528)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=18528)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=18528)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=18528)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=18528)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=18527)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=18527)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=18527)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=18527)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=18527)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=18527)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=18529)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m

[2m[36m(pid=18530)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=18530)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=18530)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=18530)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=18530)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=18530)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=18598)[0m Instructions for updating:
[2m[36m(pid=18598)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18598)[0m Instructions for updating:
[2m[36m(pid=18598)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18598)[0m Instructions for updating:
[2m[36m(pid=18598)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18598)[0m Instructions for updating:
[2m[36m(pid=18598)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18533)[0m Instructions for updatin

[2m[36m(pid=18529)[0m Instructions for updating:
[2m[36m(pid=18529)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18529)[0m Instructions for updating:
[2m[36m(pid=18529)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18525)[0m Instructions for updating:
[2m[36m(pid=18525)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18525)[0m Instructions for updating:
[2m[36m(pid=18525)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18530)[0m Instructions for updating:
[2m[36m(pid=18530)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18530)[0m Instructions for updating:
[2m[36m(pid=18530)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18530)[0m Instructions for updating:
[2m[36m(pid=18530)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18530)[0m Instructions for updating:
[2m[36m(pid=18530)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18

[2m[36m(pid=18533)[0m 2019-11-29 12:29:09,977	INFO sampler.py:555 -- Outputs of compute_actions():
[2m[36m(pid=18533)[0m 
[2m[36m(pid=18533)[0m { 'default_policy': ( { 'data': { 'batches': [ np.ndarray((1, 1), dtype=float32, min=-0.845, max=-0.845, mean=-0.845),
[2m[36m(pid=18533)[0m                                                np.ndarray((1,), dtype=int64, min=2.0, max=2.0, mean=2.0)]},
[2m[36m(pid=18533)[0m                         'type': 'TupleActions'},
[2m[36m(pid=18533)[0m                       [],
[2m[36m(pid=18533)[0m                       { 'action_logp': np.ndarray((1,), dtype=float32, min=-2.375, max=-2.375, mean=-2.375),
[2m[36m(pid=18533)[0m                         'action_prob': np.ndarray((1,), dtype=float32, min=0.093, max=0.093, mean=0.093),
[2m[36m(pid=18533)[0m                         'behaviour_logits': np.ndarray((1, 5), dtype=float32, min=-0.007, max=0.001, mean=-0.003),
[2m[36m(pid=18533)[0m                         'vf_preds': np.n

[2m[36m(pid=18533)[0m 2019-11-29 12:29:37,189	INFO rollout_worker.py:501 -- Completed sample batch:
[2m[36m(pid=18533)[0m 
[2m[36m(pid=18533)[0m { 'data': { 'action_logp': np.ndarray((200,), dtype=float32, min=-6.228, max=-2.01, mean=-2.524),
[2m[36m(pid=18533)[0m             'action_prob': np.ndarray((200,), dtype=float32, min=0.002, max=0.134, mean=0.093),
[2m[36m(pid=18533)[0m             'actions': np.ndarray((200, 2), dtype=float32, min=-2.883, max=2.395, mean=0.512),
[2m[36m(pid=18533)[0m             'advantages': np.ndarray((200,), dtype=float32, min=-18.425, max=-0.004, mean=-12.335),
[2m[36m(pid=18533)[0m             'agent_index': np.ndarray((200,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[2m[36m(pid=18533)[0m             'behaviour_logits': np.ndarray((200, 5), dtype=float32, min=-0.007, max=0.002, mean=-0.002),
[2m[36m(pid=18533)[0m             'dones': np.ndarray((200,), dtype=bool, min=0.0, max=1.0, mean=0.01),
[2m[36m(pid=18533)[0m        

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-29-57
  done: false
  episode_len_mean: 100.0
  episode_reward_max: -26.34719212110256
  episode_reward_mean: -44.916006324565615
  episode_reward_min: -61.4768787661309
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4648.444
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 2.5102245807647705
        entropy_coeff: 0.0
        kl: 0.022121764719486237
        policy_loss: -0.019945034757256508
        total_loss: 116.4520492553711
        vf_explained_var: -0.0030957460403442383
        vf_loss: 116.46757507324219
    load_time_ms: 54.57
    num_steps_sampled: 2000
    num_steps_trained: 2000
    sample_time_ms: 44934.394
    update_time_ms: 1052.637
  iterations_since_restore: 1
  node_ip: 172.19.3.168
  num_healthy_workers: 7
 

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-31-51
  done: false
  episode_len_mean: 100.0
  episode_reward_max: -8.34198240392416
  episode_reward_mean: -35.879404366584836
  episode_reward_min: -66.28624914801476
  episodes_this_iter: 20
  episodes_total: 60
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-32-50
  done: false
  episode_len_mean: 100.0
  episode_reward_max: -5.328344142339765
  episode_reward_mean: -31.625615024137335
  episode_reward_min: -66.28624914801476
  episodes_this_iter: 20
  episodes_total: 80
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4094.018
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689

[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-33-35
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 15.158017599494153
  episode_reward_mean: -26.161431006171586
  episode_reward_min: -66.28624914801476
  episodes_this_iter: 20
  episodes_total: 100
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4049.51
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 2.1945958137512207
        entropy_coeff: 0.0
        kl: 0.01472463272511959
        policy_loss

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-34-16
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 15.90452108893653
  episode_reward_mean: -17.211868348119957
  episode_reward_min: -66.28624914801476
  episodes_this_iter: 20
  episodes_total: 120
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4009.868
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 2.0929043292999268
        entropy_coeff: 0.0
        kl: 0.02113816700875759
        policy_loss: -0.02397967502474785
        total_loss: 11.476655006408691
        vf_explained_var: -0.0028121471405029297
        vf_loss: 11.496408462524414
    load_time_ms: 10.432
    num_ste

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-35-09
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 20.34327139182502
  episode_reward_mean: -8.051775824262604
  episode_reward_min: -52.9019759524435
  episodes_this_iter: 20
  episodes_total: 140
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3986.982
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.9703090190887451
        entropy_coeff: 0.0
        kl: 0.015034849755465984
        policy_loss: -0.0187921654433012
        total_loss: 16.456390380859375
        vf_explained_var: -0.0008807182312011719
        vf_loss: 16.47217559814453
    load_time_ms: 9.196
    num_steps_sampled: 14000
    num_steps_trained: 14000
    sample_time_ms: 47602.014
    update_time_ms: 165.446
  iterations_since_restore: 7
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  o

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-37-02
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 35.61973604307993
  episode_reward_mean: 8.363436791600698
  episode_reward_min: -25.405275937993924
  episodes_this_iter: 20
  episodes_total: 180
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3952.928
    learner:
      default_policy:
        cur_

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-37-49
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 41.376996559768614
  episode_reward_mean: 15.30783730720419
  episode_reward_min: -18.77806822775423
  episodes_this_iter: 20
  episodes_total: 200
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3940.494
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.7687098979949951
        entropy_coeff: 0.0
        kl: 0.013500119559466839
        policy_loss: -0.01644487865269184
        total_loss: 63.03304672241211
        vf_explained_var: -6.031990051269531e-05
        vf_l

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-40-25
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 53.70135490741838
  episode_reward_mean: 28.889696411951192
  episode_reward_min: -1.3500653816626804
  episodes_this_iter: 20
  episodes_total: 240
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4127.142
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.6952638626098633
        entropy_coeff: 0.0
        kl: 0.005560711026191711
        policy_loss: -0.01044701598584652
        total_loss: 120.8608627319336
        vf_explained_var: -0.0005840063095092773
        vf_

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-41-18
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 63.6104931716222
  episode_reward_mean: 34.30575724443142
  episode_reward_min: 10.225918876947745
  episodes_this_iter: 20
  episodes_total: 260
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4124.212
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.632768154144287
        entropy_coeff: 0.0
        kl: 0.017541812732815742
        policy_loss: -0.020751789212226868
        total_loss: 120.96566772460938
        vf_explained_var: -0.0009244680404663086
        vf_loss: 120.98551940917969
    load_time_ms: 1.663
    num_steps_sampled: 26000
    num_steps_trained: 26000
    sample_time_ms: 52554.51
    update_time_ms: 17.821
  iterations_since_res

[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-43-29
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 69.62172119024027
  episode_reward_mean: 42.6468210543048
  episode_reward_min: 13.042185789614186
  episodes_this_iter: 20
  episodes_total: 300
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4208.234
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4771263599395752
        entropy_coeff: 0.0
        kl: 0.006525788456201553
        policy_loss: -0.008490786887705326
        total_loss: 153.4886932373047
        vf_explained_var: -0.00023043155670166016
        vf_loss: 153.4969024658203
    load_time_ms: 1.655
    num_steps_

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-44-20
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 69.62172119024027
  episode_reward_mean: 46.461551582387
  episode_reward_min: 21.053255194112914
  episodes_this_iter: 20
  episodes_total: 320
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4210.946
    learner:
      default_policy:
        cur_kl_coeff: 0.02500000037252903
        cur_lr: 4.999999873689376e-05
        entropy: 1.4660441875457764
        entropy_coeff: 0.0
        kl: 0.030356822535395622
        policy_loss: -0.023907341063022614
        total_loss: 172.08700561523438
        vf_explained_var: -8.237361907958984e-05
        vf_loss: 172.11012268066406
    load_time_ms: 1.792
    num_steps_sampled: 32000
    num_steps_trained: 32000
    sample_time_ms: 56159.705
    update_time_ms: 23.011
  iterations_since_re

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-46-21
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 82.93814308876811
  episode_reward_mean: 51.35160747509

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-46-57
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 82.93814308876811
  episode_reward_mean: 51.49143517782445
  episode_reward_min: 18.104187250056793
  episodes_this_iter: 20
  episodes_total: 380
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-47-38
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 82.93814308876811
  episode_reward_mean: 52.36707217797869
  episode_reward_min: 17.477804574977956
  episodes_this_iter: 20
  episodes_total: 400
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4167.586
    learner:
      default_policy:
        cur_kl_coeff: 0.03750000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.5722018480300903
        entropy_coeff: 0.0
        kl: 0.02452213689684868
        policy_loss: 

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-48-14
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 82.93814308876811
  episode_reward_mean: 53.55980255452563
  episode_reward_min: 16.150147618871205
  episodes_this_iter: 20
  episodes_total: 420
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3876.229
    learner:
      default_policy:
        cur_kl_coeff: 0.03750000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.5647329092025757
        entropy_coeff: 0.0
        kl: 0.051316037774086
        policy_loss: -0.03260316699743271
        total_loss: 220.68887329101562
        vf_explained_var: -2.7179718017578125e-05
        vf_los

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-51-10
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 95.7466449054983
  episode_reward_mean: 55.702535657248106
  episode_reward_min: 5.441135757873955
  episodes_this_iter: 20
  episodes_total: 480
  experiment_id: 4c66fa44b39046

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-52-11
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 95.7466449054983
  episode_reward_mean: 54.490519294409225
  episode_reward_min: 5.441135757873955
  episodes_this_iter: 20
  episodes_total: 500
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4774.724
    learner:
      default_policy:
        cur_kl

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-53-08
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 95.7466449054983
  episode_reward_mean: 53.13560963592458
  episode_reward_min: 1.8236177494159174
  episodes_this_iter: 20
  episodes_total: 520
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4762.712
    learner:
      default_policy:
        cur_kl_coeff: 0.08437500149011612
        cur_lr: 4.999999873689376

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-53-56
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 95.7466449054983
  episode_reward_mean: 53.570053817493054
  episode_reward_min: 1.8236177494159174
  episodes_this_iter: 20
  episodes_total: 540
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4764.562
    learner:
      default_policy:
        cur_kl_coeff: 0.12656250596046448
        cur_lr: 4.999999873689376e-05
        entropy: 1.6631723642349243
        entropy_coeff: 0.0
        kl: 0.036959510296583176
        policy_loss: -0.0411510244011879
        total_loss: 211.19143676757812
        vf_explained_var: -5.0067901611328125e-06
        vf_l

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-54-49
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 95.7466449054983
  episode_reward_mean: 55.91480591762933
  episode_reward_min: 1.8236177494159174
  episodes_this_iter: 20
  episodes_total: 560
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4766.926
    learner:
      default_policy:
        cur_kl_coeff: 0.12656250596046448
        cur_lr: 4.999999873689376e-05
        entropy: 1.5646562576293945
        entropy_coeff: 0.0
        kl: 0.051391810178756714
        policy_loss: -0.0532110296189785
        total_loss: 268.54803466796875
        vf_explained_var: -4.887580871582031e-06
        vf_loss: 268.59478759765625
    load_time_ms: 2.306
    num_steps_sampled: 56000
    num_steps_trained: 56000
    sample_time_ms: 45956.166
    update_time_ms: 18.974
  iterations_since_restore: 28
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  o

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-57-22
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 96.42519832168142
  episode_reward_mean: 60.88905733491

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-58-20
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 96.42519832168142
  episode_reward_mean: 61.102731835253664
  episode_reward_min: 4.598046647523791
  episodes_this_iter: 20
  episodes_total: 640
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4839.486
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.681554913520813
        entropy_coeff

[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-59-06
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 96.42519832168142
  episode_reward_mean: 59.361761182305926
  episode_reward_min: 4.598046647523791
  episodes_this_iter: 20
  episodes_total: 660
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4807.805
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.7242836952209473
        entropy_coeff: 0.0
        kl: 0.020288405939936638
        policy_loss: -0.03658849373459816
        total_loss: 221.5955352783203
        vf_explained_var: -1.430511474609375e-06
        vf_loss: 221.62631225585938
    load_time_ms: 2.793
    num_steps_s

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_12-59-43
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 103.57884159889323
  episode_reward_mean: 59.235142133975934
  episode_reward_min: 4.598046647523791
  episodes_this_iter: 20
  episodes_total: 680
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4792.322
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.722929835319519
        entropy_coeff: 0.0
        kl: 0.03869161382317543
        policy_loss: -0.04497072473168373
        total_loss: 218.55453491210938
        vf_explained_var: -1.0728836059570312e-06
        vf_loss: 218.58843994140625
    load_time_ms: 2.776
    num_steps_

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-02-57
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 103.57884159889323
  episode_reward_mean: 57.715387539155515
  episode_reward_min: 8.35284421401073
  episodes_this_

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-03-49
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.53126762268369
  episode_reward_mean: 57.22639395186495
  episode_reward_min: 8.35284421401073
  episodes_this_iter: 20
  episodes_total: 780
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4862.744
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-04-51
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.53126762268369
  episode_reward_mean: 57.928552273403376
  episode_reward_min: 8.35284421401073
  episodes_this_iter: 20
  episodes_total: 800
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4885.852
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.9176808595657349
        entropy_coeff: 0.0
        kl: 0.021466262638568878
        policy_loss: -

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-05-31
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.53126762268369
  episode_reward_mean: 60.86696364463542
  episode_reward_min: 12.554447731153578
  episodes_this_iter: 20
  episodes_total: 820
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4900.471
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.8016860485076904
        entropy_coeff: 0.0
        kl: 0.028350522741675377
        policy_loss: 

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-06-31
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.8995956543616
  episode_reward_mean: 62.1100146866053
  episode_reward_min: 12.554447731153578
  episodes_this_iter: 20
  episodes_total: 840
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4879.93
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.837691068649292
        entropy_coeff: 0.0
        kl: 0.02238737791776657
        policy_loss: -0.02945677377283573
        total_loss: 256.1331787109375
        vf_explained_var: 0.0
        vf_loss: 256.1562194824219
    load_time_ms: 1.759
    num_steps_sampled: 84000
    num_ste

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-08-14
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.89547027795885
  episode_reward_mean: 64.5396851680

[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-09-14
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.89547027795885
  episode_reward_mean: 65.74513469344697
  episode_reward_min: 2.9994564974065305
  episodes_this_iter: 20
  episodes_total: 900
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4385.525
    learner:
      default_policy:
        cur_

[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-10-03
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.89547027795885
  episode_reward_mean: 65.67859423556224
  episode_reward_min: 2.9994564974065305
  episodes_this_iter: 20
  episodes_total: 920
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3965.233
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.99999987368937

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-11-02
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.89547027795885
  episode_reward_mean: 64.49122583882492
  episode_reward_min: 2.9994564974065305
  episodes_this_iter: 20
  episodes_total: 940
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3957.567
    learner:
      default_policy:
        cur_

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-11-41
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 101.2086984304976
  episode_reward_mean: 62.34796099579001
  episode_reward_min: 8.110009955505936
  episodes_this_iter: 20
  episodes_total: 960
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4088.52
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 2.0221667289733887
        entropy_coeff: 0.0
        kl: 0.019602088257670403
        policy_loss: -0.029022926464676857
        total_loss: 190.57022094726562
        vf_explained_var: 0.0
        vf_loss: 190.593597412109

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-1

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-15-07
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.08751161858575
  episode_reward_mean: 58.439857956437336
  episode_reward_min: 3.848885476563702
  episodes_this_iter: 20
  episodes_total: 1040
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_tim

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-15-59
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.08751161858575
  episode_reward_mean: 59.42151652024894
  episode_reward_min: 3.848885476563702
  episodes_this_iter: 20
  episodes_total: 1060
  experiment_id: 4c66fa44b390

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-16-48
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 102.08751161858575
  episode_reward_mean: 58.25658333366767
  episode_reward_min: -11.722673805874322
  episodes_this_iter: 20
  episodes_total: 1080
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3894.693
    learner:
      default_policy:
        cu

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-17-27
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 105.69892412055239
  episode_reward_mean: 57.819663256713326
  episode_reward_min: -11.722673805874322
  episodes_this_iter: 20
  episodes_total: 1100
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3909.347
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.796750545501709
        entropy_coeff: 0.0
        kl: 0.0361044704914093
        policy_loss: -0.04230000078678131
        total_loss: 258.8918762207031
 

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-18-13
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 105.69892412055239
  episode_reward_mean: 56.977643037781924
  episode_reward_min: -11.722673805874322
  episodes_this_iter: 20
  episodes_total: 1120
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 3922.994
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.9116467237472534
        entropy_coeff: 0.0
        kl: 0.019577808678150177
        policy_loss: -0.030714016407728195
        total_loss: 209.0606231689453
        vf_explained_var: 5.960464477539063e-08
        vf_loss: 209.0858154296875
    load_time_ms: 1.677
    num_steps_sampled: 112000
    num_steps_trained: 112000
    sample_ti

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-19-17
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 105.69892412055239
  episode_reward_mean: 54.89390415910021
  episode_reward_min: -15.313626914852419
  episodes_this_iter: 20
  episodes_total: 1140
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4979.835
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.973544955253601
        entropy_coeff: 0.0
        kl: 0.0318862684071064
        policy_loss: 

[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-20-05
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 113.80970101060169
  episode_reward_mean: 56.52534100640262
  episode_reward_min: -15.313626914852419
  episodes_this_iter: 20
  episodes_total: 1160
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4837.321
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.8267658948898315
        entropy_coeff: 0.0
        kl: 0.03867810219526291
        policy_loss: -0.04811853542923927
        total_loss: 272.3883972167969
        vf_explained_var: 0.0
        vf_loss: 272.4255065917

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-20-53
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 113.80970101060169
  episode_reward_mean: 59.31961928839728
  episode_reward_min: -15.313626914852419
  episodes_this_iter: 20
  episodes_total: 1180
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4831.44
    learner:
      default_policy:
        cur_kl_coeff: 0.2847656309604645
        cur_lr: 4.999999873689376e-05
        entropy: 1.9099223613739014
        entropy_coeff: 0.0
        kl: 0.04980594664812088
        policy_loss: -0.04971517622470856
        total_loss: 276.2934875488281
        vf_explained_var: -1.1920928955078125e-07
        vf_loss: 276.3290100097656
    load_time_ms: 2.024
    num_steps_sampled: 118000
    num_steps_trained: 118000
    sample_time

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-21-52
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 113.80970101060169
  episode_reward_mean: 58.03544442719823
  episode_reward_min: -15.313626914852419
  episodes_this_iter: 20
  episodes_total: 1200
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 5034.783
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 2.0401570796966553
        entropy_coeff: 0.0
        kl: 0.025218291208148003
        policy_loss: -0.04416177421808243
        total_loss: 252.22447204589844
        vf_explained_var: -1.1920928955078125e-07
        vf_loss: 252.2578125
    load_time_ms: 2.669
    num_steps_sampled: 120000
    num_steps_trained: 120000
    sample_time_ms: 44769.16
    update_time_ms: 17.542
  iterations_since_restore: 60
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  of

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-22-33
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 113.80970101060169
  episode_reward_mean: 59.794021508303885
  episode_reward_min: -15.313626914852419
  episodes_this_iter: 20
  episodes_total: 1220
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 5034.179
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 1.6514610052108765
        entropy_coeff: 0.0
        kl: 0.020937694236636162
        policy_loss: -0.037048742175102234
        total_loss: 226.0435791015625
        vf_explained_var: 1.7881393432617188e-07
        vf_loss: 226.07168579101562
    load_time_ms: 2.561
    num_steps_sampled: 122000
    num_steps_trained: 122000
    sample_time_ms: 43400.597
    update_time_ms: 17.623
  iterations_si

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-23-25
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 113.80970101060169
  episode_reward_mean: 63.079910677776915
  episode_reward_min: 1.170507471565906
  episodes_this_iter: 20
  episodes_total: 1240
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 5584.754
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 2.1799867153167725
        entropy_coeff: 0.0
        kl: 0.017226792871952057
        policy_loss: -0.02834331803023815
        total_loss: 242.51043701171875
        vf_explained_var: 0.0
        vf_loss: 242.53138732910156
    load_time_ms: 3.817
    num_steps_sampled: 124000
    num_steps_trained: 124000
    sample_time_ms: 44126.702
    update_time_ms: 17.666
  iterations_since_restore: 62
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  off_policy_esti

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-2

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-26-56
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 103.24051853338115
  episode_reward_mean: 57.648447910566844
  episode_reward_min: 2.7927860018218444
  episodes_this_iter: 20
  episodes_total: 1320
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 5551.618
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-27-42
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 101.40424326461621
  episode_reward_mean: 56.62355490838718
  episode_reward_min: 2.7927860018218444
  episodes_this_iter: 20
  episodes_total: 1340
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4501.478
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 2.3965835571289062
        entropy_coeff: 0.0
        kl: 0.030709372833371162
        policy_loss: -0.04328332841396332
        total_loss: 235.24363708496094
        vf_explained_var: 0.0
        vf_loss: 235.273834228

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-28-44
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 106.05446898064763
  episode_reward_mean: 54.97963515321858
  episode_reward_min: 2.7927860018218444
  episodes_this_iter: 20
  episodes_total: 1360
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4526.108
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 2.0316920280456543
        entropy_coeff: 0.0
        kl: 0.028706876561045647
        policy_loss: -0.03757999464869499
        total_loss: 209.8843536376953
        vf_explained_var: -1.1920928955078125e-07
        vf_

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-29-30
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 106.05446898064763
  episode_reward_mean: 56.52604186347337
  episode_reward_min: 4.008517161822962
  episodes_this_iter: 20
  episodes_total: 1380
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4548.31
    learner:
      default_policy:
        cur_kl_coeff: 0.4271484315395355
        cur_lr: 4.999999873689376e-05
        entropy: 2.111539125442505
        entropy_coeff: 0.0
        kl: 0.04070242494344711
        policy_loss: -0.04782974347472191
        total_loss: 242.2086181640625
        vf_explained_var: 0.0
        vf_loss: 242.2390899658203
    load_time_ms: 4.005
    num_steps_sampled: 138000
    num_steps_trained: 138000
    sample_time_ms: 47116.764
    update_time_ms: 17.81
  iterations_since_restore: 69
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  off_policy_estimator: 

[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-31-09
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 106.05446898064763
  episode_reward_mean: 56.878947143538014
  episode_reward_min: 7.2687456928031375
  episodes_thi

[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-32-06
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 106.05446898064763
  episode_reward_mean: 56.541955713668614
  episode_reward_min: 6.434985195702133
  episodes_this

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-33-12
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 104.51505275695578
  episode_reward_mean: 57.593626525642904
  episode_reward_min: 6.434985195702133
  episodes_this_iter: 20
  episodes_total: 1460
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4258.977
    learner:
      default_policy:
        cur_kl_coeff: 0.6407226324081421
        cur_lr: 4.9999998736893

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-34-19
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 104.51505275695578
  episode_reward_mean: 55.06856035261469
  episode_reward_min: 6.434985195702133
  episodes_this_iter: 20
  episodes_total: 1480
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-35-04
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 104.51505275695578
  episode_reward_mean: 53.67813933296995
  episode_reward_min: 0.8465541499776424
  episodes_this_iter: 20
  episodes_total: 1500
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4279.138
    learner:
      default_policy:
        cur_kl_coeff: 0.6407226324081421
        cur_lr: 4.999999873689376e-05
        entropy: 2.482215642929077
        entropy_coeff: 0.0
        kl: 0.011844136752188206
        policy_loss: -0.02492212876677513
        total_loss: 231.77012634277344


[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-35-52
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 104.51505275695578
  episode_reward_mean: 54.46573521590566
  episode_reward_min: -2.987494852964023
  episodes_this_iter: 20
  episodes_total: 1520
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4277.774
    learner:
      default_policy:
        cur_kl_coeff: 0.6407226324081421
        cur_lr: 4.999999873689376e-05
        entropy: 1.8094652891159058
        entropy_coeff: 0.0
        kl: 0.011337812058627605
        policy_loss: -0.024505823850631714
        total_loss: 247.4650115966797
        vf_explained_var: 5.960464477539063e-08
        vf_loss: 247.48226928710938
    load_time_ms: 3.54
    num_steps_

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-36-57
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 97.25020458350082
  episode_reward_mean: 55.821874428729274
  episode_reward_min: -2.987494852964023
  episodes_this_iter: 20
  episodes_total: 1540
  experiment_id: 4c66fa44b39046a18a94e3a8a1253eac
  hostname: andrei
  info:
    grad_time_ms: 4270.379
    learner:
      default_policy:
        cur_kl_coeff: 0.6407226324081421
        cur_lr: 4.999999873689376e-05
        entropy: 1.9751026630401611
        entropy_coeff: 0.0
        kl: 0.015142806805670261
        policy_loss: -0.03171955794095993
        total_loss: 254.53602600097656
        vf_explained_var: -1.1920928955078125e-07
        vf_loss: 254.5580596923828
    load_time_ms: 3.533
    num_steps_sampled: 154000
    num_steps_trained: 154000
    sample_ti

[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18529)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18525)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18528)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18533)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18527)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18598)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=18530)[0m   veh_id, int(target_lane), 100000)


2019-11-29 13:39:25,411	ERROR trial_runner.py:569 -- Error processing event.
Traceback (most recent call last):
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(NoSuchProcess): [36mray_PPO:train()[39m (pid=18531, host=andrei)
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 418, in train
    raise e
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/andrei/anaco

[2m[36m(pid=18531)[0m 2019-11-29 13:39:25,387	INFO trainer.py:415 -- Worker crashed during call to train(). To attempt to continue training without the failed worker, set `'ignore_worker_failures': True`.
[2m[36m(pid=18532)[0m 2019-11-29 13:39:25,501	INFO trainer.py:345 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=18532)[0m Instructions for updating:
[2m[36m(pid=18532)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18532)[0m Instructions for updating:
[2m[36m(pid=18532)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18532)[0m Instructions for updating:
[2m[36m(pid=18532)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18532)[0m Instructions for updating:
[2m[36m(pid=18532)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=18532)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=18532)[0m 2019-11-29 13:39:32,228	I

[2m[36m(pid=31377)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=31377)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=31377)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=31377)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=31377)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=31377)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=31381)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=31381)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=31381)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=31381)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=31381)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=31381)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=31379)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m

[2m[36m(pid=18532)[0m 2019-11-29 13:39:57,418	INFO trainable.py:102 -- _setup took 31.917 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


2019-11-29 13:39:58,117	ERROR ray_trial_executor.py:614 -- Error restoring runner for Trial PPO_LaneChangeAccelEnv1-v0_0.
Traceback (most recent call last):
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(NoSuchProcess): [36mray_PPO:train()[39m (pid=18531, host=andrei)
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 418, in train
    raise e
  File "/home/andrei/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 407, in train
    result = Tra

2019-11-29 13:40:00,135	INFO ray_trial_executor.py:224 -- Trying to start runner for Trial PPO_LaneChangeAccelEnv1-v0_0 without checkpoint.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 0/1 GPUs, 0.0/1.71 GiB heap, 0.0/0.59 GiB objects
Memory usage on this node: 13.6/15.6 GiB
Result logdir: /home/andrei/ray_results/FigureEightNetwork
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - PPO_LaneChangeAccelEnv1-v0_0:	RUNNING, 2 failures: /home/andrei/ray_results/FigureEightNetwork/PPO_LaneChangeAccelEnv1-v0_0_2019-11-29_12-28-34qmp3vm0h/error_2019-11-29_13-40-00.txt, [8 CPUs, 0 GPUs], [pid=18531], 4163 s, 79 iter, 158000 ts, 57.2 rew

[2m[36m(pid=31379)[0m 2019-11-29 13:40:07,591	INFO trainer.py:345 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=31299)[0m Instructions for updating:
[2m[36m(pid=31299)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31299)[0m Instructions for updating:
[2m[36m(pid=31299)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31298)[0m Instructions for updating:
[2m[36m(pid=3

[2m[36m(pid=31379)[0m Instructions for updating:
[2m[36m(pid=31379)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31379)[0m Instructions for updating:
[2m[36m(pid=31379)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31298)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31293)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31299)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31296)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31295)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31297)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31294)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31379)[0m   "Converting sparse IndexedSlices to a den

[2m[36m(pid=31375)[0m Instructions for updating:
[2m[36m(pid=31375)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31375)[0m Instructions for updating:
[2m[36m(pid=31375)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31377)[0m Instructions for updating:
[2m[36m(pid=31377)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31377)[0m Instructions for updating:
[2m[36m(pid=31377)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31380)[0m   "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[2m[36m(pid=31377)[0m Instructions for updating:
[2m[36m(pid=31377)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31377)[0m Instructions for updating:
[2m[36m(pid=31377)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31381)[0m Instructions for updating:
[2m[36m(pid=31381)[0m keep_dims is deprecated, use keepdims instead
[2m[36m(pid=31381)[0m Instructions for

[2m[36m(pid=31380)[0m 2019-11-29 13:40:52,592	INFO rollout_worker.py:467 -- Generating sample batch of size 200
[2m[36m(pid=31380)[0m 2019-11-29 13:40:54,291	INFO sampler.py:310 -- Raw obs from env: { 0: { 'agent0': np.ndarray((69,), dtype=float64, min=0.0, max=0.969, mean=0.288)}}
[2m[36m(pid=31380)[0m 2019-11-29 13:40:54,291	INFO sampler.py:311 -- Info return from env: {0: {'agent0': None}}
[2m[36m(pid=31380)[0m 2019-11-29 13:40:54,291	INFO sampler.py:409 -- Preprocessed obs: np.ndarray((69,), dtype=float64, min=0.0, max=0.969, mean=0.288)
[2m[36m(pid=31380)[0m 2019-11-29 13:40:54,291	INFO sampler.py:413 -- Filtered obs: np.ndarray((69,), dtype=float64, min=0.0, max=0.969, mean=0.288)
[2m[36m(pid=31380)[0m 2019-11-29 13:40:54,292	INFO sampler.py:528 -- Inputs to compute_actions():
[2m[36m(pid=31380)[0m 
[2m[36m(pid=31380)[0m { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
[2m[36m(pid=31380)[0m                                   'env_id': 0,
[2m[36m

[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m 2019-11-29 13:41:36,892	INFO rollout_worker.py:501 -- Completed sample batch:
[2m[36m(pid=31380)[0m 
[2m[36m(pid=31380)[0m { 'data': { 'action_logp': np.ndarray((200,), dtype=float32, min=-7.025, max=-2.013, mean=-2.497),
[2m[36m(pid=31380)[0m             'action_prob': np.ndarray((200,), dtype=float32, min=0.001, max=0.134, mean=0.096),
[2m[36m(pid=31380)[0m             'actions': np.ndarray((200, 2), dtype=float32, min=-3.174, max=2.859, mean=0.549),
[2m[36m(pid=31380)[0m             'advantages': np.ndarray((200,), dtype=float32, min=-18.048, max=0.115, mean=-8.217),
[2m[36m(pid=31380)[0m             'agent_index': np.ndarray((200,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[2m[36m(pid=31380)[0m             'behaviour_logits': np.ndarray((200, 5), dtype=float32, min=-0.004, max=0.006, mean=0.0),
[2m[36m(pid=31380)[0m             'dones': np.ndarray((200,), dtype=bool, min=0

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-42-03
  done: false
  episode_len_mean: 100.0
  episode_reward_max: -17.918471781617242
  episode_reward_mean: -47.19255849827083
  episode_reward_min: -68.0528843532122
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 6381.157
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 2.4970741271972656
        entropy_coeff: 0.0
        kl: 0.01922648213803768
        policy_loss: -0.02002931386232376
        total_loss: 132.1475067138672
        vf_explained_var: -0.0008386373519897461
        vf_loss: 132.16368103027344
    load_time_ms: 65.35
    num_steps_sampled: 2000
    num_steps_trained: 2000
    sample_time_ms: 64547.756
    update_time_ms: 515.733
  iterations_since_restore: 1
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  of

[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-4

[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChang

[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-46-33
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 7.950909018221466
  episode_reward_mean: -27.8842329894

[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-47-35
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 12.043024078637945
  episode_reward_mean: -18.642782205204018
  episode_reward_min: -56.51138051934428
  episodes_this_iter: 20
  episodes_total: 120
  experiment_id: cae98feeab

[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-48-25
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 21.271069270581204
  episode_reward_mean: -9.393912750502484
  episode_reward_min: -48.40592683226851
  episodes_thi

[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-49-20
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 25.400581541516992
  episode_reward_mean: -0.8411986793645346
  episode_reward_min: -30.348394431782193
  episodes_this_iter: 20
  episodes_total: 160
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 5632.864
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.7964372634887695
        entropy

[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-50-51
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 40.17928222740801
  episode_reward_mean: 7.777977313881715
  episode_reward_min: -23.770967967200605
  episodes_this_iter: 20
  episodes_total: 180
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 5423.785
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.7352017164230347
        entropy_coeff: 0.0
        kl: 0.013960756361484528
        policy_loss: -0.016090918332338333
        total_loss: 48.81249237060547
        vf_explained_var: 0.006356418132781982
        vf_loss: 48.82579803466797
    load_time_ms: 10.185
    num_steps_

[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-52-20
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 55.47198941729193
  episode_reward_mean: 23.567991309653692
  episode_reward_min: -8.045387892062866
  episodes_this_iter: 20
  episodes_total: 220
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 4980.095
    learner:
      default_policy:
        cur_

[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-53-12
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 58.95038255645987
  episode_reward_mean: 30.607928499878813
  episode_reward_min: 1.931494296482292
  episodes_this_iter: 20
  episodes_total: 240
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 4626.874
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.6282875537872314
        entropy_coeff: 0.0
        kl: 0.012246652506291866
        policy_loss:

[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-53-55
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 58.95038255645987
  episode_reward_mean: 35.89291883033603
  episode_reward_min: 11.710142636754728
  episodes_this_iter: 20
  episodes_total: 260
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 4794.577
    learner:
      default_policy:
        cur_kl_coeff: 0.10000000149011612
        cur_lr: 4.999999873689376e-05
        entropy: 1.6777266263961792
        entropy_coeff: 0.0
        kl: 0.005335825029760599
        policy_loss: -0.008607623167335987
        total_loss: 123.18213653564453
        vf_explained_var: -5.4836273193359375e-06
        vf_loss: 123.190185546875
    load_time_ms: 2.485
    num_steps_sampled: 26000
    num_steps_trained: 26000
    sample_time_

[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-55-51
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 67.14691246189733
  episode_reward_mean: 43.858093678590286
  episode_reward_min: 21.854080152220853
  episodes_this_iter: 20
  episodes_total: 300
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 5060.648
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.5983189344406128
        entropy_coeff: 0.0
        kl: 0.013874181546270847
        policy_loss: -0.01525917835533619
        total_loss: 166.9679718017578
        vf_explained_var: -1.537799835205078e-05
        vf_loss: 166.9824676513672
    load_time_ms: 2.887
    num_steps_

[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m  Retrying in 1 seconds
[2m[36m(pid=31378)[0m  Retrying in 1 seconds
[2m[36m(pid=31376)[0m  Retrying in 1 seconds
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_13-59-56
  done: false
  episode_len_mean: 

[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-01-46
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 82.84219389822879
  episode_reward_mean: 54.83158592895733
  episode_reward_min: 25.236997970211853
  episodes_this_iter: 20
  episodes_total: 360
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 6563.083
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4890679121017456
        entropy_coeff: 0.0
        kl: 0.020386632531881332
        policy_loss:

[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-05-53
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 85.22436339239275
  episode_reward_mean: 62.36797052316449
  episode_reward_min: 20.444966141282894
  episodes_this_iter: 20
  episodes_total: 400
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 9010.987
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4722696542739868
        entropy_coeff: 0.0
        kl: 0.017953014001250267
        policy_loss: -0.01661653444170952
        total_loss: 289.29473876953125


Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-08-07
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 85.22436339239275
  episode_reward_mean: 63.88764761850232
  episode_reward_min: 20.444966141282894
  episodes_this_iter: 20
  episodes_total: 420
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 9840.153
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4774954319000244
        entropy_coeff: 0.0
        kl: 0.029261859133839607
        policy_loss: -0.023199034854769707
        total_loss: 274.9759521484375
        vf_explained_var: 6.020069122314453e-06
        vf_loss: 274.9976806640625
    load_time_ms: 5.834
    num_steps_sampled: 42000
    num_steps_trained: 42000
    sample_time_ms: 84734.07
    update_time_ms: 27.588
  iterations_since_restore: 21
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  of

[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-12-44
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 88.23445497620473
  episode_reward_mean: 62.8472299342018
  episode_reward_min: 20.444966141282894
  episodes_this_iter: 20
  episodes_total: 460
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 12517.609
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4357510805130005
        entropy_coeff: 0.0
        kl: 0.02585897222161293
        policy_loss: 

[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-14-58
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 88.23445497620473
  episode_reward_mean: 61.47750356589303
  episode_reward_min: 20.444966141282894
  episodes_this_iter: 20
  episodes_total: 480
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 13849.519
    learner:
      default_policy:
        cur_kl_coeff: 0.05000000074505806
        cur_lr: 4.999999873689376e-05
        entropy: 1.4206792116165161
        entropy_coeff: 0.0
        kl: 0.041231557726860046
        policy_loss: -0.038519758731126785
        total_loss: 259.6058349609375
        vf_explained_var: -3.933906555175781e-06
        vf_loss: 259.6424255371094
    load_time_ms: 8.018
    num_steps

[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-19-35
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 89.4591208286454
  episode_reward_mean: 62.2429444474714
  episode_reward_min: 30.076818502515312
  episodes_this_iter: 20
  episodes_total: 520
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 14449.078
    learner:
      default_policy:
        cur_kl_coeff: 0.07500000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.465853214263916
        entropy_coeff: 0.0
        kl: 0.018046749755740166
        policy_loss: -0.02406899258494377
        total_loss: 303.9429931640625
        vf_explained_var: 6.556510925292969e-07
        vf_loss:

[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-24-10
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 91.5136966186264
  episode_reward_mean: 63.27369179572648
  episode_reward_min: 15.915210453547353
  episodes_this_iter: 20
  episodes_total: 560
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 16014.792
    learner:
      default_policy:
        cur_kl_coeff: 0.07500000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.4297116994857788
        entropy_coe

[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-26-31
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 91.5136966186264
  episode_reward_mean: 63.84550507745821
  episode_reward_min: 15.915210453547353
  episodes_this_iter: 20
  episodes_total: 580
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 15821.607
    learner:
      default_policy:
        cur_kl_coeff: 0.07500000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.533389925956726
        entropy_coeff: 0.0
        kl: 0.02443278208374977
        policy_loss: -0.026246650144457817
        total_loss: 267.92120361328125
        vf_explained_var: -1.5497207641601562e-06
        vf_loss: 267.9455871582031
    load_time_ms: 9.125
    num_steps_

[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-31-06
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 98.5540152626434
  episode_reward_mean: 63.465574419839086
  episode_reward_min: 15.915210453547353
  episodes_this_iter: 20
  episodes_total: 620
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 15834.477
    learner:
      default_policy:
        cur_kl_coeff: 0.07500000298023224
        cur_lr: 4.9999998736893

[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-33-24
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 98.5540152626434
  episode_reward_mean: 64.86306275922101
  episode_reward_min: 24.697104246948932
  episodes_this_iter: 20
  episodes_total: 640
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 15600.236
    learner:
      default_policy:
        cur_kl_coeff: 0.07500000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.423563003540039
        entropy_coeff: 0.0
        kl: 0.04048779234290123
        policy_loss: -0.03249954804778099
        total_loss: 261.354736328125
    

[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-37-53
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 98.5540152626434
  episode_reward_mean: 66.9635671399477
  episode_reward_min: 30.219323204052852
  episodes_this_iter: 20
  episodes_total: 680
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 15503.919
    learner:
      default_policy:
        cur_kl_coeff: 0.11249999701976776
        cur_lr: 4.999999873689376

[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-40-15
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 98.5540152626434
  episode_reward_mean: 67.57349239277754
  episode_reward_min: 30.219323204052852
  episodes_this_iter: 20
  episodes_total: 700
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 15869.812
    learner:
      default_policy:
        cur_kl_coeff: 0.11249999701976776
        cur_lr: 4.999999873689376e-05
        entropy: 1.4278461933135986
        entropy_coeff: 0.0
        kl: 0.02107337862253189
        policy_loss: -0.024580195546150208
        total_loss: 277.3083190917969
        vf_explained_var: 2.384185791015625e-07
        vf_loss: 277.3304748535156
    load_time_ms: 8.34
    num_steps_sampled: 70000
    num_steps_trained: 70000
    sample_time_ms: 

[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31380)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31375)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-44-50
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 97.10518647859351
  episode_reward_mean: 66.88881877717819
  episode_reward_min: 21.821240417772778
  episodes_this_iter: 20
  episodes_total: 740
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_

[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31378)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-47-15
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 97.10518647859351
  episode_reward_mean: 65.10343561034348
  episode_reward_min: 21.821240417772778
  episodes_this_iter: 20
  episodes_total: 760
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 16246.622
    learner:
      default_policy:
        cur_kl_coeff: 0.11249999701976776
        cur_lr: 4.999999873689376e-05
        entropy: 1.6986898183822632
        entropy_coeff: 0.0
        kl: 0.01829254999756813
        policy_loss: -0.021348971873521805
        total_loss: 258.3643798828125
        vf_explained_var: 0.0
        vf_loss: 258.3836669921

[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-49-35
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.18842208334817
  episode_reward_mean: 65.9513328724793
  episode_reward_min: 21.821240417772778
  episodes_this_iter: 20
  episodes_total: 780
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 16579.753
    learner:
      default_policy:
        cur_kl_coeff: 0.11249999701976776
        cur_lr: 4.999999873689376e-05
        entropy: 1.4991579055786133
        entropy_coeff: 0.0
        kl: 0.020995287224650383
        policy_loss: -0.020937873050570488
        total_loss: 322.9857482910156
        vf_explained_var: -1.1920928955078125e-07
        vf_loss: 323.0044250488281
    load_time_ms: 7.185
    num_steps_sampled: 78000
    num_steps_trained: 78000
    sample_time_ms: 121583.109
    update_time_ms: 63.04
  iterations_since_r

[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31377)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31376)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31381)[0m   veh_id, int(target_lane), 100000)
[2m[36m(pid=31382)[0m   veh_id, int(target_lane), 100000)
Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-54-12
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 99.18842208334817
  episode_reward_mean: 66.60545781121564
  episode_reward_min: 18.705916384815175
  episodes_this_iter: 20
  episodes_total: 820
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 17032.582
    learner:
      default_policy:
        cur_kl_coeff: 0.16875000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.6271650791168213
        entropy_coeff: 0.0
        kl: 0.03130567818880081
        policy_loss: -0.03293722867965698
        total_loss: 287.4142150878906
 

Result for PPO_LaneChangeAccelEnv1-v0_0:
  custom_metrics: {}
  date: 2019-11-29_14-56-22
  done: false
  episode_len_mean: 100.0
  episode_reward_max: 100.53493608293832
  episode_reward_mean: 69.13951650305646
  episode_reward_min: 18.705916384815175
  episodes_this_iter: 20
  episodes_total: 840
  experiment_id: cae98feeaba94a4fb99e19849f571f9d
  hostname: andrei
  info:
    grad_time_ms: 17218.555
    learner:
      default_policy:
        cur_kl_coeff: 0.16875000298023224
        cur_lr: 4.999999873689376e-05
        entropy: 1.519315481185913
        entropy_coeff: 0.0
        kl: 0.02827766351401806
        policy_loss: -0.029441189020872116
        total_loss: 324.5316467285156
        vf_explained_var: 5.960464477539063e-08
        vf_loss: 324.5562438964844
    load_time_ms: 10.102
    num_steps_sampled: 84000
    num_steps_trained: 84000
    sample_time_ms: 120363.166
    update_time_ms: 73.36
  iterations_since_restore: 42
  node_ip: 172.19.3.168
  num_healthy_workers: 7
  