# Understanding the effects of PyCIGAR hyperparameters on training and results

The goal of this notebook is to gain intuition on the impact of the different hyperparameters currently specified to train a PPO agent in PyCIGAR.

Fixed settings in this experiment:
- PPO algorithm
- `ieee37busdata network` and load profile
- tracked device: `inverter_s701a`
- discrete single actions
- `CentralControlPVInverterEnv` environment


Hyperparameters identified:
- discount factor $\gamma$
- GAE lambda
- train batch size
- depth of NN
- widths of NN layers
- lr (or lr schedule)
- loss factors (penalties)
    - oscillation
    - action change
    - deviation from the initial command
    
    

Until now, mostly qualitative results were obtained, mostly by looking at the graphs of voltage, $y$ value, injected power and actions taken by a tracked device over the course of a test simulation. In order to have objective grounds on which we can compare different solutions, we need quantitative results. Some statistics that reflect different aspects of both the solutions and the training processes are listed in the next section.

## Statistics that summarize a training

foreach epoch: 
    - number of actions taken
    - average magnitude of the actions
    - total reward
    - time of earliest action
    - average Shannon entropy of the action distribution

- epoch at which the policy does not change anymore
- average runtime of an epoch


An additional qualitative result could be a GIF of the curves over epochs


## Methods to understand hyperparameters

Some methods that come to mind to obtain an intuition are:

- try extreme values, then compare statistics
- change independently: plot statistics
- bayesian optimization with statistics for objective


In [1]:
import os
import ray
from copy import deepcopy
os.chdir("/home/alex/ceds-cigar/rl/notebooks")
from understand_hyperparameters import base_config, get_train_fn, env_name, test_env

ray.init()

2020-03-02 11:50:01,583	INFO resource_spec.py:212 -- Starting Ray with 27.98 GiB memory available for workers and up to 13.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-03-02 11:50:02,119	INFO services.py:1078 -- View the Ray dashboard at [1m[32mlocalhost:8265[39m[22m


{'node_ip_address': '128.3.28.231',
 'redis_address': '128.3.28.231:23941',
 'object_store_address': '/tmp/ray/session_2020-03-02_11-50-01_560675_5436/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-03-02_11-50-01_560675_5436/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-03-02_11-50-01_560675_5436'}

In [2]:
N_WORKERS = 8
EPOCHS = 20
EVAL_ROUNDS = 3

In [3]:
base_config['num_workers'] = N_WORKERS
full_config = {
    'model_config': base_config,
    'pycigar_params': pycigar_params,
    'epochs': EPOCHS,
    'eval_rounds': EVAL_ROUNDS
}

## Gamma

Values for gamma are typically set to 0.95 or 0.99 in the litterature. However low values of gamma may be beneficial to some scenarios.

A value of 0 indicates greedy short term vision: the choice of an action is determined by the next reward.

A value of 1 indicates undiscounted rewards, where rewards of all future timesteps matter equally.

In [4]:
config = deepcopy(base_config)
config['gamma'] = ray.tune.grid_search([0, 0.3, 0.5, 0.9, 1])
    run_hp_experiment({'model_config': {'gamma': ray.tune.grid_search([0, 0.3, 0.5, 0.9, 1])}},
                      full_config, 'gamma')

Trial name,status,loc,gamma
coop_train_fn_01ce5c76,RUNNING,,
coop_train_fn_01cf1a8a,PENDING,,
coop_train_fn_01cf9492,PENDING,,
coop_train_fn_01d00e2c,PENDING,,


[2m[36m(pid=5503)[0m 2020-03-02 11:50:05,655	INFO trainer.py:420 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=5503)[0m 2020-03-02 11:50:05,686	INFO trainer.py:580 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=5503)[0m   obj = yaml.load(type_)
  0%|          | 0/20 [00:00<?, ?it/s]
[2m[36m(pid=5496)[0m   obj = yaml.load(type_)
[2m[36m(pid=5506)[0m   obj = yaml.load(type_)
[2m[36m(pid=5499)[0m   obj = yaml.load(type_)
[2m[36m(pid=5502)[0m   obj = yaml.load(type_)
[2m[36m(pid=5510)[0m   obj = yaml.load(type_)
[2m[36m(pid=5497)[0m   obj = yaml.load(type_)
[2m[36m(pid=5505)[0m   obj = yaml.load(type_)
[2m[36m(pid=5504)[0m   obj = yaml.load(type_)
[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_11-52-36
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.2
  episode_reward_max: -26.14741461764519
  episode_reward_mean: -35.200422563277066
  episode_reward_min: -50.962902008956064
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 497.542
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5910038948059082
        entropy_coeff: 0.0
        kl: 0.01858106255531311
        policy_loss: -0.0375814214348793
        total_loss: 1.9940201044082642
        vf_explained_var: 0.0149829788133502
        vf_loss: 2.0278854370117188
    load_time_ms: 70.068
    num_steps_sampled: 500
    num_steps_trained: 384
    sample_time_ms: 87229.177
    update_time_ms: 773.479
  iterations_since_re

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-35.2004,88.6483,500.0,1.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_11-54-50
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 21.975609756097562
  episode_reward_max: -24.21453216211696
  episode_reward_mean: -33.75728942679512
  episode_reward_min: -50.962902008956064
  episodes_this_iter: 21
  episodes_total: 41
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 346.083
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5121212005615234
        entropy_coeff: 0.0
        kl: 0.043743833899497986
        policy_loss: -0.08773771673440933
        total_loss: 0.9034019112586975
        vf_explained_var: 0.014706333167850971
        vf_loss: 0.9823908805847168
    load_time_ms: 35.653
    num_steps_sampled: 1000
    num_steps_trained: 768
    sample_time_ms: 83431.825
    update_time_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-33.7573,168.519,1000.0,2.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_11-57-08
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.158730158730158
  episode_reward_max: -21.359299996041862
  episode_reward_mean: -32.88376685385413
  episode_reward_min: -50.962902008956064
  episodes_this_iter: 22
  episodes_total: 63
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 286.677
    learner:
      default_policy:
        cur_kl_coeff: 0.30000001192092896
        cur_lr: 0.00019999999494757503
        entropy: 1.389791488647461
        entropy_coeff: 0.0
        kl: 0.03325387462973595
        policy_loss: -0.0944279208779335
        total_loss: 0.9427871108055115
        vf_explained_var: 0.02342834137380123
        vf_loss: 1.0272387266159058
    load_time_ms: 24.151
    num_steps_sampled: 1500
    num_steps_trained: 1152
    sample_time_ms: 83041.155
    update_time_ms: 262.562
  ite

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.8838,250.979,1500.0,3.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 15%|█▌        | 3/20 [06:58<40:06, 141.54s/it]
[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_11-59-23
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.28235294117647
  episode_reward_max: -21.359299996041862
  episode_reward_mean: -33.04560660633808
  episode_reward_min: -50.962902008956064
  episodes_this_iter: 22
  episodes_total: 85
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 259.768
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.2639960050582886
        entropy_coeff: 0.0
        kl: 0.01931740529835224
        policy_loss: -0.03263290598988533
        total_loss: 1.4205936193466187
        vf_explained_var: 0.032050490379333496
        vf_loss: 1.4445337057113647
    load_time_ms: 18.624
    num_steps_sampled: 2000
    num_steps_trained: 1536
    sample_time_ms: 81929.012
    update_time_ms: 198.025
  i

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-33.0456,329.803,2000.0,4.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-01-37
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.27
  episode_reward_max: -19.016652756232936
  episode_reward_mean: -31.699474806509933
  episode_reward_min: -50.962902008956064
  episodes_this_iter: 24
  episodes_total: 109
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 242.659
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.1257505416870117
        entropy_coeff: 0.0
        kl: 0.031177064403891563
        policy_loss: -0.05140916630625725
        total_loss: 0.8746022582054138
        vf_explained_var: 0.25007015466690063
        vf_loss: 0.9119817614555359
    load_time_ms: 15.115
    num_steps_sampled: 2500
    num_steps_trained: 1920
    sample_time_ms: 81689.611
    update_time_ms: 160.049
  iterations_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-31.6995,410.746,2500.0,5.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-03-48
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.47
  episode_reward_max: -17.335606802947638
  episode_reward_mean: -30.03431862182384
  episode_reward_min: -45.347596624023026
  episodes_this_iter: 22
  episodes_total: 131
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 233.852
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.975621223449707
        entropy_coeff: 0.0
        kl: 0.02764529176056385
        policy_loss: -0.09632903337478638
        total_loss: 0.8219075202941895
        vf_explained_var: 0.33467116951942444
        vf_loss: 0.89957594871521
    load_time_ms: 12.777
    num_steps_sampled: 3000
    num_steps_trained: 2304
    sample_time_ms: 81520.548
    update_time_ms: 134.395
  iterations_since_r

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-30.0343,491.651,3000.0,6.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-06-02
  done: false
  earliest_action: 1.3333333333333333
  episode_len_mean: 22.23
  episode_reward_max: -17.335606802947638
  episode_reward_mean: -29.5668615273671
  episode_reward_min: -45.347596624023026
  episodes_this_iter: 22
  episodes_total: 153
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 227.57
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 0.8543381094932556
        entropy_coeff: 0.0
        kl: 0.013458329252898693
        policy_loss: -0.003101726295426488
        total_loss: 1.357533574104309
        vf_explained_var: 0.3683580458164215
        vf_loss: 1.3470087051391602
    load_time_ms: 11.147
    num_steps_sampled: 3500
    num_steps_trained: 2688
    sample_time_ms: 81497.913
    update_time_ms: 116.225
  i

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-29.5669,573.239,3500.0,7.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-08-14
  done: false
  earliest_action: 3.0
  episode_len_mean: 22.11
  episode_reward_max: -17.335606802947638
  episode_reward_mean: -29.40023383184648
  episode_reward_min: -45.347596624023026
  episodes_this_iter: 24
  episodes_total: 177
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 222.455
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 0.7503502368927002
        entropy_coeff: 0.0
        kl: 0.00674161734059453
        policy_loss: -0.022292306646704674
        total_loss: 1.1118663549423218
        vf_explained_var: 0.5012932419776917
        vf_loss: 1.1273326873779297
    load_time_ms: 9.976
    num_steps_sampled: 4000
    num_steps_trained: 3072
    sample_time_ms: 81459.993
    update_time_ms: 102.821
  iterations_sinc

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-29.4002,654.657,4000.0,8.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 40%|████      | 8/20 [18:04<26:51, 134.30s/it]
[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-10-24
  done: false
  earliest_action: 3.3333333333333335
  episode_len_mean: 22.02
  episode_reward_max: -17.335606802947638
  episode_reward_mean: -29.936879949782544
  episode_reward_min: -42.45238887358206
  episodes_this_iter: 22
  episodes_total: 199
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 216.314
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 0.6134935021400452
        entropy_coeff: 0.0
        kl: 0.014106832444667816
        policy_loss: -0.03842109814286232
        total_loss: 0.8769027590751648
        vf_explained_var: 0.5820698738098145
        vf_loss: 0.9010407328605652
    load_time_ms: 9.146
    num_steps_sampled: 4500
    num_steps_trained: 3456
    sample_time_ms: 81078.653
    update_time_ms: 91.971
  i

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-29.9369,732.889,4500.0,9.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-12-37
  done: false
  earliest_action: 5.0
  episode_len_mean: 22.09
  episode_reward_max: -18.415245851063382
  episode_reward_mean: -31.031927227329064
  episode_reward_min: -42.45238887358206
  episodes_this_iter: 23
  episodes_total: 222
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 217.77
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 0.49770262837409973
        entropy_coeff: 0.0
        kl: 0.01063606608659029
        policy_loss: -0.032780472189188004
        total_loss: 0.8499128222465515
        vf_explained_var: 0.6412146687507629
        vf_loss: 0.8719242215156555
    load_time_ms: 8.406
    num_steps_sampled: 5000
    num_steps_trained: 3840
    sample_time_ms: 81146.296
    update_time_ms: 83.207
  iterations_since

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-31.0319,814.927,5000.0,10.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-14-47
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.03
  episode_reward_max: -20.725403123997395
  episode_reward_mean: -31.955370299272694
  episode_reward_min: -42.812555708332674
  episodes_this_iter: 21
  episodes_total: 243
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 189.988
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 0.41574302315711975
        entropy_coeff: 0.0
        kl: 0.0036263142246752977
        policy_loss: -0.004776048008352518
        total_loss: 0.6789969801902771
        vf_explained_var: 0.763480007648468
        vf_loss: 0.6801013946533203
    load_time_ms: 1.578
    num_steps_sampled: 5500
    num_steps_trained: 4224
    sample_time_ms: 80225.519
    update_time_ms: 6.333
  iterations_sin

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-31.9554,893.207,5500.0,11.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-17-02
  done: false
  earliest_action: 10.0
  episode_len_mean: 22.21
  episode_reward_max: -21.228887090697686
  episode_reward_mean: -32.35723316595798
  episode_reward_min: -42.812555708332674
  episodes_this_iter: 25
  episodes_total: 268
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 193.37
    learner:
      default_policy:
        cur_kl_coeff: 0.5062500238418579
        cur_lr: 0.00019999999494757503
        entropy: 0.3456374704837799
        entropy_coeff: 0.0
        kl: 0.003813599469140172
        policy_loss: -0.024464929476380348
        total_loss: 0.6799723505973816
        vf_explained_var: 0.7334599494934082
        vf_loss: 0.7025065422058105
    load_time_ms: 1.687
    num_steps_sampled: 6000
    num_steps_trained: 4608
    sample_time_ms: 80468.632
    update_time_ms: 5.8
  iterations_since_r

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.3572,975.543,6000.0,12.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 60%|██████    | 12/20 [26:52<17:44, 133.08s/it]
[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-19-13
  done: false
  earliest_action: 4.0
  episode_len_mean: 22.16
  episode_reward_max: -21.46385812259072
  episode_reward_mean: -32.674584500140796
  episode_reward_min: -42.812555708332674
  episodes_this_iter: 21
  episodes_total: 289
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 200.351
    learner:
      default_policy:
        cur_kl_coeff: 0.25312501192092896
        cur_lr: 0.00019999999494757503
        entropy: 0.2638407051563263
        entropy_coeff: 0.0
        kl: 0.005388479679822922
        policy_loss: 0.007594509515911341
        total_loss: 0.5174431800842285
        vf_explained_var: 0.8120202422142029
        vf_loss: 0.5084847211837769
    load_time_ms: 1.836
    num_steps_sampled: 6500
    num_steps_trained: 4992
    sample_time_ms: 80360.24
    update_time_ms: 6.023
  iterations_since_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.6746,1057.0,6500.0,13.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-21-23
  done: false
  earliest_action: 2.0
  episode_len_mean: 21.92
  episode_reward_max: -21.46385812259072
  episode_reward_mean: -32.627879155945365
  episode_reward_min: -42.812555708332674
  episodes_this_iter: 24
  episodes_total: 313
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 202.034
    learner:
      default_policy:
        cur_kl_coeff: 0.25312501192092896
        cur_lr: 0.00019999999494757503
        entropy: 0.20656782388687134
        entropy_coeff: 0.0
        kl: 0.0034725964069366455
        policy_loss: -0.05842965841293335
        total_loss: 0.41697826981544495
        vf_explained_var: 0.8363442420959473
        vf_loss: 0.4745289087295532
    load_time_ms: 1.832
    num_steps_sampled: 7000
    num_steps_trained: 5376
    sample_time_ms: 80457.569
    update_time_ms: 6.034
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.6279,1136.8,7000.0,14.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,



[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-23-35
  done: false
  earliest_action: 17.0
  episode_len_mean: 22.0
  episode_reward_max: -21.46385812259072
  episode_reward_mean: -32.70384805164425
  episode_reward_min: -40.89376612020947
  episodes_this_iter: 21
  episodes_total: 334
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 204.445
    learner:
      default_policy:
        cur_kl_coeff: 0.12656250596046448
        cur_lr: 0.00019999999494757503
        entropy: 0.1572905033826828
        entropy_coeff: 0.0
        kl: 0.002373327501118183
        policy_loss: -0.0160618145018816
        total_loss: 0.4618871212005615
        vf_explained_var: 0.8352593779563904
        vf_loss: 0.4776485860347748
    load_time_ms: 1.873
    num_steps_sampled: 7500
    num_steps_trained: 5760
    sample_time_ms: 80517.899
    update_time_ms: 5.649
  iterations_since_re

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.7038,1218.38,7500.0,15.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-25-47
  done: false
  earliest_action: 6.0
  episode_len_mean: 21.91
  episode_reward_max: -22.366214059613743
  episode_reward_mean: -32.82340933956444
  episode_reward_min: -40.89376612020947
  episodes_this_iter: 24
  episodes_total: 358
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 204.006
    learner:
      default_policy:
        cur_kl_coeff: 0.06328125298023224
        cur_lr: 0.00019999999494757503
        entropy: 0.1248585656285286
        entropy_coeff: 0.0
        kl: 0.001402351539582014
        policy_loss: 0.045956745743751526
        total_loss: 0.4797727167606354
        vf_explained_var: 0.8499598503112793
        vf_loss: 0.43372729420661926
    load_time_ms: 2.02
    num_steps_sampled: 8000
    num_steps_trained: 6144
    sample_time_ms: 80482.025
    update_time_ms: 5.539
  iterations_since_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.8234,1298.91,8000.0,16.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-28-00
  done: false
  earliest_action: 10.5
  episode_len_mean: 21.79
  episode_reward_max: -22.366214059613743
  episode_reward_mean: -32.68067133723104
  episode_reward_min: -40.89376612020947
  episodes_this_iter: 23
  episodes_total: 381
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 203.477
    learner:
      default_policy:
        cur_kl_coeff: 0.03164062649011612
        cur_lr: 0.00019999999494757503
        entropy: 0.1302674561738968
        entropy_coeff: 0.0
        kl: 6.222242518560961e-05
        policy_loss: -0.04734555259346962
        total_loss: 0.2498888522386551
        vf_explained_var: 0.890816867351532
        vf_loss: 0.2972324788570404
    load_time_ms: 2.039
    num_steps_sampled: 8500
    num_steps_trained: 6528
    sample_time_ms: 80306.665
    update_time_ms: 5.357
  iterations_since

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-32.6807,1378.74,8500.0,17.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-30-10
  done: false
  earliest_action: 1.0
  episode_len_mean: 21.99
  episode_reward_max: -22.366214059613743
  episode_reward_mean: -33.11772463676864
  episode_reward_min: -40.89376612020947
  episodes_this_iter: 23
  episodes_total: 404
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 204.403
    learner:
      default_policy:
        cur_kl_coeff: 0.01582031324505806
        cur_lr: 0.00019999999494757503
        entropy: 0.11031019687652588
        entropy_coeff: 0.0
        kl: 0.0005548815825022757
        policy_loss: -0.06094267964363098
        total_loss: 0.21752674877643585
        vf_explained_var: 0.9029488563537598
        vf_loss: 0.27846065163612366
    load_time_ms: 2.031
    num_steps_sampled: 9000
    num_steps_trained: 6912
    sample_time_ms: 80214.263
    update_time_ms: 5.015
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-33.1177,1459.24,9000.0,18.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 90%|█████████ | 18/20 [40:00<04:22, 131.48s/it]
[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-32-16
  done: false
  earliest_action: 6.0
  episode_len_mean: 22.01
  episode_reward_max: -22.366214059613743
  episode_reward_mean: -33.08666762999398
  episode_reward_min: -38.901760334294394
  episodes_this_iter: 22
  episodes_total: 426
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 209.249
    learner:
      default_policy:
        cur_kl_coeff: 0.00791015662252903
        cur_lr: 0.00019999999494757503
        entropy: 0.07934562116861343
        entropy_coeff: 0.0
        kl: 0.0012722046812996268
        policy_loss: -0.025250643491744995
        total_loss: 0.30372360348701477
        vf_explained_var: 0.8848587870597839
        vf_loss: 0.3289641737937927
    load_time_ms: 1.981
    num_steps_sampled: 9500
    num_steps_trained: 7296
    sample_time_ms: 80083.023
    update_time_ms: 4.948
  iterations_s

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-33.0867,1536.22,9500.0,19.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5503)[0m Running 3 evaluation rounds




Result for coop_train_fn_01ce5c76:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-34-25
  done: false
  earliest_action: .nan
  episode_len_mean: 21.92
  episode_reward_max: -22.366214059613743
  episode_reward_mean: -33.34174474601194
  episode_reward_min: -38.67411789699543
  episodes_this_iter: 23
  episodes_total: 449
  experiment_id: 55afac1d235b4e1d9556c9e7ff43c5e7
  experiment_tag: 0_gamma=0
  hostname: gigteam
  info:
    grad_time_ms: 204.382
    learner:
      default_policy:
        cur_kl_coeff: 0.003955078311264515
        cur_lr: 0.00019999999494757503
        entropy: 0.07030748575925827
        entropy_coeff: 0.0
        kl: 0.00016790260269772261
        policy_loss: 0.05911455675959587
        total_loss: 0.34987783432006836
        vf_explained_var: 0.9036698341369629
        vf_loss: 0.29076260328292847
    load_time_ms: 1.938
    num_steps_sampled: 10000
    num_steps_trained: 7680
    sample_time_ms: 79874.392
    update_time_ms: 4.996
  iterations

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,RUNNING,128.3.28.231:5503,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,PENDING,,,,,,
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m 2020-03-02 12:35:25,067	INFO trainer.py:420 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=5500)[0m 2020-03-02 12:35:25,212	INFO trainer.py:580 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=5500)[0m   obj = yaml.load(type_)
  0%|          | 0/20 [00:00<?, ?it/s]
[2m[36m(pid=5509)[0m   obj = yaml.load(type_)
[2m[36m(pid=5501)[0m   obj = yaml.load(type_)
[2m[36m(pid=5507)[0m   obj = yaml.load(type_)
[2m[36m(pid=5495)[0m   obj = yaml.load(type_)
[2m[36m(pid=5508)[0m   obj = yaml.load(type_)
[2m[36m(pid=5498)[0m   obj = yaml.load(type_)
[2m[36m(pid=6534)[0m   obj = yaml.load(type_)
[2m[36m(pid=6535)[0m   obj = yaml.load(type_)
[2m[36m(pid=5500)[0m Running 3 evaluation rounds




[2m[36m(pid=5500)[0m   out=out, **kwargs)
Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-37-57
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 21.85
  episode_reward_max: -26.237097113740926
  episode_reward_mean: -41.95244344178789
  episode_reward_min: -59.21981980489896
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 693.239
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5801324844360352
        entropy_coeff: 0.0
        kl: 0.03025939129292965
        policy_loss: -0.06496120989322662
        total_loss: 6.733348846435547
        vf_explained_var: 0.009931206703186035
        vf_loss: 6.792259216308594
    load_time_ms: 136.131
    num_steps_sampled: 500
    num_steps_trained: 384
    sample_time

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-41.9524,91.5177,500.0,1.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-40-21
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 21.666666666666668
  episode_reward_max: -26.237097113740926
  episode_reward_mean: -38.43079806981094
  episode_reward_min: -59.21981980489896
  episodes_this_iter: 22
  episodes_total: 42
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 449.986
    learner:
      default_policy:
        cur_kl_coeff: 0.30000001192092896
        cur_lr: 0.00019999999494757503
        entropy: 1.5273493528366089
        entropy_coeff: 0.0
        kl: 0.02943859063088894
        policy_loss: -0.061931684613227844
        total_loss: 2.883488893508911
        vf_explained_var: 0.006601790431886911
        vf_loss: 2.936589002609253
    load_time_ms: 69.246
    num_steps_sampled: 1000
    num_steps_trained: 768
    sample_time_ms: 87820.802
    update_time_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-38.4308,177.493,1000.0,2.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-42-40
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 21.96875
  episode_reward_max: -25.414640780580882
  episode_reward_mean: -37.814804130375364
  episode_reward_min: -59.21981980489896
  episodes_this_iter: 22
  episodes_total: 64
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 375.976
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.4155877828598022
        entropy_coeff: 0.0
        kl: 0.02847112901508808
        policy_loss: -0.09159404784440994
        total_loss: 2.5465452671051025
        vf_explained_var: 0.004904448986053467
        vf_loss: 2.6253273487091064
    load_time_ms: 46.823
    num_steps_sampled: 1500
    num_steps_trained: 1152
    sample_time_ms: 85445.659
    update_time_ms: 236

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-37.8148,258.46,1500.0,3.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds
Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-45-00
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.113636363636363
  episode_reward_max: -24.38440968780847
  episode_reward_mean: -36.24362689487369
  episode_reward_min: -59.21981980489896
  episodes_this_iter: 24
  episodes_total: 88
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 329.551
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.274842381477356
        entropy_coeff: 0.0
        kl: 0.036801066249608994
        policy_loss: -0.07075344026088715
        total_loss: 1.5519499778747559
        vf_explained_var: 0.008729477412998676
        vf_loss: 1.5978626012802124
    load_time_ms: 35.39
    num_steps_sampled: 2000
    num_steps_trained: 1536
    sample_



 20%|██        | 4/20 [09:32<38:16, 143.54s/it]


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-36.2436,339.827,2000.0,4.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-47-18
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.31
  episode_reward_max: -20.305713098243682
  episode_reward_mean: -33.56121242537268
  episode_reward_min: -56.93357960033278
  episodes_this_iter: 22
  episodes_total: 110
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 300.009
    learner:
      default_policy:
        cur_kl_coeff: 1.0125000476837158
        cur_lr: 0.00019999999494757503
        entropy: 1.166400671005249
        entropy_coeff: 0.0
        kl: 0.022984514012932777
        policy_loss: -0.06711231917142868
        total_loss: 0.6999108791351318
        vf_explained_var: 0.01576155424118042
        vf_loss: 0.7437513470649719
    load_time_ms: 28.532
    num_steps_sampled: 2500
    num_steps_trained: 1920
    sample_time_ms: 83767.37
    update_time_ms: 145.017
  iterations_sin

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-33.5612,421.401,2500.0,5.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,



[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-49-34
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 22.3
  episode_reward_max: -19.04997615802237
  episode_reward_mean: -31.161042622123272
  episode_reward_min: -47.833854070842705
  episodes_this_iter: 21
  episodes_total: 131
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 281.373
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 1.0677372217178345
        entropy_coeff: 0.0
        kl: 0.008459389209747314
        policy_loss: -0.012979731895029545
        total_loss: 1.136444091796875
        vf_explained_var: 0.09584251791238785
        vf_loss: 1.1365760564804077
    load_time_ms: 24.047
    num_steps_sampled: 3000
    num_steps_trained: 2304
    sample_time_ms: 82752.772
    update_time_ms: 122.07


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-31.161,499.315,3000.0,6.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-51-48
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.41
  episode_reward_max: -19.04997615802237
  episode_reward_mean: -28.859051918863166
  episode_reward_min: -44.26918655965513
  episodes_this_iter: 22
  episodes_total: 153
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 266.197
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 0.9792947769165039
        entropy_coeff: 0.0
        kl: 0.006952514406293631
        policy_loss: -0.04914891719818115
        total_loss: 0.9348328113555908
        vf_explained_var: 0.2167392373085022
        vf_loss: 0.9734225869178772
    load_time_ms: 20.819
    num_steps_sampled: 3500
    num_steps_trained: 2688
    sample_time_ms: 82168.358
    update_time_ms: 106.071
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-28.8591,578.194,3500.0,7.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 35%|███▌      | 7/20 [16:20<29:57, 138.30s/it]
[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-54-05
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.54
  episode_reward_max: -19.04997615802237
  episode_reward_mean: -26.90013150849425
  episode_reward_min: -36.7413012430131
  episodes_this_iter: 23
  episodes_total: 176
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 253.977
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 0.9165277481079102
        entropy_coeff: 0.0
        kl: 0.006509026978164911
        policy_loss: -0.012756035663187504
        total_loss: 0.991804301738739
        vf_explained_var: 0.32714250683784485
        vf_loss: 0.9946746826171875
    load_time_ms: 18.421
    num_steps_sampled: 4000
    num_steps_trained: 3072
    sample_time_ms: 81747.911
    update_time_ms: 93.607
  

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-26.9001,657.204,4000.0,8.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-56-24
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.46
  episode_reward_max: -16.996564299973972
  episode_reward_mean: -25.835937727653025
  episode_reward_min: -35.691888368771544
  episodes_this_iter: 21
  episodes_total: 197
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 243.069
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 0.8467116355895996
        entropy_coeff: 0.0
        kl: 0.005760727915912867
        policy_loss: -0.012814961373806
        total_loss: 0.9732627868652344
        vf_explained_var: 0.3137587308883667
        vf_loss: 0.9773285984992981
    load_time_ms: 16.593
    num_steps_sampled: 4500
    num_steps_trained: 3456
    sample_time_ms: 81878.858
    update_time_ms: 83.933
  iterations_sin

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-25.8359,740.323,4500.0,9.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_12-58-38
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 22.38
  episode_reward_max: -16.996564299973972
  episode_reward_mean: -25.15643140195704
  episode_reward_min: -35.691888368771544
  episodes_this_iter: 23
  episodes_total: 220
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 237.713
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 0.8226820826530457
        entropy_coeff: 0.0
        kl: 0.006167533341795206
        policy_loss: -0.06232169270515442
        total_loss: 0.8658318519592285
        vf_explained_var: 0.32946762442588806
        vf_loss: 0.9187865257263184
    load_time_ms: 15.116
    num_steps_sampled: 5000
    num_steps_trained: 3840
    sample_time_ms: 81275.197
    update_time_ms: 76.093



Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-25.1564,816.389,5000.0,10.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-00-57
  done: false
  earliest_action: 1.3333333333333333
  episode_len_mean: 22.51
  episode_reward_max: -16.996564299973972
  episode_reward_mean: -24.841289657826746
  episode_reward_min: -37.117612473078836
  episodes_this_iter: 23
  episodes_total: 243
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 189.22
    learner:
      default_policy:
        cur_kl_coeff: 1.5187499523162842
        cur_lr: 0.00019999999494757503
        entropy: 0.7872444987297058
        entropy_coeff: 0.0
        kl: 0.004864901304244995
        policy_loss: -0.03188101947307587
        total_loss: 0.7225752472877502
        vf_explained_var: 0.42046985030174255
        vf_loss: 0.7470677495002747
    load_time_ms: 1.7
    num_steps_sampled: 5500
    num_steps_trained: 4224
    sample_time_ms: 80554.57
    update_time_ms: 7.181
  it

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-24.8413,899.344,5500.0,11.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-03-14
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 22.38
  episode_reward_max: -16.996564299973972
  episode_reward_mean: -24.144013230476954
  episode_reward_min: -37.117612473078836
  episodes_this_iter: 22
  episodes_total: 265
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 189.091
    learner:
      default_policy:
        cur_kl_coeff: 0.7593749761581421
        cur_lr: 0.00019999999494757503
        entropy: 0.7787302136421204
        entropy_coeff: 0.0
        kl: 0.009539013728499413
        policy_loss: -0.003762734355404973
        total_loss: 0.703486442565918
        vf_explained_var: 0.37332651019096375
        vf_loss: 0.7000053524971008
    load_time_ms: 1.568
    num_steps_sampled: 6000
    num_steps_trained: 4608
    sample_time_ms: 80142.322
    update_time_ms: 6.784


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-24.144,981.208,6000.0,12.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 60%|██████    | 12/20 [27:45<18:19, 137.47s/it]
[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-05-31
  done: false
  earliest_action: 2.3333333333333335
  episode_len_mean: 22.38
  episode_reward_max: -16.92177077986154
  episode_reward_mean: -23.974056827722805
  episode_reward_min: -46.9207380294837
  episodes_this_iter: 22
  episodes_total: 287
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 186.796
    learner:
      default_policy:
        cur_kl_coeff: 0.7593749761581421
        cur_lr: 0.00019999999494757503
        entropy: 0.728602409362793
        entropy_coeff: 0.0
        kl: 0.007667269557714462
        policy_loss: -0.0688251480460167
        total_loss: 1.139266848564148
        vf_explained_var: 0.3704560101032257
        vf_loss: 1.2022695541381836
    load_time_ms: 1.503
    num_steps_sampled: 6500
    num_steps_trained: 4992
    sample_time_ms: 80087.553
    update_time_ms: 6.884
  itera

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-23.9741,1061.6,6500.0,13.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-07-50
  done: false
  earliest_action: 2.6666666666666665
  episode_len_mean: 22.27
  episode_reward_max: -16.054664584594374
  episode_reward_mean: -23.375414125631547
  episode_reward_min: -46.9207380294837
  episodes_this_iter: 22
  episodes_total: 309
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 186.016
    learner:
      default_policy:
        cur_kl_coeff: 0.7593749761581421
        cur_lr: 0.00019999999494757503
        entropy: 0.692413866519928
        entropy_coeff: 0.0
        kl: 0.004097541328519583
        policy_loss: 0.01352259237319231
        total_loss: 1.2744642496109009
        vf_explained_var: 0.3230039179325104
        vf_loss: 1.2578301429748535
    load_time_ms: 1.519
    num_steps_sampled: 7000
    num_steps_trained: 5376
    sample_time_ms: 80159.53
    update_time_ms: 7.098
  iter

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-23.3754,1143.69,7000.0,14.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-10-10
  done: false
  earliest_action: 3.6666666666666665
  episode_len_mean: 22.43
  episode_reward_max: -13.797280983745624
  episode_reward_mean: -22.68145317450649
  episode_reward_min: -46.9207380294837
  episodes_this_iter: 23
  episodes_total: 332
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 190.39
    learner:
      default_policy:
        cur_kl_coeff: 0.37968748807907104
        cur_lr: 0.00019999999494757503
        entropy: 0.6068693995475769
        entropy_coeff: 0.0
        kl: 0.012807823717594147
        policy_loss: -0.012596815824508667
        total_loss: 0.7382307052612305
        vf_explained_var: 0.37490081787109375
        vf_loss: 0.7459644675254822
    load_time_ms: 1.593
    num_steps_sampled: 7500
    num_steps_trained: 5760
    sample_time_ms: 80066.915
    update_time_ms: 6.735
  

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-22.6815,1224.38,7500.0,15.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-12-27
  done: false
  earliest_action: 3.3333333333333335
  episode_len_mean: 22.6
  episode_reward_max: -13.797280983745624
  episode_reward_mean: -22.287184171889557
  episode_reward_min: -46.9207380294837
  episodes_this_iter: 21
  episodes_total: 353
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 190.996
    learner:
      default_policy:
        cur_kl_coeff: 0.37968748807907104
        cur_lr: 0.00019999999494757503
        entropy: 0.5605766177177429
        entropy_coeff: 0.0
        kl: 0.01384887844324112
        policy_loss: -0.03046686016023159
        total_loss: 0.9016550183296204
        vf_explained_var: 0.37029802799224854
        vf_loss: 0.9268636703491211
    load_time_ms: 1.586
    num_steps_sampled: 8000
    num_steps_trained: 6144
    sample_time_ms: 80639.651
    update_time_ms: 7.285
  i

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-22.2872,1308.03,8000.0,16.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-14-46
  done: false
  earliest_action: 3.0
  episode_len_mean: 22.64
  episode_reward_max: -13.797280983745624
  episode_reward_mean: -21.446022619259878
  episode_reward_min: -36.75060776638952
  episodes_this_iter: 22
  episodes_total: 375
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 192.583
    learner:
      default_policy:
        cur_kl_coeff: 0.37968748807907104
        cur_lr: 0.00019999999494757503
        entropy: 0.4582558572292328
        entropy_coeff: 0.0
        kl: 0.0149412015452981
        policy_loss: -0.09140557050704956
        total_loss: 0.375042200088501
        vf_explained_var: 0.4970138967037201
        vf_loss: 0.46077480912208557
    load_time_ms: 1.58
    num_steps_sampled: 8500
    num_steps_trained: 6528
    sample_time_ms: 81041.624
    update_time_ms: 6.767
  iterations_since_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-21.446,1390.93,8500.0,17.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


 85%|████████▌ | 17/20 [39:17<06:54, 138.33s/it]
[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-17-00
  done: false
  earliest_action: 6.0
  episode_len_mean: 22.72
  episode_reward_max: -12.927432361188576
  episode_reward_mean: -20.37596787049672
  episode_reward_min: -31.18088213533773
  episodes_this_iter: 22
  episodes_total: 397
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 193.445
    learner:
      default_policy:
        cur_kl_coeff: 0.37968748807907104
        cur_lr: 0.00019999999494757503
        entropy: 0.3720685541629791
        entropy_coeff: 0.0
        kl: 0.008825796656310558
        policy_loss: 0.00873814057558775
        total_loss: 0.7533662915229797
        vf_explained_var: 0.4848872125148773
        vf_loss: 0.7412770390510559
    load_time_ms: 1.518
    num_steps_sampled: 9000
    num_steps_trained: 6912
    sample_time_ms: 80903.59
    update_time_ms: 6.814
  iterations_since_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-20.376,1468.57,9000.0,18.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-19-11
  done: false
  earliest_action: 13.0
  episode_len_mean: 22.71
  episode_reward_max: -12.927432361188576
  episode_reward_mean: -20.096377158841452
  episode_reward_min: -31.18088213533773
  episodes_this_iter: 22
  episodes_total: 419
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 196.768
    learner:
      default_policy:
        cur_kl_coeff: 0.37968748807907104
        cur_lr: 0.00019999999494757503
        entropy: 0.29895615577697754
        entropy_coeff: 0.0
        kl: 0.0043678912334144115
        policy_loss: -0.028675302863121033
        total_loss: 1.0686954259872437
        vf_explained_var: 0.44388699531555176
        vf_loss: 1.0957123041152954
    load_time_ms: 1.467
    num_steps_sampled: 9500
    num_steps_trained: 7296
    sample_time_ms: 80182.891
    update_time_ms: 6.582
  iteration

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-20.0964,1544.51,9500.0,19.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=5500)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf1a8a:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-21-26
  done: false
  earliest_action: 6.0
  episode_len_mean: 22.66
  episode_reward_max: -12.927432361188576
  episode_reward_mean: -19.037689213552248
  episode_reward_min: -31.18088213533773
  episodes_this_iter: 22
  episodes_total: 441
  experiment_id: 97a6bfd1a6864ea79d671ec97a904c84
  experiment_tag: 1_gamma=0.3
  hostname: gigteam
  info:
    grad_time_ms: 195.547
    learner:
      default_policy:
        cur_kl_coeff: 0.18984374403953552
        cur_lr: 0.00019999999494757503
        entropy: 0.27212175726890564
        entropy_coeff: 0.0
        kl: 0.004637372680008411
        policy_loss: -0.015284478664398193
        total_loss: 0.6593372821807861
        vf_explained_var: 0.5169728398323059
        vf_loss: 0.6737414002418518
    load_time_ms: 1.384
    num_steps_sampled: 10000
    num_steps_trained: 7680
    sample_time_ms: 80486.434
    update_time_ms: 6.722
  iterations_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,RUNNING,128.3.28.231:5500,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,PENDING,,,,,,
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m 2020-03-02 13:21:38,972	INFO trainer.py:420 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=6739)[0m 2020-03-02 13:21:39,139	INFO trainer.py:580 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=6739)[0m   obj = yaml.load(type_)
  0%|          | 0/20 [00:00<?, ?it/s]
[2m[36m(pid=7204)[0m   obj = yaml.load(type_)
[2m[36m(pid=7200)[0m   obj = yaml.load(type_)
[2m[36m(pid=7199)[0m   obj = yaml.load(type_)
[2m[36m(pid=7202)[0m   obj = yaml.load(type_)
[2m[36m(pid=7198)[0m   obj = yaml.load(type_)
[2m[36m(pid=7201)[0m   obj = yaml.load(type_)
[2m[36m(pid=7205)[0m   obj = yaml.load(type_)
[2m[36m(pid=7203)[0m   obj = yaml.load(type_)
[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-24-08
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 23.25
  episode_reward_max: -27.34245513279052
  episode_reward_mean: -35.79183901956628
  episode_reward_min: -42.585104552181164
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 575.4
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5931931734085083
        entropy_coeff: 0.0
        kl: 0.01584339514374733
        policy_loss: -0.029233479872345924
        total_loss: 7.073192596435547
        vf_explained_var: 0.009486973285675049
        vf_loss: 7.099256992340088
    load_time_ms: 82.656
    num_steps_sampled: 500
    num_steps_trained: 384
    sample_time_ms: 86561.963
    update_time_ms: 534.959
  it

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-35.7918,87.8393,500.0,1.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-26-24
  done: false
  earliest_action: 0.0
  episode_len_mean: 23.225
  episode_reward_max: -24.302050652745272
  episode_reward_mean: -35.14342651907225
  episode_reward_min: -47.61955865782558
  episodes_this_iter: 20
  episodes_total: 40
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 384.075
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5530061721801758
        entropy_coeff: 0.0
        kl: 0.02318497560918331
        policy_loss: -0.06544694304466248
        total_loss: 4.188205718994141
        vf_explained_var: 0.003958443645387888
        vf_loss: 4.2490153312683105
    load_time_ms: 41.985
    num_steps_sampled: 1000
    num_steps_trained: 768
    sample_time_ms: 82178.044
    update_time_ms: 271.344
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-35.1434,165.871,1000.0,2.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-28-45
  done: false
  earliest_action: 1.0
  episode_len_mean: 22.918032786885245
  episode_reward_max: -20.00950866255737
  episode_reward_mean: -33.15136476003366
  episode_reward_min: -47.61955865782558
  episodes_this_iter: 21
  episodes_total: 61
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 307.495
    learner:
      default_policy:
        cur_kl_coeff: 0.30000001192092896
        cur_lr: 0.00019999999494757503
        entropy: 1.4684337377548218
        entropy_coeff: 0.0
        kl: 0.02446693181991577
        policy_loss: -0.010401708073914051
        total_loss: 2.1114776134490967
        vf_explained_var: 0.004364351276308298
        vf_loss: 2.11453914642334
    load_time_ms: 28.397
    num_steps_sampled: 1500
    num_steps_trained: 1152
    sample_time_ms: 80980.439
    update_time_ms: 183.8
  ite

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-33.1514,244.654,1500.0,3.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-31-04
  done: false
  earliest_action: 1.6666666666666667
  episode_len_mean: 22.80722891566265
  episode_reward_max: -18.640269422613834
  episode_reward_mean: -31.391312428442113
  episode_reward_min: -47.61955865782558
  episodes_this_iter: 22
  episodes_total: 83
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 279.263
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.334478497505188
        entropy_coeff: 0.0
        kl: 0.030477292835712433
        policy_loss: -0.05535120889544487
        total_loss: 1.193440318107605
        vf_explained_var: 0.007275938987731934
        vf_loss: 1.235076665878296
    load_time_ms: 21.746
    num_steps_sampled: 2000
    num_steps_trained: 1536
    sample_time_ms: 81793.338
    update_time_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-31.3913,329.126,2000.0,4.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-33-21
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 22.75
  episode_reward_max: -18.640269422613834
  episode_reward_mean: -29.939417465787077
  episode_reward_min: -47.61955865782558
  episodes_this_iter: 23
  episodes_total: 106
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 259.845
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.2110501527786255
        entropy_coeff: 0.0
        kl: 0.01844012551009655
        policy_loss: -0.10156457871198654
        total_loss: 1.382750153541565
        vf_explained_var: 0.010292510502040386
        vf_loss: 1.4718676805496216
    load_time_ms: 17.626
    num_steps_sampled: 2500
    num_steps_trained: 1920
    sample_time_ms: 81830.475
    update_time_ms: 114.635


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-29.9394,411.331,2500.0,5.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-35-38
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.38
  episode_reward_max: -16.99324777629094
  episode_reward_mean: -26.83687709108076
  episode_reward_min: -45.27972673305756
  episodes_this_iter: 24
  episodes_total: 130
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 248.868
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.058292031288147
        entropy_coeff: 0.0
        kl: 0.018055761232972145
        policy_loss: -0.06646206229925156
        total_loss: 0.9148402810096741
        vf_explained_var: 0.04821270704269409
        vf_loss: 0.9691147804260254
    load_time_ms: 14.903
    num_steps_sampled: 3000
    num_steps_trained: 2304
    sample_time_ms: 81669.319
    update_time_ms: 96.756
  i

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-26.8369,492.432,3000.0,6.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds
Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-37-53
  done: false
  earliest_action: 0.6666666666666666
  episode_len_mean: 22.15
  episode_reward_max: -16.99324777629094
  episode_reward_mean: -25.04225853385249
  episode_reward_min: -39.087906099198634
  episodes_this_iter: 23
  episodes_total: 153
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 243.175
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.0004693269729614
        entropy_coeff: 0.0
        kl: 0.008723118342459202
        policy_loss: -0.04276499152183533
        total_loss: 1.6825504302978516
        vf_explained_var: 0.04062684252858162
        vf_loss: 1.719427466392517
    load_time_ms: 13.086
    num_steps_sampled: 3500
    num_steps_trained: 2688
    sam



 35%|███▌      | 7/20 [16:10<29:54, 138.01s/it]


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-25.0423,571.845,3500.0,7.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-40-16
  done: false
  earliest_action: 1.6666666666666667
  episode_len_mean: 22.13
  episode_reward_max: -16.86308045604868
  episode_reward_mean: -24.019165888038664
  episode_reward_min: -37.01030040500499
  episodes_this_iter: 22
  episodes_total: 175
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 236.627
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.9385414123535156
        entropy_coeff: 0.0
        kl: 0.01734119839966297
        policy_loss: -0.01474743615835905
        total_loss: 1.6160640716552734
        vf_explained_var: 0.1986093968153
        vf_loss: 1.6191061735153198
    load_time_ms: 11.678
    num_steps_sampled: 4000
    num_steps_trained: 3072
    sample_time_ms: 81580.338
    update_time_ms: 74.532
  iter

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-24.0192,655.538,4000.0,8.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-42-29
  done: false
  earliest_action: 1.6666666666666667
  episode_len_mean: 22.25
  episode_reward_max: -15.93856016956791
  episode_reward_mean: -23.252609274193986
  episode_reward_min: -37.01030040500499
  episodes_this_iter: 21
  episodes_total: 196
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 231.122
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.87455815076828
        entropy_coeff: 0.0
        kl: 0.012947098352015018
        policy_loss: -0.03308115899562836
        total_loss: 1.3541666269302368
        vf_explained_var: 0.3005389869213104
        vf_loss: 1.3785085678100586
    load_time_ms: 10.627
    num_steps_sampled: 4500
    num_steps_trained: 3456
    sample_time_ms: 81006.361
    update_time_ms: 67.105
  it



Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-23.2526,732.181,4500.0,9.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-44-49
  done: false
  earliest_action: 2.6666666666666665
  episode_len_mean: 22.25
  episode_reward_max: -15.48536698474276
  episode_reward_mean: -22.269035549970866
  episode_reward_min: -36.22402284272192
  episodes_this_iter: 21
  episodes_total: 217
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 227.161
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.7784128785133362
        entropy_coeff: 0.0
        kl: 0.011022440157830715
        policy_loss: -0.05631129816174507
        total_loss: 1.311432957649231
        vf_explained_var: 0.2708139717578888
        vf_loss: 1.3603042364120483
    load_time_ms: 9.804
    num_steps_sampled: 5000
    num_steps_trained: 3840
    sample_time_ms: 81190.821
    update_time_ms: 61.205
  it

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-22.269,815.273,5000.0,10.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-47-10
  done: false
  earliest_action: 5.333333333333333
  episode_len_mean: 22.61
  episode_reward_max: -14.348642710806853
  episode_reward_mean: -21.684684470903512
  episode_reward_min: -36.22402284272192
  episodes_this_iter: 24
  episodes_total: 241
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 189.042
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.6196821331977844
        entropy_coeff: 0.0
        kl: 0.01702275313436985
        policy_loss: -0.1083969697356224
        total_loss: 0.7919361591339111
        vf_explained_var: 0.3791804313659668
        vf_loss: 0.888842761516571
    load_time_ms: 1.639
    num_steps_sampled: 5500
    num_steps_trained: 4224
    sample_time_ms: 80862.475
    update_time_ms: 8.34
  iterat

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-21.6847,898.78,5500.0,11.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-49-22
  done: false
  earliest_action: 11.0
  episode_len_mean: 22.53
  episode_reward_max: -14.348642710806853
  episode_reward_mean: -20.229387821198703
  episode_reward_min: -36.22402284272192
  episodes_this_iter: 23
  episodes_total: 264
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 188.11
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.5483207106590271
        entropy_coeff: 0.0
        kl: 0.009989027865231037
        policy_loss: 0.009933263063430786
        total_loss: 1.11760675907135
        vf_explained_var: 0.3620024621486664
        vf_loss: 1.1009310483932495
    load_time_ms: 1.665
    num_steps_sampled: 6000
    num_steps_trained: 4608
    sample_time_ms: 81047.725
    update_time_ms: 8.346
  iterations_since_r

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-20.2294,978.653,6000.0,12.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-51-41
  done: false
  earliest_action: 9.666666666666666
  episode_len_mean: 22.49
  episode_reward_max: -12.459379438678157
  episode_reward_mean: -19.21829922168603
  episode_reward_min: -27.799578884813524
  episodes_this_iter: 21
  episodes_total: 285
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 196.778
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.4928039610385895
        entropy_coeff: 0.0
        kl: 0.005617495626211166
        policy_loss: -0.006141558289527893
        total_loss: 1.215649962425232
        vf_explained_var: 0.3404777944087982
        vf_loss: 1.2179995775222778
    load_time_ms: 1.772
    num_steps_sampled: 6500
    num_steps_trained: 4992
    sample_time_ms: 81651.81
    update_time_ms: 8.242
  ite

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-19.2183,1063.57,6500.0,13.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-53-59
  done: false
  earliest_action: 8.0
  episode_len_mean: 22.49
  episode_reward_max: -12.34070874560998
  episode_reward_mean: -18.487800472673513
  episode_reward_min: -27.375048117959732
  episodes_this_iter: 24
  episodes_total: 309
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 198.448
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.40570154786109924
        entropy_coeff: 0.0
        kl: 0.007541047874838114
        policy_loss: -0.022046441212296486
        total_loss: 1.6561617851257324
        vf_explained_var: 0.32725098729133606
        vf_loss: 1.673117995262146
    load_time_ms: 1.842
    num_steps_sampled: 7000
    num_steps_trained: 5376
    sample_time_ms: 81376.041
    update_time_ms: 7.733
  iterations_sin

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-18.4878,1145.3,7000.0,14.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-56-15
  done: false
  earliest_action: 4.333333333333333
  episode_len_mean: 22.43
  episode_reward_max: -11.992595702385273
  episode_reward_mean: -17.619787692941593
  episode_reward_min: -23.631771038657938
  episodes_this_iter: 20
  episodes_total: 329
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 199.992
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.33701738715171814
        entropy_coeff: 0.0
        kl: 0.004415544215589762
        policy_loss: 0.023045016452670097
        total_loss: 1.5026010274887085
        vf_explained_var: 0.3333418369293213
        vf_loss: 1.4765753746032715
    load_time_ms: 1.898
    num_steps_sampled: 7500
    num_steps_trained: 5760
    sample_time_ms: 81145.458
    update_time_ms: 7.628
  

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-17.6198,1225.21,7500.0,15.0
coop_train_fn_01d00e2c,PENDING,,,,,,


 75%|███████▌  | 15/20 [34:32<11:25, 137.02s/it]
[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_13-58-30
  done: false
  earliest_action: 14.0
  episode_len_mean: 22.41
  episode_reward_max: -11.992595702385273
  episode_reward_mean: -17.220530188765693
  episode_reward_min: -23.631771038657938
  episodes_this_iter: 24
  episodes_total: 353
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 198.476
    learner:
      default_policy:
        cur_kl_coeff: 0.3375000059604645
        cur_lr: 0.00019999999494757503
        entropy: 0.2714349925518036
        entropy_coeff: 0.0
        kl: 0.004662792198359966
        policy_loss: -0.033737994730472565
        total_loss: 1.3233311176300049
        vf_explained_var: 0.30983880162239075
        vf_loss: 1.3554954528808594
    load_time_ms: 1.908
    num_steps_sampled: 8000
    num_steps_trained: 6144
    sample_time_ms: 80889.983
    update_time_ms: 7.315
  iterations_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-17.2205,1303.73,8000.0,16.0
coop_train_fn_01d00e2c,PENDING,,,,,,


 80%|████████  | 16/20 [36:47<09:05, 136.42s/it]
[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-00-42
  done: false
  earliest_action: 7.0
  episode_len_mean: 22.43
  episode_reward_max: -11.992595702385273
  episode_reward_mean: -16.948735919813316
  episode_reward_min: -23.631771038657938
  episodes_this_iter: 21
  episodes_total: 374
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 196.581
    learner:
      default_policy:
        cur_kl_coeff: 0.16875000298023224
        cur_lr: 0.00019999999494757503
        entropy: 0.22385959327220917
        entropy_coeff: 0.0
        kl: 0.001812795759178698
        policy_loss: -0.01662055030465126
        total_loss: 1.4684810638427734
        vf_explained_var: 0.3512478768825531
        vf_loss: 1.4847955703735352
    load_time_ms: 1.815
    num_steps_sampled: 8500
    num_steps_trained: 6528
    sample_time_ms: 80756.392
    update_time_ms: 7.221
  iterations_s

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-16.9487,1381.79,8500.0,17.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-02-55
  done: false
  earliest_action: 14.0
  episode_len_mean: 22.44
  episode_reward_max: -11.992595702385273
  episode_reward_mean: -16.518681964716617
  episode_reward_min: -23.631771038657938
  episodes_this_iter: 22
  episodes_total: 396
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 195.739
    learner:
      default_policy:
        cur_kl_coeff: 0.08437500149011612
        cur_lr: 0.00019999999494757503
        entropy: 0.19598829746246338
        entropy_coeff: 0.0
        kl: 0.0019192388281226158
        policy_loss: -0.02405036799609661
        total_loss: 1.2946568727493286
        vf_explained_var: 0.3436586856842041
        vf_loss: 1.3185453414916992
    load_time_ms: 1.829
    num_steps_sampled: 9000
    num_steps_trained: 6912
    sample_time_ms: 80286.222
    update_time_ms: 7.252
  iterations

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-16.5187,1460.78,9000.0,18.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-05-11
  done: false
  earliest_action: 12.666666666666666
  episode_len_mean: 22.47
  episode_reward_max: -11.960063276251432
  episode_reward_mean: -16.19390140315428
  episode_reward_min: -22.236248482403116
  episodes_this_iter: 23
  episodes_total: 419
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 195.497
    learner:
      default_policy:
        cur_kl_coeff: 0.04218750074505806
        cur_lr: 0.00019999999494757503
        entropy: 0.2015218883752823
        entropy_coeff: 0.0
        kl: 0.002602027030661702
        policy_loss: 0.0070009431801736355
        total_loss: 1.4241856336593628
        vf_explained_var: 0.36925575137138367
        vf_loss: 1.4170747995376587
    load_time_ms: 1.778
    num_steps_sampled: 9500
    num_steps_trained: 7296
    sample_time_ms: 80532.239
    update_time_ms: 7.637

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-16.1939,1539.89,9500.0,19.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=6739)[0m Running 3 evaluation rounds




Result for coop_train_fn_01cf9492:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-07-24
  done: false
  earliest_action: 9.333333333333334
  episode_len_mean: 22.46
  episode_reward_max: -11.960063276251432
  episode_reward_mean: -16.19265126902014
  episode_reward_min: -22.236248482403116
  episodes_this_iter: 22
  episodes_total: 441
  experiment_id: f139b618e7ba4f0a8e39a0e82a2a9bca
  experiment_tag: 2_gamma=0.5
  hostname: gigteam
  info:
    grad_time_ms: 195.742
    learner:
      default_policy:
        cur_kl_coeff: 0.02109375037252903
        cur_lr: 0.00019999999494757503
        entropy: 0.16981764137744904
        entropy_coeff: 0.0
        kl: 0.002314337296411395
        policy_loss: -0.0353802926838398
        total_loss: 1.3278182744979858
        vf_explained_var: 0.3507944643497467
        vf_loss: 1.363149642944336
    load_time_ms: 1.637
    num_steps_sampled: 10000
    num_steps_trained: 7680
    sample_time_ms: 79843.846
    update_time_ms: 7.437
  



Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000.0,20.0
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000.0,20.0
coop_train_fn_01cf9492,RUNNING,128.3.28.231:6739,0.5,-16.1927,1616.09,10000.0,20.0
coop_train_fn_01d00e2c,PENDING,,,,,,


[2m[36m(pid=7245)[0m 2020-03-02 14:08:26,754	INFO trainer.py:420 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
[2m[36m(pid=7245)[0m 2020-03-02 14:08:26,887	INFO trainer.py:580 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=7245)[0m   obj = yaml.load(type_)
  0%|          | 0/20 [00:00<?, ?it/s]
[2m[36m(pid=7261)[0m   obj = yaml.load(type_)
[2m[36m(pid=7252)[0m   obj = yaml.load(type_)
[2m[36m(pid=7247)[0m   obj = yaml.load(type_)
[2m[36m(pid=7257)[0m   obj = yaml.load(type_)
[2m[36m(pid=7254)[0m   obj = yaml.load(type_)
[2m[36m(pid=7266)[0m   obj = yaml.load(type_)
[2m[36m(pid=7244)[0m   obj = yaml.load(type_)
[2m[36m(pid=7271)[0m   obj = yaml.load(type_)
[2m[36m(pid=7245)[0m Running 3 evaluation rounds




[2m[36m(pid=7245)[0m   out=out, **kwargs)
Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-10-54
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.4
  episode_reward_max: -21.860658753701916
  episode_reward_mean: -32.50608367401371
  episode_reward_min: -38.966923510557166
  episodes_this_iter: 20
  episodes_total: 20
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 523.437
    learner:
      default_policy:
        cur_kl_coeff: 0.20000000298023224
        cur_lr: 0.00019999999494757503
        entropy: 1.5891427993774414
        entropy_coeff: 0.0
        kl: 0.020907701924443245
        policy_loss: -0.03277362138032913
        total_loss: 87.37005615234375
        vf_explained_var: 0.00032941499375738204
        vf_loss: 87.39864349365234
    load_time_ms: 53.834
    num_steps_sampled: 500
    num_steps_trained: 384
    sample_time_ms: 83664.63

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-32.5061,85.0463,500,1


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-13-19
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.7
  episode_reward_max: -21.860658753701916
  episode_reward_mean: -32.814437391447754
  episode_reward_min: -43.89036441297965
  episodes_this_iter: 20
  episodes_total: 40
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 362.849
    learner:
      default_policy:
        cur_kl_coeff: 0.30000001192092896
        cur_lr: 0.00019999999494757503
        entropy: 1.5768548250198364
        entropy_coeff: 0.0
        kl: 0.026138314977288246
        policy_loss: -0.03911885246634483
        total_loss: 71.55743408203125
        vf_explained_var: -5.7439010561211035e-05
        vf_loss: 71.58871459960938
    load_time_ms: 27.722
    num_steps_sampled: 1000
    num_steps_trained: 768
    sample_time_ms: 82321.361
    update_time_ms: 372.63

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-32.8144,166.27,1000,2


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-15-38
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.365079365079364
  episode_reward_max: -21.860658753701916
  episode_reward_mean: -32.13388119123995
  episode_reward_min: -44.506636824219385
  episodes_this_iter: 23
  episodes_total: 63
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 321.182
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.534854769706726
        entropy_coeff: 0.0
        kl: 0.01841040514409542
        policy_loss: -0.014685460366308689
        total_loss: 62.4134407043457
        vf_explained_var: -0.00022582213568966836
        vf_loss: 62.41984176635742
    load_time_ms: 19.069
    num_steps_sampled: 1500
    num_steps_trained: 1152
    sample_time_ms: 82242.706
    update_time_ms: 251.217


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-32.1339,248.653,1500,3


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-17-59
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.423529411764704
  episode_reward_max: -21.860658753701916
  episode_reward_mean: -31.739906942332052
  episode_reward_min: -44.506636824219385
  episodes_this_iter: 22
  episodes_total: 85
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 288.558
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.4838241338729858
        entropy_coeff: 0.0
        kl: 0.018272368237376213
        policy_loss: 0.006718821823596954
        total_loss: 46.4044075012207
        vf_explained_var: -0.0001036524772644043
        vf_loss: 46.389469146728516
    load_time_ms: 14.958
    num_steps_sampled: 2000
    num_steps_trained: 1536
    sample_time_ms: 82045.156
    update_

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-31.7399,330.34,2000,4


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-20-19
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.53
  episode_reward_max: -19.743417137825084
  episode_reward_mean: -31.121764312983352
  episode_reward_min: -44.506636824219385
  episodes_this_iter: 23
  episodes_total: 108
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 268.272
    learner:
      default_policy:
        cur_kl_coeff: 0.44999998807907104
        cur_lr: 0.00019999999494757503
        entropy: 1.3415993452072144
        entropy_coeff: 0.0
        kl: 0.03226913884282112
        policy_loss: -0.038231778889894485
        total_loss: 33.405242919921875
        vf_explained_var: -7.422765338560566e-05
        vf_loss: 33.428951263427734
    load_time_ms: 12.401
    num_steps_sampled: 2500
    num_steps_trained: 1920
    sample_time_ms: 81976.949
    update_time_ms: 1

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-31.1218,412.272,2500,5


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-22-43
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.5
  episode_reward_max: -19.743417137825084
  episode_reward_mean: -29.805474380166988
  episode_reward_min: -44.506636824219385
  episodes_this_iter: 23
  episodes_total: 131
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 258.745
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.2439489364624023
        entropy_coeff: 0.0
        kl: 0.01253785565495491
        policy_loss: -0.008412915281951427
        total_loss: 25.970869064331055
        vf_explained_var: -9.191036224365234e-05
        vf_loss: 25.97081756591797
    load_time_ms: 10.767
    num_steps_sampled: 3000
    num_steps_trained: 2304
    sample_time_ms: 82035.666
    update_time_ms: 129.851
  iterations

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-29.8055,494.86,3000,6


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-25-12
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.54
  episode_reward_max: -19.656675338693002
  episode_reward_mean: -28.34998053393219
  episode_reward_min: -44.506636824219385
  episodes_this_iter: 20
  episodes_total: 151
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 248.118
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.1336370706558228
        entropy_coeff: 0.0
        kl: 0.011593982577323914
        policy_loss: -0.019446147605776787
        total_loss: 17.215425491333008
        vf_explained_var: -0.00010903676593443379
        vf_loss: 17.2270450592041
    load_time_ms: 9.474
    num_steps_sampled: 3500
    num_steps_trained: 2688
    sample_time_ms: 81696.146
    update_time_ms: 112.968
  iterations

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-28.35,574.753,3500,7


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-27-38
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.52
  episode_reward_max: -18.80358384935892
  episode_reward_mean: -26.68416746457317
  episode_reward_min: -35.54042868094645
  episodes_this_iter: 23
  episodes_total: 174
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 242.027
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 1.00751793384552
        entropy_coeff: 0.0
        kl: 0.009654078632593155
        policy_loss: -0.03762245550751686
        total_loss: 12.285231590270996
        vf_explained_var: -4.851818084716797e-05
        vf_loss: 12.316337585449219
    load_time_ms: 8.473
    num_steps_sampled: 4000
    num_steps_trained: 3072
    sample_time_ms: 81697.212
    update_time_ms: 99.838
  iterations_sinc

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-26.6842,656.699,4000,8


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-29-58
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.55
  episode_reward_max: -17.719501602962
  episode_reward_mean: -25.245571873269313
  episode_reward_min: -35.54042868094645
  episodes_this_iter: 22
  episodes_total: 196
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 240.421
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.9067935347557068
        entropy_coeff: 0.0
        kl: 0.013483375310897827
        policy_loss: -0.022166654467582703
        total_loss: 11.271069526672363
        vf_explained_var: -7.557868957519531e-05
        vf_loss: 11.284136772155762
    load_time_ms: 7.666
    num_steps_sampled: 4500
    num_steps_trained: 3456
    sample_time_ms: 81771.992
    update_time_ms: 89.652
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-25.2456,739.348,4500,9


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-32-15
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.36
  episode_reward_max: -17.24307337448205
  episode_reward_mean: -23.526724458346283
  episode_reward_min: -31.65276965459898
  episodes_this_iter: 24
  episodes_total: 220
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 235.093
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.7433553338050842
        entropy_coeff: 0.0
        kl: 0.012055136263370514
        policy_loss: -0.029469260945916176
        total_loss: 8.090298652648926
        vf_explained_var: -4.845857620239258e-05
        vf_loss: 8.1116304397583
    load_time_ms: 7.018
    num_steps_sampled: 5000
    num_steps_trained: 3840
    sample_time_ms: 81736.483
    update_time_ms: 81.159
  iterations_sinc

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-23.5267,820.984,5000,10


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-34-26
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.39
  episode_reward_max: -16.023471841185273
  episode_reward_mean: -22.449063434831086
  episode_reward_min: -31.377963215798076
  episodes_this_iter: 21
  episodes_total: 241
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 201.383
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.6135984063148499
        entropy_coeff: 0.0
        kl: 0.007392928469926119
        policy_loss: 0.03730444610118866
        total_loss: 6.077453136444092
        vf_explained_var: -2.9881795853725635e-05
        vf_loss: 6.035158634185791
    load_time_ms: 1.809
    num_steps_sampled: 5500
    num_steps_trained: 4224
    sample_time_ms: 81199.656
    update_time_ms: 8.433


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-22.4491,899.506,5500,11


 55%|█████▌    | 11/20 [25:55<20:45, 138.37s/it]
[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-36-39
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.33
  episode_reward_max: -15.52587028139144
  episode_reward_mean: -21.141469058962162
  episode_reward_min: -30.03087194063059
  episodes_this_iter: 23
  episodes_total: 264
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 199.082
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.49649572372436523
        entropy_coeff: 0.0
        kl: 0.0067540486343204975
        policy_loss: -0.03696141019463539
        total_loss: 5.307623386383057
        vf_explained_var: -5.0902366638183594e-05
        vf_loss: 5.340025424957275
    load_time_ms: 1.79
    num_steps_sampled: 6000
    num_steps_trained: 4608
    sample_time_ms: 81191.509
    update_time_ms: 7.871
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-21.1415,980.615,6000,12


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-38-48
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.24
  episode_reward_max: -14.29417410113383
  episode_reward_mean: -20.37145830784644
  episode_reward_min: -30.03087194063059
  episodes_this_iter: 24
  episodes_total: 288
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 191.602
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.3900728225708008
        entropy_coeff: 0.0
        kl: 0.005979182198643684
        policy_loss: -0.02772604487836361
        total_loss: 3.915926933288574
        vf_explained_var: -5.467732626129873e-05
        vf_loss: 3.9396169185638428
    load_time_ms: 1.724
    num_steps_sampled: 6500
    num_steps_trained: 4992
    sample_time_ms: 80808.829
    update_time_ms: 7.895
  iterations_sinc

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-20.3715,1059.07,6500,13


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-41-03
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.21
  episode_reward_max: -14.29417410113383
  episode_reward_mean: -19.415182171286677
  episode_reward_min: -28.318070695780214
  episodes_this_iter: 21
  episodes_total: 309
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 198.098
    learner:
      default_policy:
        cur_kl_coeff: 0.675000011920929
        cur_lr: 0.00019999999494757503
        entropy: 0.30088624358177185
        entropy_coeff: 0.0
        kl: 0.004674558062106371
        policy_loss: -0.030573464930057526
        total_loss: 3.15602970123291
        vf_explained_var: -2.1219253540039062e-05
        vf_loss: 3.18344783782959
    load_time_ms: 1.645
    num_steps_sampled: 7000
    num_steps_trained: 5376
    sample_time_ms: 80820.517
    update_time_ms: 7.63
  iterations_sin



Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-19.4152,1140.94,7000,14


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-43-16
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.07
  episode_reward_max: -13.893676811115869
  episode_reward_mean: -18.693952427615585
  episode_reward_min: -28.318070695780214
  episodes_this_iter: 22
  episodes_total: 331
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 197.985
    learner:
      default_policy:
        cur_kl_coeff: 0.3375000059604645
        cur_lr: 0.00019999999494757503
        entropy: 0.22467757761478424
        entropy_coeff: 0.0
        kl: 0.0037405285984277725
        policy_loss: 0.004548497498035431
        total_loss: 3.3706798553466797
        vf_explained_var: -5.416075509856455e-05
        vf_loss: 3.3648688793182373
    load_time_ms: 1.644
    num_steps_sampled: 7500
    num_steps_trained: 5760
    sample_time_ms: 80520.685
    update_time_ms: 7.878
  iteratio

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-18.694,1219.87,7500,15


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-45-23
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.15
  episode_reward_max: -13.893676811115869
  episode_reward_mean: -18.172706943561952
  episode_reward_min: -25.183317269432905
  episodes_this_iter: 24
  episodes_total: 355
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 195.499
    learner:
      default_policy:
        cur_kl_coeff: 0.16875000298023224
        cur_lr: 0.00019999999494757503
        entropy: 0.17088718712329865
        entropy_coeff: 0.0
        kl: 0.0024468537885695696
        policy_loss: 0.023292794823646545
        total_loss: 2.9871370792388916
        vf_explained_var: -3.143151479889639e-05
        vf_loss: 2.9634313583374023
    load_time_ms: 1.565
    num_steps_sampled: 8000
    num_steps_trained: 6144
    sample_time_ms: 79818.267
    update_time_ms: 7.872
  iterati

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-18.1727,1295.4,8000,16


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-47-36
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.18
  episode_reward_max: -13.893676811115869
  episode_reward_mean: -17.872460380890196
  episode_reward_min: -25.183317269432905
  episodes_this_iter: 22
  episodes_total: 377
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 196.468
    learner:
      default_policy:
        cur_kl_coeff: 0.08437500149011612
        cur_lr: 0.00019999999494757503
        entropy: 0.1268981546163559
        entropy_coeff: 0.0
        kl: 0.0017918258672580123
        policy_loss: -0.018201613798737526
        total_loss: 2.8580949306488037
        vf_explained_var: -4.907448965241201e-05
        vf_loss: 2.876145124435425
    load_time_ms: 1.547
    num_steps_sampled: 8500
    num_steps_trained: 6528
    sample_time_ms: 79894.441
    update_time_ms: 7.335
  iteratio

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-17.8725,1376.06,8500,17


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-49-48
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.17
  episode_reward_max: -13.893676811115869
  episode_reward_mean: -17.422572415201262
  episode_reward_min: -24.048503331328586
  episodes_this_iter: 23
  episodes_total: 400
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 200.172
    learner:
      default_policy:
        cur_kl_coeff: 0.04218750074505806
        cur_lr: 0.00019999999494757503
        entropy: 0.1479969471693039
        entropy_coeff: 0.0
        kl: 0.0005421453970484436
        policy_loss: -0.02206662856042385
        total_loss: 2.356070041656494
        vf_explained_var: -7.359186565736309e-05
        vf_loss: 2.3781137466430664
    load_time_ms: 1.55
    num_steps_sampled: 9000
    num_steps_trained: 6912
    sample_time_ms: 79789.15
    update_time_ms: 7.5
  iterations_si

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-17.4226,1456.99,9000,18


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-52-00
  done: false
  earliest_action: 0.3333333333333333
  episode_len_mean: 22.38
  episode_reward_max: -14.814193866508758
  episode_reward_mean: -17.444565208562548
  episode_reward_min: -24.048503331328586
  episodes_this_iter: 20
  episodes_total: 420
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 195.789
    learner:
      default_policy:
        cur_kl_coeff: 0.02109375037252903
        cur_lr: 0.00019999999494757503
        entropy: 0.1298200935125351
        entropy_coeff: 0.0
        kl: 0.00037746247835457325
        policy_loss: -0.007249698042869568
        total_loss: 2.43117356300354
        vf_explained_var: -3.9895374357001856e-05
        vf_loss: 2.438415288925171
    load_time_ms: 1.642
    num_steps_sampled: 9500
    num_steps_trained: 7296
    sample_time_ms: 79553.57
    update_time_ms: 7.

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-17.4446,1537.22,9500,19


[2m[36m(pid=7245)[0m Running 3 evaluation rounds




Result for coop_train_fn_01d00e2c:
  avg_entropy: .nan
  custom_metrics: {}
  date: 2020-03-02_14-54-14
  done: false
  earliest_action: 0.0
  episode_len_mean: 22.24
  episode_reward_max: -14.002397741415791
  episode_reward_mean: -17.18398942825228
  episode_reward_min: -24.048503331328586
  episodes_this_iter: 25
  episodes_total: 445
  experiment_id: b6d1371fe6874c05951b2b75058d8cef
  experiment_tag: 3_gamma=0.9
  hostname: gigteam
  info:
    grad_time_ms: 195.074
    learner:
      default_policy:
        cur_kl_coeff: 0.010546875186264515
        cur_lr: 0.00019999999494757503
        entropy: 0.11284220963716507
        entropy_coeff: 0.0
        kl: 0.00027872747159563005
        policy_loss: -0.007401251699775457
        total_loss: 2.920513868331909
        vf_explained_var: -5.6544940889580175e-05
        vf_loss: 2.9279119968414307
    load_time_ms: 1.717
    num_steps_sampled: 10000
    num_steps_trained: 7680
    sample_time_ms: 79670.836
    update_time_ms: 7.42
  itera

Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,RUNNING,128.3.28.231:7245,0.9,-17.184,1620.04,10000,20


100%|██████████| 20/20 [45:43<00:00, 137.17s/it]


Trial name,status,loc,gamma,reward,total time (s),ts,iter
coop_train_fn_01ce5c76,TERMINATED,,0.0,-33.3417,1616.1,10000,20
coop_train_fn_01cf1a8a,TERMINATED,,0.3,-19.0377,1623.6,10000,20
coop_train_fn_01cf9492,TERMINATED,,0.5,-16.1927,1616.09,10000,20
coop_train_fn_01d00e2c,TERMINATED,,0.9,-17.184,1620.04,10000,20


2020-03-02 14:54:14,656	INFO tune.py:352 -- Returning an analysis object by default. You can call `analysis.trials` to retrieve a list of trials. This message will be removed in future versions of Tune.
