# Solving RL problems with `ray[rllib]`

<img src="images/cartpole.jpg" width="500"></img>

## Step 1: Initialize `ray`

- `ray` is a package providing distributed computing primitives. `rllib` is built on `ray`.

In [1]:
import ray

ray.init()

{'node_ip_address': '192.168.0.90',
 'raylet_ip_address': '192.168.0.90',
 'redis_address': '192.168.0.90:6379',
 'object_store_address': '/tmp/ray/session_2022-01-03_13-36-30_217833_10780/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-01-03_13-36-30_217833_10780/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2022-01-03_13-36-30_217833_10780',
 'metrics_export_port': 61790,
 'node_id': '792c2cebf718247142f97bf17da59a5a2878c4f6d0078bf9f8316b56'}

## Step 2: Run an **experiment** to solve RL problems

An experiment involves four things
- A **RL environment** (e.g. `CartPole-v1`)
- A **RL algorithm** to learn in that environment (e.g. Proximal Policy Optimization (PPO))
- **Configuration** (algorithm config, experiment config, environment config etc.)
- An **experiment runner** (called `tune`)

In [2]:
from ray import tune

tune.run("PPO",
         config={"env": "CartPole-v1",
                 # other configurations go here, if none provided, then default configurations will be used
                 }
         )

[2m[36m(PPO pid=11129)[0m 2022-01-03 13:39:11,432	INFO trainer.py:722 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also want to then set `eager_tracing=True` in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPO pid=11129)[0m 2022-01-03 13:39:11,432	INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPO pid=11129)[0m 2022-01-03 13:39:11,432	INFO trainer.py:743 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


Trial name,status,loc
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129




Trial name,status,loc
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129




Trial name,status,loc
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 4000
  custom_metrics: {}
  date: 2022-01-03_13-39-21
  done: false
  episode_len_mean: 21.64673913043478
  episode_media: {}
  episode_reward_max: 68.0
  episode_reward_mean: 21.64673913043478
  episode_reward_min: 8.0
  episodes_this_iter: 184
  episodes_total: 184
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.6663591265678406
          entropy_coeff: 0.0
          kl: 0.02796749770641327
          model: {}
          policy_loss: -0.044631849974393845
          total_loss: 192.7428741455078
          vf_explained_var: 0.016500303521752357
          vf_loss: 192.78192138671875
    num_agent_steps_sampled: 4000
    num_agent_steps_trained: 4000
    num_steps_sampled: 4000
    num_steps_trained: 4000

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,1,7.00473,4000,21.6467,68,8,21.6467


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-01-03_13-39-28
  done: false
  episode_len_mean: 43.97
  episode_media: {}
  episode_reward_max: 197.0
  episode_reward_mean: 43.97
  episode_reward_min: 8.0
  episodes_this_iter: 79
  episodes_total: 263
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.6105722188949585
          entropy_coeff: 0.0
          kl: 0.017157860100269318
          model: {}
          policy_loss: -0.03345935046672821
          total_loss: 474.6767272949219
          vf_explained_var: 0.07448326796293259
          vf_loss: 474.70501708984375
    num_agent_steps_sampled: 8000
    num_agent_steps_trained: 8000
    num_steps_sampled: 8000
    num_steps_trained: 8000
    num_steps_trained_th

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,2,13.8136,8000,43.97,197,8,43.97


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 12000
  custom_metrics: {}
  date: 2022-01-03_13-39-35
  done: false
  episode_len_mean: 71.13
  episode_media: {}
  episode_reward_max: 248.0
  episode_reward_mean: 71.13
  episode_reward_min: 11.0
  episodes_this_iter: 36
  episodes_total: 299
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5665245056152344
          entropy_coeff: 0.0
          kl: 0.010042975656688213
          model: {}
          policy_loss: -0.018269281834363937
          total_loss: 652.2549438476562
          vf_explained_var: 0.08383574336767197
          vf_loss: 652.2701416015625
    num_agent_steps_sampled: 12000
    num_agent_steps_trained: 12000
    num_steps_sampled: 12000
    num_steps_trained: 12000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,3,20.3305,12000,71.13,248,11,71.13


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 16000
  custom_metrics: {}
  date: 2022-01-03_13-39-41
  done: false
  episode_len_mean: 95.06
  episode_media: {}
  episode_reward_max: 271.0
  episode_reward_mean: 95.06
  episode_reward_min: 11.0
  episodes_this_iter: 23
  episodes_total: 322
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5408810973167419
          entropy_coeff: 0.0
          kl: 0.005866415333002806
          model: {}
          policy_loss: -0.01100907102227211
          total_loss: 570.3348388671875
          vf_explained_var: 0.2864651679992676
          vf_loss: 570.3441162109375
    num_agent_steps_sampled: 16000
    num_agent_steps_trained: 16000
    num_steps_sampled: 16000
    num_steps_trained: 16000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,4,27.0606,16000,95.06,271,11,95.06


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,4,27.0606,16000,95.06,271,11,95.06


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 20000
  custom_metrics: {}
  date: 2022-01-03_13-39-48
  done: false
  episode_len_mean: 126.15
  episode_media: {}
  episode_reward_max: 403.0
  episode_reward_mean: 126.15
  episode_reward_min: 11.0
  episodes_this_iter: 21
  episodes_total: 343
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5414741039276123
          entropy_coeff: 0.0
          kl: 0.006212709471583366
          model: {}
          policy_loss: -0.012014788575470448
          total_loss: 357.6759948730469
          vf_explained_var: 0.5299631953239441
          vf_loss: 357.6861572265625
    num_agent_steps_sampled: 20000
    num_agent_steps_trained: 20000
    num_steps_sampled: 20000
    num_steps_trained: 20000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,5,33.6944,20000,126.15,403,11,126.15


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 24000
  custom_metrics: {}
  date: 2022-01-03_13-39-55
  done: false
  episode_len_mean: 157.89
  episode_media: {}
  episode_reward_max: 476.0
  episode_reward_mean: 157.89
  episode_reward_min: 11.0
  episodes_this_iter: 13
  episodes_total: 356
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5308369994163513
          entropy_coeff: 0.0
          kl: 0.009827823378145695
          model: {}
          policy_loss: -0.017479119822382927
          total_loss: 382.46868896484375
          vf_explained_var: 0.4310585558414459
          vf_loss: 382.4832458496094
    num_agent_steps_sampled: 24000
    num_agent_steps_trained: 24000
    num_steps_sampled: 24000
    num_steps_trained: 24000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,6,40.2547,24000,157.89,476,11,157.89


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 28000
  custom_metrics: {}
  date: 2022-01-03_13-40-02
  done: false
  episode_len_mean: 194.74
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 194.74
  episode_reward_min: 20.0
  episodes_this_iter: 10
  episodes_total: 366
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5359815359115601
          entropy_coeff: 0.0
          kl: 0.003213357413187623
          model: {}
          policy_loss: -0.00845080055296421
          total_loss: 345.37432861328125
          vf_explained_var: 0.36814582347869873
          vf_loss: 345.38177490234375
    num_agent_steps_sampled: 28000
    num_agent_steps_trained: 28000
    num_steps_sampled: 28000
    num_steps_trained: 28000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,7,47.0663,28000,194.74,500,20,194.74


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 32000
  custom_metrics: {}
  date: 2022-01-03_13-40-08
  done: false
  episode_len_mean: 222.53
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 222.53
  episode_reward_min: 20.0
  episodes_this_iter: 8
  episodes_total: 374
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.15000000596046448
          cur_lr: 4.999999873689376e-05
          entropy: 0.5514851212501526
          entropy_coeff: 0.0
          kl: 0.0034927264787256718
          model: {}
          policy_loss: -0.006180190946906805
          total_loss: 327.150146484375
          vf_explained_var: 0.15332654118537903
          vf_loss: 327.1558532714844
    num_agent_steps_sampled: 32000
    num_agent_steps_trained: 32000
    num_steps_sampled: 32000
    num_steps_trained: 32000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,8,53.557,32000,222.53,500,20,222.53


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 36000
  custom_metrics: {}
  date: 2022-01-03_13-40-15
  done: false
  episode_len_mean: 252.38
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 252.38
  episode_reward_min: 20.0
  episodes_this_iter: 8
  episodes_total: 382
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.5647332072257996
          entropy_coeff: 0.0
          kl: 0.007649588864296675
          model: {}
          policy_loss: -0.0074668750166893005
          total_loss: 374.8133850097656
          vf_explained_var: 0.173770010471344
          vf_loss: 374.82025146484375
    num_agent_steps_sampled: 36000
    num_agent_steps_trained: 36000
    num_steps_sampled: 36000
    num_steps_trained: 36000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,9,60.341,36000,252.38,500,20,252.38


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,9,60.341,36000,252.38,500,20,252.38


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 40000
  custom_metrics: {}
  date: 2022-01-03_13-40-22
  done: false
  episode_len_mean: 285.96
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 285.96
  episode_reward_min: 22.0
  episodes_this_iter: 9
  episodes_total: 391
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.5340165495872498
          entropy_coeff: 0.0
          kl: 0.0032838049810379744
          model: {}
          policy_loss: -0.0001432914868928492
          total_loss: 567.168212890625
          vf_explained_var: -0.018672805279493332
          vf_loss: 567.1680908203125
    num_agent_steps_sampled: 40000
    num_agent_steps_trained: 40000
    num_steps_sampled: 40000
    num_steps_trained: 40000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,10,67.0795,40000,285.96,500,22,285.96


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 44000
  custom_metrics: {}
  date: 2022-01-03_13-40-28
  done: false
  episode_len_mean: 316.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 316.4
  episode_reward_min: 32.0
  episodes_this_iter: 8
  episodes_total: 399
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.03750000149011612
          cur_lr: 4.999999873689376e-05
          entropy: 0.569813072681427
          entropy_coeff: 0.0
          kl: 0.0026031602174043655
          model: {}
          policy_loss: -0.0005075505468994379
          total_loss: 531.6728515625
          vf_explained_var: -0.010295074433088303
          vf_loss: 531.6732788085938
    num_agent_steps_sampled: 44000
    num_agent_steps_trained: 44000
    num_steps_sampled: 44000
    num_steps_trained: 44000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,11,73.7405,44000,316.4,500,32,316.4


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 48000
  custom_metrics: {}
  date: 2022-01-03_13-40-35
  done: false
  episode_len_mean: 342.68
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 342.68
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 407
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.5800459384918213
          entropy_coeff: 0.0
          kl: 0.0032933764159679413
          model: {}
          policy_loss: -0.0005361199146136642
          total_loss: 502.0644226074219
          vf_explained_var: 0.035069212317466736
          vf_loss: 502.06488037109375
    num_agent_steps_sampled: 48000
    num_agent_steps_trained: 48000
    num_steps_sampled: 48000
    num_steps_trained: 48000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,12,80.2224,48000,342.68,500,39,342.68


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 52000
  custom_metrics: {}
  date: 2022-01-03_13-40-42
  done: false
  episode_len_mean: 366.17
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 366.17
  episode_reward_min: 39.0
  episodes_this_iter: 9
  episodes_total: 416
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.00937500037252903
          cur_lr: 4.999999873689376e-05
          entropy: 0.5499919056892395
          entropy_coeff: 0.0
          kl: 0.009402944706380367
          model: {}
          policy_loss: -0.005568421445786953
          total_loss: 437.74261474609375
          vf_explained_var: 0.031414251774549484
          vf_loss: 437.7480773925781
    num_agent_steps_sampled: 52000
    num_agent_steps_trained: 52000
    num_steps_sampled: 52000
    num_steps_trained: 52000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,13,86.8478,52000,366.17,500,39,366.17


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 56000
  custom_metrics: {}
  date: 2022-01-03_13-40-48
  done: false
  episode_len_mean: 394.07
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 394.07
  episode_reward_min: 39.0
  episodes_this_iter: 9
  episodes_total: 425
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.00937500037252903
          cur_lr: 4.999999873689376e-05
          entropy: 0.5567706823348999
          entropy_coeff: 0.0
          kl: 0.008467133156955242
          model: {}
          policy_loss: -0.004910553339868784
          total_loss: 415.7817077636719
          vf_explained_var: 0.004876916296780109
          vf_loss: 415.7865905761719
    num_agent_steps_sampled: 56000
    num_agent_steps_trained: 56000
    num_steps_sampled: 56000
    num_steps_trained: 56000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,14,93.6665,56000,394.07,500,39,394.07


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,14,93.6665,56000,394.07,500,39,394.07


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 60000
  custom_metrics: {}
  date: 2022-01-03_13-40-55
  done: false
  episode_len_mean: 420.11
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 420.11
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 433
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.00937500037252903
          cur_lr: 4.999999873689376e-05
          entropy: 0.5898487567901611
          entropy_coeff: 0.0
          kl: 0.0032153381034731865
          model: {}
          policy_loss: -0.001305061625316739
          total_loss: 503.4804382324219
          vf_explained_var: 0.03905607759952545
          vf_loss: 503.4817199707031
    num_agent_steps_sampled: 60000
    num_agent_steps_trained: 60000
    num_steps_sampled: 60000
    num_steps_trained: 60000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,15,100.277,60000,420.11,500,39,420.11


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 64000
  custom_metrics: {}
  date: 2022-01-03_13-41-02
  done: false
  episode_len_mean: 442.89
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 442.89
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 441
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.004687500186264515
          cur_lr: 4.999999873689376e-05
          entropy: 0.5899381041526794
          entropy_coeff: 0.0
          kl: 0.0036167073994874954
          model: {}
          policy_loss: 0.0004902082146145403
          total_loss: 545.7775268554688
          vf_explained_var: 0.08752446621656418
          vf_loss: 545.7770385742188
    num_agent_steps_sampled: 64000
    num_agent_steps_trained: 64000
    num_steps_sampled: 64000
    num_steps_trained: 64000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,16,107.026,64000,442.89,500,39,442.89


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 68000
  custom_metrics: {}
  date: 2022-01-03_13-41-08
  done: false
  episode_len_mean: 453.79
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 453.79
  episode_reward_min: 39.0
  episodes_this_iter: 10
  episodes_total: 451
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0023437500931322575
          cur_lr: 4.999999873689376e-05
          entropy: 0.5610551238059998
          entropy_coeff: 0.0
          kl: 0.00781817827373743
          model: {}
          policy_loss: -0.005211701150983572
          total_loss: 421.1143798828125
          vf_explained_var: 0.1872728019952774
          vf_loss: 421.11956787109375
    num_agent_steps_sampled: 68000
    num_agent_steps_trained: 68000
    num_steps_sampled: 68000
    num_steps_trained: 68000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,17,113.558,68000,453.79,500,39,453.79


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 72000
  custom_metrics: {}
  date: 2022-01-03_13-41-15
  done: false
  episode_len_mean: 464.29
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 464.29
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 459
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0023437500931322575
          cur_lr: 4.999999873689376e-05
          entropy: 0.5712805986404419
          entropy_coeff: 0.0
          kl: 0.004599640611559153
          model: {}
          policy_loss: -0.0021797791123390198
          total_loss: 355.96728515625
          vf_explained_var: 0.14474022388458252
          vf_loss: 355.96942138671875
    num_agent_steps_sampled: 72000
    num_agent_steps_trained: 72000
    num_steps_sampled: 72000
    num_steps_trained: 72000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,18,120.254,72000,464.29,500,39,464.29


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 76000
  custom_metrics: {}
  date: 2022-01-03_13-41-22
  done: false
  episode_len_mean: 469.34
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 469.34
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 467
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0011718750465661287
          cur_lr: 4.999999873689376e-05
          entropy: 0.5803890824317932
          entropy_coeff: 0.0
          kl: 0.005978395696729422
          model: {}
          policy_loss: -0.001656494103372097
          total_loss: 479.11749267578125
          vf_explained_var: -0.03898642212152481
          vf_loss: 479.119140625
    num_agent_steps_sampled: 76000
    num_agent_steps_trained: 76000
    num_steps_sampled: 76000
    num_steps_trained: 76000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,19,127.185,76000,469.34,500,39,469.34


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,19,127.185,76000,469.34,500,39,469.34


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 80000
  custom_metrics: {}
  date: 2022-01-03_13-41-29
  done: false
  episode_len_mean: 471.59
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 471.59
  episode_reward_min: 39.0
  episodes_this_iter: 8
  episodes_total: 475
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0011718750465661287
          cur_lr: 4.999999873689376e-05
          entropy: 0.5981844067573547
          entropy_coeff: 0.0
          kl: 0.005319596733897924
          model: {}
          policy_loss: -0.0003460199513938278
          total_loss: 539.0498046875
          vf_explained_var: -0.055875200778245926
          vf_loss: 539.0501098632812
    num_agent_steps_sampled: 80000
    num_agent_steps_trained: 80000
    num_steps_sampled: 80000
    num_steps_trained: 80000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,20,133.772,80000,471.59,500,39,471.59


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 84000
  custom_metrics: {}
  date: 2022-01-03_13-41-35
  done: false
  episode_len_mean: 470.12
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 470.12
  episode_reward_min: 103.0
  episodes_this_iter: 10
  episodes_total: 485
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0011718750465661287
          cur_lr: 4.999999873689376e-05
          entropy: 0.5744273066520691
          entropy_coeff: 0.0
          kl: 0.003393118502572179
          model: {}
          policy_loss: 0.0013910011621192098
          total_loss: 588.284423828125
          vf_explained_var: -0.04091992974281311
          vf_loss: 588.2830200195312
    num_agent_steps_sampled: 84000
    num_agent_steps_trained: 84000
    num_steps_sampled: 84000
    num_steps_trained: 84000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,21,140.283,84000,470.12,500,103,470.12


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 88000
  custom_metrics: {}
  date: 2022-01-03_13-41-42
  done: false
  episode_len_mean: 467.65
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 467.65
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 493
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5612112283706665
          entropy_coeff: 0.0
          kl: 0.007541084196418524
          model: {}
          policy_loss: -0.005620027892291546
          total_loss: 317.69781494140625
          vf_explained_var: 0.22571057081222534
          vf_loss: 317.70343017578125
    num_agent_steps_sampled: 88000
    num_agent_steps_trained: 88000
    num_steps_sampled: 88000
    num_steps_trained: 88000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,22,146.911,88000,467.65,500,103,467.65


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 92000
  custom_metrics: {}
  date: 2022-01-03_13-41-49
  done: false
  episode_len_mean: 467.65
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 467.65
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 501
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5558075904846191
          entropy_coeff: 0.0
          kl: 0.0072825136594474316
          model: {}
          policy_loss: -0.001537785748951137
          total_loss: 480.6811828613281
          vf_explained_var: -0.06036938726902008
          vf_loss: 480.6827392578125
    num_agent_steps_sampled: 92000
    num_agent_steps_trained: 92000
    num_steps_sampled: 92000
    num_steps_trained: 92000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,23,153.674,92000,467.65,500,103,467.65


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 96000
  custom_metrics: {}
  date: 2022-01-03_13-41-55
  done: false
  episode_len_mean: 463.99
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 463.99
  episode_reward_min: 103.0
  episodes_this_iter: 10
  episodes_total: 511
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5813219547271729
          entropy_coeff: 0.0
          kl: 0.012521774508059025
          model: {}
          policy_loss: -0.00892722699791193
          total_loss: 342.2606201171875
          vf_explained_var: 0.2783268392086029
          vf_loss: 342.2695617675781
    num_agent_steps_sampled: 96000
    num_agent_steps_trained: 96000
    num_steps_sampled: 96000
    num_steps_trained: 96000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,24,160.356,96000,463.99,500,103,463.99


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,24,160.356,96000,463.99,500,103,463.99


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 100000
  custom_metrics: {}
  date: 2022-01-03_13-42-02
  done: false
  episode_len_mean: 471.79
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 471.79
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 519
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5722456574440002
          entropy_coeff: 0.0
          kl: 0.0063661839812994
          model: {}
          policy_loss: -0.002771718893200159
          total_loss: 411.03179931640625
          vf_explained_var: 0.08734791725873947
          vf_loss: 411.0345764160156
    num_agent_steps_sampled: 100000
    num_agent_steps_trained: 100000
    num_steps_sampled: 100000
    num_steps_trained: 100000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,25,166.9,100000,471.79,500,103,471.79


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 104000
  custom_metrics: {}
  date: 2022-01-03_13-42-09
  done: false
  episode_len_mean: 471.86
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 471.86
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 527
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.551196277141571
          entropy_coeff: 0.0
          kl: 0.005431749392300844
          model: {}
          policy_loss: -0.001215668162330985
          total_loss: 454.0567626953125
          vf_explained_var: 0.03504559397697449
          vf_loss: 454.0579528808594
    num_agent_steps_sampled: 104000
    num_agent_steps_trained: 104000
    num_steps_sampled: 104000
    num_steps_trained: 104000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,26,173.644,104000,471.86,500,103,471.86


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 108000
  custom_metrics: {}
  date: 2022-01-03_13-42-16
  done: false
  episode_len_mean: 471.86
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 471.86
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 535
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5692477226257324
          entropy_coeff: 0.0
          kl: 0.005617249757051468
          model: {}
          policy_loss: -0.0013889643596485257
          total_loss: 428.63238525390625
          vf_explained_var: 0.10854756832122803
          vf_loss: 428.6337585449219
    num_agent_steps_sampled: 108000
    num_agent_steps_trained: 108000
    num_steps_sampled: 108000
    num_steps_trained: 108000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,27,180.352,108000,471.86,500,103,471.86


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 112000
  custom_metrics: {}
  date: 2022-01-03_13-42-22
  done: false
  episode_len_mean: 473.37
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 473.37
  episode_reward_min: 103.0
  episodes_this_iter: 8
  episodes_total: 543
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5218449831008911
          entropy_coeff: 0.0
          kl: 0.00573698990046978
          model: {}
          policy_loss: -0.0002649179077707231
          total_loss: 369.30255126953125
          vf_explained_var: 0.21435509622097015
          vf_loss: 369.3028259277344
    num_agent_steps_sampled: 112000
    num_agent_steps_trained: 112000
    num_steps_sampled: 112000
    num_steps_trained: 112000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,28,187.009,112000,473.37,500,103,473.37


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 116000
  custom_metrics: {}
  date: 2022-01-03_13-42-29
  done: false
  episode_len_mean: 484.09
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 484.09
  episode_reward_min: 105.0
  episodes_this_iter: 8
  episodes_total: 551
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5402454137802124
          entropy_coeff: 0.0
          kl: 0.007040547206997871
          model: {}
          policy_loss: -0.0027080182917416096
          total_loss: 311.1156311035156
          vf_explained_var: 0.2556372582912445
          vf_loss: 311.1183166503906
    num_agent_steps_sampled: 116000
    num_agent_steps_trained: 116000
    num_steps_sampled: 116000
    num_steps_trained: 116000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,29,193.661,116000,484.09,500,105,484.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,29,193.661,116000,484.09,500,105,484.09


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 120000
  custom_metrics: {}
  date: 2022-01-03_13-42-36
  done: false
  episode_len_mean: 486.86
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 486.86
  episode_reward_min: 105.0
  episodes_this_iter: 8
  episodes_total: 559
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.5258514881134033
          entropy_coeff: 0.0
          kl: 0.004464397672563791
          model: {}
          policy_loss: 0.0003494231204967946
          total_loss: 510.3366394042969
          vf_explained_var: -0.06618722528219223
          vf_loss: 510.3363037109375
    num_agent_steps_sampled: 120000
    num_agent_steps_trained: 120000
    num_steps_sampled: 120000
    num_steps_trained: 120000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,30,200.442,120000,486.86,500,105,486.86


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 124000
  custom_metrics: {}
  date: 2022-01-03_13-42-43
  done: false
  episode_len_mean: 486.86
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 486.86
  episode_reward_min: 105.0
  episodes_this_iter: 8
  episodes_total: 567
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0002929687616415322
          cur_lr: 4.999999873689376e-05
          entropy: 0.5264471173286438
          entropy_coeff: 0.0
          kl: 0.003849866334348917
          model: {}
          policy_loss: 0.0006517216679640114
          total_loss: 545.7387084960938
          vf_explained_var: -0.01630960777401924
          vf_loss: 545.738037109375
    num_agent_steps_sampled: 124000
    num_agent_steps_trained: 124000
    num_steps_sampled: 124000
    num_steps_trained: 124000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,31,207.156,124000,486.86,500,105,486.86


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 128000
  custom_metrics: {}
  date: 2022-01-03_13-42-49
  done: false
  episode_len_mean: 486.86
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 486.86
  episode_reward_min: 105.0
  episodes_this_iter: 8
  episodes_total: 575
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0001464843808207661
          cur_lr: 4.999999873689376e-05
          entropy: 0.5416557192802429
          entropy_coeff: 0.0
          kl: 0.0035247569903731346
          model: {}
          policy_loss: -0.00046945587382651865
          total_loss: 545.5198974609375
          vf_explained_var: -0.0627073347568512
          vf_loss: 545.5203857421875
    num_agent_steps_sampled: 128000
    num_agent_steps_trained: 128000
    num_steps_sampled: 128000
    num_steps_trained: 128000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,32,213.948,128000,486.86,500,105,486.86


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 132000
  custom_metrics: {}
  date: 2022-01-03_13-42-56
  done: false
  episode_len_mean: 489.33
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 489.33
  episode_reward_min: 105.0
  episodes_this_iter: 8
  episodes_total: 583
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 7.324219041038305e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.50007563829422
          entropy_coeff: 0.0
          kl: 0.005137373693287373
          model: {}
          policy_loss: 4.097018245374784e-05
          total_loss: 515.6433715820312
          vf_explained_var: -0.06309071183204651
          vf_loss: 515.643310546875
    num_agent_steps_sampled: 132000
    num_agent_steps_trained: 132000
    num_steps_sampled: 132000
    num_steps_trained: 132000
    num_ste

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,33,220.656,132000,489.33,500,105,489.33


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 136000
  custom_metrics: {}
  date: 2022-01-03_13-43-03
  done: false
  episode_len_mean: 494.6
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 494.6
  episode_reward_min: 290.0
  episodes_this_iter: 8
  episodes_total: 591
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 7.324219041038305e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.49026063084602356
          entropy_coeff: 0.0
          kl: 0.0048093427903950214
          model: {}
          policy_loss: -0.001734600169584155
          total_loss: 528.8831176757812
          vf_explained_var: -0.15633000433444977
          vf_loss: 528.8848876953125
    num_agent_steps_sampled: 136000
    num_agent_steps_trained: 136000
    num_steps_sampled: 136000
    num_steps_trained: 136000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,34,227.128,136000,494.6,500,290,494.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,34,227.128,136000,494.6,500,290,494.6


Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 140000
  custom_metrics: {}
  date: 2022-01-03_13-43-09
  done: false
  episode_len_mean: 495.75
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 495.75
  episode_reward_min: 290.0
  episodes_this_iter: 8
  episodes_total: 599
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.662109520519152e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.48286598920822144
          entropy_coeff: 0.0
          kl: 0.0036757809575647116
          model: {}
          policy_loss: 3.896733687724918e-05
          total_loss: 520.688720703125
          vf_explained_var: -0.17857114970684052
          vf_loss: 520.688720703125
    num_agent_steps_sampled: 140000
    num_agent_steps_trained: 140000
    num_steps_sampled: 140000
    num_steps_trained: 140000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,35,233.839,140000,495.75,500,290,495.75




Result for PPO_CartPole-v1_24a1e_00000:
  agent_timesteps_total: 144000
  custom_metrics: {}
  date: 2022-01-03_13-43-16
  done: false
  episode_len_mean: 497.59
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.59
  episode_reward_min: 290.0
  episodes_this_iter: 8
  episodes_total: 607
  experiment_id: a4b4f475ff524abc86d9f584e8fcba05
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.831054760259576e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.5355268120765686
          entropy_coeff: 0.0
          kl: 0.003950898069888353
          model: {}
          policy_loss: -0.0031078020110726357
          total_loss: 454.7011413574219
          vf_explained_var: 0.01892552711069584
          vf_loss: 454.7042236328125
    num_agent_steps_sampled: 144000
    num_agent_steps_trained: 144000
    num_steps_sampled: 144000
    num_steps_trained: 144000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_24a1e_00000,RUNNING,192.168.0.90:11129,36,240.56,144000,497.59,500,290,497.59


[2m[36m(PPO pid=11129)[0m 2022-01-03 13:43:16,714	ERROR worker.py:431 -- SystemExit was raised from the worker
[2m[36m(PPO pid=11129)[0m Traceback (most recent call last):
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 759, in ray._raylet.task_execution_handler
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 580, in ray._raylet.execute_task
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 618, in ray._raylet.execute_task
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 625, in ray._raylet.execute_task
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 629, in ray._raylet.execute_task
[2m[36m(PPO pid=11129)[0m   File "python/ray/_raylet.pyx", line 578, in ray._raylet.execute_task.function_executor
[2m[36m(PPO pid=11129)[0m   File "/home/dibya/miniconda3/envs/deep_rl_get_started_fast_python3.9/lib/python3.9/site-packages/ray/_private/function_manager.py", line 609, in actor_method_execu

<ray.tune.analysis.experiment_analysis.ExperimentAnalysis at 0x7fb371024640>

### Configuration

These configurations are applied in sequence

1. [Common config](https://docs.ray.io/en/master/rllib-training.html#common-parameters)
2. [Algorithm specific config (overrides common config)](https://docs.ray.io/en/master/rllib-algorithms.html#ppo)
3. User defined config

### Anatomy of an experiment

<img src="images/ex/2.png" width="750"></img>

In [None]:
tune.run("PPO",
         config={"env": "CartPole-v1",
                 "evaluation_interval": 2,    # num of training iter between evaluations
                 "evaluation_num_episodes": 20,
                 "num_gpus": 0
                 }
         )

[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO trainer.py:722 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also want to then set `eager_tracing=True` in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO trainer.py:743 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117




Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117




Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 4000
  custom_metrics: {}
  date: 2022-01-03_13-43-29
  done: false
  episode_len_mean: 22.429378531073446
  episode_media: {}
  episode_reward_max: 106.0
  episode_reward_mean: 22.429378531073446
  episode_reward_min: 8.0
  episodes_this_iter: 177
  episodes_total: 177
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.6658107042312622
          entropy_coeff: 0.0
          kl: 0.028826190158724785
          model: {}
          policy_loss: -0.04175363481044769
          total_loss: 234.28335571289062
          vf_explained_var: 0.026140272617340088
          vf_loss: 234.3193359375
    num_agent_steps_sampled: 4000
    num_agent_steps_trained: 4000
    num_steps_sampled: 4000
    num_steps_trained: 4000

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,1,6.7837,4000,22.4294,106,8,22.4294


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-01-03_13-43-38
  done: false
  episode_len_mean: 43.34
  episode_media: {}
  episode_reward_max: 179.0
  episode_reward_mean: 43.34
  episode_reward_min: 9.0
  episodes_this_iter: 83
  episodes_total: 260
  evaluation:
    custom_metrics: {}
    episode_len_mean: 74.05
    episode_media: {}
    episode_reward_max: 234.0
    episode_reward_mean: 74.05
    episode_reward_min: 11.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 32
      - 118
      - 157
      - 50
      - 38
      - 46
      - 70
      - 18
      - 62
      - 11
      - 81
      - 18
      - 234
      - 24
      - 97
      - 171
      - 21
      - 34
      - 106
      - 93
      episode_reward:
      - 32.0
      - 118.0
      - 157.0
      - 50.0
      - 38.0
      - 46.0
      - 70.0
      - 18.0
      - 62.0
      - 11.0
      - 81.0
      - 18.0
      - 234.0
      - 24.0
      - 97.0
   

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,2,16.07,8000,43.34,179,9,43.34


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,2,16.07,8000,43.34,179,9,43.34


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 12000
  custom_metrics: {}
  date: 2022-01-03_13-43-45
  done: false
  episode_len_mean: 66.58
  episode_media: {}
  episode_reward_max: 206.0
  episode_reward_mean: 66.58
  episode_reward_min: 12.0
  episodes_this_iter: 46
  episodes_total: 306
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5684974789619446
          entropy_coeff: 0.0
          kl: 0.010665263049304485
          model: {}
          policy_loss: -0.019676998257637024
          total_loss: 515.532958984375
          vf_explained_var: 0.09138352423906326
          vf_loss: 515.5494384765625
    num_agent_steps_sampled: 12000
    num_agent_steps_trained: 12000
    num_steps_sampled: 12000
    num_steps_trained: 12000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 16000
  custom_metrics: {}
  date: 2022-01-03_13-44-02
  done: false
  episode_len_mean: 96.8
  episode_media: {}
  episode_reward_max: 360.0
  episode_reward_mean: 96.8
  episode_reward_min: 12.0
  episodes_this_iter: 22
  episodes_total: 328
  evaluation:
    custom_metrics: {}
    episode_len_mean: 293.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 293.0
    episode_reward_min: 91.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 91
      - 121
      - 221
      - 203
      - 271
      - 285
      - 325
      - 343
      - 275
      - 403
      - 385
      - 500
      - 268
      - 471
      - 299
      - 301
      - 307
      - 285
      - 238
      - 268
      episode_reward:
      - 91.0
      - 121.0
      - 221.0
      - 203.0
      - 271.0
      - 285.0
      - 325.0
      - 343.0
      - 275.0
      - 403.0
      - 385.0
      - 500.0
      - 268.0
      

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,4,39.7013,16000,96.8,360,12,96.8


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 20000
  custom_metrics: {}
  date: 2022-01-03_13-44-09
  done: false
  episode_len_mean: 131.82
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 131.82
  episode_reward_min: 14.0
  episodes_this_iter: 12
  episodes_total: 340
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5397278070449829
          entropy_coeff: 0.0
          kl: 0.006242944393306971
          model: {}
          policy_loss: -0.01558975875377655
          total_loss: 694.2406616210938
          vf_explained_var: 0.2767491638660431
          vf_loss: 694.2542724609375
    num_agent_steps_sampled: 20000
    num_agent_steps_trained: 20000
    num_steps_sampled: 20000
    num_steps_trained: 20000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 24000
  custom_metrics: {}
  date: 2022-01-03_13-44-28
  done: false
  episode_len_mean: 163.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 163.4
  episode_reward_min: 14.0
  episodes_this_iter: 10
  episodes_total: 350
  evaluation:
    custom_metrics: {}
    episode_len_mean: 365.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 365.55
    episode_reward_min: 133.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 406
      - 500
      - 281
      - 404
      - 400
      - 279
      - 271
      - 458
      - 441
      - 304
      - 440
      - 397
      - 296
      - 500
      - 133
      - 262
      - 414
      - 473
      - 270
      - 382
      episode_reward:
      - 406.0
      - 500.0
      - 281.0
      - 404.0
      - 400.0
      - 279.0
      - 271.0
      - 458.0
      - 441.0
      - 304.0
      - 440.0
      - 397.0
      - 296.0

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,6,65.3898,24000,163.4,500,14,163.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 28000
  custom_metrics: {}
  date: 2022-01-03_13-44-34
  done: false
  episode_len_mean: 193.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 193.4
  episode_reward_min: 14.0
  episodes_this_iter: 10
  episodes_total: 360
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.15000000596046448
          cur_lr: 4.999999873689376e-05
          entropy: 0.5351145267486572
          entropy_coeff: 0.0
          kl: 0.00829379539936781
          model: {}
          policy_loss: -0.011822070926427841
          total_loss: 362.6712341308594
          vf_explained_var: 0.2681409418582916
          vf_loss: 362.6817932128906
    num_agent_steps_sampled: 28000
    num_agent_steps_trained: 28000
    num_steps_sampled: 28000
    num_steps_trained: 28000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 32000
  custom_metrics: {}
  date: 2022-01-03_13-44-58
  done: false
  episode_len_mean: 225.28
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 225.28
  episode_reward_min: 14.0
  episodes_this_iter: 9
  episodes_total: 369
  evaluation:
    custom_metrics: {}
    episode_len_mean: 468.1
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 468.1
    episode_reward_min: 316.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 431
      - 443
      - 316
      - 343
      - 474
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 499
      - 500
      - 500
      - 356
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 431.0
      - 443.0
      - 316.0
      - 343.0
      - 474.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,8,95.2527,32000,225.28,500,14,225.28


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 36000
  custom_metrics: {}
  date: 2022-01-03_13-45-04
  done: false
  episode_len_mean: 260.1
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 260.1
  episode_reward_min: 19.0
  episodes_this_iter: 9
  episodes_total: 378
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.48875856399536133
          entropy_coeff: 0.0
          kl: 0.005784942302852869
          model: {}
          policy_loss: -0.004727379884570837
          total_loss: 388.62091064453125
          vf_explained_var: 0.15254899859428406
          vf_loss: 388.6252136230469
    num_agent_steps_sampled: 36000
    num_agent_steps_trained: 36000
    num_steps_sampled: 36000
    num_steps_trained: 36000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 40000
  custom_metrics: {}
  date: 2022-01-03_13-45-27
  done: false
  episode_len_mean: 294.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 294.48
  episode_reward_min: 19.0
  episodes_this_iter: 11
  episodes_total: 389
  evaluation:
    custom_metrics: {}
    episode_len_mean: 459.65
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 459.65
    episode_reward_min: 381.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 421
      - 500
      - 500
      - 500
      - 471
      - 500
      - 500
      - 456
      - 469
      - 458
      - 406
      - 396
      - 394
      - 449
      - 500
      - 381
      - 392
      - 500
      - 500
      - 500
      episode_reward:
      - 421.0
      - 500.0
      - 500.0
      - 500.0
      - 471.0
      - 500.0
      - 500.0
      - 456.0
      - 469.0
      - 458.0
      - 406.0
      - 396.0
      - 394

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,10,124.458,40000,294.48,500,19,294.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,10,124.458,40000,294.48,500,19,294.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 44000
  custom_metrics: {}
  date: 2022-01-03_13-45-34
  done: false
  episode_len_mean: 324.49
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 324.49
  episode_reward_min: 27.0
  episodes_this_iter: 8
  episodes_total: 397
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.5341718196868896
          entropy_coeff: 0.0
          kl: 0.008021753281354904
          model: {}
          policy_loss: -0.004115492105484009
          total_loss: 140.74203491210938
          vf_explained_var: 0.48715221881866455
          vf_loss: 140.74554443359375
    num_agent_steps_sampled: 44000
    num_agent_steps_trained: 44000
    num_steps_sampled: 44000
    num_steps_trained: 44000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 48000
  custom_metrics: {}
  date: 2022-01-03_13-45-57
  done: false
  episode_len_mean: 357.11
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 357.11
  episode_reward_min: 13.0
  episodes_this_iter: 9
  episodes_total: 406
  evaluation:
    custom_metrics: {}
    episode_len_mean: 492.4
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 492.4
    episode_reward_min: 380.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 380
      - 500
      - 500
      - 468
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,12,154.502,48000,357.11,500,13,357.11


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 52000
  custom_metrics: {}
  date: 2022-01-03_13-46-04
  done: false
  episode_len_mean: 381.1
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 381.1
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 414
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.03750000149011612
          cur_lr: 4.999999873689376e-05
          entropy: 0.5165870785713196
          entropy_coeff: 0.0
          kl: 0.005959213245660067
          model: {}
          policy_loss: -0.0020698525477200747
          total_loss: 180.06182861328125
          vf_explained_var: 0.5193233489990234
          vf_loss: 180.0636749267578
    num_agent_steps_sampled: 52000
    num_agent_steps_trained: 52000
    num_steps_sampled: 52000
    num_steps_trained: 52000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 56000
  custom_metrics: {}
  date: 2022-01-03_13-46-28
  done: false
  episode_len_mean: 406.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 406.44
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 422
  evaluation:
    custom_metrics: {}
    episode_len_mean: 495.85
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 495.85
    episode_reward_min: 417.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 417
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 417.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,14,185.714,56000,406.44,500,13,406.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 60000
  custom_metrics: {}
  date: 2022-01-03_13-46-35
  done: false
  episode_len_mean: 432.59
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 432.59
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 430
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.03750000149011612
          cur_lr: 4.999999873689376e-05
          entropy: 0.5273802876472473
          entropy_coeff: 0.0
          kl: 0.00733661325648427
          model: {}
          policy_loss: 0.0002818219072651118
          total_loss: 304.7541809082031
          vf_explained_var: 0.2888195216655731
          vf_loss: 304.7536315917969
    num_agent_steps_sampled: 60000
    num_agent_steps_trained: 60000
    num_steps_sampled: 60000
    num_steps_trained: 60000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 64000
  custom_metrics: {}
  date: 2022-01-03_13-46-56
  done: false
  episode_len_mean: 444.6
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 444.6
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 438
  evaluation:
    custom_metrics: {}
    episode_len_mean: 414.8
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 414.8
    episode_reward_min: 144.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 216
      - 417
      - 500
      - 500
      - 278
      - 500
      - 500
      - 386
      - 388
      - 379
      - 144
      - 367
      - 500
      - 500
      - 500
      - 500
      - 221
      - 500
      - 500
      - 500
      episode_reward:
      - 216.0
      - 417.0
      - 500.0
      - 500.0
      - 278.0
      - 500.0
      - 500.0
      - 386.0
      - 388.0
      - 379.0
      - 144.0
      - 367.0
      - 500.0
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,16,213.351,64000,444.6,500,13,444.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,16,213.351,64000,444.6,500,13,444.6


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 68000
  custom_metrics: {}
  date: 2022-01-03_13-47-03
  done: false
  episode_len_mean: 453.71
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 453.71
  episode_reward_min: 13.0
  episodes_this_iter: 9
  episodes_total: 447
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.4969511032104492
          entropy_coeff: 0.0
          kl: 0.005400669761002064
          model: {}
          policy_loss: -0.011936291120946407
          total_loss: 270.889404296875
          vf_explained_var: 0.40124747157096863
          vf_loss: 270.9012451171875
    num_agent_steps_sampled: 68000
    num_agent_steps_trained: 68000
    num_steps_sampled: 68000
    num_steps_trained: 68000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 72000
  custom_metrics: {}
  date: 2022-01-03_13-47-26
  done: false
  episode_len_mean: 462.61
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 462.61
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 455
  evaluation:
    custom_metrics: {}
    episode_len_mean: 493.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 493.9
    episode_reward_min: 378.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 378
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 378.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,18,243.262,72000,462.61,500,13,462.61


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 76000
  custom_metrics: {}
  date: 2022-01-03_13-47-32
  done: false
  episode_len_mean: 465.42
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 465.42
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 463
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.46219807863235474
          entropy_coeff: 0.0
          kl: 0.004400151781737804
          model: {}
          policy_loss: -0.002735947957262397
          total_loss: 140.81980895996094
          vf_explained_var: 0.56380695104599
          vf_loss: 140.82244873046875
    num_agent_steps_sampled: 76000
    num_agent_steps_trained: 76000
    num_steps_sampled: 76000
    num_steps_trained: 76000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 80000
  custom_metrics: {}
  date: 2022-01-03_13-47-57
  done: false
  episode_len_mean: 468.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 468.27
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 471
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,20,273.924,80000,468.27,500,13,468.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 84000
  custom_metrics: {}
  date: 2022-01-03_13-48-03
  done: false
  episode_len_mean: 473.35
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 473.35
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 479
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.004687500186264515
          cur_lr: 4.999999873689376e-05
          entropy: 0.4432362914085388
          entropy_coeff: 0.0
          kl: 0.00505047058686614
          model: {}
          policy_loss: 0.0021547339856624603
          total_loss: 284.6266784667969
          vf_explained_var: 0.5598533153533936
          vf_loss: 284.6245422363281
    num_agent_steps_sampled: 84000
    num_agent_steps_trained: 84000
    num_steps_sampled: 84000
    num_steps_trained: 84000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 88000
  custom_metrics: {}
  date: 2022-01-03_13-48-27
  done: false
  episode_len_mean: 481.06
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 481.06
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 487
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,22,303.856,88000,481.06,500,13,481.06


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 92000
  custom_metrics: {}
  date: 2022-01-03_13-48-33
  done: false
  episode_len_mean: 484.83
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 484.83
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 495
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0023437500931322575
          cur_lr: 4.999999873689376e-05
          entropy: 0.4058391749858856
          entropy_coeff: 0.0
          kl: 0.0034382410813122988
          model: {}
          policy_loss: -0.0004056684556417167
          total_loss: 167.3172149658203
          vf_explained_var: 0.7687280774116516
          vf_loss: 167.31761169433594
    num_agent_steps_sampled: 92000
    num_agent_steps_trained: 92000
    num_steps_sampled: 92000
    num_steps_trained: 92000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 96000
  custom_metrics: {}
  date: 2022-01-03_13-48-58
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 503
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,24,334.689,96000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 100000
  custom_metrics: {}
  date: 2022-01-03_13-49-04
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 511
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.3711017966270447
          entropy_coeff: 0.0
          kl: 0.00191035820171237
          model: {}
          policy_loss: 0.0005888226442039013
          total_loss: 270.68115234375
          vf_explained_var: 0.4633130133152008
          vf_loss: 270.6805419921875
    num_agent_steps_sampled: 100000
    num_agent_steps_trained: 100000
    num_steps_sampled: 100000
    num_steps_trained: 100000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 104000
  custom_metrics: {}
  date: 2022-01-03_13-49-27
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 519
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,26,364.052,104000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 108000
  custom_metrics: {}
  date: 2022-01-03_13-49-34
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 527
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0001464843808207661
          cur_lr: 4.999999873689376e-05
          entropy: 0.36017468571662903
          entropy_coeff: 0.0
          kl: 0.0034243902191519737
          model: {}
          policy_loss: 0.0010374116245657206
          total_loss: 281.56884765625
          vf_explained_var: 0.37897738814353943
          vf_loss: 281.5677795410156
    num_agent_steps_sampled: 108000
    num_agent_steps_trained: 108000
    num_steps_sampled: 108000
    num_steps_trained: 108000
    num_ste

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 112000
  custom_metrics: {}
  date: 2022-01-03_13-49-57
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 535
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,28,394.092,112000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,28,394.092,112000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 116000
  custom_metrics: {}
  date: 2022-01-03_13-50-04
  done: false
  episode_len_mean: 492.87
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.87
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 543
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.662109520519152e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.3628605008125305
          entropy_coeff: 0.0
          kl: 0.0037267031148076057
          model: {}
          policy_loss: -0.0016175990458577871
          total_loss: 299.8031005859375
          vf_explained_var: 0.33579498529434204
          vf_loss: 299.80462646484375
    num_agent_steps_sampled: 116000
    num_agent_steps_trained: 116000
    num_steps_sampled: 116000
    num_steps_trained: 116000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 120000
  custom_metrics: {}
  date: 2022-01-03_13-50-28
  done: false
  episode_len_mean: 494.29
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 494.29
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 551
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,30,424.419,120000,494.29,500,136,494.29


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 124000
  custom_metrics: {}
  date: 2022-01-03_13-50-34
  done: false
  episode_len_mean: 499.09
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.09
  episode_reward_min: 409.0
  episodes_this_iter: 8
  episodes_total: 559
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 9.15527380129788e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.3726036250591278
          entropy_coeff: 0.0
          kl: 0.0030164276249706745
          model: {}
          policy_loss: 0.0016188398003578186
          total_loss: 363.93804931640625
          vf_explained_var: 0.2855881154537201
          vf_loss: 363.93646240234375
    num_agent_steps_sampled: 124000
    num_agent_steps_trained: 124000
    num_steps_sampled: 124000
    num_steps_trained: 124000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 128000
  custom_metrics: {}
  date: 2022-01-03_13-50-58
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 567
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.6
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.6
    episode_reward_min: 452.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 452
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,32,454.403,128000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 132000
  custom_metrics: {}
  date: 2022-01-03_13-51-04
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 575
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.28881845032447e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.3741539716720581
          entropy_coeff: 0.0
          kl: 0.0032545658759772778
          model: {}
          policy_loss: -0.0036664640065282583
          total_loss: 258.59686279296875
          vf_explained_var: 0.35744550824165344
          vf_loss: 258.6005554199219
    num_agent_steps_sampled: 132000
    num_agent_steps_trained: 132000
    num_steps_sampled: 132000
    num_steps_trained: 132000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 136000
  custom_metrics: {}
  date: 2022-01-03_13-51-28
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 583
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,34,484.884,136000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 140000
  custom_metrics: {}
  date: 2022-01-03_13-51-35
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 591
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.722046125811175e-07
          cur_lr: 4.999999873689376e-05
          entropy: 0.3170822858810425
          entropy_coeff: 0.0
          kl: 0.0040565794333815575
          model: {}
          policy_loss: -0.0009620698401704431
          total_loss: 325.4109802246094
          vf_explained_var: 0.35483792424201965
          vf_loss: 325.4119567871094
    num_agent_steps_sampled: 140000
    num_agent_steps_trained: 140000
    num_steps_sampled: 140000
    num_steps_trained: 140000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 144000
  custom_metrics: {}
  date: 2022-01-03_13-51-59
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 599
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,36,515.601,144000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 148000
  custom_metrics: {}
  date: 2022-01-03_13-52-06
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 607
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.4305115314527939e-07
          cur_lr: 4.999999873689376e-05
          entropy: 0.3192770779132843
          entropy_coeff: 0.0
          kl: 0.0038329511880874634
          model: {}
          policy_loss: -0.0026558642275631428
          total_loss: 372.1199951171875
          vf_explained_var: 0.2135278582572937
          vf_loss: 372.1226501464844
    num_agent_steps_sampled: 148000
    num_agent_steps_trained: 148000
    num_steps_sampled: 148000
    num_steps_trained: 148000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 152000
  custom_metrics: {}
  date: 2022-01-03_13-52-30
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 615
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,38,546.256,152000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 156000
  custom_metrics: {}
  date: 2022-01-03_13-52-37
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 623
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.5762788286319847e-08
          cur_lr: 4.999999873689376e-05
          entropy: 0.33596229553222656
          entropy_coeff: 0.0
          kl: 0.003063222859054804
          model: {}
          policy_loss: -0.0007319195428863168
          total_loss: 502.2861328125
          vf_explained_var: -0.01001955009996891
          vf_loss: 502.2868347167969
    num_agent_steps_sampled: 156000
    num_agent_steps_trained: 156000
    num_steps_sampled: 156000
    num_steps_trained: 156000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 160000
  custom_metrics: {}
  date: 2022-01-03_13-53-00
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 631
  evaluation:
    custom_metrics: {}
    episode_len_mean: 490.95
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 490.95
    episode_reward_min: 319.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 319
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 319.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,40,576.486,160000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 164000
  custom_metrics: {}
  date: 2022-01-03_13-53-07
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 639
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.940697071579962e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.35207512974739075
          entropy_coeff: 0.0
          kl: 0.0028695587534457445
          model: {}
          policy_loss: -0.005121050402522087
          total_loss: 273.1157531738281
          vf_explained_var: 0.08673982322216034
          vf_loss: 273.1208801269531
    num_agent_steps_sampled: 164000
    num_agent_steps_trained: 164000
    num_steps_sampled: 164000
    num_steps_trained: 164000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 168000
  custom_metrics: {}
  date: 2022-01-03_13-53-30
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 647
  evaluation:
    custom_metrics: {}
    episode_len_mean: 471.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 471.9
    episode_reward_min: 242.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 254
      - 500
      - 500
      - 500
      - 500
      - 242
      - 473
      - 500
      - 500
      - 469
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 254.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 242.0
      - 473.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,42,606.409,168000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,42,606.409,168000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 172000
  custom_metrics: {}
  date: 2022-01-03_13-53-37
  done: false
  episode_len_mean: 498.92
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.92
  episode_reward_min: 392.0
  episodes_this_iter: 9
  episodes_total: 656
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.2351742678949904e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.3735828697681427
          entropy_coeff: 0.0
          kl: 0.004065442830324173
          model: {}
          policy_loss: -0.0010430102702230215
          total_loss: 464.3570556640625
          vf_explained_var: 0.017288917675614357
          vf_loss: 464.3581237792969
    num_agent_steps_sampled: 172000
    num_agent_steps_trained: 172000
    num_steps_sampled: 172000
    num_steps_trained: 172000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 176000
  custom_metrics: {}
  date: 2022-01-03_13-54-01
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 664
  evaluation:
    custom_metrics: {}
    episode_len_mean: 494.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 494.9
    episode_reward_min: 399.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 399
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 499
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 399.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,44,636.882,176000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 180000
  custom_metrics: {}
  date: 2022-01-03_13-54-07
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 672
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.587935669737476e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.3455185294151306
          entropy_coeff: 0.0
          kl: 0.0016453824937343597
          model: {}
          policy_loss: -0.001856086659245193
          total_loss: 263.1059265136719
          vf_explained_var: 0.15679922699928284
          vf_loss: 263.1077880859375
    num_agent_steps_sampled: 180000
    num_agent_steps_trained: 180000
    num_steps_sampled: 180000
    num_steps_trained: 180000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 184000
  custom_metrics: {}
  date: 2022-01-03_13-54-32
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 680
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,46,667.508,184000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 188000
  custom_metrics: {}
  date: 2022-01-03_13-54-38
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 688
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.396983917434369e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.34701216220855713
          entropy_coeff: 0.0
          kl: 0.0024114095140248537
          model: {}
          policy_loss: 0.00015137766604311764
          total_loss: 370.8667297363281
          vf_explained_var: -0.011806507594883442
          vf_loss: 370.8665771484375
    num_agent_steps_sampled: 188000
    num_agent_steps_trained: 188000
    num_steps_sampled: 188000
    num_steps_trained: 188000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 192000
  custom_metrics: {}
  date: 2022-01-03_13-55-02
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 696
  evaluation:
    custom_metrics: {}
    episode_len_mean: 496.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 496.55
    episode_reward_min: 431.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 431
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 431.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,48,698.238,192000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 196000
  custom_metrics: {}
  date: 2022-01-03_13-55-09
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 704
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.4924597935859225e-11
          cur_lr: 4.999999873689376e-05
          entropy: 0.28589412569999695
          entropy_coeff: 0.0
          kl: 0.0026714280247688293
          model: {}
          policy_loss: -0.001123060705140233
          total_loss: 340.09228515625
          vf_explained_var: 0.23291447758674622
          vf_loss: 340.0933532714844
    num_agent_steps_sampled: 196000
    num_agent_steps_trained: 196000
    num_steps_sampled: 196000
    num_steps_trained: 196000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 200000
  custom_metrics: {}
  date: 2022-01-03_13-55-33
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 712
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,50,728.358,200000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 204000
  custom_metrics: {}
  date: 2022-01-03_13-55-39
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 720
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.731149483964806e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.2651657164096832
          entropy_coeff: 0.0
          kl: 0.003235872834920883
          model: {}
          policy_loss: -0.00046385437599383295
          total_loss: 410.3289794921875
          vf_explained_var: 0.14666806161403656
          vf_loss: 410.32940673828125
    num_agent_steps_sampled: 204000
    num_agent_steps_trained: 204000
    num_steps_sampled: 204000
    num_steps_trained: 204000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 208000
  custom_metrics: {}
  date: 2022-01-03_13-56-04
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 728
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,52,759.192,208000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 212000
  custom_metrics: {}
  date: 2022-01-03_13-56-10
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 736
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.1827873709912016e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.28173762559890747
          entropy_coeff: 0.0
          kl: 0.0032018960919231176
          model: {}
          policy_loss: -0.0011629423825070262
          total_loss: 304.3319091796875
          vf_explained_var: 0.37079381942749023
          vf_loss: 304.33306884765625
    num_agent_steps_sampled: 212000
    num_agent_steps_trained: 212000
    num_steps_sampled: 212000
    num_steps_trained: 212000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 216000
  custom_metrics: {}
  date: 2022-01-03_13-56-34
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 744
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,54,789.91,216000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 220000
  custom_metrics: {}
  date: 2022-01-03_13-56-41
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 752
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.456968427478004e-13
          cur_lr: 4.999999873689376e-05
          entropy: 0.3077785074710846
          entropy_coeff: 0.0
          kl: 0.001683449256233871
          model: {}
          policy_loss: -0.00023939917446114123
          total_loss: 297.7902526855469
          vf_explained_var: 0.4101290702819824
          vf_loss: 297.79052734375
    num_agent_steps_sampled: 220000
    num_agent_steps_trained: 220000
    num_steps_sampled: 220000
    num_steps_trained: 220000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 224000
  custom_metrics: {}
  date: 2022-01-03_13-57-04
  done: false
  episode_len_mean: 497.99
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.99
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 760
  evaluation:
    custom_metrics: {}
    episode_len_mean: 476.3
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 476.3
    episode_reward_min: 352.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 352
      - 500
      - 500
      - 500
      - 359
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 353
      - 500
      - 500
      - 500
      - 500
      - 500
      - 462
      episode_reward:
      - 500.0
      - 352.0
      - 500.0
      - 500.0
      - 500.0
      - 359.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,56,819.885,224000,497.99,500,376,497.99


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 228000
  custom_metrics: {}
  date: 2022-01-03_13-57-11
  done: false
  episode_len_mean: 498.75
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.75
  episode_reward_min: 459.0
  episodes_this_iter: 8
  episodes_total: 768
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.728484213739002e-13
          cur_lr: 4.999999873689376e-05
          entropy: 0.27726444602012634
          entropy_coeff: 0.0
          kl: 0.002490389160811901
          model: {}
          policy_loss: -0.0012705748667940497
          total_loss: 212.70944213867188
          vf_explained_var: 0.4870787262916565
          vf_loss: 212.71072387695312
    num_agent_steps_sampled: 228000
    num_agent_steps_trained: 228000
    num_steps_sampled: 228000
    num_steps_trained: 228000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 232000
  custom_metrics: {}
  date: 2022-01-03_13-57-35
  done: false
  episode_len_mean: 497.89
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.89
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 776
  evaluation:
    custom_metrics: {}
    episode_len_mean: 496.05
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 496.05
    episode_reward_min: 447.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 447
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 484
      - 490
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 447.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,58,849.877,232000,497.89,500,446,497.89


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 236000
  custom_metrics: {}
  date: 2022-01-03_13-57-41
  done: false
  episode_len_mean: 497.62
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.62
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 784
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 6.821210534347505e-14
          cur_lr: 4.999999873689376e-05
          entropy: 0.2768877446651459
          entropy_coeff: 0.0
          kl: 0.002392576541751623
          model: {}
          policy_loss: -0.0008955386583693326
          total_loss: 288.79681396484375
          vf_explained_var: 0.2627524137496948
          vf_loss: 288.7977294921875
    num_agent_steps_sampled: 236000
    num_agent_steps_trained: 236000
    num_steps_sampled: 236000
    num_steps_trained: 236000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 240000
  custom_metrics: {}
  date: 2022-01-03_13-58-05
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 792
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,60,880.211,240000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,60,880.211,240000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 244000
  custom_metrics: {}
  date: 2022-01-03_13-58-11
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 800
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.4106052671737525e-14
          cur_lr: 4.999999873689376e-05
          entropy: 0.27059662342071533
          entropy_coeff: 0.0
          kl: 0.0037121737841516733
          model: {}
          policy_loss: -0.0013116763439029455
          total_loss: 359.6751708984375
          vf_explained_var: 0.24808254837989807
          vf_loss: 359.6764831542969
    num_agent_steps_sampled: 244000
    num_agent_steps_trained: 244000
    num_steps_sampled: 244000
    num_steps_trained: 244000
    n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 248000
  custom_metrics: {}
  date: 2022-01-03_13-58-36
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 808
  evaluation:
    custom_metrics: {}
    episode_len_mean: 498.6
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 498.6
    episode_reward_min: 472.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 472
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 472.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,62,911.055,248000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,62,911.055,248000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 252000
  custom_metrics: {}
  date: 2022-01-03_13-58-42
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 816
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.526513167934381e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.29696959257125854
          entropy_coeff: 0.0
          kl: 0.003980488050729036
          model: {}
          policy_loss: -0.002883958863094449
          total_loss: 318.4349060058594
          vf_explained_var: 0.21819797158241272
          vf_loss: 318.43780517578125
    num_agent_steps_sampled: 252000
    num_agent_steps_trained: 252000
    num_steps_sampled: 252000
    num_steps_trained: 252000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 256000
  custom_metrics: {}
  date: 2022-01-03_13-59-06
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 824
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,64,941.592,256000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,64,941.592,256000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 260000
  custom_metrics: {}
  date: 2022-01-03_13-59-13
  done: false
  episode_len_mean: 497.98
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.98
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 832
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.1316282919835953e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.3001687526702881
          entropy_coeff: 0.0
          kl: 0.005532457958906889
          model: {}
          policy_loss: -0.0025179916992783546
          total_loss: 415.4772644042969
          vf_explained_var: 0.09152212738990784
          vf_loss: 415.4797668457031
    num_agent_steps_sampled: 260000
    num_agent_steps_trained: 260000
    num_steps_sampled: 260000
    num_steps_trained: 260000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 264000
  custom_metrics: {}
  date: 2022-01-03_13-59-37
  done: false
  episode_len_mean: 497.98
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.98
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 840
  evaluation:
    custom_metrics: {}
    episode_len_mean: 494.65
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 494.65
    episode_reward_min: 399.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 494
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 399
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 494.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,66,972.208,264000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,66,972.208,264000,497.98,500,446,497.98


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 268000
  custom_metrics: {}
  date: 2022-01-03_13-59-44
  done: false
  episode_len_mean: 497.75
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.75
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 848
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.0658141459917976e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.3097001016139984
          entropy_coeff: 0.0
          kl: 0.002771975938230753
          model: {}
          policy_loss: -0.003007206367328763
          total_loss: 239.74151611328125
          vf_explained_var: 0.2675352692604065
          vf_loss: 239.74452209472656
    num_agent_steps_sampled: 268000
    num_agent_steps_trained: 268000
    num_steps_sampled: 268000
    num_steps_trained: 268000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 272000
  custom_metrics: {}
  date: 2022-01-03_14-00-10
  done: false
  episode_len_mean: 498.16
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.16
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 856
  evaluation:
    custom_metrics: {}
    episode_len_mean: 499.1
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 499.1
    episode_reward_min: 484.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 484
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 498
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 484.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,68,1005.16,272000,498.16,500,446,498.16


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 276000
  custom_metrics: {}
  date: 2022-01-03_14-00-17
  done: false
  episode_len_mean: 498.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.45
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 864
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.664535364979494e-16
          cur_lr: 4.999999873689376e-05
          entropy: 0.30517786741256714
          entropy_coeff: 0.0
          kl: 0.003519083373248577
          model: {}
          policy_loss: -0.003675314364954829
          total_loss: 180.7675018310547
          vf_explained_var: 0.4545365571975708
          vf_loss: 180.77117919921875
    num_agent_steps_sampled: 276000
    num_agent_steps_trained: 276000
    num_steps_sampled: 276000
    num_steps_trained: 276000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 280000
  custom_metrics: {}
  date: 2022-01-03_14-00-42
  done: false
  episode_len_mean: 499.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.45
  episode_reward_min: 473.0
  episodes_this_iter: 8
  episodes_total: 872
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,70,1036.44,280000,499.45,500,473,499.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 284000
  custom_metrics: {}
  date: 2022-01-03_14-00-48
  done: false
  episode_len_mean: 499.5
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.5
  episode_reward_min: 473.0
  episodes_this_iter: 8
  episodes_total: 880
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 6.661338412448735e-17
          cur_lr: 4.999999873689376e-05
          entropy: 0.33786264061927795
          entropy_coeff: 0.0
          kl: 0.0016626294236630201
          model: {}
          policy_loss: -0.0017250650562345982
          total_loss: 257.88360595703125
          vf_explained_var: 0.2730449438095093
          vf_loss: 257.88531494140625
    num_agent_steps_sampled: 284000
    num_agent_steps_trained: 284000
    num_steps_sampled: 284000
    num_steps_trained: 284000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 288000
  custom_metrics: {}
  date: 2022-01-03_14-01-13
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 888
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.55
    episode_reward_min: 451.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 451
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,72,1067.28,288000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 292000
  custom_metrics: {}
  date: 2022-01-03_14-01-20
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 896
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.6653346031121838e-17
          cur_lr: 4.999999873689376e-05
          entropy: 0.3299762010574341
          entropy_coeff: 0.0
          kl: 0.003860413795337081
          model: {}
          policy_loss: -0.0030521831940859556
          total_loss: 199.1850128173828
          vf_explained_var: 0.3740776479244232
          vf_loss: 199.1880645751953
    num_agent_steps_sampled: 292000
    num_agent_steps_trained: 292000
    num_steps_sampled: 292000
    num_steps_trained: 292000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 296000
  custom_metrics: {}
  date: 2022-01-03_14-01-46
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 904
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,74,1100.56,296000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 300000
  custom_metrics: {}
  date: 2022-01-03_14-01-53
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 912
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.326673015560919e-18
          cur_lr: 4.999999873689376e-05
          entropy: 0.30977368354797363
          entropy_coeff: 0.0
          kl: 0.003388606710359454
          model: {}
          policy_loss: -0.001255176728591323
          total_loss: 306.4930725097656
          vf_explained_var: 0.17501209676265717
          vf_loss: 306.4943542480469
    num_agent_steps_sampled: 300000
    num_agent_steps_trained: 300000
    num_steps_sampled: 300000
    num_steps_trained: 300000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 304000
  custom_metrics: {}
  date: 2022-01-03_14-02-17
  done: false
  episode_len_mean: 499.08
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.08
  episode_reward_min: 456.0
  episodes_this_iter: 9
  episodes_total: 921
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.8
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.8
    episode_reward_min: 479.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 497
      - 500
      - 500
      - 500
      - 480
      - 479
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 497.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,76,1131.99,304000,499.08,500,456,499.08


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 308000
  custom_metrics: {}
  date: 2022-01-03_14-02-24
  done: false
  episode_len_mean: 499.08
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.08
  episode_reward_min: 456.0
  episodes_this_iter: 8
  episodes_total: 929
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.0816682538902298e-18
          cur_lr: 4.999999873689376e-05
          entropy: 0.2569127678871155
          entropy_coeff: 0.0
          kl: 0.003834979608654976
          model: {}
          policy_loss: -0.0022537545301020145
          total_loss: 184.89852905273438
          vf_explained_var: 0.5317127108573914
          vf_loss: 184.90077209472656
    num_agent_steps_sampled: 308000
    num_agent_steps_trained: 308000
    num_steps_sampled: 308000
    num_steps_trained: 308000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 312000
  custom_metrics: {}
  date: 2022-01-03_14-02-48
  done: false
  episode_len_mean: 494.06
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 494.06
  episode_reward_min: 359.0
  episodes_this_iter: 9
  episodes_total: 938
  evaluation:
    custom_metrics: {}
    episode_len_mean: 484.45
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 484.45
    episode_reward_min: 407.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 477
      - 491
      - 500
      - 500
      - 468
      - 440
      - 500
      - 500
      - 421
      - 500
      - 500
      - 407
      - 485
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 477.0
      - 491.0
      - 500.0
      - 500.0
      - 468.0
      - 440.0
      - 500.0
      - 500.0
      - 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,78,1162.5,312000,494.06,500,359,494.06


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 316000
  custom_metrics: {}
  date: 2022-01-03_14-02-55
  done: false
  episode_len_mean: 492.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.45
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 946
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.204170634725574e-19
          cur_lr: 4.999999873689376e-05
          entropy: 0.2483358383178711
          entropy_coeff: 0.0
          kl: 0.003815557574853301
          model: {}
          policy_loss: -0.00015161468763835728
          total_loss: 176.90817260742188
          vf_explained_var: 0.5207358002662659
          vf_loss: 176.90834045410156
    num_agent_steps_sampled: 316000
    num_agent_steps_trained: 316000
    num_steps_sampled: 316000
    num_steps_trained: 316000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 320000
  custom_metrics: {}
  date: 2022-01-03_14-03-20
  done: false
  episode_len_mean: 492.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.45
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 954
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.55
    episode_reward_min: 451.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 451
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,80,1194.27,320000,492.45,500,359,492.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 324000
  custom_metrics: {}
  date: 2022-01-03_14-03-27
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 962
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.3010426586813936e-19
          cur_lr: 4.999999873689376e-05
          entropy: 0.26068851351737976
          entropy_coeff: 0.0
          kl: 0.004601421300321817
          model: {}
          policy_loss: -0.00252857175655663
          total_loss: 327.0229797363281
          vf_explained_var: 0.399458646774292
          vf_loss: 327.0255126953125
    num_agent_steps_sampled: 324000
    num_agent_steps_trained: 324000
    num_steps_sampled: 324000
    num_steps_trained: 324000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 328000
  custom_metrics: {}
  date: 2022-01-03_14-03-51
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 970
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.75
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.75
    episode_reward_min: 455.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 455
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 455.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,82,1225.19,328000,492.44,500,359,492.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 332000
  custom_metrics: {}
  date: 2022-01-03_14-03-58
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 978
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.252606646703484e-20
          cur_lr: 4.999999873689376e-05
          entropy: 0.2761077880859375
          entropy_coeff: 0.0
          kl: 0.002532893093302846
          model: {}
          policy_loss: -0.0015050852671265602
          total_loss: 263.3819885253906
          vf_explained_var: 0.48334449529647827
          vf_loss: 263.3835144042969
    num_agent_steps_sampled: 332000
    num_agent_steps_trained: 332000
    num_steps_sampled: 332000
    num_steps_trained: 332000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44
