# Assignment 5

This is the starter code for Assignment 5. In this assignment, you will solve increasingly challenging tasks from the [Minigrid benchmark](https://minigrid.farama.org/). You can use any RL algorithm to solve these tasks.

* Problems 1 and 2 are mandatory for all students.
* Problem 3 is mandatory for COMP 552 but optional for COMP 442.
* Problem 4 is optional for all.

In [None]:
!pip install torch
!pip install gymnasium
!pip install minigrid

In [1]:
import gymnasium as gym
import minigrid
import numpy as np

np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})

pygame 2.5.2 (SDL 2.28.3, Python 3.11.5)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [2]:
def compute_score(task, policy):
  num_episodes = 10
  cur_episode  = 0

  seed_by_episode = [42, 34, 50, 1, 9, 7, 43, 56, 90, 11]
  score_by_episode = np.zeros(num_episodes)

  while cur_episode < num_episodes:

    cumulative_reward = 0
    cur_seed = seed_by_episode[cur_episode]

    observation, info = task.reset(seed=cur_seed)
    done = False

    while not done:
      action = policy(observation)
      observation, reward, terminated, truncated, info = task.step(action)
      cumulative_reward += reward

      if terminated or truncated:
        done = True
        score_by_episode[cur_episode] = cumulative_reward
        cur_episode += 1

  score_mean = round(score_by_episode.mean(), 3)
  score_std  = round(score_by_episode.std(), 3)
  score_best = round(score_by_episode.max(), 3)

  print(f"Best score: {score_best}")
  print(f"Average score: {score_mean, score_std}")

  return score_by_episode

## Problem 1
Solve the [Minigrid Unlock](https://minigrid.farama.org/environments/minigrid/UnlockEnv/) task.

This problem is worth 5 points.

![](https://minigrid.farama.org/_images/UnlockEnv.gif)

In [3]:
! python -m rl_zoo3.train --algo ppo --env MiniGrid-Unlock-v0

Seed: 2577565073
Loading hyperparameters from: /opt/homebrew/anaconda3/lib/python3.11/site-packages/rl_zoo3/hyperparams/ppo.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 64),
             ('clip_range', 0.2),
             ('ent_coef', 0.0),
             ('env_wrapper', 'minigrid.wrappers.FlatObsWrapper'),
             ('gae_lambda', 0.95),
             ('gamma', 0.99),
             ('learning_rate', 0.00025),
             ('n_envs', 8),
             ('n_epochs', 10),
             ('n_steps', 128),
             ('n_timesteps', 100000.0),
             ('normalize', True),
             ('policy', 'MlpPolicy')])
Using 8 environments
Creating test environment
Normalization activated: {'gamma': 0.99, 'norm_reward': False, 'training': False}
Normalization activated: {'gamma': 0.99}
Using cpu device
Log path: logs/ppo/MiniGrid-Unlock-v0_1
-----------------------------
| time/              |      |
|    fps             | 5999 |
| 

In [10]:
from minigrid.wrappers import FlatObsWrapper
from stable_baselines3 import PPO

In [11]:
model = PPO.load("logs/ppo/MiniGrid-Unlock-v0_1/best_model.zip")

In [12]:


def first_policy(observation):
  ######## PUT YOUR CODE HERE ########
  action, _states = model.predict(observation)
  ######## PUT YOUR CODE HERE ########
  return action

In [14]:

first_task = gym.make("MiniGrid-Unlock-v0")
first_task = FlatObsWrapper(first_task)

compute_score(task=first_task, policy=first_policy)

Best score: 0.975
Average score: (0.804, 0.175)


array([0.97, 0.69, 0.76, 0.85, 0.35, 0.97, 0.91, 0.78, 0.83, 0.92])

### Discussion
I employed a PPO algorithm using RL zoo. The code is able to finish in about a minute, which is impressive. The reward is also able to climb past 0.8, which means the agent is able to quickly find the key and unlock the door.

## Problem 2
Solve the [Minigrid Unlock and Pickup](https://minigrid.farama.org/environments/minigrid/UnlockPickupEnv/) task.

This problem is worth 10 points for COMP 442 students.
This problem is worth 05 points for COMP 552 students.

![](https://minigrid.farama.org/_images/UnlockPickupEnv.gif)

In [240]:
! cd Minigrid-master/minigrid/ ; python3 manual_control.py --env MiniGrid-UnlockPickup-v0 > ~/Documents/documents-main/school/rice/fall_2023/comp_552/HW5/log.txt

78621.18s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Traceback (most recent call last):
  File "/Users/kmm12/Documents/documents-main/school/rice/fall_2023/comp_552/HW5/Minigrid-master/minigrid/manual_control.py", line 181, in <module>
    manual_control.start()
  File "/Users/kmm12/Documents/documents-main/school/rice/fall_2023/comp_552/HW5/Minigrid-master/minigrid/manual_control.py", line 39, in start
    for event in pygame.event.get():
                 ^^^^^^^^^^^^^^^^^^
pygame.error: video system not initialized


In [244]:
import pickle

pkl_file = open('traj_list_2.pkl', 'rb')

traj_list = pickle.load(pkl_file)

In [245]:
from imitation.data.rollout import flatten_trajectories as ft

trans_list = ft(traj_list)

In [257]:
import numpy as np
from imitation.policies.serialize import load_policy
from imitation.util.util import make_vec_env
from minigrid.wrappers import RGBImgPartialObsWrapper

SEED = 42

# second_task = gym.make("MiniGrid-UnlockPickup-v0")
# second_task = ImgObsWrapper(second_task)


# env = make_vec_env(
#     "MiniGrid-UnlockPickup-v0",
#     rng=np.random.default_rng(SEED),
#     n_envs=8)


second_task = make_vec_env(
    "MiniGrid-UnlockPickup-v0",
    rng=np.random.default_rng(SEED),
    n_envs=8,
    post_wrappers=[
        lambda env, _: FlatObsWrapper(env)
    ],  # needed for computing rollouts later
)

In [258]:
from imitation.algorithms.adversarial.gail import GAIL
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.evaluation import evaluate_policy

learner = PPO(
    env=second_task,
    policy=MlpPolicy,
    batch_size=64,
    ent_coef=0.0,
    learning_rate=0.00025,
    gamma=0.99,
    n_epochs=10,
    clip_range=0.2,
    gae_lambda=0.95,
    seed=SEED,
)

reward_net = BasicRewardNet(
    observation_space=second_task.observation_space,
    action_space=second_task.action_space,
    normalize_input_layer=RunningNorm,
)

gail_trainer = GAIL(
    demonstrations=traj_list,
    demo_batch_size=1024,
    gen_replay_buffer_capacity=512,
    n_disc_updates_per_round=8,
    venv=second_task,
    gen_algo=learner,
    reward_net=reward_net,
    allow_variable_horizon=True
)

# evaluate the learner before training
second_task.seed(SEED)
learner_rewards_before_training, _ = evaluate_policy(
    learner, second_task, 100, return_episode_rewards=True
)

# train the learner and evaluate again
gail_trainer.train(1_000_000)


second_task.seed(SEED)
learner_rewards_after_training, _ = evaluate_policy(
    learner, second_task, 100, return_episode_rewards=True
)


Running with `allow_variable_horizon` set to True. Some algorithms are biased towards shorter or longer episodes, which may significantly confound results. Additionally, even unbiased algorithms can exploit the information leak from the termination condition, producing spuriously high performance. See https://imitation.readthedocs.io/en/latest/getting-started/variable-horizon.html for more information.


round:   0%|          | 0/61 [00:00<?, ?it/s]

------------------------------------------
| raw/                        |          |
|    gen/rollout/ep_len_mean  | 288      |
|    gen/rollout/ep_rew_mean  | 0        |
|    gen/time/fps             | 1894     |
|    gen/time/iterations      | 1        |
|    gen/time/time_elapsed    | 8        |
|    gen/time/total_timesteps | 16384    |
------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.695    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
-

round:   2%|▏         | 1/61 [00:17<17:59, 18.00s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 179         |
|    gen/time/fps                    | 1919        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 32768       |
|    gen/train/approx_kl             | 0.011278102 |
|    gen/train/clip_fraction         | 0.112       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.94       |
|    gen/train/explained_variance    | -0.0427     |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.787       |
|    gen/train/n_updates             | 10          |
|    gen/train/policy_gradient_loss  | -0.00702    |
|    gen/train/value_loss            | 3.8    

round:   3%|▎         | 2/61 [00:36<17:50, 18.14s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1884        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 49152       |
|    gen/train/approx_kl             | 0.011101833 |
|    gen/train/clip_fraction         | 0.13        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.92       |
|    gen/train/explained_variance    | -0.00554    |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.58        |
|    gen/train/n_updates             | 20          |
|    gen/train/policy_gradient_loss  | -0.0107     |
|    gen/train/value_loss            | 5.7    

round:   5%|▍         | 3/61 [00:54<17:24, 18.01s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00153     |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1887        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 65536       |
|    gen/train/approx_kl             | 0.015735364 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.9        |
|    gen/train/explained_variance    | 0.329       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.27        |
|    gen/train/n_updates             | 30          |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 4.75   

round:   7%|▋         | 4/61 [01:12<17:09, 18.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00153     |
|    gen/rollout/ep_rew_wrapped_mean | 207         |
|    gen/time/fps                    | 1883        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 81920       |
|    gen/train/approx_kl             | 0.012665272 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.87       |
|    gen/train/explained_variance    | 0.565       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.17        |
|    gen/train/n_updates             | 40          |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 4.64   

round:   8%|▊         | 5/61 [01:30<16:48, 18.01s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 198         |
|    gen/time/fps                    | 1836        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 98304       |
|    gen/train/approx_kl             | 0.013797963 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.85       |
|    gen/train/explained_variance    | 0.753       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.8         |
|    gen/train/n_updates             | 50          |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 3.7    

round:  10%|▉         | 6/61 [01:49<16:55, 18.46s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1879        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 114688      |
|    gen/train/approx_kl             | 0.013399007 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.85       |
|    gen/train/explained_variance    | 0.77        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.619       |
|    gen/train/n_updates             | 60          |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 4.88   

round:  11%|█▏        | 7/61 [02:08<16:39, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 192         |
|    gen/time/fps                    | 1899        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 131072      |
|    gen/train/approx_kl             | 0.017824866 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.84       |
|    gen/train/explained_variance    | 0.574       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.88        |
|    gen/train/n_updates             | 70          |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 17.2   

round:  13%|█▎        | 8/61 [02:26<16:12, 18.35s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 285        |
|    gen/rollout/ep_rew_mean         | 0.0102     |
|    gen/rollout/ep_rew_wrapped_mean | 232        |
|    gen/time/fps                    | 1839       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 147456     |
|    gen/train/approx_kl             | 0.01382261 |
|    gen/train/clip_fraction         | 0.157      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.82      |
|    gen/train/explained_variance    | 0.676      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 28.1       |
|    gen/train/n_updates             | 80         |
|    gen/train/policy_gradient_loss  | -0.0149    |
|    gen/train/value_loss            | 29.4       |
------------

round:  15%|█▍        | 9/61 [02:44<15:56, 18.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0102      |
|    gen/rollout/ep_rew_wrapped_mean | 333         |
|    gen/time/fps                    | 1873        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 163840      |
|    gen/train/approx_kl             | 0.011295734 |
|    gen/train/clip_fraction         | 0.112       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.78       |
|    gen/train/explained_variance    | 0.654       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 45          |
|    gen/train/n_updates             | 90          |
|    gen/train/policy_gradient_loss  | -0.0128     |
|    gen/train/value_loss            | 166    

round:  16%|█▋        | 10/61 [03:02<15:33, 18.31s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 326         |
|    gen/time/fps                    | 1847        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 180224      |
|    gen/train/approx_kl             | 0.012342205 |
|    gen/train/clip_fraction         | 0.128       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.76       |
|    gen/train/explained_variance    | 0.816       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 62.4        |
|    gen/train/n_updates             | 100         |
|    gen/train/policy_gradient_loss  | -0.0126     |
|    gen/train/value_loss            | 121    

round:  18%|█▊        | 11/61 [03:21<15:25, 18.50s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 287        |
|    gen/rollout/ep_rew_mean         | 0.00588    |
|    gen/rollout/ep_rew_wrapped_mean | 272        |
|    gen/time/fps                    | 1858       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 196608     |
|    gen/train/approx_kl             | 0.01074544 |
|    gen/train/clip_fraction         | 0.116      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.73      |
|    gen/train/explained_variance    | 0.757      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 172        |
|    gen/train/n_updates             | 110        |
|    gen/train/policy_gradient_loss  | -0.0126    |
|    gen/train/value_loss            | 210        |
------------

round:  20%|█▉        | 12/61 [03:40<15:09, 18.56s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00862     |
|    gen/rollout/ep_rew_wrapped_mean | 228         |
|    gen/time/fps                    | 1869        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 212992      |
|    gen/train/approx_kl             | 0.014515139 |
|    gen/train/clip_fraction         | 0.157       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.69       |
|    gen/train/explained_variance    | 0.776       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 117         |
|    gen/train/n_updates             | 120         |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 117    

round:  21%|██▏       | 13/61 [03:58<14:46, 18.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0185      |
|    gen/rollout/ep_rew_wrapped_mean | 204         |
|    gen/time/fps                    | 1893        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 229376      |
|    gen/train/approx_kl             | 0.012595223 |
|    gen/train/clip_fraction         | 0.144       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.65       |
|    gen/train/explained_variance    | 0.77        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 21.5        |
|    gen/train/n_updates             | 130         |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 72.1   

round:  23%|██▎       | 14/61 [04:16<14:21, 18.34s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0109      |
|    gen/rollout/ep_rew_wrapped_mean | 202         |
|    gen/time/fps                    | 1841        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 245760      |
|    gen/train/approx_kl             | 0.013701243 |
|    gen/train/clip_fraction         | 0.164       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.59       |
|    gen/train/explained_variance    | 0.854       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 14          |
|    gen/train/n_updates             | 140         |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 37.2   

round:  25%|██▍       | 15/61 [04:35<14:06, 18.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1880        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 262144      |
|    gen/train/approx_kl             | 0.014477641 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.55       |
|    gen/train/explained_variance    | 0.793       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.24        |
|    gen/train/n_updates             | 150         |
|    gen/train/policy_gradient_loss  | -0.0126     |
|    gen/train/value_loss            | 19.6   

round:  26%|██▌       | 16/61 [04:53<13:45, 18.34s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0175      |
|    gen/rollout/ep_rew_wrapped_mean | 180         |
|    gen/time/fps                    | 1847        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 278528      |
|    gen/train/approx_kl             | 0.017754592 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.5        |
|    gen/train/explained_variance    | 0.795       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.02        |
|    gen/train/n_updates             | 160         |
|    gen/train/policy_gradient_loss  | -0.0138     |
|    gen/train/value_loss            | 11.7   

round:  28%|██▊       | 17/61 [05:12<13:33, 18.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0196      |
|    gen/rollout/ep_rew_wrapped_mean | 178         |
|    gen/time/fps                    | 1842        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 294912      |
|    gen/train/approx_kl             | 0.025699243 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.43       |
|    gen/train/explained_variance    | 0.619       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 17.8        |
|    gen/train/n_updates             | 170         |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 17.7   

round:  30%|██▉       | 18/61 [05:30<13:11, 18.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00788     |
|    gen/rollout/ep_rew_wrapped_mean | 168         |
|    gen/time/fps                    | 1822        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 311296      |
|    gen/train/approx_kl             | 0.047805317 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.38       |
|    gen/train/explained_variance    | 0.327       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.26        |
|    gen/train/n_updates             | 180         |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 10.9   

round:  31%|███       | 19/61 [05:48<12:54, 18.43s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 288        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 195        |
|    gen/time/fps                    | 1820       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 327680     |
|    gen/train/approx_kl             | 0.02225788 |
|    gen/train/clip_fraction         | 0.163      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.25      |
|    gen/train/explained_variance    | 0.567      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.84       |
|    gen/train/n_updates             | 190        |
|    gen/train/policy_gradient_loss  | -0.0107    |
|    gen/train/value_loss            | 10.8       |
------------

round:  33%|███▎      | 20/61 [06:07<12:35, 18.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00584     |
|    gen/rollout/ep_rew_wrapped_mean | 223         |
|    gen/time/fps                    | 1857        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 344064      |
|    gen/train/approx_kl             | 0.033189498 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.32       |
|    gen/train/explained_variance    | 0.759       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.57        |
|    gen/train/n_updates             | 200         |
|    gen/train/policy_gradient_loss  | -0.0172     |
|    gen/train/value_loss            | 17.3   

round:  34%|███▍      | 21/61 [06:25<12:15, 18.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0163      |
|    gen/rollout/ep_rew_wrapped_mean | 235         |
|    gen/time/fps                    | 1917        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 360448      |
|    gen/train/approx_kl             | 0.021003492 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.34       |
|    gen/train/explained_variance    | 0.847       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.14        |
|    gen/train/n_updates             | 210         |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 20     

round:  36%|███▌      | 22/61 [06:43<11:49, 18.20s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 283        |
|    gen/rollout/ep_rew_mean         | 0.0191     |
|    gen/rollout/ep_rew_wrapped_mean | 237        |
|    gen/time/fps                    | 1939       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 376832     |
|    gen/train/approx_kl             | 0.01720974 |
|    gen/train/clip_fraction         | 0.193      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.38      |
|    gen/train/explained_variance    | 0.878      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 3.16       |
|    gen/train/n_updates             | 220        |
|    gen/train/policy_gradient_loss  | -0.0175    |
|    gen/train/value_loss            | 20         |
------------

round:  38%|███▊      | 23/61 [07:01<11:26, 18.07s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 285        |
|    gen/rollout/ep_rew_mean         | 0.0126     |
|    gen/rollout/ep_rew_wrapped_mean | 233        |
|    gen/time/fps                    | 1936       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 393216     |
|    gen/train/approx_kl             | 0.01874625 |
|    gen/train/clip_fraction         | 0.187      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.33      |
|    gen/train/explained_variance    | 0.916      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.59       |
|    gen/train/n_updates             | 230        |
|    gen/train/policy_gradient_loss  | -0.0149    |
|    gen/train/value_loss            | 15.4       |
------------

round:  39%|███▉      | 24/61 [07:19<11:06, 18.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0146      |
|    gen/rollout/ep_rew_wrapped_mean | 209         |
|    gen/time/fps                    | 1943        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 409600      |
|    gen/train/approx_kl             | 0.019535221 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.38       |
|    gen/train/explained_variance    | 0.943       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.1         |
|    gen/train/n_updates             | 240         |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 14.3   

round:  41%|████      | 25/61 [07:36<10:46, 17.96s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0205      |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1921        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 425984      |
|    gen/train/approx_kl             | 0.021706197 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.32       |
|    gen/train/explained_variance    | 0.956       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.49        |
|    gen/train/n_updates             | 250         |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 17.1   

round:  43%|████▎     | 26/61 [07:54<10:28, 17.95s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 281        |
|    gen/rollout/ep_rew_mean         | 0.0285     |
|    gen/rollout/ep_rew_wrapped_mean | 222        |
|    gen/time/fps                    | 1935       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 442368     |
|    gen/train/approx_kl             | 0.02026394 |
|    gen/train/clip_fraction         | 0.172      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.22      |
|    gen/train/explained_variance    | 0.95       |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.36       |
|    gen/train/n_updates             | 260        |
|    gen/train/policy_gradient_loss  | -0.0136    |
|    gen/train/value_loss            | 19.6       |
------------

round:  44%|████▍     | 27/61 [08:12<10:09, 17.93s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0422      |
|    gen/rollout/ep_rew_wrapped_mean | 227         |
|    gen/time/fps                    | 1955        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 458752      |
|    gen/train/approx_kl             | 0.022334224 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.13       |
|    gen/train/explained_variance    | 0.944       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.1         |
|    gen/train/n_updates             | 270         |
|    gen/train/policy_gradient_loss  | -0.0147     |
|    gen/train/value_loss            | 22     

round:  46%|████▌     | 28/61 [08:30<09:52, 17.94s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 276        |
|    gen/rollout/ep_rew_mean         | 0.0469     |
|    gen/rollout/ep_rew_wrapped_mean | 213        |
|    gen/time/fps                    | 1973       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 475136     |
|    gen/train/approx_kl             | 0.01712257 |
|    gen/train/clip_fraction         | 0.163      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.06      |
|    gen/train/explained_variance    | 0.935      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 8.62       |
|    gen/train/n_updates             | 280        |
|    gen/train/policy_gradient_loss  | -0.0144    |
|    gen/train/value_loss            | 27.8       |
------------

round:  48%|████▊     | 29/61 [08:48<09:30, 17.84s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 278        |
|    gen/rollout/ep_rew_mean         | 0.0376     |
|    gen/rollout/ep_rew_wrapped_mean | 197        |
|    gen/time/fps                    | 1914       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 491520     |
|    gen/train/approx_kl             | 0.02158501 |
|    gen/train/clip_fraction         | 0.165      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.987     |
|    gen/train/explained_variance    | 0.936      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 3.27       |
|    gen/train/n_updates             | 290        |
|    gen/train/policy_gradient_loss  | -0.0123    |
|    gen/train/value_loss            | 19.7       |
------------

round:  49%|████▉     | 30/61 [09:06<09:13, 17.85s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0167      |
|    gen/rollout/ep_rew_wrapped_mean | 221         |
|    gen/time/fps                    | 1912        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 507904      |
|    gen/train/approx_kl             | 0.016258324 |
|    gen/train/clip_fraction         | 0.157       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.05       |
|    gen/train/explained_variance    | 0.927       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 6.78        |
|    gen/train/n_updates             | 300         |
|    gen/train/policy_gradient_loss  | -0.0118     |
|    gen/train/value_loss            | 22.3   

round:  51%|█████     | 31/61 [09:24<08:57, 17.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0249      |
|    gen/rollout/ep_rew_wrapped_mean | 220         |
|    gen/time/fps                    | 1955        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 524288      |
|    gen/train/approx_kl             | 0.021759408 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.987      |
|    gen/train/explained_variance    | 0.918       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.85        |
|    gen/train/n_updates             | 310         |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 16     

round:  52%|█████▏    | 32/61 [09:41<08:35, 17.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0148      |
|    gen/rollout/ep_rew_wrapped_mean | 190         |
|    gen/time/fps                    | 1868        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 540672      |
|    gen/train/approx_kl             | 0.020234622 |
|    gen/train/clip_fraction         | 0.167       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.923      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.63        |
|    gen/train/n_updates             | 320         |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 14.6   

round:  54%|█████▍    | 33/61 [09:59<08:20, 17.88s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0132      |
|    gen/rollout/ep_rew_wrapped_mean | 174         |
|    gen/time/fps                    | 1882        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 557056      |
|    gen/train/approx_kl             | 0.018859163 |
|    gen/train/clip_fraction         | 0.154       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.934      |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.07        |
|    gen/train/n_updates             | 330         |
|    gen/train/policy_gradient_loss  | -0.0114     |
|    gen/train/value_loss            | 6.3    

round:  56%|█████▌    | 34/61 [10:17<08:04, 17.93s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0192      |
|    gen/rollout/ep_rew_wrapped_mean | 188         |
|    gen/time/fps                    | 1869        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 573440      |
|    gen/train/approx_kl             | 0.020109788 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.919      |
|    gen/train/explained_variance    | 0.937       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 17.7        |
|    gen/train/n_updates             | 340         |
|    gen/train/policy_gradient_loss  | -0.0112     |
|    gen/train/value_loss            | 11.3   

round:  57%|█████▋    | 35/61 [10:36<07:48, 18.03s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0283      |
|    gen/rollout/ep_rew_wrapped_mean | 215         |
|    gen/time/fps                    | 1897        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 589824      |
|    gen/train/approx_kl             | 0.018752826 |
|    gen/train/clip_fraction         | 0.168       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.08       |
|    gen/train/explained_variance    | 0.9         |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 10.5        |
|    gen/train/n_updates             | 350         |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 30.2   

round:  59%|█████▉    | 36/61 [10:54<07:30, 18.03s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 282        |
|    gen/rollout/ep_rew_mean         | 0.0244     |
|    gen/rollout/ep_rew_wrapped_mean | 215        |
|    gen/time/fps                    | 1914       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 606208     |
|    gen/train/approx_kl             | 0.02533369 |
|    gen/train/clip_fraction         | 0.171      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.942     |
|    gen/train/explained_variance    | 0.934      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 8.31       |
|    gen/train/n_updates             | 360        |
|    gen/train/policy_gradient_loss  | -0.015     |
|    gen/train/value_loss            | 25         |
------------

round:  61%|██████    | 37/61 [11:12<07:12, 18.03s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0408      |
|    gen/rollout/ep_rew_wrapped_mean | 206         |
|    gen/time/fps                    | 1847        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 622592      |
|    gen/train/approx_kl             | 0.020781165 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.939      |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.68        |
|    gen/train/n_updates             | 370         |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 16.7   

round:  62%|██████▏   | 38/61 [11:30<06:56, 18.13s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0149      |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1936        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 638976      |
|    gen/train/approx_kl             | 0.023781385 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.874      |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.23        |
|    gen/train/n_updates             | 380         |
|    gen/train/policy_gradient_loss  | -0.0151     |
|    gen/train/value_loss            | 12.5   

round:  64%|██████▍   | 39/61 [11:48<06:37, 18.08s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.00984     |
|    gen/rollout/ep_rew_wrapped_mean | 176         |
|    gen/time/fps                    | 1899        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 655360      |
|    gen/train/approx_kl             | 0.025314387 |
|    gen/train/clip_fraction         | 0.194       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.95       |
|    gen/train/explained_variance    | 0.959       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.94        |
|    gen/train/n_updates             | 390         |
|    gen/train/policy_gradient_loss  | -0.0193     |
|    gen/train/value_loss            | 7.05   

round:  66%|██████▌   | 40/61 [12:06<06:19, 18.09s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 285        |
|    gen/rollout/ep_rew_mean         | 0.00984    |
|    gen/rollout/ep_rew_wrapped_mean | 183        |
|    gen/time/fps                    | 1872       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 671744     |
|    gen/train/approx_kl             | 0.02362791 |
|    gen/train/clip_fraction         | 0.172      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.921     |
|    gen/train/explained_variance    | 0.941      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 1.65       |
|    gen/train/n_updates             | 400        |
|    gen/train/policy_gradient_loss  | -0.0128    |
|    gen/train/value_loss            | 9.59       |
------------

round:  67%|██████▋   | 41/61 [12:25<06:03, 18.20s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 288        |
|    gen/rollout/ep_rew_mean         | 0.00144    |
|    gen/rollout/ep_rew_wrapped_mean | 184        |
|    gen/time/fps                    | 1910       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 688128     |
|    gen/train/approx_kl             | 0.02451703 |
|    gen/train/clip_fraction         | 0.203      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.961     |
|    gen/train/explained_variance    | 0.911      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.23       |
|    gen/train/n_updates             | 410        |
|    gen/train/policy_gradient_loss  | -0.0189    |
|    gen/train/value_loss            | 6.27       |
------------

round:  69%|██████▉   | 42/61 [12:43<05:44, 18.14s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0212      |
|    gen/rollout/ep_rew_wrapped_mean | 187         |
|    gen/time/fps                    | 1931        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 704512      |
|    gen/train/approx_kl             | 0.022707704 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.974      |
|    gen/train/explained_variance    | 0.906       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.6         |
|    gen/train/n_updates             | 420         |
|    gen/train/policy_gradient_loss  | -0.0172     |
|    gen/train/value_loss            | 7.18   

round:  70%|███████   | 43/61 [13:01<05:25, 18.08s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0268      |
|    gen/rollout/ep_rew_wrapped_mean | 203         |
|    gen/time/fps                    | 1875        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 720896      |
|    gen/train/approx_kl             | 0.022034056 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.885      |
|    gen/train/explained_variance    | 0.801       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.03        |
|    gen/train/n_updates             | 430         |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 11.5   

round:  72%|███████▏  | 44/61 [13:19<05:07, 18.09s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 285        |
|    gen/rollout/ep_rew_mean         | 0.0157     |
|    gen/rollout/ep_rew_wrapped_mean | 219        |
|    gen/time/fps                    | 1919       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 737280     |
|    gen/train/approx_kl             | 0.02684091 |
|    gen/train/clip_fraction         | 0.211      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.02      |
|    gen/train/explained_variance    | 0.783      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 3.28       |
|    gen/train/n_updates             | 440        |
|    gen/train/policy_gradient_loss  | -0.0192    |
|    gen/train/value_loss            | 12.5       |
------------

round:  74%|███████▍  | 45/61 [13:36<04:48, 18.02s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 283        |
|    gen/rollout/ep_rew_mean         | 0.0206     |
|    gen/rollout/ep_rew_wrapped_mean | 211        |
|    gen/time/fps                    | 1928       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 753664     |
|    gen/train/approx_kl             | 0.01940728 |
|    gen/train/clip_fraction         | 0.194      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1         |
|    gen/train/explained_variance    | 0.681      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.59       |
|    gen/train/n_updates             | 450        |
|    gen/train/policy_gradient_loss  | -0.0149    |
|    gen/train/value_loss            | 8.4        |
------------

round:  75%|███████▌  | 46/61 [13:54<04:30, 18.01s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.012       |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1931        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 770048      |
|    gen/train/approx_kl             | 0.018209098 |
|    gen/train/clip_fraction         | 0.152       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.88       |
|    gen/train/explained_variance    | 0.807       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.5         |
|    gen/train/n_updates             | 460         |
|    gen/train/policy_gradient_loss  | -0.0122     |
|    gen/train/value_loss            | 9.05   

round:  77%|███████▋  | 47/61 [14:12<04:10, 17.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0243      |
|    gen/rollout/ep_rew_wrapped_mean | 185         |
|    gen/time/fps                    | 1832        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 786432      |
|    gen/train/approx_kl             | 0.018271938 |
|    gen/train/clip_fraction         | 0.156       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.814      |
|    gen/train/explained_variance    | 0.82        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.4         |
|    gen/train/n_updates             | 470         |
|    gen/train/policy_gradient_loss  | -0.0113     |
|    gen/train/value_loss            | 5.05   

round:  79%|███████▊  | 48/61 [14:30<03:54, 18.03s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 277       |
|    gen/rollout/ep_rew_mean         | 0.0428    |
|    gen/rollout/ep_rew_wrapped_mean | 151       |
|    gen/time/fps                    | 1873      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 802816    |
|    gen/train/approx_kl             | 0.0317891 |
|    gen/train/clip_fraction         | 0.206     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.853    |
|    gen/train/explained_variance    | 0.567     |
|    gen/train/learning_rate         | 0.00025   |
|    gen/train/loss                  | 2.52      |
|    gen/train/n_updates             | 480       |
|    gen/train/policy_gradient_loss  | -0.0231   |
|    gen/train/value_loss            | 14        |
-------------------------------

round:  80%|████████  | 49/61 [14:49<03:36, 18.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0279      |
|    gen/rollout/ep_rew_wrapped_mean | 156         |
|    gen/time/fps                    | 1877        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 819200      |
|    gen/train/approx_kl             | 0.029193996 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.934      |
|    gen/train/explained_variance    | 0.705       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.7         |
|    gen/train/n_updates             | 490         |
|    gen/train/policy_gradient_loss  | -0.029      |
|    gen/train/value_loss            | 10.4   

round:  82%|████████▏ | 50/61 [15:07<03:19, 18.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0277      |
|    gen/rollout/ep_rew_wrapped_mean | 185         |
|    gen/time/fps                    | 1885        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 835584      |
|    gen/train/approx_kl             | 0.024990994 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.911      |
|    gen/train/explained_variance    | 0.845       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.07        |
|    gen/train/n_updates             | 500         |
|    gen/train/policy_gradient_loss  | -0.0151     |
|    gen/train/value_loss            | 10.4   

round:  84%|████████▎ | 51/61 [15:25<03:01, 18.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0393      |
|    gen/rollout/ep_rew_wrapped_mean | 199         |
|    gen/time/fps                    | 1889        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 851968      |
|    gen/train/approx_kl             | 0.021766894 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.881      |
|    gen/train/explained_variance    | 0.79        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.33        |
|    gen/train/n_updates             | 510         |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 16.3   

round:  85%|████████▌ | 52/61 [15:43<02:43, 18.14s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 281       |
|    gen/rollout/ep_rew_mean         | 0.0281    |
|    gen/rollout/ep_rew_wrapped_mean | 197       |
|    gen/time/fps                    | 1912      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 868352    |
|    gen/train/approx_kl             | 0.0193111 |
|    gen/train/clip_fraction         | 0.175     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.844    |
|    gen/train/explained_variance    | 0.818     |
|    gen/train/learning_rate         | 0.00025   |
|    gen/train/loss                  | 4         |
|    gen/train/n_updates             | 520       |
|    gen/train/policy_gradient_loss  | -0.0136   |
|    gen/train/value_loss            | 15.3      |
-------------------------------

round:  87%|████████▋ | 53/61 [16:01<02:25, 18.16s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0377      |
|    gen/rollout/ep_rew_wrapped_mean | 184         |
|    gen/time/fps                    | 1876        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 884736      |
|    gen/train/approx_kl             | 0.019510709 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.723      |
|    gen/train/explained_variance    | 0.862       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.66        |
|    gen/train/n_updates             | 530         |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 7.55   

round:  89%|████████▊ | 54/61 [16:20<02:07, 18.16s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0217      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1915        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 901120      |
|    gen/train/approx_kl             | 0.039438147 |
|    gen/train/clip_fraction         | 0.236       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.79       |
|    gen/train/explained_variance    | 0.863       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 11          |
|    gen/train/n_updates             | 540         |
|    gen/train/policy_gradient_loss  | -0.0256     |
|    gen/train/value_loss            | 10.8   

round:  90%|█████████ | 55/61 [16:38<01:49, 18.18s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0185      |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1964        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 917504      |
|    gen/train/approx_kl             | 0.031889506 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.826      |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.89        |
|    gen/train/n_updates             | 550         |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 7.01   

round:  92%|█████████▏| 56/61 [16:56<01:30, 18.08s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0172      |
|    gen/rollout/ep_rew_wrapped_mean | 208         |
|    gen/time/fps                    | 1935        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 933888      |
|    gen/train/approx_kl             | 0.025529094 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.774      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.43        |
|    gen/train/n_updates             | 560         |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 10.6   

round:  93%|█████████▎| 57/61 [17:14<01:12, 18.12s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0229      |
|    gen/rollout/ep_rew_wrapped_mean | 228         |
|    gen/time/fps                    | 1933        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 950272      |
|    gen/train/approx_kl             | 0.019494925 |
|    gen/train/clip_fraction         | 0.171       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.806      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.31        |
|    gen/train/n_updates             | 570         |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 11.5   

round:  95%|█████████▌| 58/61 [17:32<00:54, 18.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0192      |
|    gen/rollout/ep_rew_wrapped_mean | 212         |
|    gen/time/fps                    | 1914        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 966656      |
|    gen/train/approx_kl             | 0.019978866 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.906      |
|    gen/train/explained_variance    | 0.942       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 12.7        |
|    gen/train/n_updates             | 580         |
|    gen/train/policy_gradient_loss  | -0.0125     |
|    gen/train/value_loss            | 18.4   

round:  97%|█████████▋| 59/61 [17:50<00:36, 18.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0225      |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1867        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 983040      |
|    gen/train/approx_kl             | 0.022635762 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.868      |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.6         |
|    gen/train/n_updates             | 590         |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 13.2   

round:  98%|█████████▊| 60/61 [18:08<00:18, 18.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0237      |
|    gen/rollout/ep_rew_wrapped_mean | 193         |
|    gen/time/fps                    | 1897        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 999424      |
|    gen/train/approx_kl             | 0.018712664 |
|    gen/train/clip_fraction         | 0.176       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.903      |
|    gen/train/explained_variance    | 0.96        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.36        |
|    gen/train/n_updates             | 600         |
|    gen/train/policy_gradient_loss  | -0.0123     |
|    gen/train/value_loss            | 8.93   

round: 100%|██████████| 61/61 [18:26<00:00, 18.14s/it]


In [261]:
def second_policy(observation):
  ######## PUT YOUR CODE HERE ########
  action, _states = learner.predict(observation)
  ######## PUT YOUR CODE HERE ########
  return action

In [262]:
env = gym.make("MiniGrid-UnlockPickup-v0")
env = FlatObsWrapper(env)

In [269]:
compute_score(task=env, policy=second_policy)

Best score: 0.531
Average score: (0.053, 0.159)


array([0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.53125])

In [270]:
gail_trainer.train(1_000_000)

round:   0%|          | 0/61 [00:00<?, ?it/s]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00663     |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1908        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1015808     |
|    gen/train/approx_kl             | 0.020309025 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.889      |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.58        |
|    gen/train/n_updates             | 610         |
|    gen/train/policy_gradient_loss  | -0.00932    |
|    gen/train/value_loss            | 12.4   

round:   2%|▏         | 1/61 [00:18<18:20, 18.34s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0107      |
|    gen/rollout/ep_rew_wrapped_mean | 251         |
|    gen/time/fps                    | 1863        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1032192     |
|    gen/train/approx_kl             | 0.021098066 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.945      |
|    gen/train/explained_variance    | 0.962       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.12        |
|    gen/train/n_updates             | 620         |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 11.4   

round:   3%|▎         | 2/61 [00:37<18:18, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0169      |
|    gen/rollout/ep_rew_wrapped_mean | 232         |
|    gen/time/fps                    | 1776        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1048576     |
|    gen/train/approx_kl             | 0.023753747 |
|    gen/train/clip_fraction         | 0.208       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.974      |
|    gen/train/explained_variance    | 0.983       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.84        |
|    gen/train/n_updates             | 630         |
|    gen/train/policy_gradient_loss  | -0.0118     |
|    gen/train/value_loss            | 7.63   

round:   5%|▍         | 3/61 [00:58<19:00, 19.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0174      |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1607        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1064960     |
|    gen/train/approx_kl             | 0.022067554 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1          |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.36        |
|    gen/train/n_updates             | 640         |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 10.3   

round:   7%|▋         | 4/61 [01:18<19:00, 20.00s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0147      |
|    gen/rollout/ep_rew_wrapped_mean | 144         |
|    gen/time/fps                    | 1860        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1081344     |
|    gen/train/approx_kl             | 0.024693191 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.991      |
|    gen/train/explained_variance    | 0.944       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.79        |
|    gen/train/n_updates             | 650         |
|    gen/train/policy_gradient_loss  | -0.0151     |
|    gen/train/value_loss            | 7.21   

round:   8%|▊         | 5/61 [01:37<18:18, 19.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00225     |
|    gen/rollout/ep_rew_wrapped_mean | 155         |
|    gen/time/fps                    | 1866        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1097728     |
|    gen/train/approx_kl             | 0.025340218 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.831      |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.91        |
|    gen/train/n_updates             | 660         |
|    gen/train/policy_gradient_loss  | -0.0188     |
|    gen/train/value_loss            | 5.5    

round:  10%|▉         | 6/61 [01:56<17:54, 19.54s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0133      |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1687        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1114112     |
|    gen/train/approx_kl             | 0.026581112 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.838      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.47        |
|    gen/train/n_updates             | 670         |
|    gen/train/policy_gradient_loss  | -0.0183     |
|    gen/train/value_loss            | 8.56   

round:  11%|█▏        | 7/61 [02:17<17:45, 19.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0111      |
|    gen/rollout/ep_rew_wrapped_mean | 222         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1130496     |
|    gen/train/approx_kl             | 0.021169318 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.779      |
|    gen/train/explained_variance    | 0.907       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.5         |
|    gen/train/n_updates             | 680         |
|    gen/train/policy_gradient_loss  | -0.0171     |
|    gen/train/value_loss            | 15.3   

round:  13%|█▎        | 8/61 [02:36<17:23, 19.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00334     |
|    gen/rollout/ep_rew_wrapped_mean | 223         |
|    gen/time/fps                    | 1832        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1146880     |
|    gen/train/approx_kl             | 0.026406609 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.707      |
|    gen/train/explained_variance    | 0.917       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.5         |
|    gen/train/n_updates             | 690         |
|    gen/train/policy_gradient_loss  | -0.0148     |
|    gen/train/value_loss            | 9.06   

round:  15%|█▍        | 9/61 [02:55<16:48, 19.39s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 192         |
|    gen/time/fps                    | 1875        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1163264     |
|    gen/train/approx_kl             | 0.028710179 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.658      |
|    gen/train/explained_variance    | 0.884       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.63        |
|    gen/train/n_updates             | 700         |
|    gen/train/policy_gradient_loss  | -0.025      |
|    gen/train/value_loss            | 5.6    

round:  16%|█▋        | 10/61 [03:13<16:12, 19.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00434     |
|    gen/rollout/ep_rew_wrapped_mean | 172         |
|    gen/time/fps                    | 1848        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1179648     |
|    gen/train/approx_kl             | 0.040286683 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.703      |
|    gen/train/explained_variance    | 0.926       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.84        |
|    gen/train/n_updates             | 710         |
|    gen/train/policy_gradient_loss  | -0.0294     |
|    gen/train/value_loss            | 5.51   

round:  18%|█▊        | 11/61 [03:32<15:55, 19.10s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00306     |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1196032     |
|    gen/train/approx_kl             | 0.026217448 |
|    gen/train/clip_fraction         | 0.194       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.746      |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.27        |
|    gen/train/n_updates             | 720         |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 11.1   

round:  20%|█▉        | 12/61 [03:52<15:40, 19.19s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0198      |
|    gen/rollout/ep_rew_wrapped_mean | 212         |
|    gen/time/fps                    | 1781        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1212416     |
|    gen/train/approx_kl             | 0.023176638 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.727      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.98        |
|    gen/train/n_updates             | 730         |
|    gen/train/policy_gradient_loss  | -0.016      |
|    gen/train/value_loss            | 5.41   

round:  21%|██▏       | 13/61 [04:12<15:39, 19.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.019       |
|    gen/rollout/ep_rew_wrapped_mean | 212         |
|    gen/time/fps                    | 1769        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1228800     |
|    gen/train/approx_kl             | 0.022313472 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.768      |
|    gen/train/explained_variance    | 0.885       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.39        |
|    gen/train/n_updates             | 740         |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 15.2   

round:  23%|██▎       | 14/61 [04:31<15:13, 19.45s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00228     |
|    gen/rollout/ep_rew_wrapped_mean | 200         |
|    gen/time/fps                    | 1814        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1245184     |
|    gen/train/approx_kl             | 0.021611702 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.765      |
|    gen/train/explained_variance    | 0.922       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.8         |
|    gen/train/n_updates             | 750         |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 8.36   

round:  25%|██▍       | 15/61 [04:50<14:46, 19.26s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00475     |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1758        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1261568     |
|    gen/train/approx_kl             | 0.020617347 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.748      |
|    gen/train/explained_variance    | 0.95        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.64        |
|    gen/train/n_updates             | 760         |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 4.25   

round:  26%|██▌       | 16/61 [05:10<14:35, 19.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00622     |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1277952     |
|    gen/train/approx_kl             | 0.026022477 |
|    gen/train/clip_fraction         | 0.161       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.772      |
|    gen/train/explained_variance    | 0.948       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.69        |
|    gen/train/n_updates             | 770         |
|    gen/train/policy_gradient_loss  | -0.0121     |
|    gen/train/value_loss            | 6.62   

round:  28%|██▊       | 17/61 [05:29<14:11, 19.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00716     |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1765        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1294336     |
|    gen/train/approx_kl             | 0.030357387 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.795      |
|    gen/train/explained_variance    | 0.923       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.59        |
|    gen/train/n_updates             | 780         |
|    gen/train/policy_gradient_loss  | -0.0175     |
|    gen/train/value_loss            | 5.19   

round:  30%|██▉       | 18/61 [05:49<13:55, 19.42s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0119      |
|    gen/rollout/ep_rew_wrapped_mean | 167         |
|    gen/time/fps                    | 1869        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1310720     |
|    gen/train/approx_kl             | 0.046296388 |
|    gen/train/clip_fraction         | 0.243       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.838      |
|    gen/train/explained_variance    | 0.945       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.9         |
|    gen/train/n_updates             | 790         |
|    gen/train/policy_gradient_loss  | -0.0254     |
|    gen/train/value_loss            | 5.86   

round:  31%|███       | 19/61 [06:08<13:31, 19.32s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 286        |
|    gen/rollout/ep_rew_mean         | 0.00991    |
|    gen/rollout/ep_rew_wrapped_mean | 174        |
|    gen/time/fps                    | 1763       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1327104    |
|    gen/train/approx_kl             | 0.03327711 |
|    gen/train/clip_fraction         | 0.21       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.72      |
|    gen/train/explained_variance    | 0.947      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 0.674      |
|    gen/train/n_updates             | 800        |
|    gen/train/policy_gradient_loss  | -0.0276    |
|    gen/train/value_loss            | 6.53       |
------------

round:  33%|███▎      | 20/61 [06:28<13:18, 19.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0267      |
|    gen/rollout/ep_rew_wrapped_mean | 172         |
|    gen/time/fps                    | 1720        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1343488     |
|    gen/train/approx_kl             | 0.051801246 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.685      |
|    gen/train/explained_variance    | 0.941       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.927       |
|    gen/train/n_updates             | 810         |
|    gen/train/policy_gradient_loss  | -0.0195     |
|    gen/train/value_loss            | 6.35   

round:  34%|███▍      | 21/61 [06:48<13:03, 19.58s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 276        |
|    gen/rollout/ep_rew_mean         | 0.0475     |
|    gen/rollout/ep_rew_wrapped_mean | 168        |
|    gen/time/fps                    | 1853       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 1359872    |
|    gen/train/approx_kl             | 0.04968017 |
|    gen/train/clip_fraction         | 0.201      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.752     |
|    gen/train/explained_variance    | 0.925      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 9.55       |
|    gen/train/n_updates             | 820        |
|    gen/train/policy_gradient_loss  | -0.0249    |
|    gen/train/value_loss            | 8.48       |
------------

round:  36%|███▌      | 22/61 [07:07<12:37, 19.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0232      |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1824        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1376256     |
|    gen/train/approx_kl             | 0.024290849 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.751      |
|    gen/train/explained_variance    | 0.901       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 29.9        |
|    gen/train/n_updates             | 830         |
|    gen/train/policy_gradient_loss  | -0.0115     |
|    gen/train/value_loss            | 10     

round:  38%|███▊      | 23/61 [07:26<12:12, 19.29s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00925     |
|    gen/rollout/ep_rew_wrapped_mean | 206         |
|    gen/time/fps                    | 1877        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1392640     |
|    gen/train/approx_kl             | 0.027136585 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.836      |
|    gen/train/explained_variance    | 0.864       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.24        |
|    gen/train/n_updates             | 840         |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 6.05   

round:  39%|███▉      | 24/61 [07:44<11:49, 19.17s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 286        |
|    gen/rollout/ep_rew_mean         | 0.00791    |
|    gen/rollout/ep_rew_wrapped_mean | 210        |
|    gen/time/fps                    | 1902       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 1409024    |
|    gen/train/approx_kl             | 0.02111633 |
|    gen/train/clip_fraction         | 0.198      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.807     |
|    gen/train/explained_variance    | 0.845      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 1.29       |
|    gen/train/n_updates             | 850        |
|    gen/train/policy_gradient_loss  | -0.0142    |
|    gen/train/value_loss            | 5.96       |
------------

round:  41%|████      | 25/61 [08:03<11:24, 19.00s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0198      |
|    gen/rollout/ep_rew_wrapped_mean | 199         |
|    gen/time/fps                    | 1899        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1425408     |
|    gen/train/approx_kl             | 0.023662688 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.789      |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.986       |
|    gen/train/n_updates             | 860         |
|    gen/train/policy_gradient_loss  | -0.016      |
|    gen/train/value_loss            | 3.52   

round:  43%|████▎     | 26/61 [08:22<11:02, 18.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0148      |
|    gen/rollout/ep_rew_wrapped_mean | 187         |
|    gen/time/fps                    | 1910        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1441792     |
|    gen/train/approx_kl             | 0.023427842 |
|    gen/train/clip_fraction         | 0.194       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.809      |
|    gen/train/explained_variance    | 0.877       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.686       |
|    gen/train/n_updates             | 870         |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 6.72   

round:  44%|████▍     | 27/61 [08:40<10:38, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.0085      |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1929        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1458176     |
|    gen/train/approx_kl             | 0.022092607 |
|    gen/train/clip_fraction         | 0.189       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.812      |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.53        |
|    gen/train/n_updates             | 880         |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 2.42   

round:  46%|████▌     | 28/61 [08:58<10:13, 18.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.024       |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1894        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1474560     |
|    gen/train/approx_kl             | 0.020875696 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.726      |
|    gen/train/explained_variance    | 0.885       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.64        |
|    gen/train/n_updates             | 890         |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 5.18   

round:  48%|████▊     | 29/61 [09:18<10:08, 19.01s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 285        |
|    gen/rollout/ep_rew_mean         | 0.0118     |
|    gen/rollout/ep_rew_wrapped_mean | 172        |
|    gen/time/fps                    | 1833       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 1490944    |
|    gen/train/approx_kl             | 0.03238321 |
|    gen/train/clip_fraction         | 0.186      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.687     |
|    gen/train/explained_variance    | 0.777      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 10.1       |
|    gen/train/n_updates             | 900        |
|    gen/train/policy_gradient_loss  | -0.0144    |
|    gen/train/value_loss            | 8.25       |
------------

round:  49%|████▉     | 30/61 [09:37<09:46, 18.92s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 287        |
|    gen/rollout/ep_rew_mean         | 0.00656    |
|    gen/rollout/ep_rew_wrapped_mean | 177        |
|    gen/time/fps                    | 1834       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 1507328    |
|    gen/train/approx_kl             | 0.03493133 |
|    gen/train/clip_fraction         | 0.198      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.749     |
|    gen/train/explained_variance    | 0.873      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 1.8        |
|    gen/train/n_updates             | 910        |
|    gen/train/policy_gradient_loss  | -0.0183    |
|    gen/train/value_loss            | 3.22       |
------------

round:  51%|█████     | 31/61 [09:56<09:27, 18.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00966     |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1829        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1523712     |
|    gen/train/approx_kl             | 0.025678748 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.729      |
|    gen/train/explained_variance    | 0.833       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.03        |
|    gen/train/n_updates             | 920         |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 5.86   

round:  52%|█████▏    | 32/61 [10:15<09:08, 18.91s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 172         |
|    gen/time/fps                    | 1840        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1540096     |
|    gen/train/approx_kl             | 0.024019863 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.718      |
|    gen/train/explained_variance    | 0.875       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.22        |
|    gen/train/n_updates             | 930         |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 5.18   

round:  54%|█████▍    | 33/61 [10:34<08:50, 18.94s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 286        |
|    gen/rollout/ep_rew_mean         | 0.0107     |
|    gen/rollout/ep_rew_wrapped_mean | 159        |
|    gen/time/fps                    | 1708       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1556480    |
|    gen/train/approx_kl             | 0.03156822 |
|    gen/train/clip_fraction         | 0.198      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.754     |
|    gen/train/explained_variance    | 0.937      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 1.38       |
|    gen/train/n_updates             | 940        |
|    gen/train/policy_gradient_loss  | -0.02      |
|    gen/train/value_loss            | 3.13       |
------------

round:  56%|█████▌    | 34/61 [10:54<08:44, 19.42s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 285       |
|    gen/rollout/ep_rew_mean         | 0.0159    |
|    gen/rollout/ep_rew_wrapped_mean | 165       |
|    gen/time/fps                    | 1858      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 1572864   |
|    gen/train/approx_kl             | 0.0227618 |
|    gen/train/clip_fraction         | 0.176     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.715    |
|    gen/train/explained_variance    | 0.855     |
|    gen/train/learning_rate         | 0.00025   |
|    gen/train/loss                  | 1.18      |
|    gen/train/n_updates             | 950       |
|    gen/train/policy_gradient_loss  | -0.0143   |
|    gen/train/value_loss            | 7.46      |
-------------------------------

round:  57%|█████▋    | 35/61 [11:13<08:20, 19.26s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0158      |
|    gen/rollout/ep_rew_wrapped_mean | 170         |
|    gen/time/fps                    | 1720        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1589248     |
|    gen/train/approx_kl             | 0.029090626 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.727      |
|    gen/train/explained_variance    | 0.826       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 10.5        |
|    gen/train/n_updates             | 960         |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 8.11   

round:  59%|█████▉    | 36/61 [11:33<08:03, 19.32s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.0117      |
|    gen/rollout/ep_rew_wrapped_mean | 175         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1605632     |
|    gen/train/approx_kl             | 0.025975045 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.708      |
|    gen/train/explained_variance    | 0.811       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.86        |
|    gen/train/n_updates             | 970         |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 9.05   

round:  61%|██████    | 37/61 [11:53<07:48, 19.52s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 288         |
|    gen/rollout/ep_rew_mean         | 0.00206     |
|    gen/rollout/ep_rew_wrapped_mean | 185         |
|    gen/time/fps                    | 1760        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1622016     |
|    gen/train/approx_kl             | 0.028113246 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.686      |
|    gen/train/explained_variance    | 0.851       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.77        |
|    gen/train/n_updates             | 980         |
|    gen/train/policy_gradient_loss  | -0.0158     |
|    gen/train/value_loss            | 9.52   

round:  62%|██████▏   | 38/61 [12:12<07:27, 19.44s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 283        |
|    gen/rollout/ep_rew_mean         | 0.0199     |
|    gen/rollout/ep_rew_wrapped_mean | 189        |
|    gen/time/fps                    | 1724       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1638400    |
|    gen/train/approx_kl             | 0.02503546 |
|    gen/train/clip_fraction         | 0.169      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.62      |
|    gen/train/explained_variance    | 0.907      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 0.967      |
|    gen/train/n_updates             | 990        |
|    gen/train/policy_gradient_loss  | -0.0151    |
|    gen/train/value_loss            | 5.39       |
------------

round:  64%|██████▍   | 39/61 [12:32<07:10, 19.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 280         |
|    gen/rollout/ep_rew_mean         | 0.0327      |
|    gen/rollout/ep_rew_wrapped_mean | 198         |
|    gen/time/fps                    | 1666        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1654784     |
|    gen/train/approx_kl             | 0.019643323 |
|    gen/train/clip_fraction         | 0.162       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.637      |
|    gen/train/explained_variance    | 0.865       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.6         |
|    gen/train/n_updates             | 1000        |
|    gen/train/policy_gradient_loss  | -0.0142     |
|    gen/train/value_loss            | 10.6   

round:  66%|██████▌   | 40/61 [12:53<06:58, 19.92s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 282        |
|    gen/rollout/ep_rew_mean         | 0.0223     |
|    gen/rollout/ep_rew_wrapped_mean | 202        |
|    gen/time/fps                    | 1787       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1671168    |
|    gen/train/approx_kl             | 0.02907113 |
|    gen/train/clip_fraction         | 0.197      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.694     |
|    gen/train/explained_variance    | 0.878      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 1.3        |
|    gen/train/n_updates             | 1010       |
|    gen/train/policy_gradient_loss  | -0.0186    |
|    gen/train/value_loss            | 12.3       |
------------

round:  67%|██████▋   | 41/61 [13:12<06:37, 19.86s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0133      |
|    gen/rollout/ep_rew_wrapped_mean | 184         |
|    gen/time/fps                    | 1738        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1687552     |
|    gen/train/approx_kl             | 0.025374018 |
|    gen/train/clip_fraction         | 0.203       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.67       |
|    gen/train/explained_variance    | 0.872       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.5         |
|    gen/train/n_updates             | 1020        |
|    gen/train/policy_gradient_loss  | -0.0183     |
|    gen/train/value_loss            | 7.57   

round:  69%|██████▉   | 42/61 [13:32<06:17, 19.84s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 287        |
|    gen/rollout/ep_rew_mean         | 0.00556    |
|    gen/rollout/ep_rew_wrapped_mean | 173        |
|    gen/time/fps                    | 1785       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1703936    |
|    gen/train/approx_kl             | 0.03162779 |
|    gen/train/clip_fraction         | 0.205      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.675     |
|    gen/train/explained_variance    | 0.867      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.35       |
|    gen/train/n_updates             | 1030       |
|    gen/train/policy_gradient_loss  | -0.0176    |
|    gen/train/value_loss            | 6.1        |
------------

round:  70%|███████   | 43/61 [13:53<05:59, 19.98s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0226      |
|    gen/rollout/ep_rew_wrapped_mean | 168         |
|    gen/time/fps                    | 1714        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1720320     |
|    gen/train/approx_kl             | 0.022411723 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.617      |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.81        |
|    gen/train/n_updates             | 1040        |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 3.83   

round:  72%|███████▏  | 44/61 [14:13<05:41, 20.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0408      |
|    gen/rollout/ep_rew_wrapped_mean | 172         |
|    gen/time/fps                    | 1783        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1736704     |
|    gen/train/approx_kl             | 0.021590946 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.609      |
|    gen/train/explained_variance    | 0.812       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 12.3        |
|    gen/train/n_updates             | 1050        |
|    gen/train/policy_gradient_loss  | -0.0128     |
|    gen/train/value_loss            | 11.9   

round:  74%|███████▍  | 45/61 [14:33<05:21, 20.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0408      |
|    gen/rollout/ep_rew_wrapped_mean | 187         |
|    gen/time/fps                    | 1625        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1753088     |
|    gen/train/approx_kl             | 0.028784385 |
|    gen/train/clip_fraction         | 0.2         |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.612      |
|    gen/train/explained_variance    | 0.742       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.97        |
|    gen/train/n_updates             | 1060        |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 11.5   

round:  75%|███████▌  | 46/61 [14:53<05:01, 20.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0196      |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1826        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1769472     |
|    gen/train/approx_kl             | 0.020928215 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.611      |
|    gen/train/explained_variance    | 0.747       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 23.1        |
|    gen/train/n_updates             | 1070        |
|    gen/train/policy_gradient_loss  | -0.011      |
|    gen/train/value_loss            | 9.79   

round:  77%|███████▋  | 47/61 [15:12<04:36, 19.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 287         |
|    gen/rollout/ep_rew_mean         | 0.00591     |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1819        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1785856     |
|    gen/train/approx_kl             | 0.023066798 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.595      |
|    gen/train/explained_variance    | 0.855       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.69        |
|    gen/train/n_updates             | 1080        |
|    gen/train/policy_gradient_loss  | -0.0122     |
|    gen/train/value_loss            | 8.47   

round:  79%|███████▊  | 48/61 [15:31<04:14, 19.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 286         |
|    gen/rollout/ep_rew_mean         | 0.00791     |
|    gen/rollout/ep_rew_wrapped_mean | 191         |
|    gen/time/fps                    | 1747        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1802240     |
|    gen/train/approx_kl             | 0.025357805 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.64       |
|    gen/train/explained_variance    | 0.849       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.56        |
|    gen/train/n_updates             | 1090        |
|    gen/train/policy_gradient_loss  | -0.019      |
|    gen/train/value_loss            | 5.61   

round:  80%|████████  | 49/61 [15:51<03:57, 19.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0111      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1709        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1818624     |
|    gen/train/approx_kl             | 0.021786347 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.682      |
|    gen/train/explained_variance    | 0.878       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.13        |
|    gen/train/n_updates             | 1100        |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 6.23   

round:  82%|████████▏ | 50/61 [16:11<03:38, 19.85s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0328      |
|    gen/rollout/ep_rew_wrapped_mean | 178         |
|    gen/time/fps                    | 1851        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1835008     |
|    gen/train/approx_kl             | 0.026863497 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.654      |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.52        |
|    gen/train/n_updates             | 1110        |
|    gen/train/policy_gradient_loss  | -0.0193     |
|    gen/train/value_loss            | 6.37   

round:  84%|████████▎ | 51/61 [16:30<03:15, 19.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0418      |
|    gen/rollout/ep_rew_wrapped_mean | 173         |
|    gen/time/fps                    | 1771        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1851392     |
|    gen/train/approx_kl             | 0.025523223 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.662      |
|    gen/train/explained_variance    | 0.852       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.4         |
|    gen/train/n_updates             | 1120        |
|    gen/train/policy_gradient_loss  | -0.0148     |
|    gen/train/value_loss            | 8.66   

round:  85%|████████▌ | 52/61 [16:50<02:55, 19.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0124      |
|    gen/rollout/ep_rew_wrapped_mean | 167         |
|    gen/time/fps                    | 1849        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1867776     |
|    gen/train/approx_kl             | 0.023053579 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.663      |
|    gen/train/explained_variance    | 0.831       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 0.866       |
|    gen/train/n_updates             | 1130        |
|    gen/train/policy_gradient_loss  | -0.0129     |
|    gen/train/value_loss            | 7.65   

round:  87%|████████▋ | 53/61 [17:09<02:36, 19.54s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0264      |
|    gen/rollout/ep_rew_wrapped_mean | 178         |
|    gen/time/fps                    | 1825        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1884160     |
|    gen/train/approx_kl             | 0.039447743 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.682      |
|    gen/train/explained_variance    | 0.732       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.06        |
|    gen/train/n_updates             | 1140        |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 7.71   

round:  89%|████████▊ | 54/61 [17:29<02:16, 19.46s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 277        |
|    gen/rollout/ep_rew_mean         | 0.0412     |
|    gen/rollout/ep_rew_wrapped_mean | 195        |
|    gen/time/fps                    | 1804       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1900544    |
|    gen/train/approx_kl             | 0.02474212 |
|    gen/train/clip_fraction         | 0.185      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.721     |
|    gen/train/explained_variance    | 0.777      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.55       |
|    gen/train/n_updates             | 1150       |
|    gen/train/policy_gradient_loss  | -0.0135    |
|    gen/train/value_loss            | 15.2       |
------------

round:  90%|█████████ | 55/61 [17:48<01:57, 19.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.043       |
|    gen/rollout/ep_rew_wrapped_mean | 193         |
|    gen/time/fps                    | 1857        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1916928     |
|    gen/train/approx_kl             | 0.026973126 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.7        |
|    gen/train/explained_variance    | 0.795       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.44        |
|    gen/train/n_updates             | 1160        |
|    gen/train/policy_gradient_loss  | -0.0118     |
|    gen/train/value_loss            | 17.9   

round:  92%|█████████▏| 56/61 [18:07<01:37, 19.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 276         |
|    gen/rollout/ep_rew_mean         | 0.0518      |
|    gen/rollout/ep_rew_wrapped_mean | 165         |
|    gen/time/fps                    | 1789        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1933312     |
|    gen/train/approx_kl             | 0.035318032 |
|    gen/train/clip_fraction         | 0.239       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.672      |
|    gen/train/explained_variance    | 0.762       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.89        |
|    gen/train/n_updates             | 1170        |
|    gen/train/policy_gradient_loss  | -0.0221     |
|    gen/train/value_loss            | 12.7   

round:  93%|█████████▎| 57/61 [18:27<01:17, 19.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0423      |
|    gen/rollout/ep_rew_wrapped_mean | 176         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1949696     |
|    gen/train/approx_kl             | 0.026143989 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.641      |
|    gen/train/explained_variance    | 0.747       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.77        |
|    gen/train/n_updates             | 1180        |
|    gen/train/policy_gradient_loss  | -0.0122     |
|    gen/train/value_loss            | 18.8   

round:  95%|█████████▌| 58/61 [18:46<00:58, 19.35s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 280        |
|    gen/rollout/ep_rew_mean         | 0.0323     |
|    gen/rollout/ep_rew_wrapped_mean | 198        |
|    gen/time/fps                    | 1758       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1966080    |
|    gen/train/approx_kl             | 0.02141019 |
|    gen/train/clip_fraction         | 0.184      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.649     |
|    gen/train/explained_variance    | 0.728      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.08       |
|    gen/train/n_updates             | 1190       |
|    gen/train/policy_gradient_loss  | -0.0145    |
|    gen/train/value_loss            | 18         |
------------

round:  97%|█████████▋| 59/61 [19:05<00:38, 19.36s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 280       |
|    gen/rollout/ep_rew_mean         | 0.0329    |
|    gen/rollout/ep_rew_wrapped_mean | 210       |
|    gen/time/fps                    | 1762      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 9         |
|    gen/time/total_timesteps        | 1982464   |
|    gen/train/approx_kl             | 0.0212855 |
|    gen/train/clip_fraction         | 0.16      |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.608    |
|    gen/train/explained_variance    | 0.852     |
|    gen/train/learning_rate         | 0.00025   |
|    gen/train/loss                  | 2.99      |
|    gen/train/n_updates             | 1200      |
|    gen/train/policy_gradient_loss  | -0.0127   |
|    gen/train/value_loss            | 19.6      |
-------------------------------

round:  98%|█████████▊| 60/61 [19:25<00:19, 19.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0718      |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1811        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1998848     |
|    gen/train/approx_kl             | 0.039381087 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.663      |
|    gen/train/explained_variance    | 0.847       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.98        |
|    gen/train/n_updates             | 1210        |
|    gen/train/policy_gradient_loss  | -0.0151     |
|    gen/train/value_loss            | 10.7   

round: 100%|██████████| 61/61 [19:44<00:00, 19.42s/it]


In [273]:
compute_score(task=env, policy=second_policy)

Best score: 0.638
Average score: (0.175, 0.268)


array([0.      , 0.      , 0.      , 0.      , 0.534375, 0.575   ,
       0.6375  , 0.      , 0.      , 0.      ])

In [274]:
gail_trainer.train(2_000_000)

round:   0%|          | 0/122 [00:00<?, ?it/s]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 271         |
|    gen/rollout/ep_rew_mean         | 0.0715      |
|    gen/rollout/ep_rew_wrapped_mean | 180         |
|    gen/time/fps                    | 1806        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2015232     |
|    gen/train/approx_kl             | 0.026362428 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.681      |
|    gen/train/explained_variance    | 0.809       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.99        |
|    gen/train/n_updates             | 1220        |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 24     

round:   1%|          | 1/122 [00:20<41:46, 20.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 274         |
|    gen/rollout/ep_rew_mean         | 0.06        |
|    gen/rollout/ep_rew_wrapped_mean | 188         |
|    gen/time/fps                    | 1690        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2031616     |
|    gen/train/approx_kl             | 0.032290228 |
|    gen/train/clip_fraction         | 0.207       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.737      |
|    gen/train/explained_variance    | 0.866       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.48        |
|    gen/train/n_updates             | 1230        |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 20.1   

round:   2%|▏         | 2/122 [00:41<41:33, 20.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 267         |
|    gen/rollout/ep_rew_mean         | 0.084       |
|    gen/rollout/ep_rew_wrapped_mean | 218         |
|    gen/time/fps                    | 1718        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2048000     |
|    gen/train/approx_kl             | 0.021606535 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.701      |
|    gen/train/explained_variance    | 0.801       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.61        |
|    gen/train/n_updates             | 1240        |
|    gen/train/policy_gradient_loss  | -0.0115     |
|    gen/train/value_loss            | 20.9   

round:   2%|▏         | 3/122 [01:01<40:49, 20.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.0588      |
|    gen/rollout/ep_rew_wrapped_mean | 199         |
|    gen/time/fps                    | 1794        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2064384     |
|    gen/train/approx_kl             | 0.023133406 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.757      |
|    gen/train/explained_variance    | 0.87        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.58        |
|    gen/train/n_updates             | 1250        |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 22.8   

round:   3%|▎         | 4/122 [01:21<39:37, 20.15s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 281        |
|    gen/rollout/ep_rew_mean         | 0.0281     |
|    gen/rollout/ep_rew_wrapped_mean | 154        |
|    gen/time/fps                    | 1782       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2080768    |
|    gen/train/approx_kl             | 0.03988814 |
|    gen/train/clip_fraction         | 0.251      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.767     |
|    gen/train/explained_variance    | 0.914      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.77       |
|    gen/train/n_updates             | 1260       |
|    gen/train/policy_gradient_loss  | -0.02      |
|    gen/train/value_loss            | 9.63       |
------------

round:   4%|▍         | 5/122 [01:41<39:27, 20.23s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0459      |
|    gen/rollout/ep_rew_wrapped_mean | 147         |
|    gen/time/fps                    | 1735        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2097152     |
|    gen/train/approx_kl             | 0.026050124 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.709      |
|    gen/train/explained_variance    | 0.919       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.96        |
|    gen/train/n_updates             | 1270        |
|    gen/train/policy_gradient_loss  | -0.0187     |
|    gen/train/value_loss            | 11.5   

round:   5%|▍         | 6/122 [02:02<39:31, 20.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0516      |
|    gen/rollout/ep_rew_wrapped_mean | 150         |
|    gen/time/fps                    | 1782        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2113536     |
|    gen/train/approx_kl             | 0.027452815 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.711      |
|    gen/train/explained_variance    | 0.903       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 15.2        |
|    gen/train/n_updates             | 1280        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 12.6   

round:   6%|▌         | 7/122 [02:22<38:58, 20.33s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0263      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2129920     |
|    gen/train/approx_kl             | 0.025607388 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.713      |
|    gen/train/explained_variance    | 0.87        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.16        |
|    gen/train/n_updates             | 1290        |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 15.2   

round:   7%|▋         | 8/122 [02:42<38:05, 20.05s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0249      |
|    gen/rollout/ep_rew_wrapped_mean | 187         |
|    gen/time/fps                    | 1836        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2146304     |
|    gen/train/approx_kl             | 0.026835658 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.655      |
|    gen/train/explained_variance    | 0.927       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.61        |
|    gen/train/n_updates             | 1300        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 6.89   

round:   7%|▋         | 9/122 [03:01<37:20, 19.82s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0459      |
|    gen/rollout/ep_rew_wrapped_mean | 188         |
|    gen/time/fps                    | 1820        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2162688     |
|    gen/train/approx_kl             | 0.033318453 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.67       |
|    gen/train/explained_variance    | 0.922       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 38.3        |
|    gen/train/n_updates             | 1310        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 15     

round:   8%|▊         | 10/122 [03:20<36:46, 19.70s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0516      |
|    gen/rollout/ep_rew_wrapped_mean | 187         |
|    gen/time/fps                    | 1785        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2179072     |
|    gen/train/approx_kl             | 0.025084898 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.654      |
|    gen/train/explained_variance    | 0.889       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 20.7        |
|    gen/train/n_updates             | 1320        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 17.4   

round:   9%|▉         | 11/122 [03:40<36:29, 19.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0314      |
|    gen/rollout/ep_rew_wrapped_mean | 203         |
|    gen/time/fps                    | 1749        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2195456     |
|    gen/train/approx_kl             | 0.023923261 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.623      |
|    gen/train/explained_variance    | 0.919       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.78        |
|    gen/train/n_updates             | 1330        |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 15     

round:  10%|▉         | 12/122 [04:00<36:27, 19.88s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0124      |
|    gen/rollout/ep_rew_wrapped_mean | 229         |
|    gen/time/fps                    | 1758        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2211840     |
|    gen/train/approx_kl             | 0.024779882 |
|    gen/train/clip_fraction         | 0.177       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.633      |
|    gen/train/explained_variance    | 0.932       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.67        |
|    gen/train/n_updates             | 1340        |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 16.4   

round:  11%|█         | 13/122 [04:20<36:05, 19.87s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0246      |
|    gen/rollout/ep_rew_wrapped_mean | 255         |
|    gen/time/fps                    | 1748        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2228224     |
|    gen/train/approx_kl             | 0.025221653 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.673      |
|    gen/train/explained_variance    | 0.95        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.22        |
|    gen/train/n_updates             | 1350        |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 13.9   

round:  11%|█▏        | 14/122 [04:40<35:57, 19.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0514      |
|    gen/rollout/ep_rew_wrapped_mean | 228         |
|    gen/time/fps                    | 1693        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2244608     |
|    gen/train/approx_kl             | 0.031382985 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.679      |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 11.8        |
|    gen/train/n_updates             | 1360        |
|    gen/train/policy_gradient_loss  | -0.0198     |
|    gen/train/value_loss            | 10.9   

round:  12%|█▏        | 15/122 [05:01<35:49, 20.09s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 273        |
|    gen/rollout/ep_rew_mean         | 0.0587     |
|    gen/rollout/ep_rew_wrapped_mean | 201        |
|    gen/time/fps                    | 1805       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2260992    |
|    gen/train/approx_kl             | 0.02244907 |
|    gen/train/clip_fraction         | 0.18       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.651     |
|    gen/train/explained_variance    | 0.962      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 6.86       |
|    gen/train/n_updates             | 1370       |
|    gen/train/policy_gradient_loss  | -0.0165    |
|    gen/train/value_loss            | 24         |
------------

round:  13%|█▎        | 16/122 [05:21<35:21, 20.01s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 272         |
|    gen/rollout/ep_rew_mean         | 0.0619      |
|    gen/rollout/ep_rew_wrapped_mean | 210         |
|    gen/time/fps                    | 1742        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2277376     |
|    gen/train/approx_kl             | 0.020563696 |
|    gen/train/clip_fraction         | 0.187       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.724      |
|    gen/train/explained_variance    | 0.954       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.23        |
|    gen/train/n_updates             | 1380        |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 25.7   

round:  14%|█▍        | 17/122 [05:41<35:16, 20.16s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 280         |
|    gen/rollout/ep_rew_mean         | 0.0312      |
|    gen/rollout/ep_rew_wrapped_mean | 221         |
|    gen/time/fps                    | 1705        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2293760     |
|    gen/train/approx_kl             | 0.023644987 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.738      |
|    gen/train/explained_variance    | 0.961       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.43        |
|    gen/train/n_updates             | 1390        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 19.5   

round:  15%|█▍        | 18/122 [06:01<34:59, 20.18s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 276         |
|    gen/rollout/ep_rew_mean         | 0.0464      |
|    gen/rollout/ep_rew_wrapped_mean | 216         |
|    gen/time/fps                    | 1832        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2310144     |
|    gen/train/approx_kl             | 0.021172615 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.674      |
|    gen/train/explained_variance    | 0.976       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.25        |
|    gen/train/n_updates             | 1400        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 11     

round:  16%|█▌        | 19/122 [06:20<34:02, 19.83s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 272         |
|    gen/rollout/ep_rew_mean         | 0.0604      |
|    gen/rollout/ep_rew_wrapped_mean | 207         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2326528     |
|    gen/train/approx_kl             | 0.021796776 |
|    gen/train/clip_fraction         | 0.168       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.646      |
|    gen/train/explained_variance    | 0.983       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.38        |
|    gen/train/n_updates             | 1410        |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 14.4   

round:  16%|█▋        | 20/122 [06:40<33:26, 19.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0411      |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1809        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2342912     |
|    gen/train/approx_kl             | 0.026723891 |
|    gen/train/clip_fraction         | 0.17        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.617      |
|    gen/train/explained_variance    | 0.974       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.21        |
|    gen/train/n_updates             | 1420        |
|    gen/train/policy_gradient_loss  | -0.0166     |
|    gen/train/value_loss            | 14     

round:  17%|█▋        | 21/122 [07:00<33:27, 19.88s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0232      |
|    gen/rollout/ep_rew_wrapped_mean | 180         |
|    gen/time/fps                    | 1649        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2359296     |
|    gen/train/approx_kl             | 0.018946208 |
|    gen/train/clip_fraction         | 0.144       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.621      |
|    gen/train/explained_variance    | 0.957       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.25        |
|    gen/train/n_updates             | 1430        |
|    gen/train/policy_gradient_loss  | -0.00911    |
|    gen/train/value_loss            | 14.9   

round:  18%|█▊        | 22/122 [07:21<33:33, 20.14s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0202      |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1773        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2375680     |
|    gen/train/approx_kl             | 0.022764312 |
|    gen/train/clip_fraction         | 0.181       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.649      |
|    gen/train/explained_variance    | 0.953       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.49        |
|    gen/train/n_updates             | 1440        |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 15.9   

round:  19%|█▉        | 23/122 [07:41<33:02, 20.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 284         |
|    gen/rollout/ep_rew_mean         | 0.0158      |
|    gen/rollout/ep_rew_wrapped_mean | 185         |
|    gen/time/fps                    | 1759        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2392064     |
|    gen/train/approx_kl             | 0.027780741 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.624      |
|    gen/train/explained_variance    | 0.944       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.46        |
|    gen/train/n_updates             | 1450        |
|    gen/train/policy_gradient_loss  | -0.0219     |
|    gen/train/value_loss            | 12.4   

round:  20%|█▉        | 24/122 [08:00<32:33, 19.94s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 285         |
|    gen/rollout/ep_rew_mean         | 0.0118      |
|    gen/rollout/ep_rew_wrapped_mean | 195         |
|    gen/time/fps                    | 1786        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2408448     |
|    gen/train/approx_kl             | 0.026172318 |
|    gen/train/clip_fraction         | 0.19        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.662      |
|    gen/train/explained_variance    | 0.968       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.8         |
|    gen/train/n_updates             | 1460        |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 8.63   

round:  20%|██        | 25/122 [08:20<32:06, 19.86s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0193      |
|    gen/rollout/ep_rew_wrapped_mean | 190         |
|    gen/time/fps                    | 1777        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2424832     |
|    gen/train/approx_kl             | 0.027471013 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.649      |
|    gen/train/explained_variance    | 0.954       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.82        |
|    gen/train/n_updates             | 1470        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 8.46   

round:  21%|██▏       | 26/122 [08:40<31:39, 19.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.025       |
|    gen/rollout/ep_rew_wrapped_mean | 194         |
|    gen/time/fps                    | 1718        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2441216     |
|    gen/train/approx_kl             | 0.028760036 |
|    gen/train/clip_fraction         | 0.186       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.66       |
|    gen/train/explained_variance    | 0.958       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 37.8        |
|    gen/train/n_updates             | 1480        |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 19.6   

round:  22%|██▏       | 27/122 [09:00<31:34, 19.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0258      |
|    gen/rollout/ep_rew_wrapped_mean | 206         |
|    gen/time/fps                    | 1769        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2457600     |
|    gen/train/approx_kl             | 0.022477549 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.677      |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.57        |
|    gen/train/n_updates             | 1490        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 17.6   

round:  23%|██▎       | 28/122 [09:20<31:08, 19.88s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0448      |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1790        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2473984     |
|    gen/train/approx_kl             | 0.021661665 |
|    gen/train/clip_fraction         | 0.181       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.684      |
|    gen/train/explained_variance    | 0.981       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 13.7        |
|    gen/train/n_updates             | 1500        |
|    gen/train/policy_gradient_loss  | -0.0129     |
|    gen/train/value_loss            | 9.31   

round:  24%|██▍       | 29/122 [09:39<30:39, 19.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 272         |
|    gen/rollout/ep_rew_mean         | 0.0634      |
|    gen/rollout/ep_rew_wrapped_mean | 189         |
|    gen/time/fps                    | 1803        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2490368     |
|    gen/train/approx_kl             | 0.021045182 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.699      |
|    gen/train/explained_variance    | 0.975       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.25        |
|    gen/train/n_updates             | 1510        |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 20.1   

round:  25%|██▍       | 30/122 [09:59<30:15, 19.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0702      |
|    gen/rollout/ep_rew_wrapped_mean | 176         |
|    gen/time/fps                    | 1779        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2506752     |
|    gen/train/approx_kl             | 0.027086142 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.648      |
|    gen/train/explained_variance    | 0.973       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.69        |
|    gen/train/n_updates             | 1520        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 11.6   

round:  25%|██▌       | 31/122 [10:18<29:50, 19.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0437      |
|    gen/rollout/ep_rew_wrapped_mean | 158         |
|    gen/time/fps                    | 1762        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2523136     |
|    gen/train/approx_kl             | 0.019907016 |
|    gen/train/clip_fraction         | 0.171       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.704      |
|    gen/train/explained_variance    | 0.92        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.56        |
|    gen/train/n_updates             | 1530        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 18.3   

round:  26%|██▌       | 32/122 [10:39<29:43, 19.82s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 277        |
|    gen/rollout/ep_rew_mean         | 0.042      |
|    gen/rollout/ep_rew_wrapped_mean | 165        |
|    gen/time/fps                    | 1635       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 10         |
|    gen/time/total_timesteps        | 2539520    |
|    gen/train/approx_kl             | 0.02182556 |
|    gen/train/clip_fraction         | 0.162      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.643     |
|    gen/train/explained_variance    | 0.938      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 9.5        |
|    gen/train/n_updates             | 1540       |
|    gen/train/policy_gradient_loss  | -0.0129    |
|    gen/train/value_loss            | 16.1       |
------------

round:  27%|██▋       | 33/122 [10:59<29:53, 20.15s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0366      |
|    gen/rollout/ep_rew_wrapped_mean | 169         |
|    gen/time/fps                    | 1692        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2555904     |
|    gen/train/approx_kl             | 0.026485149 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.637      |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.41        |
|    gen/train/n_updates             | 1550        |
|    gen/train/policy_gradient_loss  | -0.016      |
|    gen/train/value_loss            | 13.2   

round:  28%|██▊       | 34/122 [11:20<29:36, 20.19s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0332      |
|    gen/rollout/ep_rew_wrapped_mean | 171         |
|    gen/time/fps                    | 1713        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2572288     |
|    gen/train/approx_kl             | 0.023733769 |
|    gen/train/clip_fraction         | 0.167       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.606      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.67        |
|    gen/train/n_updates             | 1560        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 12.7   

round:  29%|██▊       | 35/122 [11:40<29:09, 20.11s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 282        |
|    gen/rollout/ep_rew_mean         | 0.0241     |
|    gen/rollout/ep_rew_wrapped_mean | 193        |
|    gen/time/fps                    | 1854       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2588672    |
|    gen/train/approx_kl             | 0.02538412 |
|    gen/train/clip_fraction         | 0.176      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.567     |
|    gen/train/explained_variance    | 0.918      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.88       |
|    gen/train/n_updates             | 1570       |
|    gen/train/policy_gradient_loss  | -0.0178    |
|    gen/train/value_loss            | 16.8       |
------------

round:  30%|██▉       | 36/122 [11:58<28:06, 19.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 280         |
|    gen/rollout/ep_rew_mean         | 0.0332      |
|    gen/rollout/ep_rew_wrapped_mean | 205         |
|    gen/time/fps                    | 1854        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2605056     |
|    gen/train/approx_kl             | 0.024004899 |
|    gen/train/clip_fraction         | 0.167       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.619      |
|    gen/train/explained_variance    | 0.869       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 6.01        |
|    gen/train/n_updates             | 1580        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 14.1   

round:  30%|███       | 37/122 [12:17<27:19, 19.29s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0476      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1834        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2621440     |
|    gen/train/approx_kl             | 0.021257939 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.657      |
|    gen/train/explained_variance    | 0.877       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.62        |
|    gen/train/n_updates             | 1590        |
|    gen/train/policy_gradient_loss  | -0.0123     |
|    gen/train/value_loss            | 17.7   

round:  31%|███       | 38/122 [12:36<26:50, 19.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 268         |
|    gen/rollout/ep_rew_mean         | 0.0795      |
|    gen/rollout/ep_rew_wrapped_mean | 140         |
|    gen/time/fps                    | 1846        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2637824     |
|    gen/train/approx_kl             | 0.023395576 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.683      |
|    gen/train/explained_variance    | 0.913       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.77        |
|    gen/train/n_updates             | 1600        |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 13.8   

round:  32%|███▏      | 39/122 [12:54<26:18, 19.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 263         |
|    gen/rollout/ep_rew_mean         | 0.0973      |
|    gen/rollout/ep_rew_wrapped_mean | 130         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2654208     |
|    gen/train/approx_kl             | 0.020796396 |
|    gen/train/clip_fraction         | 0.159       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.664      |
|    gen/train/explained_variance    | 0.784       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.61        |
|    gen/train/n_updates             | 1610        |
|    gen/train/policy_gradient_loss  | -0.0167     |
|    gen/train/value_loss            | 26.3   

round:  33%|███▎      | 40/122 [13:13<25:53, 18.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 259         |
|    gen/rollout/ep_rew_mean         | 0.113       |
|    gen/rollout/ep_rew_wrapped_mean | 157         |
|    gen/time/fps                    | 1815        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2670592     |
|    gen/train/approx_kl             | 0.025763731 |
|    gen/train/clip_fraction         | 0.162       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.629      |
|    gen/train/explained_variance    | 0.797       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.64        |
|    gen/train/n_updates             | 1620        |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 21.4   

round:  34%|███▎      | 41/122 [13:32<25:31, 18.91s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 263         |
|    gen/rollout/ep_rew_mean         | 0.0972      |
|    gen/rollout/ep_rew_wrapped_mean | 176         |
|    gen/time/fps                    | 1868        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2686976     |
|    gen/train/approx_kl             | 0.021040678 |
|    gen/train/clip_fraction         | 0.171       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.67       |
|    gen/train/explained_variance    | 0.784       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 26.1        |
|    gen/train/n_updates             | 1630        |
|    gen/train/policy_gradient_loss  | -0.0149     |
|    gen/train/value_loss            | 30.5   

round:  34%|███▍      | 42/122 [13:50<25:04, 18.81s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0759      |
|    gen/rollout/ep_rew_wrapped_mean | 165         |
|    gen/time/fps                    | 1867        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2703360     |
|    gen/train/approx_kl             | 0.035080306 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.695      |
|    gen/train/explained_variance    | 0.89        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.72        |
|    gen/train/n_updates             | 1640        |
|    gen/train/policy_gradient_loss  | -0.0173     |
|    gen/train/value_loss            | 18.3   

round:  35%|███▌      | 43/122 [14:10<24:58, 18.96s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0761      |
|    gen/rollout/ep_rew_wrapped_mean | 168         |
|    gen/time/fps                    | 1784        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2719744     |
|    gen/train/approx_kl             | 0.022524705 |
|    gen/train/clip_fraction         | 0.176       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.712      |
|    gen/train/explained_variance    | 0.867       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.74        |
|    gen/train/n_updates             | 1650        |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 17.5   

round:  36%|███▌      | 44/122 [14:29<24:49, 19.09s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 274         |
|    gen/rollout/ep_rew_mean         | 0.0585      |
|    gen/rollout/ep_rew_wrapped_mean | 169         |
|    gen/time/fps                    | 1780        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2736128     |
|    gen/train/approx_kl             | 0.023007473 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.719      |
|    gen/train/explained_variance    | 0.812       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.09        |
|    gen/train/n_updates             | 1660        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 18.9   

round:  37%|███▋      | 45/122 [14:49<24:49, 19.35s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0686      |
|    gen/rollout/ep_rew_wrapped_mean | 172         |
|    gen/time/fps                    | 1772        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2752512     |
|    gen/train/approx_kl             | 0.019074023 |
|    gen/train/clip_fraction         | 0.154       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.649      |
|    gen/train/explained_variance    | 0.808       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.19        |
|    gen/train/n_updates             | 1670        |
|    gen/train/policy_gradient_loss  | -0.0128     |
|    gen/train/value_loss            | 18.2   

round:  38%|███▊      | 46/122 [15:08<24:21, 19.23s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0337      |
|    gen/rollout/ep_rew_wrapped_mean | 174         |
|    gen/time/fps                    | 1849        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2768896     |
|    gen/train/approx_kl             | 0.023050018 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.597      |
|    gen/train/explained_variance    | 0.76        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.14        |
|    gen/train/n_updates             | 1680        |
|    gen/train/policy_gradient_loss  | -0.018      |
|    gen/train/value_loss            | 11.3   

round:  39%|███▊      | 47/122 [15:27<23:46, 19.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0722      |
|    gen/rollout/ep_rew_wrapped_mean | 171         |
|    gen/time/fps                    | 1848        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2785280     |
|    gen/train/approx_kl             | 0.023339584 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.655      |
|    gen/train/explained_variance    | 0.746       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.33        |
|    gen/train/n_updates             | 1690        |
|    gen/train/policy_gradient_loss  | -0.018      |
|    gen/train/value_loss            | 11.5   

round:  39%|███▉      | 48/122 [15:45<23:22, 18.95s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 269        |
|    gen/rollout/ep_rew_mean         | 0.0745     |
|    gen/rollout/ep_rew_wrapped_mean | 154        |
|    gen/time/fps                    | 1843       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2801664    |
|    gen/train/approx_kl             | 0.02088346 |
|    gen/train/clip_fraction         | 0.157      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.633     |
|    gen/train/explained_variance    | 0.596      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.22       |
|    gen/train/n_updates             | 1700       |
|    gen/train/policy_gradient_loss  | -0.0138    |
|    gen/train/value_loss            | 16.3       |
------------

round:  40%|████      | 49/122 [16:04<22:57, 18.87s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0527      |
|    gen/rollout/ep_rew_wrapped_mean | 156         |
|    gen/time/fps                    | 1753        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2818048     |
|    gen/train/approx_kl             | 0.019271044 |
|    gen/train/clip_fraction         | 0.15        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.631      |
|    gen/train/explained_variance    | 0.59        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.82        |
|    gen/train/n_updates             | 1710        |
|    gen/train/policy_gradient_loss  | -0.0126     |
|    gen/train/value_loss            | 15.6   

round:  41%|████      | 50/122 [16:23<22:49, 19.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0395      |
|    gen/rollout/ep_rew_wrapped_mean | 163         |
|    gen/time/fps                    | 1766        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2834432     |
|    gen/train/approx_kl             | 0.030032862 |
|    gen/train/clip_fraction         | 0.198       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.651      |
|    gen/train/explained_variance    | 0.63        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.43        |
|    gen/train/n_updates             | 1720        |
|    gen/train/policy_gradient_loss  | -0.0176     |
|    gen/train/value_loss            | 9.75   

round:  42%|████▏     | 51/122 [16:44<23:01, 19.46s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 272        |
|    gen/rollout/ep_rew_mean         | 0.0588     |
|    gen/rollout/ep_rew_wrapped_mean | 167        |
|    gen/time/fps                    | 1681       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2850816    |
|    gen/train/approx_kl             | 0.02542551 |
|    gen/train/clip_fraction         | 0.182      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.621     |
|    gen/train/explained_variance    | 0.624      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 5.02       |
|    gen/train/n_updates             | 1730       |
|    gen/train/policy_gradient_loss  | -0.0154    |
|    gen/train/value_loss            | 14.3       |
------------

round:  43%|████▎     | 52/122 [17:04<22:56, 19.67s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 279        |
|    gen/rollout/ep_rew_mean         | 0.0331     |
|    gen/rollout/ep_rew_wrapped_mean | 167        |
|    gen/time/fps                    | 1802       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2867200    |
|    gen/train/approx_kl             | 0.02750542 |
|    gen/train/clip_fraction         | 0.186      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.653     |
|    gen/train/explained_variance    | 0.761      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 11.4       |
|    gen/train/n_updates             | 1740       |
|    gen/train/policy_gradient_loss  | -0.0168    |
|    gen/train/value_loss            | 13.8       |
------------

round:  43%|████▎     | 53/122 [17:23<22:22, 19.45s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 283        |
|    gen/rollout/ep_rew_mean         | 0.0221     |
|    gen/rollout/ep_rew_wrapped_mean | 170        |
|    gen/time/fps                    | 1848       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2883584    |
|    gen/train/approx_kl             | 0.02438039 |
|    gen/train/clip_fraction         | 0.194      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.671     |
|    gen/train/explained_variance    | 0.73       |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 21.5       |
|    gen/train/n_updates             | 1750       |
|    gen/train/policy_gradient_loss  | -0.0136    |
|    gen/train/value_loss            | 14.7       |
------------

round:  44%|████▍     | 54/122 [17:42<21:49, 19.26s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 282         |
|    gen/rollout/ep_rew_mean         | 0.0235      |
|    gen/rollout/ep_rew_wrapped_mean | 168         |
|    gen/time/fps                    | 1817        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2899968     |
|    gen/train/approx_kl             | 0.027164275 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.683      |
|    gen/train/explained_variance    | 0.813       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.05        |
|    gen/train/n_updates             | 1760        |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 11.9   

round:  45%|████▌     | 55/122 [18:01<21:28, 19.23s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.0578      |
|    gen/rollout/ep_rew_wrapped_mean | 174         |
|    gen/time/fps                    | 1755        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2916352     |
|    gen/train/approx_kl             | 0.026124222 |
|    gen/train/clip_fraction         | 0.19        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.696      |
|    gen/train/explained_variance    | 0.866       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 16.3        |
|    gen/train/n_updates             | 1770        |
|    gen/train/policy_gradient_loss  | -0.0157     |
|    gen/train/value_loss            | 11.8   

round:  46%|████▌     | 56/122 [18:21<21:21, 19.42s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 277         |
|    gen/rollout/ep_rew_mean         | 0.0449      |
|    gen/rollout/ep_rew_wrapped_mean | 179         |
|    gen/time/fps                    | 1703        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2932736     |
|    gen/train/approx_kl             | 0.018649254 |
|    gen/train/clip_fraction         | 0.153       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.635      |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.19        |
|    gen/train/n_updates             | 1780        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 15.4   

round:  47%|████▋     | 57/122 [18:42<21:30, 19.86s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0356      |
|    gen/rollout/ep_rew_wrapped_mean | 170         |
|    gen/time/fps                    | 1820        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2949120     |
|    gen/train/approx_kl             | 0.022052443 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.674      |
|    gen/train/explained_variance    | 0.915       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.78        |
|    gen/train/n_updates             | 1790        |
|    gen/train/policy_gradient_loss  | -0.0142     |
|    gen/train/value_loss            | 14.9   

round:  48%|████▊     | 58/122 [19:01<20:51, 19.56s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 283         |
|    gen/rollout/ep_rew_mean         | 0.0248      |
|    gen/rollout/ep_rew_wrapped_mean | 173         |
|    gen/time/fps                    | 1763        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2965504     |
|    gen/train/approx_kl             | 0.025076883 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.666      |
|    gen/train/explained_variance    | 0.93        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.9         |
|    gen/train/n_updates             | 1800        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 20.3   

round:  48%|████▊     | 59/122 [19:20<20:28, 19.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 271         |
|    gen/rollout/ep_rew_mean         | 0.0714      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1713        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2981888     |
|    gen/train/approx_kl             | 0.028555714 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.687      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.54        |
|    gen/train/n_updates             | 1810        |
|    gen/train/policy_gradient_loss  | -0.019      |
|    gen/train/value_loss            | 15.8   

round:  49%|████▉     | 60/122 [19:40<20:11, 19.54s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 273        |
|    gen/rollout/ep_rew_mean         | 0.0628     |
|    gen/rollout/ep_rew_wrapped_mean | 155        |
|    gen/time/fps                    | 1747       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2998272    |
|    gen/train/approx_kl             | 0.02617328 |
|    gen/train/clip_fraction         | 0.19       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.681     |
|    gen/train/explained_variance    | 0.928      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 3          |
|    gen/train/n_updates             | 1820       |
|    gen/train/policy_gradient_loss  | -0.0181    |
|    gen/train/value_loss            | 18.6       |
------------

round:  50%|█████     | 61/122 [20:00<20:17, 19.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 271         |
|    gen/rollout/ep_rew_mean         | 0.068       |
|    gen/rollout/ep_rew_wrapped_mean | 150         |
|    gen/time/fps                    | 1790        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3014656     |
|    gen/train/approx_kl             | 0.022849247 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.718      |
|    gen/train/explained_variance    | 0.852       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 11.2        |
|    gen/train/n_updates             | 1830        |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 19.9   

round:  51%|█████     | 62/122 [20:21<20:03, 20.06s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 265        |
|    gen/rollout/ep_rew_mean         | 0.088      |
|    gen/rollout/ep_rew_wrapped_mean | 154        |
|    gen/time/fps                    | 1784       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3031040    |
|    gen/train/approx_kl             | 0.01891952 |
|    gen/train/clip_fraction         | 0.169      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.676     |
|    gen/train/explained_variance    | 0.871      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.65       |
|    gen/train/n_updates             | 1840       |
|    gen/train/policy_gradient_loss  | -0.0106    |
|    gen/train/value_loss            | 20.1       |
------------

round:  52%|█████▏    | 63/122 [20:41<19:41, 20.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 266         |
|    gen/rollout/ep_rew_mean         | 0.0874      |
|    gen/rollout/ep_rew_wrapped_mean | 147         |
|    gen/time/fps                    | 1700        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3047424     |
|    gen/train/approx_kl             | 0.021936255 |
|    gen/train/clip_fraction         | 0.181       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.662      |
|    gen/train/explained_variance    | 0.831       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.52        |
|    gen/train/n_updates             | 1850        |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 19.9   

round:  52%|█████▏    | 64/122 [21:01<19:32, 20.21s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 266         |
|    gen/rollout/ep_rew_mean         | 0.0819      |
|    gen/rollout/ep_rew_wrapped_mean | 135         |
|    gen/time/fps                    | 1779        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3063808     |
|    gen/train/approx_kl             | 0.026042184 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.656      |
|    gen/train/explained_variance    | 0.837       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.99        |
|    gen/train/n_updates             | 1860        |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 14.7   

round:  53%|█████▎    | 65/122 [21:22<19:12, 20.23s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 263         |
|    gen/rollout/ep_rew_mean         | 0.093       |
|    gen/rollout/ep_rew_wrapped_mean | 126         |
|    gen/time/fps                    | 1760        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3080192     |
|    gen/train/approx_kl             | 0.020749817 |
|    gen/train/clip_fraction         | 0.17        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.641      |
|    gen/train/explained_variance    | 0.726       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.35        |
|    gen/train/n_updates             | 1870        |
|    gen/train/policy_gradient_loss  | -0.016      |
|    gen/train/value_loss            | 16.2   

round:  54%|█████▍    | 66/122 [21:42<18:49, 20.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 258         |
|    gen/rollout/ep_rew_mean         | 0.115       |
|    gen/rollout/ep_rew_wrapped_mean | 126         |
|    gen/time/fps                    | 1743        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3096576     |
|    gen/train/approx_kl             | 0.031485572 |
|    gen/train/clip_fraction         | 0.207       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.64       |
|    gen/train/explained_variance    | 0.79        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.45        |
|    gen/train/n_updates             | 1880        |
|    gen/train/policy_gradient_loss  | -0.0192     |
|    gen/train/value_loss            | 15.8   

round:  55%|█████▍    | 67/122 [22:02<18:34, 20.26s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 266         |
|    gen/rollout/ep_rew_mean         | 0.0857      |
|    gen/rollout/ep_rew_wrapped_mean | 139         |
|    gen/time/fps                    | 1787        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3112960     |
|    gen/train/approx_kl             | 0.021808505 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.669      |
|    gen/train/explained_variance    | 0.713       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.51        |
|    gen/train/n_updates             | 1890        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 19.3   

round:  56%|█████▌    | 68/122 [22:21<17:57, 19.96s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 266        |
|    gen/rollout/ep_rew_mean         | 0.0866     |
|    gen/rollout/ep_rew_wrapped_mean | 171        |
|    gen/time/fps                    | 1805       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3129344    |
|    gen/train/approx_kl             | 0.02830233 |
|    gen/train/clip_fraction         | 0.172      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.601     |
|    gen/train/explained_variance    | 0.817      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 6.05       |
|    gen/train/n_updates             | 1900       |
|    gen/train/policy_gradient_loss  | -0.015     |
|    gen/train/value_loss            | 16.7       |
------------

round:  57%|█████▋    | 69/122 [22:41<17:36, 19.93s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0747      |
|    gen/rollout/ep_rew_wrapped_mean | 182         |
|    gen/time/fps                    | 1774        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3145728     |
|    gen/train/approx_kl             | 0.027144546 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.557      |
|    gen/train/explained_variance    | 0.833       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 18.3        |
|    gen/train/n_updates             | 1910        |
|    gen/train/policy_gradient_loss  | -0.0183     |
|    gen/train/value_loss            | 14.1   

round:  57%|█████▋    | 70/122 [23:01<17:06, 19.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 272         |
|    gen/rollout/ep_rew_mean         | 0.0624      |
|    gen/rollout/ep_rew_wrapped_mean | 177         |
|    gen/time/fps                    | 1708        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3162112     |
|    gen/train/approx_kl             | 0.021517528 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.593      |
|    gen/train/explained_variance    | 0.862       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.53        |
|    gen/train/n_updates             | 1920        |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 14.8   

round:  58%|█████▊    | 71/122 [23:21<16:56, 19.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 266         |
|    gen/rollout/ep_rew_mean         | 0.0891      |
|    gen/rollout/ep_rew_wrapped_mean | 181         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3178496     |
|    gen/train/approx_kl             | 0.022057181 |
|    gen/train/clip_fraction         | 0.159       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.575      |
|    gen/train/explained_variance    | 0.845       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.17        |
|    gen/train/n_updates             | 1930        |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 18.8   

round:  59%|█████▉    | 72/122 [23:41<16:35, 19.91s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 258        |
|    gen/rollout/ep_rew_mean         | 0.115      |
|    gen/rollout/ep_rew_wrapped_mean | 179        |
|    gen/time/fps                    | 1782       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3194880    |
|    gen/train/approx_kl             | 0.01985796 |
|    gen/train/clip_fraction         | 0.158      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.632     |
|    gen/train/explained_variance    | 0.825      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 9.76       |
|    gen/train/n_updates             | 1940       |
|    gen/train/policy_gradient_loss  | -0.0146    |
|    gen/train/value_loss            | 25.9       |
------------

round:  60%|█████▉    | 73/122 [24:00<16:04, 19.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.056       |
|    gen/rollout/ep_rew_wrapped_mean | 175         |
|    gen/time/fps                    | 1776        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3211264     |
|    gen/train/approx_kl             | 0.025741475 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.627      |
|    gen/train/explained_variance    | 0.795       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.75        |
|    gen/train/n_updates             | 1950        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 19.4   

round:  61%|██████    | 74/122 [24:19<15:41, 19.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0712      |
|    gen/rollout/ep_rew_wrapped_mean | 196         |
|    gen/time/fps                    | 1706        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3227648     |
|    gen/train/approx_kl             | 0.021017687 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.638      |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.89        |
|    gen/train/n_updates             | 1960        |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 16.5   

round:  61%|██████▏   | 75/122 [24:39<15:19, 19.56s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.0574      |
|    gen/rollout/ep_rew_wrapped_mean | 179         |
|    gen/time/fps                    | 1829        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3244032     |
|    gen/train/approx_kl             | 0.019420264 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.603      |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.39        |
|    gen/train/n_updates             | 1970        |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 20.3   

round:  62%|██████▏   | 76/122 [24:58<14:48, 19.32s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0538      |
|    gen/rollout/ep_rew_wrapped_mean | 156         |
|    gen/time/fps                    | 1818        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3260416     |
|    gen/train/approx_kl             | 0.018282529 |
|    gen/train/clip_fraction         | 0.152       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.577      |
|    gen/train/explained_variance    | 0.913       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 41.4        |
|    gen/train/n_updates             | 1980        |
|    gen/train/policy_gradient_loss  | -0.013      |
|    gen/train/value_loss            | 18.6   

round:  63%|██████▎   | 77/122 [25:17<14:29, 19.32s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0377      |
|    gen/rollout/ep_rew_wrapped_mean | 146         |
|    gen/time/fps                    | 1722        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3276800     |
|    gen/train/approx_kl             | 0.019373724 |
|    gen/train/clip_fraction         | 0.144       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.567      |
|    gen/train/explained_variance    | 0.836       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.33        |
|    gen/train/n_updates             | 1990        |
|    gen/train/policy_gradient_loss  | -0.0112     |
|    gen/train/value_loss            | 18.2   

round:  64%|██████▍   | 78/122 [25:37<14:20, 19.56s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 278        |
|    gen/rollout/ep_rew_mean         | 0.0379     |
|    gen/rollout/ep_rew_wrapped_mean | 155        |
|    gen/time/fps                    | 1676       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3293184    |
|    gen/train/approx_kl             | 0.43946457 |
|    gen/train/clip_fraction         | 0.172      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.568     |
|    gen/train/explained_variance    | 0.836      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 3.75       |
|    gen/train/n_updates             | 2000       |
|    gen/train/policy_gradient_loss  | 0.0155     |
|    gen/train/value_loss            | 14.4       |
------------

round:  65%|██████▍   | 79/122 [25:58<14:23, 20.09s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 280        |
|    gen/rollout/ep_rew_mean         | 0.0306     |
|    gen/rollout/ep_rew_wrapped_mean | 180        |
|    gen/time/fps                    | 1705       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3309568    |
|    gen/train/approx_kl             | 0.04160344 |
|    gen/train/clip_fraction         | 0.177      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.575     |
|    gen/train/explained_variance    | 0.893      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 9.86       |
|    gen/train/n_updates             | 2010       |
|    gen/train/policy_gradient_loss  | -0.0178    |
|    gen/train/value_loss            | 13         |
------------

round:  66%|██████▌   | 80/122 [26:19<14:08, 20.20s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0376      |
|    gen/rollout/ep_rew_wrapped_mean | 201         |
|    gen/time/fps                    | 1724        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3325952     |
|    gen/train/approx_kl             | 0.023565486 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.553      |
|    gen/train/explained_variance    | 0.886       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.51        |
|    gen/train/n_updates             | 2020        |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 16.6   

round:  66%|██████▋   | 81/122 [26:39<13:51, 20.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0712      |
|    gen/rollout/ep_rew_wrapped_mean | 192         |
|    gen/time/fps                    | 1747        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3342336     |
|    gen/train/approx_kl             | 0.052593745 |
|    gen/train/clip_fraction         | 0.257       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.621      |
|    gen/train/explained_variance    | 0.932       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.4         |
|    gen/train/n_updates             | 2030        |
|    gen/train/policy_gradient_loss  | -0.0361     |
|    gen/train/value_loss            | 15.1   

round:  67%|██████▋   | 82/122 [26:59<13:29, 20.25s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 262         |
|    gen/rollout/ep_rew_mean         | 0.0981      |
|    gen/rollout/ep_rew_wrapped_mean | 148         |
|    gen/time/fps                    | 1729        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3358720     |
|    gen/train/approx_kl             | 0.020022135 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.652      |
|    gen/train/explained_variance    | 0.9         |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.98        |
|    gen/train/n_updates             | 2040        |
|    gen/train/policy_gradient_loss  | -0.0167     |
|    gen/train/value_loss            | 20.5   

round:  68%|██████▊   | 83/122 [27:20<13:11, 20.30s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 272         |
|    gen/rollout/ep_rew_mean         | 0.0634      |
|    gen/rollout/ep_rew_wrapped_mean | 144         |
|    gen/time/fps                    | 1708        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3375104     |
|    gen/train/approx_kl             | 0.022888046 |
|    gen/train/clip_fraction         | 0.166       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.643      |
|    gen/train/explained_variance    | 0.847       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.72        |
|    gen/train/n_updates             | 2050        |
|    gen/train/policy_gradient_loss  | -0.0115     |
|    gen/train/value_loss            | 25.2   

round:  69%|██████▉   | 84/122 [27:41<12:56, 20.42s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0718      |
|    gen/rollout/ep_rew_wrapped_mean | 151         |
|    gen/time/fps                    | 1694        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3391488     |
|    gen/train/approx_kl             | 0.023665074 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.626      |
|    gen/train/explained_variance    | 0.853       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.85        |
|    gen/train/n_updates             | 2060        |
|    gen/train/policy_gradient_loss  | -0.0142     |
|    gen/train/value_loss            | 14.6   

round:  70%|██████▉   | 85/122 [28:02<12:44, 20.65s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 268         |
|    gen/rollout/ep_rew_mean         | 0.0784      |
|    gen/rollout/ep_rew_wrapped_mean | 166         |
|    gen/time/fps                    | 1678        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3407872     |
|    gen/train/approx_kl             | 0.022827474 |
|    gen/train/clip_fraction         | 0.168       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.625      |
|    gen/train/explained_variance    | 0.867       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.46        |
|    gen/train/n_updates             | 2070        |
|    gen/train/policy_gradient_loss  | -0.0148     |
|    gen/train/value_loss            | 22.5   

round:  70%|███████   | 86/122 [28:22<12:22, 20.63s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0746      |
|    gen/rollout/ep_rew_wrapped_mean | 166         |
|    gen/time/fps                    | 1707        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3424256     |
|    gen/train/approx_kl             | 0.022837402 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.631      |
|    gen/train/explained_variance    | 0.843       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.89        |
|    gen/train/n_updates             | 2080        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 19.3   

round:  71%|███████▏  | 87/122 [28:43<12:00, 20.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 270         |
|    gen/rollout/ep_rew_mean         | 0.0731      |
|    gen/rollout/ep_rew_wrapped_mean | 139         |
|    gen/time/fps                    | 1728        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3440640     |
|    gen/train/approx_kl             | 0.023198558 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.646      |
|    gen/train/explained_variance    | 0.827       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.03        |
|    gen/train/n_updates             | 2090        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 17.2   

round:  72%|███████▏  | 88/122 [29:03<11:34, 20.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0764      |
|    gen/rollout/ep_rew_wrapped_mean | 123         |
|    gen/time/fps                    | 1721        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3457024     |
|    gen/train/approx_kl             | 0.022536147 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.6        |
|    gen/train/explained_variance    | 0.863       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.38        |
|    gen/train/n_updates             | 2100        |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 17.4   

round:  73%|███████▎  | 89/122 [29:23<11:09, 20.30s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 268         |
|    gen/rollout/ep_rew_mean         | 0.0764      |
|    gen/rollout/ep_rew_wrapped_mean | 131         |
|    gen/time/fps                    | 1697        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3473408     |
|    gen/train/approx_kl             | 0.027002484 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.569      |
|    gen/train/explained_variance    | 0.866       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.29        |
|    gen/train/n_updates             | 2110        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 14.2   

round:  74%|███████▍  | 90/122 [29:43<10:51, 20.35s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 271         |
|    gen/rollout/ep_rew_mean         | 0.063       |
|    gen/rollout/ep_rew_wrapped_mean | 148         |
|    gen/time/fps                    | 1724        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3489792     |
|    gen/train/approx_kl             | 0.034898773 |
|    gen/train/clip_fraction         | 0.187       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.622      |
|    gen/train/explained_variance    | 0.793       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.66        |
|    gen/train/n_updates             | 2120        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 20.9   

round:  75%|███████▍  | 91/122 [30:04<10:34, 20.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 274         |
|    gen/rollout/ep_rew_mean         | 0.0571      |
|    gen/rollout/ep_rew_wrapped_mean | 164         |
|    gen/time/fps                    | 1631        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 3506176     |
|    gen/train/approx_kl             | 0.024466226 |
|    gen/train/clip_fraction         | 0.177       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.621      |
|    gen/train/explained_variance    | 0.883       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.77        |
|    gen/train/n_updates             | 2130        |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 17.8   

round:  75%|███████▌  | 92/122 [30:25<10:17, 20.57s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 275        |
|    gen/rollout/ep_rew_mean         | 0.0488     |
|    gen/rollout/ep_rew_wrapped_mean | 167        |
|    gen/time/fps                    | 1729       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3522560    |
|    gen/train/approx_kl             | 0.02359862 |
|    gen/train/clip_fraction         | 0.169      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.57      |
|    gen/train/explained_variance    | 0.833      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 15.6       |
|    gen/train/n_updates             | 2140       |
|    gen/train/policy_gradient_loss  | -0.0154    |
|    gen/train/value_loss            | 20.8       |
------------

round:  76%|███████▌  | 93/122 [30:45<09:54, 20.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 279         |
|    gen/rollout/ep_rew_mean         | 0.0356      |
|    gen/rollout/ep_rew_wrapped_mean | 151         |
|    gen/time/fps                    | 1716        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3538944     |
|    gen/train/approx_kl             | 0.021336135 |
|    gen/train/clip_fraction         | 0.166       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.558      |
|    gen/train/explained_variance    | 0.878       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.04        |
|    gen/train/n_updates             | 2150        |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 14.5   

round:  77%|███████▋  | 94/122 [31:06<09:34, 20.53s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 280        |
|    gen/rollout/ep_rew_mean         | 0.0341     |
|    gen/rollout/ep_rew_wrapped_mean | 140        |
|    gen/time/fps                    | 1661       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3555328    |
|    gen/train/approx_kl             | 0.02961072 |
|    gen/train/clip_fraction         | 0.209      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.599     |
|    gen/train/explained_variance    | 0.837      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.8        |
|    gen/train/n_updates             | 2160       |
|    gen/train/policy_gradient_loss  | -0.0218    |
|    gen/train/value_loss            | 13.1       |
------------

round:  78%|███████▊  | 95/122 [31:27<09:17, 20.65s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 268         |
|    gen/rollout/ep_rew_mean         | 0.0799      |
|    gen/rollout/ep_rew_wrapped_mean | 134         |
|    gen/time/fps                    | 1701        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3571712     |
|    gen/train/approx_kl             | 0.021248246 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.598      |
|    gen/train/explained_variance    | 0.729       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.34        |
|    gen/train/n_updates             | 2170        |
|    gen/train/policy_gradient_loss  | -0.018      |
|    gen/train/value_loss            | 17.7   

round:  79%|███████▊  | 96/122 [31:47<08:57, 20.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 258         |
|    gen/rollout/ep_rew_mean         | 0.116       |
|    gen/rollout/ep_rew_wrapped_mean | 130         |
|    gen/time/fps                    | 1703        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3588096     |
|    gen/train/approx_kl             | 0.024205849 |
|    gen/train/clip_fraction         | 0.187       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.585      |
|    gen/train/explained_variance    | 0.764       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.56        |
|    gen/train/n_updates             | 2180        |
|    gen/train/policy_gradient_loss  | -0.0134     |
|    gen/train/value_loss            | 17.4   

round:  80%|███████▉  | 97/122 [32:08<08:36, 20.65s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0756      |
|    gen/rollout/ep_rew_wrapped_mean | 138         |
|    gen/time/fps                    | 1699        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3604480     |
|    gen/train/approx_kl             | 0.015831068 |
|    gen/train/clip_fraction         | 0.139       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.525      |
|    gen/train/explained_variance    | 0.711       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.77        |
|    gen/train/n_updates             | 2190        |
|    gen/train/policy_gradient_loss  | -0.0113     |
|    gen/train/value_loss            | 26     

round:  80%|████████  | 98/122 [32:29<08:14, 20.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.0607      |
|    gen/rollout/ep_rew_wrapped_mean | 151         |
|    gen/time/fps                    | 1732        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3620864     |
|    gen/train/approx_kl             | 0.025533246 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.56       |
|    gen/train/explained_variance    | 0.716       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.21        |
|    gen/train/n_updates             | 2200        |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 18.4   

round:  81%|████████  | 99/122 [32:49<07:53, 20.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 264         |
|    gen/rollout/ep_rew_mean         | 0.0929      |
|    gen/rollout/ep_rew_wrapped_mean | 148         |
|    gen/time/fps                    | 1689        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3637248     |
|    gen/train/approx_kl             | 0.020269766 |
|    gen/train/clip_fraction         | 0.153       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.568      |
|    gen/train/explained_variance    | 0.718       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.17        |
|    gen/train/n_updates             | 2210        |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 17.2   

round:  82%|████████▏ | 100/122 [33:10<07:33, 20.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 266         |
|    gen/rollout/ep_rew_mean         | 0.0834      |
|    gen/rollout/ep_rew_wrapped_mean | 141         |
|    gen/time/fps                    | 1719        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3653632     |
|    gen/train/approx_kl             | 0.025161346 |
|    gen/train/clip_fraction         | 0.176       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.579      |
|    gen/train/explained_variance    | 0.641       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.26        |
|    gen/train/n_updates             | 2220        |
|    gen/train/policy_gradient_loss  | -0.0137     |
|    gen/train/value_loss            | 18.8   

round:  83%|████████▎ | 101/122 [33:31<07:14, 20.71s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0506      |
|    gen/rollout/ep_rew_wrapped_mean | 146         |
|    gen/time/fps                    | 1710        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3670016     |
|    gen/train/approx_kl             | 0.029896252 |
|    gen/train/clip_fraction         | 0.194       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.575      |
|    gen/train/explained_variance    | 0.709       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 7.84        |
|    gen/train/n_updates             | 2230        |
|    gen/train/policy_gradient_loss  | -0.0167     |
|    gen/train/value_loss            | 14.7   

round:  84%|████████▎ | 102/122 [33:51<06:54, 20.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 278         |
|    gen/rollout/ep_rew_mean         | 0.0389      |
|    gen/rollout/ep_rew_wrapped_mean | 164         |
|    gen/time/fps                    | 1678        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3686400     |
|    gen/train/approx_kl             | 0.027276512 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.595      |
|    gen/train/explained_variance    | 0.727       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.63        |
|    gen/train/n_updates             | 2240        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 19.4   

round:  84%|████████▍ | 103/122 [34:13<06:37, 20.90s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0312      |
|    gen/rollout/ep_rew_wrapped_mean | 173         |
|    gen/time/fps                    | 1696        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3702784     |
|    gen/train/approx_kl             | 0.024230048 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.598      |
|    gen/train/explained_variance    | 0.795       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.14        |
|    gen/train/n_updates             | 2250        |
|    gen/train/policy_gradient_loss  | -0.0142     |
|    gen/train/value_loss            | 12.2   

round:  85%|████████▌ | 104/122 [34:34<06:16, 20.89s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 275         |
|    gen/rollout/ep_rew_mean         | 0.0516      |
|    gen/rollout/ep_rew_wrapped_mean | 157         |
|    gen/time/fps                    | 1684        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3719168     |
|    gen/train/approx_kl             | 0.022116682 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.584      |
|    gen/train/explained_variance    | 0.699       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 5.39        |
|    gen/train/n_updates             | 2260        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 13.4   

round:  86%|████████▌ | 105/122 [34:55<05:55, 20.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 265         |
|    gen/rollout/ep_rew_mean         | 0.0899      |
|    gen/rollout/ep_rew_wrapped_mean | 147         |
|    gen/time/fps                    | 1699        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3735552     |
|    gen/train/approx_kl             | 0.026718322 |
|    gen/train/clip_fraction         | 0.177       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.576      |
|    gen/train/explained_variance    | 0.588       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.22        |
|    gen/train/n_updates             | 2270        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 19.1   

round:  87%|████████▋ | 106/122 [35:16<05:35, 20.96s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 258         |
|    gen/rollout/ep_rew_mean         | 0.116       |
|    gen/rollout/ep_rew_wrapped_mean | 143         |
|    gen/time/fps                    | 1711        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3751936     |
|    gen/train/approx_kl             | 0.024296694 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.565      |
|    gen/train/explained_variance    | 0.635       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 9.01        |
|    gen/train/n_updates             | 2280        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 17.4   

round:  88%|████████▊ | 107/122 [35:37<05:14, 20.94s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 274         |
|    gen/rollout/ep_rew_mean         | 0.0549      |
|    gen/rollout/ep_rew_wrapped_mean | 122         |
|    gen/time/fps                    | 1707        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3768320     |
|    gen/train/approx_kl             | 0.022850586 |
|    gen/train/clip_fraction         | 0.165       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.541      |
|    gen/train/explained_variance    | 0.649       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.73        |
|    gen/train/n_updates             | 2290        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 19.8   

round:  89%|████████▊ | 108/122 [35:58<04:53, 20.96s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 281         |
|    gen/rollout/ep_rew_mean         | 0.0303      |
|    gen/rollout/ep_rew_wrapped_mean | 129         |
|    gen/time/fps                    | 1721        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3784704     |
|    gen/train/approx_kl             | 0.022811532 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.598      |
|    gen/train/explained_variance    | 0.732       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 10.1        |
|    gen/train/n_updates             | 2300        |
|    gen/train/policy_gradient_loss  | -0.0125     |
|    gen/train/value_loss            | 9.93   

round:  89%|████████▉ | 109/122 [36:19<04:32, 20.96s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 268        |
|    gen/rollout/ep_rew_mean         | 0.079      |
|    gen/rollout/ep_rew_wrapped_mean | 134        |
|    gen/time/fps                    | 1652       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3801088    |
|    gen/train/approx_kl             | 0.03685295 |
|    gen/train/clip_fraction         | 0.197      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.558     |
|    gen/train/explained_variance    | 0.751      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 2.61       |
|    gen/train/n_updates             | 2310       |
|    gen/train/policy_gradient_loss  | -0.0175    |
|    gen/train/value_loss            | 10         |
------------

round:  90%|█████████ | 110/122 [36:40<04:13, 21.09s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 262         |
|    gen/rollout/ep_rew_mean         | 0.1         |
|    gen/rollout/ep_rew_wrapped_mean | 131         |
|    gen/time/fps                    | 1648        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3817472     |
|    gen/train/approx_kl             | 0.022705149 |
|    gen/train/clip_fraction         | 0.154       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.549      |
|    gen/train/explained_variance    | 0.72        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.83        |
|    gen/train/n_updates             | 2320        |
|    gen/train/policy_gradient_loss  | -0.0138     |
|    gen/train/value_loss            | 16.9   

round:  91%|█████████ | 111/122 [37:01<03:51, 21.04s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 262         |
|    gen/rollout/ep_rew_mean         | 0.104       |
|    gen/rollout/ep_rew_wrapped_mean | 144         |
|    gen/time/fps                    | 1721        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3833856     |
|    gen/train/approx_kl             | 0.025990482 |
|    gen/train/clip_fraction         | 0.194       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.572      |
|    gen/train/explained_variance    | 0.671       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 4.05        |
|    gen/train/n_updates             | 2330        |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 18     

round:  92%|█████████▏| 112/122 [37:21<03:28, 20.83s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 262         |
|    gen/rollout/ep_rew_mean         | 0.104       |
|    gen/rollout/ep_rew_wrapped_mean | 139         |
|    gen/time/fps                    | 1717        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3850240     |
|    gen/train/approx_kl             | 0.022741739 |
|    gen/train/clip_fraction         | 0.17        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.559      |
|    gen/train/explained_variance    | 0.719       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 8.38        |
|    gen/train/n_updates             | 2340        |
|    gen/train/policy_gradient_loss  | -0.0171     |
|    gen/train/value_loss            | 16     

round:  93%|█████████▎| 113/122 [37:42<03:06, 20.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 265         |
|    gen/rollout/ep_rew_mean         | 0.0928      |
|    gen/rollout/ep_rew_wrapped_mean | 140         |
|    gen/time/fps                    | 1745        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3866624     |
|    gen/train/approx_kl             | 0.025895037 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.589      |
|    gen/train/explained_variance    | 0.687       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.47        |
|    gen/train/n_updates             | 2350        |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 19.2   

round:  93%|█████████▎| 114/122 [38:03<02:46, 20.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 258         |
|    gen/rollout/ep_rew_mean         | 0.117       |
|    gen/rollout/ep_rew_wrapped_mean | 152         |
|    gen/time/fps                    | 1650        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3883008     |
|    gen/train/approx_kl             | 0.028376233 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.575      |
|    gen/train/explained_variance    | 0.769       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 1.99        |
|    gen/train/n_updates             | 2360        |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 17.7   

round:  94%|█████████▍| 115/122 [38:23<02:25, 20.79s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 268         |
|    gen/rollout/ep_rew_mean         | 0.0772      |
|    gen/rollout/ep_rew_wrapped_mean | 152         |
|    gen/time/fps                    | 1688        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3899392     |
|    gen/train/approx_kl             | 0.019857485 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.597      |
|    gen/train/explained_variance    | 0.766       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.53        |
|    gen/train/n_updates             | 2370        |
|    gen/train/policy_gradient_loss  | -0.0121     |
|    gen/train/value_loss            | 26.2   

round:  95%|█████████▌| 116/122 [38:44<02:04, 20.73s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 270        |
|    gen/rollout/ep_rew_mean         | 0.0692     |
|    gen/rollout/ep_rew_wrapped_mean | 170        |
|    gen/time/fps                    | 1736       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3915776    |
|    gen/train/approx_kl             | 0.02228106 |
|    gen/train/clip_fraction         | 0.17       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.547     |
|    gen/train/explained_variance    | 0.88       |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 16.4       |
|    gen/train/n_updates             | 2380       |
|    gen/train/policy_gradient_loss  | -0.0116    |
|    gen/train/value_loss            | 16.9       |
------------

round:  96%|█████████▌| 117/122 [39:04<01:43, 20.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 273         |
|    gen/rollout/ep_rew_mean         | 0.0603      |
|    gen/rollout/ep_rew_wrapped_mean | 145         |
|    gen/time/fps                    | 1746        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3932160     |
|    gen/train/approx_kl             | 0.034221064 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.559      |
|    gen/train/explained_variance    | 0.796       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 2.27        |
|    gen/train/n_updates             | 2390        |
|    gen/train/policy_gradient_loss  | -0.0233     |
|    gen/train/value_loss            | 13.3   

round:  97%|█████████▋| 118/122 [39:25<01:22, 20.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 265         |
|    gen/rollout/ep_rew_mean         | 0.0916      |
|    gen/rollout/ep_rew_wrapped_mean | 142         |
|    gen/time/fps                    | 1723        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3948544     |
|    gen/train/approx_kl             | 0.025396343 |
|    gen/train/clip_fraction         | 0.168       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.572      |
|    gen/train/explained_variance    | 0.706       |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 3.53        |
|    gen/train/n_updates             | 2400        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 14.3   

round:  98%|█████████▊| 119/122 [39:45<01:01, 20.39s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 262         |
|    gen/rollout/ep_rew_mean         | 0.1         |
|    gen/rollout/ep_rew_wrapped_mean | 158         |
|    gen/time/fps                    | 1682        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3964928     |
|    gen/train/approx_kl             | 0.026057918 |
|    gen/train/clip_fraction         | 0.176       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.604      |
|    gen/train/explained_variance    | 0.75        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 13          |
|    gen/train/n_updates             | 2410        |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 27.8   

round:  98%|█████████▊| 120/122 [40:06<00:41, 20.51s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 271        |
|    gen/rollout/ep_rew_mean         | 0.0703     |
|    gen/rollout/ep_rew_wrapped_mean | 182        |
|    gen/time/fps                    | 1767       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3981312    |
|    gen/train/approx_kl             | 0.02731313 |
|    gen/train/clip_fraction         | 0.189      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.568     |
|    gen/train/explained_variance    | 0.825      |
|    gen/train/learning_rate         | 0.00025    |
|    gen/train/loss                  | 4.99       |
|    gen/train/n_updates             | 2420       |
|    gen/train/policy_gradient_loss  | -0.0172    |
|    gen/train/value_loss            | 18.5       |
------------

round:  99%|█████████▉| 121/122 [40:25<00:20, 20.18s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 269         |
|    gen/rollout/ep_rew_mean         | 0.0759      |
|    gen/rollout/ep_rew_wrapped_mean | 183         |
|    gen/time/fps                    | 1778        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3997696     |
|    gen/train/approx_kl             | 0.026008941 |
|    gen/train/clip_fraction         | 0.2         |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.597      |
|    gen/train/explained_variance    | 0.83        |
|    gen/train/learning_rate         | 0.00025     |
|    gen/train/loss                  | 6.11        |
|    gen/train/n_updates             | 2430        |
|    gen/train/policy_gradient_loss  | -0.0188     |
|    gen/train/value_loss            | 21.7   

round: 100%|██████████| 122/122 [40:45<00:00, 20.05s/it]


In [533]:
compute_score(task=env, policy=second_policy)

Best score: 0.875
Average score: (0.192, 0.309)


array([0.      , 0.      , 0.      , 0.875   , 0.      , 0.465625,
       0.584375, 0.      , 0.      , 0.      ])

### Discussion

I used GAIL to solve this problem. This required that I play the game and record it so that the agent could learn from me, the expert. The agent was able to get upt to about 0.2 average reward. It performs pretty well, unless it gets stuck and gets 0 reward. This happens a lot, and is the reason why the agent doesn't do better. I believe this is because of the limited number of expert runs I give to the agent. I believe with more time on my hands, I could play the game more, and the agent would do much better.

## Problem 3
Solve the [Minigrid Blocked, Unlock and Pickup](https://minigrid.farama.org/environments/minigrid/BlockedUnlockPickupEnv/) task.

This problem is optional for COMP 442 students.
This problem is mandatory for COMP 552 students.

This problem is worth 05 points for COMP 552 students.

![](https://minigrid.farama.org/_images/BlockedUnlockPickupEnv.gif)

In [581]:
! cd Minigrid-master/minigrid/ ; python3 manual_control.py --env MiniGrid-BlockedUnlockPickup-v0

100717.86s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
down
down
Traceback (most recent call last):
  File "/Users/kmm12/Documents/documents-main/school/rice/fall_2023/comp_552/HW5/Minigrid-master/minigrid/manual_control.py", line 181, in <module>
    manual_control.start()
  File "/Users/kmm12/Documents/documents-main/school/rice/fall_2023/comp_552/HW5/Minigrid-master/minigrid/manual_control.py", line 39, in start
    for event in pygame.event.get():
                 ^^^^^^^^^^^^^^^^^^
pygame.error: video system not initialized


In [582]:
import pickle

pkl_file = open('traj_list.pkl', 'rb')

traj_list = pickle.load(pkl_file)

In [583]:
import numpy as np
from imitation.policies.serialize import load_policy
from imitation.util.util import make_vec_env
from minigrid.wrappers import RGBImgPartialObsWrapper

SEED = 42


third_task = make_vec_env(
    "MiniGrid-BlockedUnlockPickup-v0",
    rng=np.random.default_rng(SEED),
    n_envs=8,
    post_wrappers=[
        lambda env, _: FlatObsWrapper(env)
    ],  # needed for computing rollouts later
)

In [584]:

learner3 = PPO(
    env=third_task,
    policy=MlpPolicy,
    batch_size=64,
    ent_coef=0.0,
    learning_rate=0.0004,
    gamma=0.99,
    n_epochs=10,
    clip_range=0.2,
    gae_lambda=0.95,
    seed=SEED,
)

reward_net3 = BasicRewardNet(
    observation_space=third_task.observation_space,
    action_space=third_task.action_space,
    normalize_input_layer=RunningNorm,
)

gail_trainer3 = GAIL(
    demonstrations=traj_list,
    demo_batch_size=1024,
    gen_replay_buffer_capacity=512,
    n_disc_updates_per_round=8,
    venv=third_task,
    gen_algo=learner3,
    reward_net=reward_net3,
    allow_variable_horizon=True
)


# train the learner and evaluate again
gail_trainer3.train(1_000_000)




Running with `allow_variable_horizon` set to True. Some algorithms are biased towards shorter or longer episodes, which may significantly confound results. Additionally, even unbiased algorithms can exploit the information leak from the termination condition, producing spuriously high performance. See https://imitation.readthedocs.io/en/latest/getting-started/variable-horizon.html for more information.


round:   0%|          | 0/61 [00:00<?, ?it/s]

------------------------------------------
| raw/                        |          |
|    gen/rollout/ep_len_mean  | 576      |
|    gen/rollout/ep_rew_mean  | 0        |
|    gen/time/fps             | 1819     |
|    gen/time/iterations      | 1        |
|    gen/time/time_elapsed    | 9        |
|    gen/time/total_timesteps | 16384    |
------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.696    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
-

round:   2%|▏         | 1/61 [00:18<18:10, 18.18s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00282     |
|    gen/rollout/ep_rew_wrapped_mean | 358         |
|    gen/time/fps                    | 1899        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 32768       |
|    gen/train/approx_kl             | 0.013002751 |
|    gen/train/clip_fraction         | 0.148       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.94       |
|    gen/train/explained_variance    | -0.0644     |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.685       |
|    gen/train/n_updates             | 10          |
|    gen/train/policy_gradient_loss  | -0.00793    |
|    gen/train/value_loss            | 2.16   

round:   3%|▎         | 2/61 [00:35<17:33, 17.85s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0.00197     |
|    gen/rollout/ep_rew_wrapped_mean | 372         |
|    gen/time/fps                    | 1849        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 49152       |
|    gen/train/approx_kl             | 0.015437589 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.92       |
|    gen/train/explained_variance    | -0.0177     |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.65        |
|    gen/train/n_updates             | 20          |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 3.96   

round:   5%|▍         | 3/61 [00:53<17:20, 17.94s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0.00158     |
|    gen/rollout/ep_rew_wrapped_mean | 380         |
|    gen/time/fps                    | 1834        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 65536       |
|    gen/train/approx_kl             | 0.014836534 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.9        |
|    gen/train/explained_variance    | 0.406       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.483       |
|    gen/train/n_updates             | 30          |
|    gen/train/policy_gradient_loss  | -0.0171     |
|    gen/train/value_loss            | 2.85   

round:   7%|▋         | 4/61 [01:12<17:08, 18.04s/it]

-----------------------------------------------------
| raw/                               |              |
|    gen/rollout/ep_len_mean         | 576          |
|    gen/rollout/ep_rew_mean         | 0            |
|    gen/rollout/ep_rew_wrapped_mean | 405          |
|    gen/time/fps                    | 1829         |
|    gen/time/iterations             | 1            |
|    gen/time/time_elapsed           | 8            |
|    gen/time/total_timesteps        | 81920        |
|    gen/train/approx_kl             | 0.0146363955 |
|    gen/train/clip_fraction         | 0.179        |
|    gen/train/clip_range            | 0.2          |
|    gen/train/entropy_loss          | -1.86        |
|    gen/train/explained_variance    | 0.543        |
|    gen/train/learning_rate         | 0.0004       |
|    gen/train/loss                  | 1.27         |
|    gen/train/n_updates             | 40           |
|    gen/train/policy_gradient_loss  | -0.0166      |
|    gen/train/value_loss   

round:   8%|▊         | 5/61 [01:30<16:53, 18.10s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 576       |
|    gen/rollout/ep_rew_mean         | 0         |
|    gen/rollout/ep_rew_wrapped_mean | 442       |
|    gen/time/fps                    | 1831      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 98304     |
|    gen/train/approx_kl             | 0.0146423 |
|    gen/train/clip_fraction         | 0.134     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -1.84     |
|    gen/train/explained_variance    | 0.709     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 1.93      |
|    gen/train/n_updates             | 50        |
|    gen/train/policy_gradient_loss  | -0.0118   |
|    gen/train/value_loss            | 6.33      |
-------------------------------

round:  10%|▉         | 6/61 [01:48<16:38, 18.16s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 490         |
|    gen/time/fps                    | 1823        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 114688      |
|    gen/train/approx_kl             | 0.015863523 |
|    gen/train/clip_fraction         | 0.153       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.85       |
|    gen/train/explained_variance    | 0.82        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.32        |
|    gen/train/n_updates             | 60          |
|    gen/train/policy_gradient_loss  | -0.0147     |
|    gen/train/value_loss            | 10.8   

round:  11%|█▏        | 7/61 [02:06<16:20, 18.16s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 488         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 131072      |
|    gen/train/approx_kl             | 0.019552741 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.84       |
|    gen/train/explained_variance    | 0.911       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.51        |
|    gen/train/n_updates             | 70          |
|    gen/train/policy_gradient_loss  | -0.0212     |
|    gen/train/value_loss            | 15.8   

round:  13%|█▎        | 8/61 [02:24<16:04, 18.19s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 564         |
|    gen/time/fps                    | 1766        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 147456      |
|    gen/train/approx_kl             | 0.017265933 |
|    gen/train/clip_fraction         | 0.198       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.81       |
|    gen/train/explained_variance    | 0.839       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 16.6        |
|    gen/train/n_updates             | 80          |
|    gen/train/policy_gradient_loss  | -0.0206     |
|    gen/train/value_loss            | 60.3   

round:  15%|█▍        | 9/61 [02:43<15:55, 18.37s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 607         |
|    gen/time/fps                    | 1840        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 163840      |
|    gen/train/approx_kl             | 0.017227912 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.77       |
|    gen/train/explained_variance    | 0.931       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.23        |
|    gen/train/n_updates             | 90          |
|    gen/train/policy_gradient_loss  | -0.0199     |
|    gen/train/value_loss            | 42.2   

round:  16%|█▋        | 10/61 [03:01<15:33, 18.30s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 589         |
|    gen/time/fps                    | 1716        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 180224      |
|    gen/train/approx_kl             | 0.017870247 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.76       |
|    gen/train/explained_variance    | 0.907       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 22.4        |
|    gen/train/n_updates             | 100         |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 28.7   

round:  18%|█▊        | 11/61 [03:21<15:30, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 535         |
|    gen/time/fps                    | 1882        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 196608      |
|    gen/train/approx_kl             | 0.022335809 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.74       |
|    gen/train/explained_variance    | 0.964       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.9         |
|    gen/train/n_updates             | 110         |
|    gen/train/policy_gradient_loss  | -0.0195     |
|    gen/train/value_loss            | 16.9   

round:  20%|█▉        | 12/61 [03:39<15:03, 18.44s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 458        |
|    gen/time/fps                    | 1853       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 212992     |
|    gen/train/approx_kl             | 0.01647282 |
|    gen/train/clip_fraction         | 0.189      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.74      |
|    gen/train/explained_variance    | 0.97       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 3.63       |
|    gen/train/n_updates             | 120        |
|    gen/train/policy_gradient_loss  | -0.0152    |
|    gen/train/value_loss            | 14.4       |
------------

round:  21%|██▏       | 13/61 [03:57<14:38, 18.30s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 410         |
|    gen/time/fps                    | 1821        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 229376      |
|    gen/train/approx_kl             | 0.017060976 |
|    gen/train/clip_fraction         | 0.176       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.68       |
|    gen/train/explained_variance    | 0.935       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 31.7        |
|    gen/train/n_updates             | 130         |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 26.7   

round:  23%|██▎       | 14/61 [04:16<14:28, 18.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 396         |
|    gen/time/fps                    | 1927        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 245760      |
|    gen/train/approx_kl             | 0.025763227 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.66       |
|    gen/train/explained_variance    | 0.927       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.74        |
|    gen/train/n_updates             | 140         |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 14.4   

round:  25%|██▍       | 15/61 [04:33<14:00, 18.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 395         |
|    gen/time/fps                    | 1907        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 262144      |
|    gen/train/approx_kl             | 0.017859362 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.58       |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.75        |
|    gen/train/n_updates             | 150         |
|    gen/train/policy_gradient_loss  | -0.0128     |
|    gen/train/value_loss            | 10.5   

round:  26%|██▌       | 16/61 [04:51<13:35, 18.13s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 418         |
|    gen/time/fps                    | 1909        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 278528      |
|    gen/train/approx_kl             | 0.020240854 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.56       |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 22.4        |
|    gen/train/n_updates             | 160         |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 20.6   

round:  28%|██▊       | 17/61 [05:09<13:15, 18.07s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 445        |
|    gen/time/fps                    | 1874       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 294912     |
|    gen/train/approx_kl             | 0.03054373 |
|    gen/train/clip_fraction         | 0.229      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.52      |
|    gen/train/explained_variance    | 0.959      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 4.59       |
|    gen/train/n_updates             | 170        |
|    gen/train/policy_gradient_loss  | -0.0168    |
|    gen/train/value_loss            | 21.8       |
------------

round:  30%|██▉       | 18/61 [05:27<12:54, 18.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 539         |
|    gen/time/fps                    | 1737        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 311296      |
|    gen/train/approx_kl             | 0.017095678 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.53       |
|    gen/train/explained_variance    | 0.921       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 78.4        |
|    gen/train/n_updates             | 180         |
|    gen/train/policy_gradient_loss  | -0.0129     |
|    gen/train/value_loss            | 52.3   

round:  31%|███       | 19/61 [05:46<12:47, 18.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 589         |
|    gen/time/fps                    | 1757        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 327680      |
|    gen/train/approx_kl             | 0.023061153 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.6        |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.7        |
|    gen/train/n_updates             | 190         |
|    gen/train/policy_gradient_loss  | -0.0137     |
|    gen/train/value_loss            | 42.2   

round:  33%|███▎      | 20/61 [06:06<12:52, 18.83s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 618         |
|    gen/time/fps                    | 1690        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 344064      |
|    gen/train/approx_kl             | 0.023195911 |
|    gen/train/clip_fraction         | 0.231       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.54       |
|    gen/train/explained_variance    | 0.986       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.08        |
|    gen/train/n_updates             | 200         |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 14.2   

round:  34%|███▍      | 21/61 [06:25<12:36, 18.91s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 554         |
|    gen/time/fps                    | 1860        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 360448      |
|    gen/train/approx_kl             | 0.019278442 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.53       |
|    gen/train/explained_variance    | 0.972       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.36        |
|    gen/train/n_updates             | 210         |
|    gen/train/policy_gradient_loss  | -0.0106     |
|    gen/train/value_loss            | 9.93   

round:  36%|███▌      | 22/61 [06:44<12:16, 18.89s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 470         |
|    gen/time/fps                    | 1526        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 376832      |
|    gen/train/approx_kl             | 0.026240926 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.44       |
|    gen/train/explained_variance    | 0.958       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.6         |
|    gen/train/n_updates             | 220         |
|    gen/train/policy_gradient_loss  | -0.0115     |
|    gen/train/value_loss            | 16.3   

round:  38%|███▊      | 23/61 [07:05<12:22, 19.55s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 403         |
|    gen/time/fps                    | 1807        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 393216      |
|    gen/train/approx_kl             | 0.019114524 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.57       |
|    gen/train/explained_variance    | 0.946       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.63        |
|    gen/train/n_updates             | 230         |
|    gen/train/policy_gradient_loss  | -0.0117     |
|    gen/train/value_loss            | 11.7   

round:  39%|███▉      | 24/61 [07:24<11:59, 19.43s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 354        |
|    gen/time/fps                    | 1670       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 409600     |
|    gen/train/approx_kl             | 0.02299919 |
|    gen/train/clip_fraction         | 0.218      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.52      |
|    gen/train/explained_variance    | 0.943      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.924      |
|    gen/train/n_updates             | 240        |
|    gen/train/policy_gradient_loss  | -0.0126    |
|    gen/train/value_loss            | 3.72       |
------------

round:  41%|████      | 25/61 [07:44<11:43, 19.54s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 339         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 425984      |
|    gen/train/approx_kl             | 0.018954339 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.51       |
|    gen/train/explained_variance    | 0.946       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.47        |
|    gen/train/n_updates             | 250         |
|    gen/train/policy_gradient_loss  | -0.00991    |
|    gen/train/value_loss            | 6.33   

round:  43%|████▎     | 26/61 [08:03<11:20, 19.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 337         |
|    gen/time/fps                    | 1733        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 442368      |
|    gen/train/approx_kl             | 0.021976184 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.48       |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.1        |
|    gen/train/n_updates             | 260         |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 17.5   

round:  44%|████▍     | 27/61 [08:23<11:07, 19.63s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 354         |
|    gen/time/fps                    | 1795        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 458752      |
|    gen/train/approx_kl             | 0.019462086 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.5        |
|    gen/train/explained_variance    | 0.868       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.02        |
|    gen/train/n_updates             | 270         |
|    gen/train/policy_gradient_loss  | -0.0111     |
|    gen/train/value_loss            | 32.9   

round:  46%|████▌     | 28/61 [08:43<10:50, 19.70s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 382         |
|    gen/time/fps                    | 1644        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 475136      |
|    gen/train/approx_kl             | 0.026241038 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.39       |
|    gen/train/explained_variance    | 0.855       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.35        |
|    gen/train/n_updates             | 280         |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 35.3   

round:  48%|████▊     | 29/61 [09:04<10:44, 20.13s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 409         |
|    gen/time/fps                    | 1772        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 491520      |
|    gen/train/approx_kl             | 0.030211063 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.37       |
|    gen/train/explained_variance    | 0.914       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 17.8        |
|    gen/train/n_updates             | 290         |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 27.7   

round:  49%|████▉     | 30/61 [09:24<10:23, 20.10s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 475         |
|    gen/time/fps                    | 1817        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 507904      |
|    gen/train/approx_kl             | 0.016815221 |
|    gen/train/clip_fraction         | 0.146       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.32       |
|    gen/train/explained_variance    | 0.891       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 78.4        |
|    gen/train/n_updates             | 300         |
|    gen/train/policy_gradient_loss  | -0.00883    |
|    gen/train/value_loss            | 86.3   

round:  51%|█████     | 31/61 [09:43<09:47, 19.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 498         |
|    gen/time/fps                    | 1824        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 524288      |
|    gen/train/approx_kl             | 0.020036526 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.2        |
|    gen/train/explained_variance    | 0.913       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 19.7        |
|    gen/train/n_updates             | 310         |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 55.1   

round:  52%|█████▏    | 32/61 [10:02<09:24, 19.47s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 503        |
|    gen/time/fps                    | 1845       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 540672     |
|    gen/train/approx_kl             | 0.02037631 |
|    gen/train/clip_fraction         | 0.182      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.28      |
|    gen/train/explained_variance    | 0.937      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 7.63       |
|    gen/train/n_updates             | 320        |
|    gen/train/policy_gradient_loss  | -0.0126    |
|    gen/train/value_loss            | 54.5       |
------------

round:  54%|█████▍    | 33/61 [10:21<09:02, 19.37s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 498         |
|    gen/time/fps                    | 1960        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 557056      |
|    gen/train/approx_kl             | 0.022388924 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.17       |
|    gen/train/explained_variance    | 0.948       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 24.1        |
|    gen/train/n_updates             | 330         |
|    gen/train/policy_gradient_loss  | -0.00834    |
|    gen/train/value_loss            | 90.6   

round:  56%|█████▌    | 34/61 [10:39<08:31, 18.94s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 499        |
|    gen/time/fps                    | 1817       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 573440     |
|    gen/train/approx_kl             | 0.01756134 |
|    gen/train/clip_fraction         | 0.148      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.25      |
|    gen/train/explained_variance    | 0.946      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.85       |
|    gen/train/n_updates             | 340        |
|    gen/train/policy_gradient_loss  | -0.0104    |
|    gen/train/value_loss            | 82.4       |
------------

round:  57%|█████▋    | 35/61 [10:58<08:12, 18.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00241     |
|    gen/rollout/ep_rew_wrapped_mean | 488         |
|    gen/time/fps                    | 1929        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 589824      |
|    gen/train/approx_kl             | 0.020556463 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.32       |
|    gen/train/explained_variance    | 0.925       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.1         |
|    gen/train/n_updates             | 350         |
|    gen/train/policy_gradient_loss  | -0.0125     |
|    gen/train/value_loss            | 91.7   

round:  59%|█████▉    | 36/61 [11:16<07:49, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00241     |
|    gen/rollout/ep_rew_wrapped_mean | 485         |
|    gen/time/fps                    | 1880        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 606208      |
|    gen/train/approx_kl             | 0.018621579 |
|    gen/train/clip_fraction         | 0.179       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.18       |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.47        |
|    gen/train/n_updates             | 360         |
|    gen/train/policy_gradient_loss  | -0.0116     |
|    gen/train/value_loss            | 99.3   

round:  61%|██████    | 37/61 [11:35<07:28, 18.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00241     |
|    gen/rollout/ep_rew_wrapped_mean | 436         |
|    gen/time/fps                    | 1849        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 622592      |
|    gen/train/approx_kl             | 0.020162918 |
|    gen/train/clip_fraction         | 0.218       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.39       |
|    gen/train/explained_variance    | 0.856       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.08        |
|    gen/train/n_updates             | 370         |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 48.3   

round:  62%|██████▏   | 38/61 [11:55<07:19, 19.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 432         |
|    gen/time/fps                    | 1773        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 638976      |
|    gen/train/approx_kl             | 0.020651087 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.35       |
|    gen/train/explained_variance    | 0.933       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.17        |
|    gen/train/n_updates             | 380         |
|    gen/train/policy_gradient_loss  | -0.012      |
|    gen/train/value_loss            | 112    

round:  64%|██████▍   | 39/61 [12:15<07:04, 19.31s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 448         |
|    gen/time/fps                    | 1773        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 655360      |
|    gen/train/approx_kl             | 0.020005172 |
|    gen/train/clip_fraction         | 0.198       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.16       |
|    gen/train/explained_variance    | 0.93        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.45        |
|    gen/train/n_updates             | 390         |
|    gen/train/policy_gradient_loss  | -0.0125     |
|    gen/train/value_loss            | 106    

round:  66%|██████▌   | 40/61 [12:34<06:45, 19.33s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 446        |
|    gen/time/fps                    | 1831       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 671744     |
|    gen/train/approx_kl             | 0.01709501 |
|    gen/train/clip_fraction         | 0.204      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.18      |
|    gen/train/explained_variance    | 0.803      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.61       |
|    gen/train/n_updates             | 400        |
|    gen/train/policy_gradient_loss  | -0.019     |
|    gen/train/value_loss            | 20.3       |
------------

round:  67%|██████▋   | 41/61 [12:53<06:24, 19.20s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 452         |
|    gen/time/fps                    | 1805        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 688128      |
|    gen/train/approx_kl             | 0.019901406 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.07       |
|    gen/train/explained_variance    | 0.897       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.35        |
|    gen/train/n_updates             | 410         |
|    gen/train/policy_gradient_loss  | -0.0215     |
|    gen/train/value_loss            | 12.9   

round:  69%|██████▉   | 42/61 [13:13<06:07, 19.36s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 427        |
|    gen/time/fps                    | 1799       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 704512     |
|    gen/train/approx_kl             | 0.01966545 |
|    gen/train/clip_fraction         | 0.188      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.01      |
|    gen/train/explained_variance    | 0.733      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.91       |
|    gen/train/n_updates             | 420        |
|    gen/train/policy_gradient_loss  | -0.0128    |
|    gen/train/value_loss            | 26         |
------------

round:  70%|███████   | 43/61 [13:32<05:51, 19.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 415         |
|    gen/time/fps                    | 1687        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 720896      |
|    gen/train/approx_kl             | 0.021596234 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.864      |
|    gen/train/explained_variance    | 0.973       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.76        |
|    gen/train/n_updates             | 430         |
|    gen/train/policy_gradient_loss  | -0.0126     |
|    gen/train/value_loss            | 21.1   

round:  72%|███████▏  | 44/61 [13:53<05:37, 19.87s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 433        |
|    gen/time/fps                    | 1709       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 737280     |
|    gen/train/approx_kl             | 0.02769242 |
|    gen/train/clip_fraction         | 0.229      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.06      |
|    gen/train/explained_variance    | 0.942      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.38       |
|    gen/train/n_updates             | 440        |
|    gen/train/policy_gradient_loss  | -0.0191    |
|    gen/train/value_loss            | 3.13       |
------------

round:  74%|███████▍  | 45/61 [14:13<05:18, 19.90s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 414         |
|    gen/time/fps                    | 1792        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 753664      |
|    gen/train/approx_kl             | 0.020966025 |
|    gen/train/clip_fraction         | 0.211       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.12       |
|    gen/train/explained_variance    | 0.914       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.29        |
|    gen/train/n_updates             | 450         |
|    gen/train/policy_gradient_loss  | -0.0163     |
|    gen/train/value_loss            | 10.8   

round:  75%|███████▌  | 46/61 [14:34<05:03, 20.20s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 413         |
|    gen/time/fps                    | 1647        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 770048      |
|    gen/train/approx_kl             | 0.024349628 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.01       |
|    gen/train/explained_variance    | 0.915       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.56        |
|    gen/train/n_updates             | 460         |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 32.8   

round:  77%|███████▋  | 47/61 [14:55<04:46, 20.46s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 400        |
|    gen/time/fps                    | 1735       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 786432     |
|    gen/train/approx_kl             | 0.03800682 |
|    gen/train/clip_fraction         | 0.276      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.16      |
|    gen/train/explained_variance    | 0.91       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.15       |
|    gen/train/n_updates             | 470        |
|    gen/train/policy_gradient_loss  | -0.0226    |
|    gen/train/value_loss            | 3.68       |
------------

round:  79%|███████▊  | 48/61 [15:15<04:25, 20.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 403         |
|    gen/time/fps                    | 1632        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 802816      |
|    gen/train/approx_kl             | 0.022579532 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.21       |
|    gen/train/explained_variance    | 0.923       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.2         |
|    gen/train/n_updates             | 480         |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 19.4   

round:  80%|████████  | 49/61 [15:37<04:10, 20.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 414         |
|    gen/time/fps                    | 1674        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 819200      |
|    gen/train/approx_kl             | 0.026772777 |
|    gen/train/clip_fraction         | 0.228       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.21       |
|    gen/train/explained_variance    | 0.934       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.38        |
|    gen/train/n_updates             | 490         |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 45.2   

round:  82%|████████▏ | 50/61 [15:58<03:48, 20.75s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 438         |
|    gen/time/fps                    | 1650        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 835584      |
|    gen/train/approx_kl             | 0.021121167 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.16       |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.02        |
|    gen/train/n_updates             | 500         |
|    gen/train/policy_gradient_loss  | -0.00846    |
|    gen/train/value_loss            | 58.4   

round:  84%|████████▎ | 51/61 [16:18<03:25, 20.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 420         |
|    gen/time/fps                    | 1683        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 851968      |
|    gen/train/approx_kl             | 0.024854962 |
|    gen/train/clip_fraction         | 0.245       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.21       |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.81        |
|    gen/train/n_updates             | 510         |
|    gen/train/policy_gradient_loss  | -0.0129     |
|    gen/train/value_loss            | 6.08   

round:  85%|████████▌ | 52/61 [16:38<03:03, 20.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 408         |
|    gen/time/fps                    | 1704        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 868352      |
|    gen/train/approx_kl             | 0.027294014 |
|    gen/train/clip_fraction         | 0.281       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.22       |
|    gen/train/explained_variance    | 0.943       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.08        |
|    gen/train/n_updates             | 520         |
|    gen/train/policy_gradient_loss  | -0.0207     |
|    gen/train/value_loss            | 3.51   

round:  87%|████████▋ | 53/61 [16:58<02:41, 20.23s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 378         |
|    gen/time/fps                    | 1739        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 884736      |
|    gen/train/approx_kl             | 0.024176907 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.14       |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.821       |
|    gen/train/n_updates             | 530         |
|    gen/train/policy_gradient_loss  | -0.0086     |
|    gen/train/value_loss            | 26.3   

round:  89%|████████▊ | 54/61 [17:18<02:21, 20.14s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 360         |
|    gen/time/fps                    | 1705        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 901120      |
|    gen/train/approx_kl             | 0.022645915 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.21       |
|    gen/train/explained_variance    | 0.858       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.968       |
|    gen/train/n_updates             | 540         |
|    gen/train/policy_gradient_loss  | -0.0102     |
|    gen/train/value_loss            | 57     

round:  90%|█████████ | 55/61 [17:38<02:00, 20.14s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 427         |
|    gen/time/fps                    | 1746        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 917504      |
|    gen/train/approx_kl             | 0.021528998 |
|    gen/train/clip_fraction         | 0.162       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.21       |
|    gen/train/explained_variance    | 0.901       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 12.4        |
|    gen/train/n_updates             | 550         |
|    gen/train/policy_gradient_loss  | -0.00911    |
|    gen/train/value_loss            | 165    

round:  92%|█████████▏| 56/61 [17:58<01:40, 20.16s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0.00117    |
|    gen/rollout/ep_rew_wrapped_mean | 498        |
|    gen/time/fps                    | 1734       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 933888     |
|    gen/train/approx_kl             | 0.02628513 |
|    gen/train/clip_fraction         | 0.187      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.14      |
|    gen/train/explained_variance    | 0.909      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 30.5       |
|    gen/train/n_updates             | 560        |
|    gen/train/policy_gradient_loss  | -0.0108    |
|    gen/train/value_loss            | 252        |
------------

round:  93%|█████████▎| 57/61 [18:19<01:21, 20.30s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0.00117     |
|    gen/rollout/ep_rew_wrapped_mean | 581         |
|    gen/time/fps                    | 1775        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 950272      |
|    gen/train/approx_kl             | 0.020338587 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.11       |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 283         |
|    gen/train/n_updates             | 570         |
|    gen/train/policy_gradient_loss  | -0.00907    |
|    gen/train/value_loss            | 158    

round:  95%|█████████▌| 58/61 [18:37<00:59, 19.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0.00117     |
|    gen/rollout/ep_rew_wrapped_mean | 587         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 966656      |
|    gen/train/approx_kl             | 0.022100486 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.09       |
|    gen/train/explained_variance    | 0.933       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 26.7        |
|    gen/train/n_updates             | 580         |
|    gen/train/policy_gradient_loss  | -0.0254     |
|    gen/train/value_loss            | 126    

round:  97%|█████████▋| 59/61 [18:57<00:39, 19.80s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 541        |
|    gen/time/fps                    | 1757       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 983040     |
|    gen/train/approx_kl             | 0.01538196 |
|    gen/train/clip_fraction         | 0.145      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.05      |
|    gen/train/explained_variance    | 0.868      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 81.5       |
|    gen/train/n_updates             | 590        |
|    gen/train/policy_gradient_loss  | -0.0108    |
|    gen/train/value_loss            | 184        |
------------

round:  98%|█████████▊| 60/61 [19:17<00:19, 19.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 447         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 999424      |
|    gen/train/approx_kl             | 0.020091439 |
|    gen/train/clip_fraction         | 0.18        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.07       |
|    gen/train/explained_variance    | 0.887       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 60.8        |
|    gen/train/n_updates             | 600         |
|    gen/train/policy_gradient_loss  | -0.0118     |
|    gen/train/value_loss            | 162    

round: 100%|██████████| 61/61 [19:36<00:00, 19.29s/it]


In [585]:
def third_policy(observation):
  ######## PUT YOUR CODE HERE ########
  action, _states = learner3.predict(observation)
  ######## PUT YOUR CODE HERE ########
  return action

In [586]:
env = gym.make("MiniGrid-BlockedUnlockPickup-v0")
env = FlatObsWrapper(env)

In [594]:
compute_score(task=env, policy=third_policy)

Best score: 0.2
Average score: (0.02, 0.06)


array([0. , 0.2, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])

### Nice :)

In [595]:
gail_trainer3.train(5_000_000)

round:   0%|          | 0/305 [00:00<?, ?it/s]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 382         |
|    gen/time/fps                    | 1890        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1015808     |
|    gen/train/approx_kl             | 0.021985818 |
|    gen/train/clip_fraction         | 0.241       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.11       |
|    gen/train/explained_variance    | 0.854       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.89        |
|    gen/train/n_updates             | 610         |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 18.1   

round:   0%|          | 1/305 [00:18<1:33:26, 18.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0102      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1032192     |
|    gen/train/approx_kl             | 0.020275325 |
|    gen/train/clip_fraction         | 0.203       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.14       |
|    gen/train/explained_variance    | 0.871       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.25        |
|    gen/train/n_updates             | 620         |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 30.1   

round:   1%|          | 2/305 [00:38<1:36:41, 19.15s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0102      |
|    gen/rollout/ep_rew_wrapped_mean | 352         |
|    gen/time/fps                    | 1803        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1048576     |
|    gen/train/approx_kl             | 0.023161177 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.06       |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.64        |
|    gen/train/n_updates             | 630         |
|    gen/train/policy_gradient_loss  | -0.00851    |
|    gen/train/value_loss            | 57.5   

round:   1%|          | 3/305 [00:57<1:36:28, 19.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0102      |
|    gen/rollout/ep_rew_wrapped_mean | 386         |
|    gen/time/fps                    | 1935        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1064960     |
|    gen/train/approx_kl             | 0.018382307 |
|    gen/train/clip_fraction         | 0.178       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.1        |
|    gen/train/explained_variance    | 0.741       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.94        |
|    gen/train/n_updates             | 640         |
|    gen/train/policy_gradient_loss  | -0.0083     |
|    gen/train/value_loss            | 56.9   

round:   1%|▏         | 4/305 [01:15<1:35:00, 18.94s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0          |
|    gen/rollout/ep_rew_wrapped_mean | 435        |
|    gen/time/fps                    | 1788       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1081344    |
|    gen/train/approx_kl             | 0.02126476 |
|    gen/train/clip_fraction         | 0.191      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.12      |
|    gen/train/explained_variance    | 0.87       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.03       |
|    gen/train/n_updates             | 650        |
|    gen/train/policy_gradient_loss  | -0.0065    |
|    gen/train/value_loss            | 59.2       |
------------

round:   2%|▏         | 5/305 [01:35<1:35:04, 19.01s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 466         |
|    gen/time/fps                    | 1815        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1097728     |
|    gen/train/approx_kl             | 0.024546273 |
|    gen/train/clip_fraction         | 0.217       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.14       |
|    gen/train/explained_variance    | 0.851       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.9        |
|    gen/train/n_updates             | 660         |
|    gen/train/policy_gradient_loss  | -0.00876    |
|    gen/train/value_loss            | 92.5   

round:   2%|▏         | 6/305 [01:53<1:34:28, 18.96s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 475         |
|    gen/time/fps                    | 1814        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1114112     |
|    gen/train/approx_kl             | 0.023317713 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.11       |
|    gen/train/explained_variance    | 0.872       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.72        |
|    gen/train/n_updates             | 670         |
|    gen/train/policy_gradient_loss  | -0.0127     |
|    gen/train/value_loss            | 65     

round:   2%|▏         | 7/305 [02:12<1:33:33, 18.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00684     |
|    gen/rollout/ep_rew_wrapped_mean | 541         |
|    gen/time/fps                    | 1909        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1130496     |
|    gen/train/approx_kl             | 0.016099911 |
|    gen/train/clip_fraction         | 0.137       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.05       |
|    gen/train/explained_variance    | 0.842       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 13.5        |
|    gen/train/n_updates             | 680         |
|    gen/train/policy_gradient_loss  | -0.0097     |
|    gen/train/value_loss            | 159    

round:   3%|▎         | 8/305 [02:30<1:31:50, 18.56s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 572       |
|    gen/rollout/ep_rew_mean         | 0.00684   |
|    gen/rollout/ep_rew_wrapped_mean | 518       |
|    gen/time/fps                    | 1861      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 1146880   |
|    gen/train/approx_kl             | 0.0187876 |
|    gen/train/clip_fraction         | 0.2       |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -1.13     |
|    gen/train/explained_variance    | 0.811     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 5.01      |
|    gen/train/n_updates             | 690       |
|    gen/train/policy_gradient_loss  | -0.0133   |
|    gen/train/value_loss            | 105       |
-------------------------------

round:   3%|▎         | 9/305 [02:50<1:33:50, 19.02s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0123      |
|    gen/rollout/ep_rew_wrapped_mean | 549         |
|    gen/time/fps                    | 1849        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1163264     |
|    gen/train/approx_kl             | 0.019951146 |
|    gen/train/clip_fraction         | 0.198       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.13       |
|    gen/train/explained_variance    | 0.873       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 20.9        |
|    gen/train/n_updates             | 700         |
|    gen/train/policy_gradient_loss  | -0.00729    |
|    gen/train/value_loss            | 96.9   

round:   3%|▎         | 10/305 [03:09<1:33:23, 19.00s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00547     |
|    gen/rollout/ep_rew_wrapped_mean | 594         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1179648     |
|    gen/train/approx_kl             | 0.015259322 |
|    gen/train/clip_fraction         | 0.136       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.14       |
|    gen/train/explained_variance    | 0.866       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 207         |
|    gen/train/n_updates             | 710         |
|    gen/train/policy_gradient_loss  | -0.00808    |
|    gen/train/value_loss            | 250    

round:   4%|▎         | 11/305 [03:28<1:33:26, 19.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0115      |
|    gen/rollout/ep_rew_wrapped_mean | 579         |
|    gen/time/fps                    | 1786        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1196032     |
|    gen/train/approx_kl             | 0.016954385 |
|    gen/train/clip_fraction         | 0.187       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.16       |
|    gen/train/explained_variance    | 0.93        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 71.6        |
|    gen/train/n_updates             | 720         |
|    gen/train/policy_gradient_loss  | -0.00798    |
|    gen/train/value_loss            | 174    

round:   4%|▍         | 12/305 [03:48<1:34:29, 19.35s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0145      |
|    gen/rollout/ep_rew_wrapped_mean | 473         |
|    gen/time/fps                    | 1585        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1212416     |
|    gen/train/approx_kl             | 0.027435943 |
|    gen/train/clip_fraction         | 0.211       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.14       |
|    gen/train/explained_variance    | 0.842       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 67.6        |
|    gen/train/n_updates             | 730         |
|    gen/train/policy_gradient_loss  | -0.0196     |
|    gen/train/value_loss            | 117    

round:   4%|▍         | 13/305 [04:10<1:37:42, 20.08s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0185      |
|    gen/rollout/ep_rew_wrapped_mean | 419         |
|    gen/time/fps                    | 1819        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1228800     |
|    gen/train/approx_kl             | 0.023855407 |
|    gen/train/clip_fraction         | 0.177       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.02       |
|    gen/train/explained_variance    | 0.92        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 26.8        |
|    gen/train/n_updates             | 740         |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 122    

round:   5%|▍         | 14/305 [04:29<1:36:07, 19.82s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0139      |
|    gen/rollout/ep_rew_wrapped_mean | 343         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1245184     |
|    gen/train/approx_kl             | 0.022413304 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.11       |
|    gen/train/explained_variance    | 0.827       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 19.7        |
|    gen/train/n_updates             | 750         |
|    gen/train/policy_gradient_loss  | -0.0105     |
|    gen/train/value_loss            | 65.1   

round:   5%|▍         | 15/305 [04:48<1:34:51, 19.63s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 574         |
|    gen/rollout/ep_rew_mean         | 0.00544     |
|    gen/rollout/ep_rew_wrapped_mean | 345         |
|    gen/time/fps                    | 1778        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1261568     |
|    gen/train/approx_kl             | 0.026561806 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.03       |
|    gen/train/explained_variance    | 0.87        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 12.2        |
|    gen/train/n_updates             | 760         |
|    gen/train/policy_gradient_loss  | -0.0186     |
|    gen/train/value_loss            | 40.8   

round:   5%|▌         | 16/305 [05:08<1:34:10, 19.55s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 570        |
|    gen/rollout/ep_rew_mean         | 0.0118     |
|    gen/rollout/ep_rew_wrapped_mean | 333        |
|    gen/time/fps                    | 1704       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1277952    |
|    gen/train/approx_kl             | 0.02329228 |
|    gen/train/clip_fraction         | 0.206      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.08      |
|    gen/train/explained_variance    | 0.719      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.27       |
|    gen/train/n_updates             | 770        |
|    gen/train/policy_gradient_loss  | -0.0163    |
|    gen/train/value_loss            | 43.9       |
------------

round:   6%|▌         | 17/305 [05:27<1:33:13, 19.42s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0139      |
|    gen/rollout/ep_rew_wrapped_mean | 304         |
|    gen/time/fps                    | 1762        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1294336     |
|    gen/train/approx_kl             | 0.021619547 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.97       |
|    gen/train/explained_variance    | 0.806       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.3         |
|    gen/train/n_updates             | 780         |
|    gen/train/policy_gradient_loss  | -0.0134     |
|    gen/train/value_loss            | 15.5   

round:   6%|▌         | 18/305 [05:46<1:33:08, 19.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0238      |
|    gen/rollout/ep_rew_wrapped_mean | 301         |
|    gen/time/fps                    | 1825        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1310720     |
|    gen/train/approx_kl             | 0.021598026 |
|    gen/train/clip_fraction         | 0.175       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.975      |
|    gen/train/explained_variance    | 0.724       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.16        |
|    gen/train/n_updates             | 790         |
|    gen/train/policy_gradient_loss  | -0.00852    |
|    gen/train/value_loss            | 12.7   

round:   6%|▌         | 19/305 [06:05<1:31:46, 19.25s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 565        |
|    gen/rollout/ep_rew_mean         | 0.0216     |
|    gen/rollout/ep_rew_wrapped_mean | 307        |
|    gen/time/fps                    | 1771       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1327104    |
|    gen/train/approx_kl             | 0.02355303 |
|    gen/train/clip_fraction         | 0.209      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.995     |
|    gen/train/explained_variance    | 0.75       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.05       |
|    gen/train/n_updates             | 800        |
|    gen/train/policy_gradient_loss  | -0.0148    |
|    gen/train/value_loss            | 11.8       |
------------

round:   7%|▋         | 20/305 [06:25<1:32:49, 19.54s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0199      |
|    gen/rollout/ep_rew_wrapped_mean | 343         |
|    gen/time/fps                    | 1664        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1343488     |
|    gen/train/approx_kl             | 0.027582327 |
|    gen/train/clip_fraction         | 0.232       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.967      |
|    gen/train/explained_variance    | 0.846       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.43        |
|    gen/train/n_updates             | 810         |
|    gen/train/policy_gradient_loss  | -0.0201     |
|    gen/train/value_loss            | 7.87   

round:   7%|▋         | 21/305 [06:46<1:34:02, 19.87s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 571        |
|    gen/rollout/ep_rew_mean         | 0.011      |
|    gen/rollout/ep_rew_wrapped_mean | 393        |
|    gen/time/fps                    | 1732       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1359872    |
|    gen/train/approx_kl             | 0.02700473 |
|    gen/train/clip_fraction         | 0.222      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.914     |
|    gen/train/explained_variance    | 0.835      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 3.52       |
|    gen/train/n_updates             | 820        |
|    gen/train/policy_gradient_loss  | -0.0188    |
|    gen/train/value_loss            | 9.93       |
------------

round:   7%|▋         | 22/305 [07:06<1:33:29, 19.82s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0111      |
|    gen/rollout/ep_rew_wrapped_mean | 419         |
|    gen/time/fps                    | 1811        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1376256     |
|    gen/train/approx_kl             | 0.027737333 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.951      |
|    gen/train/explained_variance    | 0.789       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.14        |
|    gen/train/n_updates             | 830         |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 10.5   

round:   8%|▊         | 23/305 [07:25<1:31:52, 19.55s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.012       |
|    gen/rollout/ep_rew_wrapped_mean | 444         |
|    gen/time/fps                    | 1738        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1392640     |
|    gen/train/approx_kl             | 0.031952158 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.873      |
|    gen/train/explained_variance    | 0.654       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.48        |
|    gen/train/n_updates             | 840         |
|    gen/train/policy_gradient_loss  | -0.0096     |
|    gen/train/value_loss            | 19.1   

round:   8%|▊         | 24/305 [07:44<1:31:51, 19.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00848     |
|    gen/rollout/ep_rew_wrapped_mean | 415         |
|    gen/time/fps                    | 1648        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1409024     |
|    gen/train/approx_kl             | 0.041851725 |
|    gen/train/clip_fraction         | 0.227       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.943      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.49        |
|    gen/train/n_updates             | 850         |
|    gen/train/policy_gradient_loss  | -0.0157     |
|    gen/train/value_loss            | 17.1   

round:   8%|▊         | 25/305 [08:05<1:32:36, 19.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 574         |
|    gen/rollout/ep_rew_mean         | 0.00419     |
|    gen/rollout/ep_rew_wrapped_mean | 411         |
|    gen/time/fps                    | 1609        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1425408     |
|    gen/train/approx_kl             | 0.026996015 |
|    gen/train/clip_fraction         | 0.223       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.02       |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.11        |
|    gen/train/n_updates             | 860         |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 11.6   

round:   9%|▊         | 26/305 [08:27<1:35:34, 20.55s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.0108      |
|    gen/rollout/ep_rew_wrapped_mean | 414         |
|    gen/time/fps                    | 1596        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1441792     |
|    gen/train/approx_kl             | 0.028982997 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.957      |
|    gen/train/explained_variance    | 0.974       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.32        |
|    gen/train/n_updates             | 870         |
|    gen/train/policy_gradient_loss  | -0.0202     |
|    gen/train/value_loss            | 9.87   

round:   9%|▉         | 27/305 [08:48<1:36:03, 20.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0148      |
|    gen/rollout/ep_rew_wrapped_mean | 398         |
|    gen/time/fps                    | 1759        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1458176     |
|    gen/train/approx_kl             | 0.023359424 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.04       |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.21        |
|    gen/train/n_updates             | 880         |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 11.9   

round:   9%|▉         | 28/305 [09:09<1:35:38, 20.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0161      |
|    gen/rollout/ep_rew_wrapped_mean | 402         |
|    gen/time/fps                    | 1649        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1474560     |
|    gen/train/approx_kl             | 0.030259645 |
|    gen/train/clip_fraction         | 0.244       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.13       |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.5         |
|    gen/train/n_updates             | 890         |
|    gen/train/policy_gradient_loss  | -0.019      |
|    gen/train/value_loss            | 11.4   

round:  10%|▉         | 29/305 [09:29<1:34:06, 20.46s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0168      |
|    gen/rollout/ep_rew_wrapped_mean | 390         |
|    gen/time/fps                    | 1802        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1490944     |
|    gen/train/approx_kl             | 0.026437262 |
|    gen/train/clip_fraction         | 0.237       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.12       |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 19.3        |
|    gen/train/n_updates             | 900         |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 9.69   

round:  10%|▉         | 30/305 [09:47<1:31:30, 19.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0166      |
|    gen/rollout/ep_rew_wrapped_mean | 382         |
|    gen/time/fps                    | 1770        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1507328     |
|    gen/train/approx_kl             | 0.026918838 |
|    gen/train/clip_fraction         | 0.242       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.12       |
|    gen/train/explained_variance    | 0.919       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 12.5        |
|    gen/train/n_updates             | 910         |
|    gen/train/policy_gradient_loss  | -0.0199     |
|    gen/train/value_loss            | 12.3   

round:  10%|█         | 31/305 [10:07<1:31:01, 19.93s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.0166     |
|    gen/rollout/ep_rew_wrapped_mean | 382        |
|    gen/time/fps                    | 1778       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1523712    |
|    gen/train/approx_kl             | 0.03260757 |
|    gen/train/clip_fraction         | 0.255      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.05      |
|    gen/train/explained_variance    | 0.942      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 5.6        |
|    gen/train/n_updates             | 920        |
|    gen/train/policy_gradient_loss  | -0.0197    |
|    gen/train/value_loss            | 9.64       |
------------

round:  10%|█         | 32/305 [10:27<1:30:05, 19.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00964     |
|    gen/rollout/ep_rew_wrapped_mean | 381         |
|    gen/time/fps                    | 1670        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1540096     |
|    gen/train/approx_kl             | 0.027607301 |
|    gen/train/clip_fraction         | 0.254       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.977      |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.79        |
|    gen/train/n_updates             | 930         |
|    gen/train/policy_gradient_loss  | -0.0298     |
|    gen/train/value_loss            | 3.89   

round:  11%|█         | 33/305 [10:47<1:30:59, 20.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0101      |
|    gen/rollout/ep_rew_wrapped_mean | 399         |
|    gen/time/fps                    | 1705        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1556480     |
|    gen/train/approx_kl             | 0.029177375 |
|    gen/train/clip_fraction         | 0.244       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.936      |
|    gen/train/explained_variance    | 0.972       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.16        |
|    gen/train/n_updates             | 940         |
|    gen/train/policy_gradient_loss  | -0.0234     |
|    gen/train/value_loss            | 3.55   

round:  11%|█         | 34/305 [11:08<1:30:56, 20.13s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 571        |
|    gen/rollout/ep_rew_mean         | 0.0101     |
|    gen/rollout/ep_rew_wrapped_mean | 389        |
|    gen/time/fps                    | 1723       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1572864    |
|    gen/train/approx_kl             | 0.04383953 |
|    gen/train/clip_fraction         | 0.259      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1         |
|    gen/train/explained_variance    | 0.937      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.34       |
|    gen/train/n_updates             | 950        |
|    gen/train/policy_gradient_loss  | -0.0185    |
|    gen/train/value_loss            | 6.06       |
------------

round:  11%|█▏        | 35/305 [11:28<1:30:18, 20.07s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0101      |
|    gen/rollout/ep_rew_wrapped_mean | 387         |
|    gen/time/fps                    | 1636        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 1589248     |
|    gen/train/approx_kl             | 0.027260927 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.03       |
|    gen/train/explained_variance    | 0.972       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.2         |
|    gen/train/n_updates             | 960         |
|    gen/train/policy_gradient_loss  | -0.0187     |
|    gen/train/value_loss            | 3.11   

round:  12%|█▏        | 36/305 [11:48<1:30:55, 20.28s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.00939     |
|    gen/rollout/ep_rew_wrapped_mean | 391         |
|    gen/time/fps                    | 1801        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1605632     |
|    gen/train/approx_kl             | 0.025635824 |
|    gen/train/clip_fraction         | 0.215       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.04       |
|    gen/train/explained_variance    | 0.82        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.79        |
|    gen/train/n_updates             | 970         |
|    gen/train/policy_gradient_loss  | -0.013      |
|    gen/train/value_loss            | 5.41   

round:  12%|█▏        | 37/305 [12:07<1:28:25, 19.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0161      |
|    gen/rollout/ep_rew_wrapped_mean | 396         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1622016     |
|    gen/train/approx_kl             | 0.023180705 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.03       |
|    gen/train/explained_variance    | 0.878       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.21        |
|    gen/train/n_updates             | 980         |
|    gen/train/policy_gradient_loss  | -0.00915    |
|    gen/train/value_loss            | 17.4   

round:  12%|█▏        | 38/305 [12:26<1:26:40, 19.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0161      |
|    gen/rollout/ep_rew_wrapped_mean | 429         |
|    gen/time/fps                    | 1821        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1638400     |
|    gen/train/approx_kl             | 0.025537582 |
|    gen/train/clip_fraction         | 0.243       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.03       |
|    gen/train/explained_variance    | 0.966       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.9         |
|    gen/train/n_updates             | 990         |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 7.68   

round:  13%|█▎        | 39/305 [12:45<1:25:23, 19.26s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 562         |
|    gen/rollout/ep_rew_mean         | 0.0253      |
|    gen/rollout/ep_rew_wrapped_mean | 426         |
|    gen/time/fps                    | 1867        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1654784     |
|    gen/train/approx_kl             | 0.024423178 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.968      |
|    gen/train/explained_variance    | 0.966       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 14.8        |
|    gen/train/n_updates             | 1000        |
|    gen/train/policy_gradient_loss  | -0.0166     |
|    gen/train/value_loss            | 7.02   

round:  13%|█▎        | 40/305 [13:03<1:23:50, 18.98s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0182      |
|    gen/rollout/ep_rew_wrapped_mean | 400         |
|    gen/time/fps                    | 1763        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1671168     |
|    gen/train/approx_kl             | 0.054461338 |
|    gen/train/clip_fraction         | 0.273       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.999      |
|    gen/train/explained_variance    | 0.963       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.78        |
|    gen/train/n_updates             | 1010        |
|    gen/train/policy_gradient_loss  | -0.0405     |
|    gen/train/value_loss            | 10     

round:  13%|█▎        | 41/305 [13:22<1:23:57, 19.08s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0182      |
|    gen/rollout/ep_rew_wrapped_mean | 421         |
|    gen/time/fps                    | 1870        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1687552     |
|    gen/train/approx_kl             | 0.026457848 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.964      |
|    gen/train/explained_variance    | 0.934       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.3         |
|    gen/train/n_updates             | 1020        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 14.5   

round:  14%|█▍        | 42/305 [13:40<1:22:16, 18.77s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 561        |
|    gen/rollout/ep_rew_mean         | 0.0281     |
|    gen/rollout/ep_rew_wrapped_mean | 421        |
|    gen/time/fps                    | 1794       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1703936    |
|    gen/train/approx_kl             | 0.03121872 |
|    gen/train/clip_fraction         | 0.279      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.04      |
|    gen/train/explained_variance    | 0.966      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.04       |
|    gen/train/n_updates             | 1030       |
|    gen/train/policy_gradient_loss  | -0.0223    |
|    gen/train/value_loss            | 9.11       |
------------

round:  14%|█▍        | 43/305 [13:59<1:21:48, 18.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0238      |
|    gen/rollout/ep_rew_wrapped_mean | 419         |
|    gen/time/fps                    | 1778        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1720320     |
|    gen/train/approx_kl             | 0.029667884 |
|    gen/train/clip_fraction         | 0.226       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.02       |
|    gen/train/explained_variance    | 0.955       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.38        |
|    gen/train/n_updates             | 1040        |
|    gen/train/policy_gradient_loss  | -0.0175     |
|    gen/train/value_loss            | 14.6   

round:  14%|█▍        | 44/305 [14:18<1:21:31, 18.74s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0165      |
|    gen/rollout/ep_rew_wrapped_mean | 453         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1736704     |
|    gen/train/approx_kl             | 0.029274467 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.949      |
|    gen/train/explained_variance    | 0.906       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 15.9        |
|    gen/train/n_updates             | 1050        |
|    gen/train/policy_gradient_loss  | -0.0113     |
|    gen/train/value_loss            | 18.8   

round:  15%|█▍        | 45/305 [14:36<1:20:52, 18.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.033       |
|    gen/rollout/ep_rew_wrapped_mean | 465         |
|    gen/time/fps                    | 1820        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1753088     |
|    gen/train/approx_kl             | 0.036699004 |
|    gen/train/clip_fraction         | 0.239       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.906      |
|    gen/train/explained_variance    | 0.823       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.69        |
|    gen/train/n_updates             | 1060        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 21.3   

round:  15%|█▌        | 46/305 [14:55<1:20:18, 18.60s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 561        |
|    gen/rollout/ep_rew_mean         | 0.0287     |
|    gen/rollout/ep_rew_wrapped_mean | 463        |
|    gen/time/fps                    | 1818       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 1769472    |
|    gen/train/approx_kl             | 0.02635677 |
|    gen/train/clip_fraction         | 0.22       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -1.02      |
|    gen/train/explained_variance    | 0.878      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 25.8       |
|    gen/train/n_updates             | 1070       |
|    gen/train/policy_gradient_loss  | -0.0135    |
|    gen/train/value_loss            | 34         |
------------

round:  15%|█▌        | 47/305 [15:13<1:19:40, 18.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.0327      |
|    gen/rollout/ep_rew_wrapped_mean | 468         |
|    gen/time/fps                    | 1807        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1785856     |
|    gen/train/approx_kl             | 0.028633116 |
|    gen/train/clip_fraction         | 0.237       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.02       |
|    gen/train/explained_variance    | 0.894       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 29.6        |
|    gen/train/n_updates             | 1080        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 31.7   

round:  16%|█▌        | 48/305 [15:32<1:19:39, 18.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0231      |
|    gen/rollout/ep_rew_wrapped_mean | 484         |
|    gen/time/fps                    | 1842        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1802240     |
|    gen/train/approx_kl             | 0.026485972 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.03       |
|    gen/train/explained_variance    | 0.843       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 31          |
|    gen/train/n_updates             | 1090        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 42.5   

round:  16%|█▌        | 49/305 [15:51<1:19:46, 18.70s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0162      |
|    gen/rollout/ep_rew_wrapped_mean | 525         |
|    gen/time/fps                    | 1821        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1818624     |
|    gen/train/approx_kl             | 0.025013126 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1.02       |
|    gen/train/explained_variance    | 0.857       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.24        |
|    gen/train/n_updates             | 1100        |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 40.7   

round:  16%|█▋        | 50/305 [16:09<1:19:35, 18.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 558         |
|    gen/time/fps                    | 1836        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1835008     |
|    gen/train/approx_kl             | 0.023101345 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -1          |
|    gen/train/explained_variance    | 0.886       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 10.3        |
|    gen/train/n_updates             | 1110        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 43.8   

round:  17%|█▋        | 51/305 [16:28<1:19:12, 18.71s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 575         |
|    gen/time/fps                    | 1891        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1851392     |
|    gen/train/approx_kl             | 0.035255414 |
|    gen/train/clip_fraction         | 0.239       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.96       |
|    gen/train/explained_variance    | 0.906       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 15.6        |
|    gen/train/n_updates             | 1120        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 37.3   

round:  17%|█▋        | 52/305 [16:46<1:18:21, 18.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 516         |
|    gen/time/fps                    | 1818        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1867776     |
|    gen/train/approx_kl             | 0.033925384 |
|    gen/train/clip_fraction         | 0.268       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.974      |
|    gen/train/explained_variance    | 0.928       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 29.1        |
|    gen/train/n_updates             | 1130        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 32     

round:  17%|█▋        | 53/305 [17:05<1:17:38, 18.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 576         |
|    gen/rollout/ep_rew_mean         | 0           |
|    gen/rollout/ep_rew_wrapped_mean | 544         |
|    gen/time/fps                    | 1740        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1884160     |
|    gen/train/approx_kl             | 0.027906898 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.861      |
|    gen/train/explained_variance    | 0.875       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.62        |
|    gen/train/n_updates             | 1140        |
|    gen/train/policy_gradient_loss  | -0.0116     |
|    gen/train/value_loss            | 48.9   

round:  18%|█▊        | 54/305 [17:24<1:18:30, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.0053      |
|    gen/rollout/ep_rew_wrapped_mean | 522         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 1900544     |
|    gen/train/approx_kl             | 0.030048564 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.942      |
|    gen/train/explained_variance    | 0.927       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.2        |
|    gen/train/n_updates             | 1150        |
|    gen/train/policy_gradient_loss  | -0.0158     |
|    gen/train/value_loss            | 17.5   

round:  18%|█▊        | 55/305 [17:43<1:18:26, 18.83s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.0053      |
|    gen/rollout/ep_rew_wrapped_mean | 468         |
|    gen/time/fps                    | 1858        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1916928     |
|    gen/train/approx_kl             | 0.028513921 |
|    gen/train/clip_fraction         | 0.218       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.961      |
|    gen/train/explained_variance    | 0.88        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.06        |
|    gen/train/n_updates             | 1160        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 18.9   

round:  18%|█▊        | 56/305 [18:01<1:17:34, 18.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.0053      |
|    gen/rollout/ep_rew_wrapped_mean | 387         |
|    gen/time/fps                    | 1820        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1933312     |
|    gen/train/approx_kl             | 0.023733547 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.957      |
|    gen/train/explained_variance    | 0.884       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.54        |
|    gen/train/n_updates             | 1170        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 11.1   

round:  19%|█▊        | 57/305 [18:20<1:17:16, 18.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0167      |
|    gen/rollout/ep_rew_wrapped_mean | 323         |
|    gen/time/fps                    | 1851        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1949696     |
|    gen/train/approx_kl             | 0.030299284 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.903      |
|    gen/train/explained_variance    | 0.914       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.38        |
|    gen/train/n_updates             | 1180        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 5.93   

round:  19%|█▉        | 58/305 [18:38<1:16:27, 18.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0125      |
|    gen/rollout/ep_rew_wrapped_mean | 319         |
|    gen/time/fps                    | 1874        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1966080     |
|    gen/train/approx_kl             | 0.021607565 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.939      |
|    gen/train/explained_variance    | 0.701       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.07        |
|    gen/train/n_updates             | 1190        |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 11.3   

round:  19%|█▉        | 59/305 [18:57<1:15:44, 18.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0193      |
|    gen/rollout/ep_rew_wrapped_mean | 344         |
|    gen/time/fps                    | 1916        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1982464     |
|    gen/train/approx_kl             | 0.027893405 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.913      |
|    gen/train/explained_variance    | 0.872       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.86        |
|    gen/train/n_updates             | 1200        |
|    gen/train/policy_gradient_loss  | -0.0151     |
|    gen/train/value_loss            | 9.44   

round:  20%|█▉        | 60/305 [19:14<1:14:34, 18.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0363      |
|    gen/rollout/ep_rew_wrapped_mean | 358         |
|    gen/time/fps                    | 1857        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 1998848     |
|    gen/train/approx_kl             | 0.030928247 |
|    gen/train/clip_fraction         | 0.244       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.858      |
|    gen/train/explained_variance    | 0.925       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.8         |
|    gen/train/n_updates             | 1210        |
|    gen/train/policy_gradient_loss  | -0.019      |
|    gen/train/value_loss            | 5.73   

round:  20%|██        | 61/305 [19:33<1:14:39, 18.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0249      |
|    gen/rollout/ep_rew_wrapped_mean | 376         |
|    gen/time/fps                    | 1874        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2015232     |
|    gen/train/approx_kl             | 0.024894789 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.879      |
|    gen/train/explained_variance    | 0.883       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8           |
|    gen/train/n_updates             | 1220        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 14.3   

round:  20%|██        | 62/305 [19:51<1:14:25, 18.38s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0265      |
|    gen/rollout/ep_rew_wrapped_mean | 395         |
|    gen/time/fps                    | 1879        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2031616     |
|    gen/train/approx_kl             | 0.026443172 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.84       |
|    gen/train/explained_variance    | 0.901       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.79        |
|    gen/train/n_updates             | 1230        |
|    gen/train/policy_gradient_loss  | -0.0184     |
|    gen/train/value_loss            | 5.43   

round:  21%|██        | 63/305 [20:09<1:13:41, 18.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0199      |
|    gen/rollout/ep_rew_wrapped_mean | 413         |
|    gen/time/fps                    | 1879        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2048000     |
|    gen/train/approx_kl             | 0.028343461 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.811      |
|    gen/train/explained_variance    | 0.901       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.45        |
|    gen/train/n_updates             | 1240        |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 8.56   

round:  21%|██        | 64/305 [20:27<1:12:55, 18.15s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.018       |
|    gen/rollout/ep_rew_wrapped_mean | 419         |
|    gen/time/fps                    | 1805        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2064384     |
|    gen/train/approx_kl             | 0.026255123 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.905      |
|    gen/train/explained_variance    | 0.889       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.3         |
|    gen/train/n_updates             | 1250        |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 10     

round:  21%|██▏       | 65/305 [20:46<1:12:50, 18.21s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.0328      |
|    gen/rollout/ep_rew_wrapped_mean | 434         |
|    gen/time/fps                    | 1813        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2080768     |
|    gen/train/approx_kl             | 0.027620818 |
|    gen/train/clip_fraction         | 0.231       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.886      |
|    gen/train/explained_variance    | 0.904       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.63        |
|    gen/train/n_updates             | 1260        |
|    gen/train/policy_gradient_loss  | -0.0149     |
|    gen/train/value_loss            | 14.6   

round:  22%|██▏       | 66/305 [21:04<1:13:08, 18.36s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 563       |
|    gen/rollout/ep_rew_mean         | 0.0266    |
|    gen/rollout/ep_rew_wrapped_mean | 446       |
|    gen/time/fps                    | 1772      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 9         |
|    gen/time/total_timesteps        | 2097152   |
|    gen/train/approx_kl             | 0.0316511 |
|    gen/train/clip_fraction         | 0.245     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.864    |
|    gen/train/explained_variance    | 0.92      |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 11.8      |
|    gen/train/n_updates             | 1270      |
|    gen/train/policy_gradient_loss  | -0.0164   |
|    gen/train/value_loss            | 13.9      |
-------------------------------

round:  22%|██▏       | 67/305 [21:23<1:13:28, 18.52s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 566        |
|    gen/rollout/ep_rew_mean         | 0.0222     |
|    gen/rollout/ep_rew_wrapped_mean | 449        |
|    gen/time/fps                    | 1791       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2113536    |
|    gen/train/approx_kl             | 0.02431225 |
|    gen/train/clip_fraction         | 0.229      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.853     |
|    gen/train/explained_variance    | 0.942      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.53       |
|    gen/train/n_updates             | 1280       |
|    gen/train/policy_gradient_loss  | -0.0132    |
|    gen/train/value_loss            | 4.52       |
------------

round:  22%|██▏       | 68/305 [21:42<1:13:32, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0213      |
|    gen/rollout/ep_rew_wrapped_mean | 444         |
|    gen/time/fps                    | 1773        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2129920     |
|    gen/train/approx_kl             | 0.023299083 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.92       |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.03        |
|    gen/train/n_updates             | 1290        |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 13     

round:  23%|██▎       | 69/305 [22:01<1:13:47, 18.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00905     |
|    gen/rollout/ep_rew_wrapped_mean | 406         |
|    gen/time/fps                    | 1860        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2146304     |
|    gen/train/approx_kl             | 0.030609623 |
|    gen/train/clip_fraction         | 0.252       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.89       |
|    gen/train/explained_variance    | 0.946       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.5         |
|    gen/train/n_updates             | 1300        |
|    gen/train/policy_gradient_loss  | -0.0249     |
|    gen/train/value_loss            | 4.85   

round:  23%|██▎       | 70/305 [22:20<1:13:04, 18.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00905     |
|    gen/rollout/ep_rew_wrapped_mean | 389         |
|    gen/time/fps                    | 1907        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2162688     |
|    gen/train/approx_kl             | 0.032529116 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.784      |
|    gen/train/explained_variance    | 0.957       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.05        |
|    gen/train/n_updates             | 1310        |
|    gen/train/policy_gradient_loss  | -0.0161     |
|    gen/train/value_loss            | 6.97   

round:  23%|██▎       | 71/305 [22:38<1:11:54, 18.44s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 576        |
|    gen/rollout/ep_rew_mean         | 0.00136    |
|    gen/rollout/ep_rew_wrapped_mean | 408        |
|    gen/time/fps                    | 1816       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2179072    |
|    gen/train/approx_kl             | 0.02789488 |
|    gen/train/clip_fraction         | 0.193      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.814     |
|    gen/train/explained_variance    | 0.651      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 14.2       |
|    gen/train/n_updates             | 1320       |
|    gen/train/policy_gradient_loss  | -0.012     |
|    gen/train/value_loss            | 11.5       |
------------

round:  24%|██▎       | 72/305 [22:56<1:11:50, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00553     |
|    gen/rollout/ep_rew_wrapped_mean | 426         |
|    gen/time/fps                    | 1783        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2195456     |
|    gen/train/approx_kl             | 0.034358807 |
|    gen/train/clip_fraction         | 0.255       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.859      |
|    gen/train/explained_variance    | 0.892       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.29        |
|    gen/train/n_updates             | 1330        |
|    gen/train/policy_gradient_loss  | -0.0147     |
|    gen/train/value_loss            | 7.51   

round:  24%|██▍       | 73/305 [23:15<1:11:59, 18.62s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 570        |
|    gen/rollout/ep_rew_mean         | 0.0119     |
|    gen/rollout/ep_rew_wrapped_mean | 433        |
|    gen/time/fps                    | 1719       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2211840    |
|    gen/train/approx_kl             | 0.03167174 |
|    gen/train/clip_fraction         | 0.249      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.838     |
|    gen/train/explained_variance    | 0.881      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.06       |
|    gen/train/n_updates             | 1340       |
|    gen/train/policy_gradient_loss  | -0.0144    |
|    gen/train/value_loss            | 8.93       |
------------

round:  24%|██▍       | 74/305 [23:34<1:12:11, 18.75s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0119      |
|    gen/rollout/ep_rew_wrapped_mean | 478         |
|    gen/time/fps                    | 1797        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2228224     |
|    gen/train/approx_kl             | 0.016058967 |
|    gen/train/clip_fraction         | 0.118       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.768      |
|    gen/train/explained_variance    | 0.715       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.19        |
|    gen/train/n_updates             | 1350        |
|    gen/train/policy_gradient_loss  | -0.00823    |
|    gen/train/value_loss            | 31.2   

round:  25%|██▍       | 75/305 [23:53<1:12:01, 18.79s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00636     |
|    gen/rollout/ep_rew_wrapped_mean | 478         |
|    gen/time/fps                    | 1806        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2244608     |
|    gen/train/approx_kl             | 0.030652415 |
|    gen/train/clip_fraction         | 0.248       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.811      |
|    gen/train/explained_variance    | 0.895       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.823       |
|    gen/train/n_updates             | 1360        |
|    gen/train/policy_gradient_loss  | -0.00758    |
|    gen/train/value_loss            | 12.1   

round:  25%|██▍       | 76/305 [24:11<1:11:16, 18.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00636     |
|    gen/rollout/ep_rew_wrapped_mean | 462         |
|    gen/time/fps                    | 1795        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2260992     |
|    gen/train/approx_kl             | 0.026236923 |
|    gen/train/clip_fraction         | 0.233       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.809      |
|    gen/train/explained_variance    | 0.924       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.17        |
|    gen/train/n_updates             | 1370        |
|    gen/train/policy_gradient_loss  | -0.0328     |
|    gen/train/value_loss            | 7.51   

round:  25%|██▌       | 77/305 [24:30<1:10:58, 18.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00575     |
|    gen/rollout/ep_rew_wrapped_mean | 434         |
|    gen/time/fps                    | 1832        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2277376     |
|    gen/train/approx_kl             | 0.036788262 |
|    gen/train/clip_fraction         | 0.243       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.758      |
|    gen/train/explained_variance    | 0.955       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.08        |
|    gen/train/n_updates             | 1380        |
|    gen/train/policy_gradient_loss  | -0.0125     |
|    gen/train/value_loss            | 3.97   

round:  26%|██▌       | 78/305 [24:50<1:11:34, 18.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0111      |
|    gen/rollout/ep_rew_wrapped_mean | 361         |
|    gen/time/fps                    | 1830        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2293760     |
|    gen/train/approx_kl             | 0.021093734 |
|    gen/train/clip_fraction         | 0.152       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.683      |
|    gen/train/explained_variance    | 0.925       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.14        |
|    gen/train/n_updates             | 1390        |
|    gen/train/policy_gradient_loss  | -0.00842    |
|    gen/train/value_loss            | 7.15   

round:  26%|██▌       | 79/305 [25:08<1:10:57, 18.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0111      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1777        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2310144     |
|    gen/train/approx_kl             | 0.039612856 |
|    gen/train/clip_fraction         | 0.283       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.723      |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.25        |
|    gen/train/n_updates             | 1400        |
|    gen/train/policy_gradient_loss  | -0.0219     |
|    gen/train/value_loss            | 1.86   

round:  26%|██▌       | 80/305 [25:27<1:10:41, 18.85s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 570        |
|    gen/rollout/ep_rew_mean         | 0.0116     |
|    gen/rollout/ep_rew_wrapped_mean | 373        |
|    gen/time/fps                    | 1814       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2326528    |
|    gen/train/approx_kl             | 0.02943552 |
|    gen/train/clip_fraction         | 0.195      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.693     |
|    gen/train/explained_variance    | 0.97       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.475      |
|    gen/train/n_updates             | 1410       |
|    gen/train/policy_gradient_loss  | -0.0173    |
|    gen/train/value_loss            | 1.57       |
------------

round:  27%|██▋       | 81/305 [25:46<1:10:07, 18.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0122      |
|    gen/rollout/ep_rew_wrapped_mean | 392         |
|    gen/time/fps                    | 1786        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2342912     |
|    gen/train/approx_kl             | 0.025027335 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.711      |
|    gen/train/explained_variance    | 0.942       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.12        |
|    gen/train/n_updates             | 1420        |
|    gen/train/policy_gradient_loss  | -0.0128     |
|    gen/train/value_loss            | 3.59   

round:  27%|██▋       | 82/305 [26:05<1:09:42, 18.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.0286      |
|    gen/rollout/ep_rew_wrapped_mean | 402         |
|    gen/time/fps                    | 1823        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2359296     |
|    gen/train/approx_kl             | 0.030456554 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.718      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.552       |
|    gen/train/n_updates             | 1430        |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 5.13   

round:  27%|██▋       | 83/305 [26:23<1:09:11, 18.70s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0407      |
|    gen/rollout/ep_rew_wrapped_mean | 400         |
|    gen/time/fps                    | 1845        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2375680     |
|    gen/train/approx_kl             | 0.028141472 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.738      |
|    gen/train/explained_variance    | 0.901       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.397       |
|    gen/train/n_updates             | 1440        |
|    gen/train/policy_gradient_loss  | -0.0184     |
|    gen/train/value_loss            | 8.79   

round:  28%|██▊       | 84/305 [26:42<1:09:14, 18.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0526      |
|    gen/rollout/ep_rew_wrapped_mean | 375         |
|    gen/time/fps                    | 1643        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2392064     |
|    gen/train/approx_kl             | 0.028615242 |
|    gen/train/clip_fraction         | 0.237       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.689      |
|    gen/train/explained_variance    | 0.942       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.894       |
|    gen/train/n_updates             | 1450        |
|    gen/train/policy_gradient_loss  | -0.0259     |
|    gen/train/value_loss            | 5.58   

round:  28%|██▊       | 85/305 [27:02<1:09:46, 19.03s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0507      |
|    gen/rollout/ep_rew_wrapped_mean | 367         |
|    gen/time/fps                    | 1881        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2408448     |
|    gen/train/approx_kl             | 0.025491487 |
|    gen/train/clip_fraction         | 0.181       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.808      |
|    gen/train/explained_variance    | 0.882       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.569       |
|    gen/train/n_updates             | 1460        |
|    gen/train/policy_gradient_loss  | -0.0126     |
|    gen/train/value_loss            | 9.46   

round:  28%|██▊       | 86/305 [27:20<1:08:12, 18.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 551         |
|    gen/rollout/ep_rew_mean         | 0.0466      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1779        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2424832     |
|    gen/train/approx_kl             | 0.033500306 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.725      |
|    gen/train/explained_variance    | 0.934       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.62        |
|    gen/train/n_updates             | 1470        |
|    gen/train/policy_gradient_loss  | -0.0137     |
|    gen/train/value_loss            | 8.71   

round:  29%|██▊       | 87/305 [27:38<1:07:52, 18.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0508      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1802        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2441216     |
|    gen/train/approx_kl             | 0.031999655 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.716      |
|    gen/train/explained_variance    | 0.952       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.615       |
|    gen/train/n_updates             | 1480        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 2.33   

round:  29%|██▉       | 88/305 [27:57<1:07:34, 18.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0261      |
|    gen/rollout/ep_rew_wrapped_mean | 349         |
|    gen/time/fps                    | 1840        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2457600     |
|    gen/train/approx_kl             | 0.029405449 |
|    gen/train/clip_fraction         | 0.218       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.939       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.748       |
|    gen/train/n_updates             | 1490        |
|    gen/train/policy_gradient_loss  | -0.0185     |
|    gen/train/value_loss            | 3.86   

round:  29%|██▉       | 89/305 [28:15<1:06:57, 18.60s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 565        |
|    gen/rollout/ep_rew_mean         | 0.0239     |
|    gen/rollout/ep_rew_wrapped_mean | 359        |
|    gen/time/fps                    | 1782       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2473984    |
|    gen/train/approx_kl             | 0.02848987 |
|    gen/train/clip_fraction         | 0.197      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.766     |
|    gen/train/explained_variance    | 0.928      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.72       |
|    gen/train/n_updates             | 1500       |
|    gen/train/policy_gradient_loss  | -0.016     |
|    gen/train/value_loss            | 8.62       |
------------

round:  30%|██▉       | 90/305 [28:34<1:06:42, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0198      |
|    gen/rollout/ep_rew_wrapped_mean | 368         |
|    gen/time/fps                    | 1734        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2490368     |
|    gen/train/approx_kl             | 0.033403035 |
|    gen/train/clip_fraction         | 0.218       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.692      |
|    gen/train/explained_variance    | 0.953       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.478       |
|    gen/train/n_updates             | 1510        |
|    gen/train/policy_gradient_loss  | -0.012      |
|    gen/train/value_loss            | 4.99   

round:  30%|██▉       | 91/305 [28:53<1:07:04, 18.80s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 567        |
|    gen/rollout/ep_rew_mean         | 0.0198     |
|    gen/rollout/ep_rew_wrapped_mean | 367        |
|    gen/time/fps                    | 1840       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2506752    |
|    gen/train/approx_kl             | 0.03476013 |
|    gen/train/clip_fraction         | 0.261      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.677     |
|    gen/train/explained_variance    | 0.959      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.38       |
|    gen/train/n_updates             | 1520       |
|    gen/train/policy_gradient_loss  | -0.0272    |
|    gen/train/value_loss            | 1.24       |
------------

round:  30%|███       | 92/305 [29:12<1:06:31, 18.74s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 574        |
|    gen/rollout/ep_rew_mean         | 0.00384    |
|    gen/rollout/ep_rew_wrapped_mean | 363        |
|    gen/time/fps                    | 1818       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2523136    |
|    gen/train/approx_kl             | 0.04236526 |
|    gen/train/clip_fraction         | 0.239      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.686     |
|    gen/train/explained_variance    | 0.964      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.578      |
|    gen/train/n_updates             | 1530       |
|    gen/train/policy_gradient_loss  | -0.0227    |
|    gen/train/value_loss            | 2.13       |
------------

round:  30%|███       | 93/305 [29:30<1:05:47, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00219     |
|    gen/rollout/ep_rew_wrapped_mean | 357         |
|    gen/time/fps                    | 1760        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2539520     |
|    gen/train/approx_kl             | 0.032684267 |
|    gen/train/clip_fraction         | 0.231       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.685      |
|    gen/train/explained_variance    | 0.931       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.1         |
|    gen/train/n_updates             | 1540        |
|    gen/train/policy_gradient_loss  | -0.0199     |
|    gen/train/value_loss            | 1.71   

round:  31%|███       | 94/305 [29:49<1:05:24, 18.60s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 575       |
|    gen/rollout/ep_rew_mean         | 0.00338   |
|    gen/rollout/ep_rew_wrapped_mean | 364       |
|    gen/time/fps                    | 1847      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 2555904   |
|    gen/train/approx_kl             | 0.0385526 |
|    gen/train/clip_fraction         | 0.236     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.669    |
|    gen/train/explained_variance    | 0.915     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 1.89      |
|    gen/train/n_updates             | 1550      |
|    gen/train/policy_gradient_loss  | -0.0179   |
|    gen/train/value_loss            | 4.24      |
-------------------------------

round:  31%|███       | 95/305 [30:07<1:04:43, 18.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0138      |
|    gen/rollout/ep_rew_wrapped_mean | 377         |
|    gen/time/fps                    | 1819        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2572288     |
|    gen/train/approx_kl             | 0.035726137 |
|    gen/train/clip_fraction         | 0.251       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.754      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.801       |
|    gen/train/n_updates             | 1560        |
|    gen/train/policy_gradient_loss  | -0.0179     |
|    gen/train/value_loss            | 4.54   

round:  31%|███▏      | 96/305 [30:26<1:04:30, 18.52s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0222      |
|    gen/rollout/ep_rew_wrapped_mean | 394         |
|    gen/time/fps                    | 1850        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2588672     |
|    gen/train/approx_kl             | 0.027937144 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.733      |
|    gen/train/explained_variance    | 0.919       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.4         |
|    gen/train/n_updates             | 1570        |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 7.42   

round:  32%|███▏      | 97/305 [30:44<1:04:05, 18.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.022       |
|    gen/rollout/ep_rew_wrapped_mean | 407         |
|    gen/time/fps                    | 1774        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2605056     |
|    gen/train/approx_kl             | 0.032302253 |
|    gen/train/clip_fraction         | 0.255       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.812      |
|    gen/train/explained_variance    | 0.872       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.11        |
|    gen/train/n_updates             | 1580        |
|    gen/train/policy_gradient_loss  | -0.0203     |
|    gen/train/value_loss            | 7.26   

round:  32%|███▏      | 98/305 [31:03<1:04:23, 18.66s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 562        |
|    gen/rollout/ep_rew_mean         | 0.0277     |
|    gen/rollout/ep_rew_wrapped_mean | 400        |
|    gen/time/fps                    | 1791       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2621440    |
|    gen/train/approx_kl             | 0.03348268 |
|    gen/train/clip_fraction         | 0.258      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.799     |
|    gen/train/explained_variance    | 0.884      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.716      |
|    gen/train/n_updates             | 1590       |
|    gen/train/policy_gradient_loss  | -0.0242    |
|    gen/train/value_loss            | 4.84       |
------------

round:  32%|███▏      | 99/305 [31:22<1:03:57, 18.63s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 563        |
|    gen/rollout/ep_rew_mean         | 0.026      |
|    gen/rollout/ep_rew_wrapped_mean | 367        |
|    gen/time/fps                    | 1847       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2637824    |
|    gen/train/approx_kl             | 0.03211891 |
|    gen/train/clip_fraction         | 0.232      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.787     |
|    gen/train/explained_variance    | 0.776      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.96       |
|    gen/train/n_updates             | 1600       |
|    gen/train/policy_gradient_loss  | -0.0241    |
|    gen/train/value_loss            | 4.56       |
------------

round:  33%|███▎      | 100/305 [31:40<1:03:26, 18.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 556         |
|    gen/rollout/ep_rew_mean         | 0.0377      |
|    gen/rollout/ep_rew_wrapped_mean | 327         |
|    gen/time/fps                    | 1863        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2654208     |
|    gen/train/approx_kl             | 0.027569624 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.78       |
|    gen/train/explained_variance    | 0.682       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.84        |
|    gen/train/n_updates             | 1610        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 6.88   

round:  33%|███▎      | 101/305 [31:58<1:02:54, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 547         |
|    gen/rollout/ep_rew_mean         | 0.056       |
|    gen/rollout/ep_rew_wrapped_mean | 294         |
|    gen/time/fps                    | 1850        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2670592     |
|    gen/train/approx_kl             | 0.060392633 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.771      |
|    gen/train/explained_variance    | 0.638       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.88        |
|    gen/train/n_updates             | 1620        |
|    gen/train/policy_gradient_loss  | -0.0166     |
|    gen/train/value_loss            | 7.63   

round:  33%|███▎      | 102/305 [32:17<1:02:37, 18.51s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 553        |
|    gen/rollout/ep_rew_mean         | 0.046      |
|    gen/rollout/ep_rew_wrapped_mean | 300        |
|    gen/time/fps                    | 1826       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2686976    |
|    gen/train/approx_kl             | 0.03005208 |
|    gen/train/clip_fraction         | 0.196      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.641     |
|    gen/train/explained_variance    | 0.819      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.911      |
|    gen/train/n_updates             | 1630       |
|    gen/train/policy_gradient_loss  | -0.0129    |
|    gen/train/value_loss            | 10.5       |
------------

round:  34%|███▍      | 103/305 [32:35<1:02:08, 18.46s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 562         |
|    gen/rollout/ep_rew_mean         | 0.0299      |
|    gen/rollout/ep_rew_wrapped_mean | 341         |
|    gen/time/fps                    | 1831        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2703360     |
|    gen/train/approx_kl             | 0.032050066 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.714      |
|    gen/train/explained_variance    | 0.897       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.99        |
|    gen/train/n_updates             | 1640        |
|    gen/train/policy_gradient_loss  | -0.0197     |
|    gen/train/value_loss            | 7.99   

round:  34%|███▍      | 104/305 [32:54<1:01:55, 18.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0182      |
|    gen/rollout/ep_rew_wrapped_mean | 397         |
|    gen/time/fps                    | 1795        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2719744     |
|    gen/train/approx_kl             | 0.033633105 |
|    gen/train/clip_fraction         | 0.241       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.614      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.756       |
|    gen/train/n_updates             | 1650        |
|    gen/train/policy_gradient_loss  | -0.0261     |
|    gen/train/value_loss            | 4.81   

round:  34%|███▍      | 105/305 [33:13<1:01:53, 18.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0348      |
|    gen/rollout/ep_rew_wrapped_mean | 427         |
|    gen/time/fps                    | 1796        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2736128     |
|    gen/train/approx_kl             | 0.034270227 |
|    gen/train/clip_fraction         | 0.228       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.665      |
|    gen/train/explained_variance    | 0.943       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.863       |
|    gen/train/n_updates             | 1660        |
|    gen/train/policy_gradient_loss  | -0.0226     |
|    gen/train/value_loss            | 3.29   

round:  35%|███▍      | 106/305 [33:31<1:01:37, 18.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0338      |
|    gen/rollout/ep_rew_wrapped_mean | 413         |
|    gen/time/fps                    | 1793        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2752512     |
|    gen/train/approx_kl             | 0.029031225 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.713      |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.37        |
|    gen/train/n_updates             | 1670        |
|    gen/train/policy_gradient_loss  | -0.0154     |
|    gen/train/value_loss            | 10.3   

round:  35%|███▌      | 107/305 [33:50<1:01:15, 18.56s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 555        |
|    gen/rollout/ep_rew_mean         | 0.04       |
|    gen/rollout/ep_rew_wrapped_mean | 387        |
|    gen/time/fps                    | 1826       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 2768896    |
|    gen/train/approx_kl             | 0.03575242 |
|    gen/train/clip_fraction         | 0.239      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.685     |
|    gen/train/explained_variance    | 0.917      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.12       |
|    gen/train/n_updates             | 1680       |
|    gen/train/policy_gradient_loss  | -0.0218    |
|    gen/train/value_loss            | 3.66       |
------------

round:  35%|███▌      | 108/305 [34:08<1:00:34, 18.45s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0175      |
|    gen/rollout/ep_rew_wrapped_mean | 350         |
|    gen/time/fps                    | 1804        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2785280     |
|    gen/train/approx_kl             | 0.031769983 |
|    gen/train/clip_fraction         | 0.234       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.638      |
|    gen/train/explained_variance    | 0.897       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.13        |
|    gen/train/n_updates             | 1690        |
|    gen/train/policy_gradient_loss  | -0.0217     |
|    gen/train/value_loss            | 4.43   

round:  36%|███▌      | 109/305 [34:26<1:00:27, 18.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0193      |
|    gen/rollout/ep_rew_wrapped_mean | 357         |
|    gen/time/fps                    | 1776        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2801664     |
|    gen/train/approx_kl             | 0.027364196 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.6        |
|    gen/train/explained_variance    | 0.93        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.532       |
|    gen/train/n_updates             | 1700        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 4.36   

round:  36%|███▌      | 110/305 [34:46<1:00:43, 18.69s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0202      |
|    gen/rollout/ep_rew_wrapped_mean | 368         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2818048     |
|    gen/train/approx_kl             | 0.027876282 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.618      |
|    gen/train/explained_variance    | 0.946       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.612       |
|    gen/train/n_updates             | 1710        |
|    gen/train/policy_gradient_loss  | -0.0155     |
|    gen/train/value_loss            | 2.12   

round:  36%|███▋      | 111/305 [35:04<1:00:16, 18.64s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0151      |
|    gen/rollout/ep_rew_wrapped_mean | 397         |
|    gen/time/fps                    | 1823        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2834432     |
|    gen/train/approx_kl             | 0.027829561 |
|    gen/train/clip_fraction         | 0.171       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.608      |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.533       |
|    gen/train/n_updates             | 1720        |
|    gen/train/policy_gradient_loss  | -0.0117     |
|    gen/train/value_loss            | 4.15   

round:  37%|███▋      | 112/305 [35:23<59:43, 18.57s/it]  

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00708     |
|    gen/rollout/ep_rew_wrapped_mean | 410         |
|    gen/time/fps                    | 1779        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2850816     |
|    gen/train/approx_kl             | 0.032532092 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.673      |
|    gen/train/explained_variance    | 0.945       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.418       |
|    gen/train/n_updates             | 1730        |
|    gen/train/policy_gradient_loss  | -0.0157     |
|    gen/train/value_loss            | 1.77   

round:  37%|███▋      | 113/305 [35:41<59:29, 18.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00228     |
|    gen/rollout/ep_rew_wrapped_mean | 420         |
|    gen/time/fps                    | 1798        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2867200     |
|    gen/train/approx_kl             | 0.025629662 |
|    gen/train/clip_fraction         | 0.185       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.648      |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.48        |
|    gen/train/n_updates             | 1740        |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 3.1    

round:  37%|███▋      | 114/305 [36:00<59:02, 18.55s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 575         |
|    gen/rollout/ep_rew_mean         | 0.00228     |
|    gen/rollout/ep_rew_wrapped_mean | 423         |
|    gen/time/fps                    | 1720        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2883584     |
|    gen/train/approx_kl             | 0.026697055 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.645      |
|    gen/train/explained_variance    | 0.942       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.16        |
|    gen/train/n_updates             | 1750        |
|    gen/train/policy_gradient_loss  | -0.0188     |
|    gen/train/value_loss            | 4.27   

round:  38%|███▊      | 115/305 [36:19<59:49, 18.89s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 572        |
|    gen/rollout/ep_rew_mean         | 0.00759    |
|    gen/rollout/ep_rew_wrapped_mean | 411        |
|    gen/time/fps                    | 1775       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2899968    |
|    gen/train/approx_kl             | 0.09044491 |
|    gen/train/clip_fraction         | 0.208      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.512     |
|    gen/train/explained_variance    | 0.959      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.326      |
|    gen/train/n_updates             | 1760       |
|    gen/train/policy_gradient_loss  | -0.0241    |
|    gen/train/value_loss            | 1.28       |
------------

round:  38%|███▊      | 116/305 [36:38<59:17, 18.82s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0165      |
|    gen/rollout/ep_rew_wrapped_mean | 404         |
|    gen/time/fps                    | 1776        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2916352     |
|    gen/train/approx_kl             | 0.029610982 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.605      |
|    gen/train/explained_variance    | 0.949       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.35        |
|    gen/train/n_updates             | 1770        |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 7.81   

round:  38%|███▊      | 117/305 [36:57<58:52, 18.79s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0232      |
|    gen/rollout/ep_rew_wrapped_mean | 395         |
|    gen/time/fps                    | 1788        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2932736     |
|    gen/train/approx_kl             | 0.036818523 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.569      |
|    gen/train/explained_variance    | 0.942       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.645       |
|    gen/train/n_updates             | 1780        |
|    gen/train/policy_gradient_loss  | -0.0227     |
|    gen/train/value_loss            | 5.03   

round:  39%|███▊      | 118/305 [37:15<58:20, 18.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.044       |
|    gen/rollout/ep_rew_wrapped_mean | 377         |
|    gen/time/fps                    | 1805        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2949120     |
|    gen/train/approx_kl             | 0.052991703 |
|    gen/train/clip_fraction         | 0.253       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.658      |
|    gen/train/explained_variance    | 0.956       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.05        |
|    gen/train/n_updates             | 1790        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 5.86   

round:  39%|███▉      | 119/305 [37:34<58:11, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 557         |
|    gen/rollout/ep_rew_mean         | 0.0393      |
|    gen/rollout/ep_rew_wrapped_mean | 360         |
|    gen/time/fps                    | 1790        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 2965504     |
|    gen/train/approx_kl             | 0.025606975 |
|    gen/train/clip_fraction         | 0.178       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.655      |
|    gen/train/explained_variance    | 0.881       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.12        |
|    gen/train/n_updates             | 1800        |
|    gen/train/policy_gradient_loss  | -0.0138     |
|    gen/train/value_loss            | 10.8   

round:  39%|███▉      | 120/305 [37:53<57:50, 18.76s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 555        |
|    gen/rollout/ep_rew_mean         | 0.0425     |
|    gen/rollout/ep_rew_wrapped_mean | 357        |
|    gen/time/fps                    | 1816       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 2981888    |
|    gen/train/approx_kl             | 0.03695049 |
|    gen/train/clip_fraction         | 0.221      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.806     |
|    gen/train/explained_variance    | 0.848      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 3.36       |
|    gen/train/n_updates             | 1810       |
|    gen/train/policy_gradient_loss  | -0.0157    |
|    gen/train/value_loss            | 15         |
------------

round:  40%|███▉      | 121/305 [38:11<57:17, 18.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0442      |
|    gen/rollout/ep_rew_wrapped_mean | 371         |
|    gen/time/fps                    | 1829        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 2998272     |
|    gen/train/approx_kl             | 0.033847526 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.792      |
|    gen/train/explained_variance    | 0.904       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.59        |
|    gen/train/n_updates             | 1820        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 16.7   

round:  40%|████      | 122/305 [38:30<56:41, 18.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0358      |
|    gen/rollout/ep_rew_wrapped_mean | 394         |
|    gen/time/fps                    | 1823        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3014656     |
|    gen/train/approx_kl             | 0.026640637 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.837      |
|    gen/train/explained_variance    | 0.89        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 15          |
|    gen/train/n_updates             | 1830        |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 19.4   

round:  40%|████      | 123/305 [38:48<56:12, 18.53s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 560        |
|    gen/rollout/ep_rew_mean         | 0.0318     |
|    gen/rollout/ep_rew_wrapped_mean | 429        |
|    gen/time/fps                    | 1849       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3031040    |
|    gen/train/approx_kl             | 0.02717311 |
|    gen/train/clip_fraction         | 0.222      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.863     |
|    gen/train/explained_variance    | 0.875      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 16.7       |
|    gen/train/n_updates             | 1840       |
|    gen/train/policy_gradient_loss  | -0.0124    |
|    gen/train/value_loss            | 24.2       |
------------

round:  41%|████      | 124/305 [39:07<55:51, 18.52s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 559        |
|    gen/rollout/ep_rew_mean         | 0.0333     |
|    gen/rollout/ep_rew_wrapped_mean | 445        |
|    gen/time/fps                    | 1797       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3047424    |
|    gen/train/approx_kl             | 0.03546599 |
|    gen/train/clip_fraction         | 0.252      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.888     |
|    gen/train/explained_variance    | 0.837      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 7.19       |
|    gen/train/n_updates             | 1850       |
|    gen/train/policy_gradient_loss  | -0.0178    |
|    gen/train/value_loss            | 23.3       |
------------

round:  41%|████      | 125/305 [39:26<55:58, 18.66s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 562       |
|    gen/rollout/ep_rew_mean         | 0.0262    |
|    gen/rollout/ep_rew_wrapped_mean | 452       |
|    gen/time/fps                    | 1832      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 3063808   |
|    gen/train/approx_kl             | 0.0265777 |
|    gen/train/clip_fraction         | 0.23      |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.859    |
|    gen/train/explained_variance    | 0.918     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 47.2      |
|    gen/train/n_updates             | 1860      |
|    gen/train/policy_gradient_loss  | -0.0126   |
|    gen/train/value_loss            | 29.5      |
-------------------------------

round:  41%|████▏     | 126/305 [39:44<55:41, 18.67s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 564        |
|    gen/rollout/ep_rew_mean         | 0.0245     |
|    gen/rollout/ep_rew_wrapped_mean | 460        |
|    gen/time/fps                    | 1827       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3080192    |
|    gen/train/approx_kl             | 0.03367967 |
|    gen/train/clip_fraction         | 0.221      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.804     |
|    gen/train/explained_variance    | 0.938      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.44       |
|    gen/train/n_updates             | 1870       |
|    gen/train/policy_gradient_loss  | -0.015     |
|    gen/train/value_loss            | 21.6       |
------------

round:  42%|████▏     | 127/305 [40:03<55:18, 18.64s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0157      |
|    gen/rollout/ep_rew_wrapped_mean | 429         |
|    gen/time/fps                    | 1857        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3096576     |
|    gen/train/approx_kl             | 0.029696977 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.799      |
|    gen/train/explained_variance    | 0.926       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 14.8        |
|    gen/train/n_updates             | 1880        |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 20.7   

round:  42%|████▏     | 128/305 [40:21<54:33, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0136      |
|    gen/rollout/ep_rew_wrapped_mean | 385         |
|    gen/time/fps                    | 1825        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3112960     |
|    gen/train/approx_kl             | 0.024064526 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.773      |
|    gen/train/explained_variance    | 0.761       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.98        |
|    gen/train/n_updates             | 1890        |
|    gen/train/policy_gradient_loss  | -0.0134     |
|    gen/train/value_loss            | 15.7   

round:  42%|████▏     | 129/305 [40:39<54:10, 18.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0136      |
|    gen/rollout/ep_rew_wrapped_mean | 345         |
|    gen/time/fps                    | 1779        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3129344     |
|    gen/train/approx_kl             | 0.025473803 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.745      |
|    gen/train/explained_variance    | 0.722       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.69        |
|    gen/train/n_updates             | 1900        |
|    gen/train/policy_gradient_loss  | -0.0102     |
|    gen/train/value_loss            | 10.2   

round:  43%|████▎     | 130/305 [40:58<54:08, 18.57s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 567        |
|    gen/rollout/ep_rew_mean         | 0.0195     |
|    gen/rollout/ep_rew_wrapped_mean | 316        |
|    gen/time/fps                    | 1867       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3145728    |
|    gen/train/approx_kl             | 0.02653909 |
|    gen/train/clip_fraction         | 0.225      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.725     |
|    gen/train/explained_variance    | 0.82       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.606      |
|    gen/train/n_updates             | 1910       |
|    gen/train/policy_gradient_loss  | -0.0167    |
|    gen/train/value_loss            | 3.26       |
------------

round:  43%|████▎     | 131/305 [41:16<53:21, 18.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 557         |
|    gen/rollout/ep_rew_mean         | 0.0373      |
|    gen/rollout/ep_rew_wrapped_mean | 317         |
|    gen/time/fps                    | 1887        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3162112     |
|    gen/train/approx_kl             | 0.036685593 |
|    gen/train/clip_fraction         | 0.25        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.674      |
|    gen/train/explained_variance    | 0.792       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.6         |
|    gen/train/n_updates             | 1920        |
|    gen/train/policy_gradient_loss  | -0.0241     |
|    gen/train/value_loss            | 6.33   

round:  43%|████▎     | 132/305 [41:36<54:12, 18.80s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 555        |
|    gen/rollout/ep_rew_mean         | 0.0409     |
|    gen/rollout/ep_rew_wrapped_mean | 322        |
|    gen/time/fps                    | 1778       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3178496    |
|    gen/train/approx_kl             | 0.03387127 |
|    gen/train/clip_fraction         | 0.227      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.67      |
|    gen/train/explained_variance    | 0.771      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 3.96       |
|    gen/train/n_updates             | 1930       |
|    gen/train/policy_gradient_loss  | -0.0121    |
|    gen/train/value_loss            | 9.06       |
------------

round:  44%|████▎     | 133/305 [41:55<54:30, 19.01s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 557        |
|    gen/rollout/ep_rew_mean         | 0.0379     |
|    gen/rollout/ep_rew_wrapped_mean | 335        |
|    gen/time/fps                    | 1783       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3194880    |
|    gen/train/approx_kl             | 0.03029744 |
|    gen/train/clip_fraction         | 0.192      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.645     |
|    gen/train/explained_variance    | 0.892      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.42       |
|    gen/train/n_updates             | 1940       |
|    gen/train/policy_gradient_loss  | -0.0155    |
|    gen/train/value_loss            | 4.42       |
------------

round:  44%|████▍     | 134/305 [42:15<54:43, 19.20s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 552         |
|    gen/rollout/ep_rew_mean         | 0.0468      |
|    gen/rollout/ep_rew_wrapped_mean | 345         |
|    gen/time/fps                    | 1734        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3211264     |
|    gen/train/approx_kl             | 0.033373587 |
|    gen/train/clip_fraction         | 0.239       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.711      |
|    gen/train/explained_variance    | 0.862       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.05        |
|    gen/train/n_updates             | 1950        |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 6.15   

round:  44%|████▍     | 135/305 [42:34<54:18, 19.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 553         |
|    gen/rollout/ep_rew_mean         | 0.0463      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1774        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3227648     |
|    gen/train/approx_kl             | 0.029655375 |
|    gen/train/clip_fraction         | 0.208       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.729      |
|    gen/train/explained_variance    | 0.702       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.14        |
|    gen/train/n_updates             | 1960        |
|    gen/train/policy_gradient_loss  | -0.0116     |
|    gen/train/value_loss            | 11     

round:  45%|████▍     | 136/305 [42:53<53:45, 19.08s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 552        |
|    gen/rollout/ep_rew_mean         | 0.0469     |
|    gen/rollout/ep_rew_wrapped_mean | 352        |
|    gen/time/fps                    | 1727       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3244032    |
|    gen/train/approx_kl             | 0.03696865 |
|    gen/train/clip_fraction         | 0.244      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.683     |
|    gen/train/explained_variance    | 0.801      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.41       |
|    gen/train/n_updates             | 1970       |
|    gen/train/policy_gradient_loss  | -0.0203    |
|    gen/train/value_loss            | 5.75       |
------------

round:  45%|████▍     | 137/305 [43:13<53:59, 19.28s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0428      |
|    gen/rollout/ep_rew_wrapped_mean | 346         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3260416     |
|    gen/train/approx_kl             | 0.027433023 |
|    gen/train/clip_fraction         | 0.206       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.689      |
|    gen/train/explained_variance    | 0.802       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.63        |
|    gen/train/n_updates             | 1980        |
|    gen/train/policy_gradient_loss  | -0.0147     |
|    gen/train/value_loss            | 4.46   

round:  45%|████▌     | 138/305 [43:31<52:52, 19.00s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.0333      |
|    gen/rollout/ep_rew_wrapped_mean | 341         |
|    gen/time/fps                    | 1864        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3276800     |
|    gen/train/approx_kl             | 0.034675565 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.653      |
|    gen/train/explained_variance    | 0.746       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.982       |
|    gen/train/n_updates             | 1990        |
|    gen/train/policy_gradient_loss  | -0.0223     |
|    gen/train/value_loss            | 6.29   

round:  46%|████▌     | 139/305 [43:50<52:15, 18.89s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 562        |
|    gen/rollout/ep_rew_mean         | 0.0273     |
|    gen/rollout/ep_rew_wrapped_mean | 342        |
|    gen/time/fps                    | 1822       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3293184    |
|    gen/train/approx_kl             | 0.02911359 |
|    gen/train/clip_fraction         | 0.219      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.675     |
|    gen/train/explained_variance    | 0.619      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.05       |
|    gen/train/n_updates             | 2000       |
|    gen/train/policy_gradient_loss  | -0.0232    |
|    gen/train/value_loss            | 4.53       |
------------

round:  46%|████▌     | 140/305 [44:08<51:37, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0234      |
|    gen/rollout/ep_rew_wrapped_mean | 343         |
|    gen/time/fps                    | 1864        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3309568     |
|    gen/train/approx_kl             | 0.030983567 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.624      |
|    gen/train/explained_variance    | 0.826       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.05        |
|    gen/train/n_updates             | 2010        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 4.23   

round:  46%|████▌     | 141/305 [44:27<51:07, 18.71s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0198      |
|    gen/rollout/ep_rew_wrapped_mean | 339         |
|    gen/time/fps                    | 1890        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3325952     |
|    gen/train/approx_kl             | 0.025978776 |
|    gen/train/clip_fraction         | 0.19        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.659      |
|    gen/train/explained_variance    | 0.819       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.9         |
|    gen/train/n_updates             | 2020        |
|    gen/train/policy_gradient_loss  | -0.0158     |
|    gen/train/value_loss            | 4.21   

round:  47%|████▋     | 142/305 [44:45<50:27, 18.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.00916     |
|    gen/rollout/ep_rew_wrapped_mean | 372         |
|    gen/time/fps                    | 1909        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3342336     |
|    gen/train/approx_kl             | 0.030891515 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.599      |
|    gen/train/explained_variance    | 0.859       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.629       |
|    gen/train/n_updates             | 2030        |
|    gen/train/policy_gradient_loss  | -0.0228     |
|    gen/train/value_loss            | 5.36   

round:  47%|████▋     | 143/305 [45:03<49:47, 18.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.00952     |
|    gen/rollout/ep_rew_wrapped_mean | 391         |
|    gen/time/fps                    | 1875        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3358720     |
|    gen/train/approx_kl             | 0.042065646 |
|    gen/train/clip_fraction         | 0.217       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.666      |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.56        |
|    gen/train/n_updates             | 2040        |
|    gen/train/policy_gradient_loss  | -0.0229     |
|    gen/train/value_loss            | 4.17   

round:  47%|████▋     | 144/305 [45:22<49:27, 18.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00719     |
|    gen/rollout/ep_rew_wrapped_mean | 405         |
|    gen/time/fps                    | 1824        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3375104     |
|    gen/train/approx_kl             | 0.027620992 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.622      |
|    gen/train/explained_variance    | 0.935       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.74        |
|    gen/train/n_updates             | 2050        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 4.32   

round:  48%|████▊     | 145/305 [45:40<49:13, 18.46s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 573         |
|    gen/rollout/ep_rew_mean         | 0.00719     |
|    gen/rollout/ep_rew_wrapped_mean | 409         |
|    gen/time/fps                    | 1856        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3391488     |
|    gen/train/approx_kl             | 0.028304635 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.719      |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.78        |
|    gen/train/n_updates             | 2060        |
|    gen/train/policy_gradient_loss  | -0.0187     |
|    gen/train/value_loss            | 5.54   

round:  48%|████▊     | 146/305 [45:59<49:05, 18.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00861     |
|    gen/rollout/ep_rew_wrapped_mean | 408         |
|    gen/time/fps                    | 1828        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3407872     |
|    gen/train/approx_kl             | 0.030422756 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.642      |
|    gen/train/explained_variance    | 0.941       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.989       |
|    gen/train/n_updates             | 2070        |
|    gen/train/policy_gradient_loss  | -0.0173     |
|    gen/train/value_loss            | 3.74   

round:  48%|████▊     | 147/305 [46:17<48:42, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 574         |
|    gen/rollout/ep_rew_mean         | 0.00502     |
|    gen/rollout/ep_rew_wrapped_mean | 401         |
|    gen/time/fps                    | 1848        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3424256     |
|    gen/train/approx_kl             | 0.025612025 |
|    gen/train/clip_fraction         | 0.184       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.652      |
|    gen/train/explained_variance    | 0.923       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.683       |
|    gen/train/n_updates             | 2080        |
|    gen/train/policy_gradient_loss  | -0.0156     |
|    gen/train/value_loss            | 6.88   

round:  49%|████▊     | 148/305 [46:36<48:32, 18.55s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 574         |
|    gen/rollout/ep_rew_mean         | 0.00486     |
|    gen/rollout/ep_rew_wrapped_mean | 406         |
|    gen/time/fps                    | 1888        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3440640     |
|    gen/train/approx_kl             | 0.028402459 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.59       |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.74        |
|    gen/train/n_updates             | 2090        |
|    gen/train/policy_gradient_loss  | -0.0203     |
|    gen/train/value_loss            | 4.23   

round:  49%|████▉     | 149/305 [46:54<47:55, 18.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 574         |
|    gen/rollout/ep_rew_mean         | 0.00486     |
|    gen/rollout/ep_rew_wrapped_mean | 412         |
|    gen/time/fps                    | 1868        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3457024     |
|    gen/train/approx_kl             | 0.041495457 |
|    gen/train/clip_fraction         | 0.223       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.642      |
|    gen/train/explained_variance    | 0.875       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.02        |
|    gen/train/n_updates             | 2100        |
|    gen/train/policy_gradient_loss  | -0.0198     |
|    gen/train/value_loss            | 7.26   

round:  49%|████▉     | 150/305 [47:12<47:25, 18.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0178      |
|    gen/rollout/ep_rew_wrapped_mean | 403         |
|    gen/time/fps                    | 1831        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3473408     |
|    gen/train/approx_kl             | 0.041177332 |
|    gen/train/clip_fraction         | 0.282       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.602      |
|    gen/train/explained_variance    | 0.921       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.514       |
|    gen/train/n_updates             | 2110        |
|    gen/train/policy_gradient_loss  | -0.0275     |
|    gen/train/value_loss            | 3.8    

round:  50%|████▉     | 151/305 [47:31<47:06, 18.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0278      |
|    gen/rollout/ep_rew_wrapped_mean | 403         |
|    gen/time/fps                    | 1806        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3489792     |
|    gen/train/approx_kl             | 0.028915878 |
|    gen/train/clip_fraction         | 0.2         |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.671      |
|    gen/train/explained_variance    | 0.762       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 10.7        |
|    gen/train/n_updates             | 2120        |
|    gen/train/policy_gradient_loss  | -0.0186     |
|    gen/train/value_loss            | 18.2   

round:  50%|████▉     | 152/305 [47:49<47:06, 18.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0265      |
|    gen/rollout/ep_rew_wrapped_mean | 417         |
|    gen/time/fps                    | 1790        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3506176     |
|    gen/train/approx_kl             | 0.031540282 |
|    gen/train/clip_fraction         | 0.232       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.681      |
|    gen/train/explained_variance    | 0.765       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.94        |
|    gen/train/n_updates             | 2130        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 14.4   

round:  50%|█████     | 153/305 [48:08<47:14, 18.65s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0148      |
|    gen/rollout/ep_rew_wrapped_mean | 429         |
|    gen/time/fps                    | 1875        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3522560     |
|    gen/train/approx_kl             | 0.033375014 |
|    gen/train/clip_fraction         | 0.238       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.783      |
|    gen/train/explained_variance    | 0.746       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.77        |
|    gen/train/n_updates             | 2140        |
|    gen/train/policy_gradient_loss  | -0.0192     |
|    gen/train/value_loss            | 12.3   

round:  50%|█████     | 154/305 [48:27<46:36, 18.52s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00655     |
|    gen/rollout/ep_rew_wrapped_mean | 444         |
|    gen/time/fps                    | 1829        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3538944     |
|    gen/train/approx_kl             | 0.026542243 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.773      |
|    gen/train/explained_variance    | 0.666       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.62        |
|    gen/train/n_updates             | 2150        |
|    gen/train/policy_gradient_loss  | -0.0166     |
|    gen/train/value_loss            | 16.8   

round:  51%|█████     | 155/305 [48:45<46:16, 18.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 550         |
|    gen/rollout/ep_rew_mean         | 0.0492      |
|    gen/rollout/ep_rew_wrapped_mean | 452         |
|    gen/time/fps                    | 1838        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3555328     |
|    gen/train/approx_kl             | 0.038286306 |
|    gen/train/clip_fraction         | 0.257       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.832      |
|    gen/train/explained_variance    | 0.854       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 12.3        |
|    gen/train/n_updates             | 2160        |
|    gen/train/policy_gradient_loss  | -0.0184     |
|    gen/train/value_loss            | 10.5   

round:  51%|█████     | 156/305 [49:04<45:53, 18.48s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 543         |
|    gen/rollout/ep_rew_mean         | 0.0653      |
|    gen/rollout/ep_rew_wrapped_mean | 440         |
|    gen/time/fps                    | 1844        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3571712     |
|    gen/train/approx_kl             | 0.022856116 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.776       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 18.7        |
|    gen/train/n_updates             | 2170        |
|    gen/train/policy_gradient_loss  | -0.0134     |
|    gen/train/value_loss            | 30.2   

round:  51%|█████▏    | 157/305 [49:22<45:39, 18.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 543         |
|    gen/rollout/ep_rew_mean         | 0.0672      |
|    gen/rollout/ep_rew_wrapped_mean | 415         |
|    gen/time/fps                    | 1826        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3588096     |
|    gen/train/approx_kl             | 0.027216326 |
|    gen/train/clip_fraction         | 0.231       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.816      |
|    gen/train/explained_variance    | 0.844       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.45        |
|    gen/train/n_updates             | 2180        |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 14.3   

round:  52%|█████▏    | 158/305 [49:41<45:28, 18.56s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.055       |
|    gen/rollout/ep_rew_wrapped_mean | 430         |
|    gen/time/fps                    | 1784        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3604480     |
|    gen/train/approx_kl             | 0.027697254 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.815       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.44        |
|    gen/train/n_updates             | 2190        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 11     

round:  52%|█████▏    | 159/305 [50:00<45:50, 18.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0366      |
|    gen/rollout/ep_rew_wrapped_mean | 400         |
|    gen/time/fps                    | 1763        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3620864     |
|    gen/train/approx_kl             | 0.035645466 |
|    gen/train/clip_fraction         | 0.306       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.813      |
|    gen/train/explained_variance    | 0.75        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.74        |
|    gen/train/n_updates             | 2200        |
|    gen/train/policy_gradient_loss  | -0.0427     |
|    gen/train/value_loss            | 16.4   

round:  52%|█████▏    | 160/305 [50:19<45:30, 18.83s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 562        |
|    gen/rollout/ep_rew_mean         | 0.0295     |
|    gen/rollout/ep_rew_wrapped_mean | 450        |
|    gen/time/fps                    | 1816       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3637248    |
|    gen/train/approx_kl             | 0.03039296 |
|    gen/train/clip_fraction         | 0.221      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.754     |
|    gen/train/explained_variance    | 0.905      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 16.4       |
|    gen/train/n_updates             | 2210       |
|    gen/train/policy_gradient_loss  | -0.0188    |
|    gen/train/value_loss            | 22.6       |
------------

round:  53%|█████▎    | 161/305 [50:38<44:56, 18.73s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0207      |
|    gen/rollout/ep_rew_wrapped_mean | 435         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3653632     |
|    gen/train/approx_kl             | 0.027660131 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.867      |
|    gen/train/explained_variance    | 0.884       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.08        |
|    gen/train/n_updates             | 2220        |
|    gen/train/policy_gradient_loss  | -0.012      |
|    gen/train/value_loss            | 13.6   

round:  53%|█████▎    | 162/305 [50:57<45:09, 18.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0427      |
|    gen/rollout/ep_rew_wrapped_mean | 410         |
|    gen/time/fps                    | 1744        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3670016     |
|    gen/train/approx_kl             | 0.036744297 |
|    gen/train/clip_fraction         | 0.292       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.811      |
|    gen/train/explained_variance    | 0.85        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.68        |
|    gen/train/n_updates             | 2230        |
|    gen/train/policy_gradient_loss  | -0.04       |
|    gen/train/value_loss            | 5.1    

round:  53%|█████▎    | 163/305 [51:16<44:48, 18.93s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0553      |
|    gen/rollout/ep_rew_wrapped_mean | 362         |
|    gen/time/fps                    | 1760        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3686400     |
|    gen/train/approx_kl             | 0.030366581 |
|    gen/train/clip_fraction         | 0.254       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.837      |
|    gen/train/explained_variance    | 0.734       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.979       |
|    gen/train/n_updates             | 2240        |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 12.6   

round:  54%|█████▍    | 164/305 [51:35<44:34, 18.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 547         |
|    gen/rollout/ep_rew_mean         | 0.0585      |
|    gen/rollout/ep_rew_wrapped_mean | 326         |
|    gen/time/fps                    | 1802        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3702784     |
|    gen/train/approx_kl             | 0.026134785 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.81       |
|    gen/train/explained_variance    | 0.71        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.4         |
|    gen/train/n_updates             | 2250        |
|    gen/train/policy_gradient_loss  | -0.012      |
|    gen/train/value_loss            | 11.9   

round:  54%|█████▍    | 165/305 [51:54<44:16, 18.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0454      |
|    gen/rollout/ep_rew_wrapped_mean | 325         |
|    gen/time/fps                    | 1770        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3719168     |
|    gen/train/approx_kl             | 0.027669545 |
|    gen/train/clip_fraction         | 0.254       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.813      |
|    gen/train/explained_variance    | 0.839       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.59        |
|    gen/train/n_updates             | 2260        |
|    gen/train/policy_gradient_loss  | -0.0238     |
|    gen/train/value_loss            | 7.61   

round:  54%|█████▍    | 166/305 [52:13<43:49, 18.92s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0252      |
|    gen/rollout/ep_rew_wrapped_mean | 350         |
|    gen/time/fps                    | 1754        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3735552     |
|    gen/train/approx_kl             | 0.029431816 |
|    gen/train/clip_fraction         | 0.233       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.784      |
|    gen/train/explained_variance    | 0.72        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.47        |
|    gen/train/n_updates             | 2270        |
|    gen/train/policy_gradient_loss  | -0.0161     |
|    gen/train/value_loss            | 9.67   

round:  55%|█████▍    | 167/305 [52:32<43:51, 19.07s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 568        |
|    gen/rollout/ep_rew_mean         | 0.0204     |
|    gen/rollout/ep_rew_wrapped_mean | 376        |
|    gen/time/fps                    | 1809       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3751936    |
|    gen/train/approx_kl             | 0.03342977 |
|    gen/train/clip_fraction         | 0.233      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.73      |
|    gen/train/explained_variance    | 0.84       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.39       |
|    gen/train/n_updates             | 2280       |
|    gen/train/policy_gradient_loss  | -0.0161    |
|    gen/train/value_loss            | 6.22       |
------------

round:  55%|█████▌    | 168/305 [52:51<43:18, 18.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0265      |
|    gen/rollout/ep_rew_wrapped_mean | 390         |
|    gen/time/fps                    | 1818        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3768320     |
|    gen/train/approx_kl             | 0.030735742 |
|    gen/train/clip_fraction         | 0.24        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.763      |
|    gen/train/explained_variance    | 0.875       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.04        |
|    gen/train/n_updates             | 2290        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 8.62   

round:  55%|█████▌    | 169/305 [53:09<42:40, 18.83s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0384      |
|    gen/rollout/ep_rew_wrapped_mean | 425         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3784704     |
|    gen/train/approx_kl             | 0.028224112 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.72       |
|    gen/train/explained_variance    | 0.618       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 22.2        |
|    gen/train/n_updates             | 2300        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 20.7   

round:  56%|█████▌    | 170/305 [53:28<42:16, 18.79s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0374      |
|    gen/rollout/ep_rew_wrapped_mean | 467         |
|    gen/time/fps                    | 1771        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3801088     |
|    gen/train/approx_kl             | 0.028654769 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.709      |
|    gen/train/explained_variance    | 0.847       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.29        |
|    gen/train/n_updates             | 2310        |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 30.8   

round:  56%|█████▌    | 171/305 [53:47<42:07, 18.86s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 560        |
|    gen/rollout/ep_rew_mean         | 0.0334     |
|    gen/rollout/ep_rew_wrapped_mean | 483        |
|    gen/time/fps                    | 1843       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3817472    |
|    gen/train/approx_kl             | 0.03263421 |
|    gen/train/clip_fraction         | 0.207      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.763     |
|    gen/train/explained_variance    | 0.809      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 11.1       |
|    gen/train/n_updates             | 2320       |
|    gen/train/policy_gradient_loss  | -0.016     |
|    gen/train/value_loss            | 21.7       |
------------

round:  56%|█████▋    | 172/305 [54:06<41:42, 18.81s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0225      |
|    gen/rollout/ep_rew_wrapped_mean | 522         |
|    gen/time/fps                    | 1812        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3833856     |
|    gen/train/approx_kl             | 0.030362628 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.779      |
|    gen/train/explained_variance    | 0.819       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.75        |
|    gen/train/n_updates             | 2330        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 18.9   

round:  57%|█████▋    | 173/305 [54:25<41:19, 18.78s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 575        |
|    gen/rollout/ep_rew_mean         | 0.00292    |
|    gen/rollout/ep_rew_wrapped_mean | 527        |
|    gen/time/fps                    | 1807       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3850240    |
|    gen/train/approx_kl             | 0.02527996 |
|    gen/train/clip_fraction         | 0.217      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.827     |
|    gen/train/explained_variance    | 0.84       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.39       |
|    gen/train/n_updates             | 2340       |
|    gen/train/policy_gradient_loss  | -0.0123    |
|    gen/train/value_loss            | 21.5       |
------------

round:  57%|█████▋    | 174/305 [54:43<40:51, 18.72s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0113      |
|    gen/rollout/ep_rew_wrapped_mean | 537         |
|    gen/time/fps                    | 1883        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3866624     |
|    gen/train/approx_kl             | 0.030697532 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.758      |
|    gen/train/explained_variance    | 0.881       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 47.4        |
|    gen/train/n_updates             | 2350        |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 38.7   

round:  57%|█████▋    | 175/305 [55:01<40:13, 18.57s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0177      |
|    gen/rollout/ep_rew_wrapped_mean | 498         |
|    gen/time/fps                    | 1786        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3883008     |
|    gen/train/approx_kl             | 0.032339808 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.822      |
|    gen/train/explained_variance    | 0.906       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11          |
|    gen/train/n_updates             | 2360        |
|    gen/train/policy_gradient_loss  | -0.0187     |
|    gen/train/value_loss            | 31.1   

round:  58%|█████▊    | 176/305 [55:20<40:03, 18.63s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 552         |
|    gen/rollout/ep_rew_mean         | 0.0508      |
|    gen/rollout/ep_rew_wrapped_mean | 442         |
|    gen/time/fps                    | 1815        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3899392     |
|    gen/train/approx_kl             | 0.024750656 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.796      |
|    gen/train/explained_variance    | 0.889       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 20.5        |
|    gen/train/n_updates             | 2370        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 33.7   

round:  58%|█████▊    | 177/305 [55:39<39:37, 18.57s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 553        |
|    gen/rollout/ep_rew_mean         | 0.0477     |
|    gen/rollout/ep_rew_wrapped_mean | 440        |
|    gen/time/fps                    | 1818       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 3915776    |
|    gen/train/approx_kl             | 0.01699498 |
|    gen/train/clip_fraction         | 0.137      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.701     |
|    gen/train/explained_variance    | 0.825      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 9.77       |
|    gen/train/n_updates             | 2380       |
|    gen/train/policy_gradient_loss  | -0.0109    |
|    gen/train/value_loss            | 70.1       |
------------

round:  58%|█████▊    | 178/305 [55:57<39:19, 18.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 548         |
|    gen/rollout/ep_rew_mean         | 0.0554      |
|    gen/rollout/ep_rew_wrapped_mean | 469         |
|    gen/time/fps                    | 1840        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3932160     |
|    gen/train/approx_kl             | 0.024783324 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.881       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.19        |
|    gen/train/n_updates             | 2390        |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 33.9   

round:  59%|█████▊    | 179/305 [56:16<39:04, 18.61s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 552        |
|    gen/rollout/ep_rew_mean         | 0.0458     |
|    gen/rollout/ep_rew_wrapped_mean | 520        |
|    gen/time/fps                    | 1838       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 3948544    |
|    gen/train/approx_kl             | 0.02355277 |
|    gen/train/clip_fraction         | 0.189      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.739     |
|    gen/train/explained_variance    | 0.934      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 7.59       |
|    gen/train/n_updates             | 2400       |
|    gen/train/policy_gradient_loss  | -0.013     |
|    gen/train/value_loss            | 57.8       |
------------

round:  59%|█████▉    | 180/305 [56:34<38:32, 18.50s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0418      |
|    gen/rollout/ep_rew_wrapped_mean | 521         |
|    gen/time/fps                    | 1837        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3964928     |
|    gen/train/approx_kl             | 0.025486588 |
|    gen/train/clip_fraction         | 0.209       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.816      |
|    gen/train/explained_variance    | 0.946       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 118         |
|    gen/train/n_updates             | 2410        |
|    gen/train/policy_gradient_loss  | -0.0146     |
|    gen/train/value_loss            | 48     

round:  59%|█████▉    | 181/305 [56:53<38:28, 18.61s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 547         |
|    gen/rollout/ep_rew_mean         | 0.0554      |
|    gen/rollout/ep_rew_wrapped_mean | 592         |
|    gen/time/fps                    | 1858        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 3981312     |
|    gen/train/approx_kl             | 0.020859309 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.924       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 29.9        |
|    gen/train/n_updates             | 2420        |
|    gen/train/policy_gradient_loss  | -0.0123     |
|    gen/train/value_loss            | 119    

round:  60%|█████▉    | 182/305 [57:12<38:06, 18.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 551         |
|    gen/rollout/ep_rew_mean         | 0.0504      |
|    gen/rollout/ep_rew_wrapped_mean | 630         |
|    gen/time/fps                    | 1802        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 3997696     |
|    gen/train/approx_kl             | 0.022449523 |
|    gen/train/clip_fraction         | 0.178       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.815      |
|    gen/train/explained_variance    | 0.908       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 58.4        |
|    gen/train/n_updates             | 2430        |
|    gen/train/policy_gradient_loss  | -0.0149     |
|    gen/train/value_loss            | 99.9   

round:  60%|██████    | 183/305 [57:30<37:57, 18.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 553         |
|    gen/rollout/ep_rew_mean         | 0.0464      |
|    gen/rollout/ep_rew_wrapped_mean | 604         |
|    gen/time/fps                    | 1863        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4014080     |
|    gen/train/approx_kl             | 0.024073152 |
|    gen/train/clip_fraction         | 0.223       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.865      |
|    gen/train/explained_variance    | 0.923       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.12        |
|    gen/train/n_updates             | 2440        |
|    gen/train/policy_gradient_loss  | -0.0117     |
|    gen/train/value_loss            | 75.5   

round:  60%|██████    | 184/305 [57:49<37:28, 18.58s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 555        |
|    gen/rollout/ep_rew_mean         | 0.0426     |
|    gen/rollout/ep_rew_wrapped_mean | 510        |
|    gen/time/fps                    | 1830       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 4030464    |
|    gen/train/approx_kl             | 0.01489798 |
|    gen/train/clip_fraction         | 0.132      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.753     |
|    gen/train/explained_variance    | 0.909      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 78.9       |
|    gen/train/n_updates             | 2450       |
|    gen/train/policy_gradient_loss  | -0.00829   |
|    gen/train/value_loss            | 50.1       |
------------

round:  61%|██████    | 185/305 [58:08<37:19, 18.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0508      |
|    gen/rollout/ep_rew_wrapped_mean | 426         |
|    gen/time/fps                    | 1817        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4046848     |
|    gen/train/approx_kl             | 0.018024681 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.722      |
|    gen/train/explained_variance    | 0.909       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.2        |
|    gen/train/n_updates             | 2460        |
|    gen/train/policy_gradient_loss  | -0.00821    |
|    gen/train/value_loss            | 51.1   

round:  61%|██████    | 186/305 [58:26<36:59, 18.65s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 543         |
|    gen/rollout/ep_rew_mean         | 0.0645      |
|    gen/rollout/ep_rew_wrapped_mean | 374         |
|    gen/time/fps                    | 1865        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4063232     |
|    gen/train/approx_kl             | 0.019648317 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.651      |
|    gen/train/explained_variance    | 0.91        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.01        |
|    gen/train/n_updates             | 2470        |
|    gen/train/policy_gradient_loss  | -0.0105     |
|    gen/train/value_loss            | 45.5   

round:  61%|██████▏   | 187/305 [58:45<36:37, 18.62s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 540         |
|    gen/rollout/ep_rew_mean         | 0.0729      |
|    gen/rollout/ep_rew_wrapped_mean | 365         |
|    gen/time/fps                    | 1902        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4079616     |
|    gen/train/approx_kl             | 0.029883543 |
|    gen/train/clip_fraction         | 0.219       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.679      |
|    gen/train/explained_variance    | 0.945       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.75        |
|    gen/train/n_updates             | 2480        |
|    gen/train/policy_gradient_loss  | -0.0161     |
|    gen/train/value_loss            | 35.5   

round:  62%|██████▏   | 188/305 [59:03<36:08, 18.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 541         |
|    gen/rollout/ep_rew_mean         | 0.0706      |
|    gen/rollout/ep_rew_wrapped_mean | 362         |
|    gen/time/fps                    | 1833        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4096000     |
|    gen/train/approx_kl             | 0.025657307 |
|    gen/train/clip_fraction         | 0.163       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.568      |
|    gen/train/explained_variance    | 0.947       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.21        |
|    gen/train/n_updates             | 2490        |
|    gen/train/policy_gradient_loss  | -0.0141     |
|    gen/train/value_loss            | 18.1   

round:  62%|██████▏   | 189/305 [59:22<35:50, 18.53s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 547         |
|    gen/rollout/ep_rew_mean         | 0.057       |
|    gen/rollout/ep_rew_wrapped_mean | 351         |
|    gen/time/fps                    | 1823        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4112384     |
|    gen/train/approx_kl             | 0.028163686 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.709      |
|    gen/train/explained_variance    | 0.907       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.45        |
|    gen/train/n_updates             | 2500        |
|    gen/train/policy_gradient_loss  | -0.0171     |
|    gen/train/value_loss            | 18.5   

round:  62%|██████▏   | 190/305 [59:40<35:39, 18.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 550         |
|    gen/rollout/ep_rew_mean         | 0.0528      |
|    gen/rollout/ep_rew_wrapped_mean | 356         |
|    gen/time/fps                    | 1835        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4128768     |
|    gen/train/approx_kl             | 0.046320714 |
|    gen/train/clip_fraction         | 0.211       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.666      |
|    gen/train/explained_variance    | 0.897       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.13        |
|    gen/train/n_updates             | 2510        |
|    gen/train/policy_gradient_loss  | -0.0202     |
|    gen/train/value_loss            | 18.8   

round:  63%|██████▎   | 191/305 [59:59<35:19, 18.59s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 545         |
|    gen/rollout/ep_rew_mean         | 0.0619      |
|    gen/rollout/ep_rew_wrapped_mean | 366         |
|    gen/time/fps                    | 1800        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4145152     |
|    gen/train/approx_kl             | 0.027387517 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.719      |
|    gen/train/explained_variance    | 0.888       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6           |
|    gen/train/n_updates             | 2520        |
|    gen/train/policy_gradient_loss  | -0.0172     |
|    gen/train/value_loss            | 26.8   

round:  63%|██████▎   | 192/305 [1:00:18<35:02, 18.60s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 552         |
|    gen/rollout/ep_rew_mean         | 0.0491      |
|    gen/rollout/ep_rew_wrapped_mean | 377         |
|    gen/time/fps                    | 1830        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4161536     |
|    gen/train/approx_kl             | 0.029162968 |
|    gen/train/clip_fraction         | 0.252       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.802      |
|    gen/train/explained_variance    | 0.865       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.02        |
|    gen/train/n_updates             | 2530        |
|    gen/train/policy_gradient_loss  | -0.0188     |
|    gen/train/value_loss            | 11.9   

round:  63%|██████▎   | 193/305 [1:00:36<34:46, 18.63s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0459      |
|    gen/rollout/ep_rew_wrapped_mean | 369         |
|    gen/time/fps                    | 1878        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4177920     |
|    gen/train/approx_kl             | 0.023683984 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.726      |
|    gen/train/explained_variance    | 0.878       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 12.9        |
|    gen/train/n_updates             | 2540        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 13     

round:  64%|██████▎   | 194/305 [1:00:54<34:12, 18.49s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0435      |
|    gen/rollout/ep_rew_wrapped_mean | 375         |
|    gen/time/fps                    | 1789        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4194304     |
|    gen/train/approx_kl             | 0.027833203 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.768      |
|    gen/train/explained_variance    | 0.845       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.8        |
|    gen/train/n_updates             | 2550        |
|    gen/train/policy_gradient_loss  | -0.0139     |
|    gen/train/value_loss            | 17.8   

round:  64%|██████▍   | 195/305 [1:01:14<34:21, 18.74s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 560         |
|    gen/rollout/ep_rew_mean         | 0.0303      |
|    gen/rollout/ep_rew_wrapped_mean | 365         |
|    gen/time/fps                    | 1839        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4210688     |
|    gen/train/approx_kl             | 0.024190735 |
|    gen/train/clip_fraction         | 0.193       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.739      |
|    gen/train/explained_variance    | 0.848       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.01        |
|    gen/train/n_updates             | 2560        |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 14.7   

round:  64%|██████▍   | 196/305 [1:01:32<33:52, 18.64s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 559        |
|    gen/rollout/ep_rew_mean         | 0.0341     |
|    gen/rollout/ep_rew_wrapped_mean | 387        |
|    gen/time/fps                    | 1809       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 4227072    |
|    gen/train/approx_kl             | 0.02350295 |
|    gen/train/clip_fraction         | 0.2        |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.698     |
|    gen/train/explained_variance    | 0.935      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.63       |
|    gen/train/n_updates             | 2570       |
|    gen/train/policy_gradient_loss  | -0.0144    |
|    gen/train/value_loss            | 9.25       |
------------

round:  65%|██████▍   | 197/305 [1:01:51<33:40, 18.71s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 559        |
|    gen/rollout/ep_rew_mean         | 0.0341     |
|    gen/rollout/ep_rew_wrapped_mean | 362        |
|    gen/time/fps                    | 1905       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 4243456    |
|    gen/train/approx_kl             | 0.03466779 |
|    gen/train/clip_fraction         | 0.243      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.754     |
|    gen/train/explained_variance    | 0.821      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.11       |
|    gen/train/n_updates             | 2580       |
|    gen/train/policy_gradient_loss  | -0.0298    |
|    gen/train/value_loss            | 11.7       |
------------

round:  65%|██████▍   | 198/305 [1:02:09<32:50, 18.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0401      |
|    gen/rollout/ep_rew_wrapped_mean | 347         |
|    gen/time/fps                    | 1989        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4259840     |
|    gen/train/approx_kl             | 0.056743544 |
|    gen/train/clip_fraction         | 0.265       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.784      |
|    gen/train/explained_variance    | 0.856       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.65        |
|    gen/train/n_updates             | 2590        |
|    gen/train/policy_gradient_loss  | -0.0312     |
|    gen/train/value_loss            | 6.93   

round:  65%|██████▌   | 199/305 [1:02:26<31:59, 18.10s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0338      |
|    gen/rollout/ep_rew_wrapped_mean | 332         |
|    gen/time/fps                    | 1990        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4276224     |
|    gen/train/approx_kl             | 0.031441312 |
|    gen/train/clip_fraction         | 0.23        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.777      |
|    gen/train/explained_variance    | 0.87        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.34        |
|    gen/train/n_updates             | 2600        |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 10.2   

round:  66%|██████▌   | 200/305 [1:02:43<31:08, 17.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 562         |
|    gen/rollout/ep_rew_mean         | 0.0266      |
|    gen/rollout/ep_rew_wrapped_mean | 361         |
|    gen/time/fps                    | 1992        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4292608     |
|    gen/train/approx_kl             | 0.033708595 |
|    gen/train/clip_fraction         | 0.229       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.72       |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.53        |
|    gen/train/n_updates             | 2610        |
|    gen/train/policy_gradient_loss  | -0.0175     |
|    gen/train/value_loss            | 12.7   

round:  66%|██████▌   | 201/305 [1:03:01<30:34, 17.64s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.0134     |
|    gen/rollout/ep_rew_wrapped_mean | 403        |
|    gen/time/fps                    | 1981       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 4308992    |
|    gen/train/approx_kl             | 0.02760127 |
|    gen/train/clip_fraction         | 0.223      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.75      |
|    gen/train/explained_variance    | 0.921      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.72       |
|    gen/train/n_updates             | 2620       |
|    gen/train/policy_gradient_loss  | -0.0134    |
|    gen/train/value_loss            | 5.66       |
------------

round:  66%|██████▌   | 202/305 [1:03:18<30:04, 17.52s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0186      |
|    gen/rollout/ep_rew_wrapped_mean | 439         |
|    gen/time/fps                    | 1981        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4325376     |
|    gen/train/approx_kl             | 0.026271667 |
|    gen/train/clip_fraction         | 0.223       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.771      |
|    gen/train/explained_variance    | 0.931       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.83        |
|    gen/train/n_updates             | 2630        |
|    gen/train/policy_gradient_loss  | -0.0143     |
|    gen/train/value_loss            | 7.93   

round:  67%|██████▋   | 203/305 [1:03:35<29:37, 17.43s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 564       |
|    gen/rollout/ep_rew_mean         | 0.0265    |
|    gen/rollout/ep_rew_wrapped_mean | 433       |
|    gen/time/fps                    | 2028      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 8         |
|    gen/time/total_timesteps        | 4341760   |
|    gen/train/approx_kl             | 0.0280587 |
|    gen/train/clip_fraction         | 0.223     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.765    |
|    gen/train/explained_variance    | 0.937     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 38.5      |
|    gen/train/n_updates             | 2640      |
|    gen/train/policy_gradient_loss  | -0.0152   |
|    gen/train/value_loss            | 11.3      |
-------------------------------

round:  67%|██████▋   | 204/305 [1:03:52<29:18, 17.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0242      |
|    gen/rollout/ep_rew_wrapped_mean | 413         |
|    gen/time/fps                    | 1979        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4358144     |
|    gen/train/approx_kl             | 0.029869784 |
|    gen/train/clip_fraction         | 0.232       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.784      |
|    gen/train/explained_variance    | 0.937       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.67        |
|    gen/train/n_updates             | 2650        |
|    gen/train/policy_gradient_loss  | -0.0173     |
|    gen/train/value_loss            | 7.22   

round:  67%|██████▋   | 205/305 [1:04:10<29:01, 17.42s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0271      |
|    gen/rollout/ep_rew_wrapped_mean | 402         |
|    gen/time/fps                    | 1963        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4374528     |
|    gen/train/approx_kl             | 0.029836865 |
|    gen/train/clip_fraction         | 0.233       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.771      |
|    gen/train/explained_variance    | 0.967       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.22        |
|    gen/train/n_updates             | 2660        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 4.05   

round:  68%|██████▊   | 206/305 [1:04:27<28:47, 17.45s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 563       |
|    gen/rollout/ep_rew_mean         | 0.0289    |
|    gen/rollout/ep_rew_wrapped_mean | 370       |
|    gen/time/fps                    | 1764      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 9         |
|    gen/time/total_timesteps        | 4390912   |
|    gen/train/approx_kl             | 0.0323715 |
|    gen/train/clip_fraction         | 0.232     |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.767    |
|    gen/train/explained_variance    | 0.948     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 1.84      |
|    gen/train/n_updates             | 2670      |
|    gen/train/policy_gradient_loss  | -0.0158   |
|    gen/train/value_loss            | 8         |
-------------------------------

round:  68%|██████▊   | 207/305 [1:04:47<29:44, 18.21s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0277      |
|    gen/rollout/ep_rew_wrapped_mean | 334         |
|    gen/time/fps                    | 1831        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4407296     |
|    gen/train/approx_kl             | 0.036014833 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.745      |
|    gen/train/explained_variance    | 0.929       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.31        |
|    gen/train/n_updates             | 2680        |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 7.76   

round:  68%|██████▊   | 208/305 [1:05:06<29:55, 18.51s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0254      |
|    gen/rollout/ep_rew_wrapped_mean | 330         |
|    gen/time/fps                    | 1794        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4423680     |
|    gen/train/approx_kl             | 0.034465726 |
|    gen/train/clip_fraction         | 0.212       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.748      |
|    gen/train/explained_variance    | 0.919       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.36        |
|    gen/train/n_updates             | 2690        |
|    gen/train/policy_gradient_loss  | -0.0127     |
|    gen/train/value_loss            | 11.8   

round:  69%|██████▊   | 209/305 [1:05:26<30:02, 18.77s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0164      |
|    gen/rollout/ep_rew_wrapped_mean | 347         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4440064     |
|    gen/train/approx_kl             | 0.026949037 |
|    gen/train/clip_fraction         | 0.203       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.725      |
|    gen/train/explained_variance    | 0.885       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.734       |
|    gen/train/n_updates             | 2700        |
|    gen/train/policy_gradient_loss  | -0.0124     |
|    gen/train/value_loss            | 9.46   

round:  69%|██████▉   | 210/305 [1:05:45<29:43, 18.77s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.016      |
|    gen/rollout/ep_rew_wrapped_mean | 395        |
|    gen/time/fps                    | 1842       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 4456448    |
|    gen/train/approx_kl             | 0.03921914 |
|    gen/train/clip_fraction         | 0.21       |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.711     |
|    gen/train/explained_variance    | 0.89       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 47.1       |
|    gen/train/n_updates             | 2710       |
|    gen/train/policy_gradient_loss  | -0.0152    |
|    gen/train/value_loss            | 20.8       |
------------

round:  69%|██████▉   | 211/305 [1:06:04<29:36, 18.90s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0353      |
|    gen/rollout/ep_rew_wrapped_mean | 452         |
|    gen/time/fps                    | 1832        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4472832     |
|    gen/train/approx_kl             | 0.023716858 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.715      |
|    gen/train/explained_variance    | 0.903       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.72        |
|    gen/train/n_updates             | 2720        |
|    gen/train/policy_gradient_loss  | -0.0115     |
|    gen/train/value_loss            | 42     

round:  70%|██████▉   | 212/305 [1:06:23<29:21, 18.94s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0342      |
|    gen/rollout/ep_rew_wrapped_mean | 506         |
|    gen/time/fps                    | 1666        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4489216     |
|    gen/train/approx_kl             | 0.021330139 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.674      |
|    gen/train/explained_variance    | 0.922       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.5         |
|    gen/train/n_updates             | 2730        |
|    gen/train/policy_gradient_loss  | -0.012      |
|    gen/train/value_loss            | 63     

round:  70%|██████▉   | 213/305 [1:06:43<29:41, 19.37s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 553         |
|    gen/rollout/ep_rew_mean         | 0.0426      |
|    gen/rollout/ep_rew_wrapped_mean | 530         |
|    gen/time/fps                    | 1855        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 4505600     |
|    gen/train/approx_kl             | 0.024894524 |
|    gen/train/clip_fraction         | 0.215       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.721      |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 42.6        |
|    gen/train/n_updates             | 2740        |
|    gen/train/policy_gradient_loss  | -0.013      |
|    gen/train/value_loss            | 40.4   

round:  70%|███████   | 214/305 [1:07:03<29:21, 19.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0233      |
|    gen/rollout/ep_rew_wrapped_mean | 548         |
|    gen/time/fps                    | 1698        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4521984     |
|    gen/train/approx_kl             | 0.019781116 |
|    gen/train/clip_fraction         | 0.16        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.663      |
|    gen/train/explained_variance    | 0.944       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.17        |
|    gen/train/n_updates             | 2750        |
|    gen/train/policy_gradient_loss  | -0.0101     |
|    gen/train/value_loss            | 40.4   

round:  70%|███████   | 215/305 [1:07:23<29:20, 19.56s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 562         |
|    gen/rollout/ep_rew_mean         | 0.0267      |
|    gen/rollout/ep_rew_wrapped_mean | 517         |
|    gen/time/fps                    | 1671        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4538368     |
|    gen/train/approx_kl             | 0.026139768 |
|    gen/train/clip_fraction         | 0.207       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.721      |
|    gen/train/explained_variance    | 0.936       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.68        |
|    gen/train/n_updates             | 2760        |
|    gen/train/policy_gradient_loss  | -0.0149     |
|    gen/train/value_loss            | 37.4   

round:  71%|███████   | 216/305 [1:07:43<29:18, 19.76s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.0312      |
|    gen/rollout/ep_rew_wrapped_mean | 479         |
|    gen/time/fps                    | 1743        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4554752     |
|    gen/train/approx_kl             | 0.018808331 |
|    gen/train/clip_fraction         | 0.145       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.668      |
|    gen/train/explained_variance    | 0.89        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 34          |
|    gen/train/n_updates             | 2770        |
|    gen/train/policy_gradient_loss  | -0.00936    |
|    gen/train/value_loss            | 46.8   

round:  71%|███████   | 217/305 [1:08:02<28:50, 19.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0532      |
|    gen/rollout/ep_rew_wrapped_mean | 439         |
|    gen/time/fps                    | 1729        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4571136     |
|    gen/train/approx_kl             | 0.055450566 |
|    gen/train/clip_fraction         | 0.201       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.67       |
|    gen/train/explained_variance    | 0.898       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 13.2        |
|    gen/train/n_updates             | 2780        |
|    gen/train/policy_gradient_loss  | -0.0194     |
|    gen/train/value_loss            | 28.1   

round:  71%|███████▏  | 218/305 [1:08:23<28:55, 19.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0528      |
|    gen/rollout/ep_rew_wrapped_mean | 387         |
|    gen/time/fps                    | 1705        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4587520     |
|    gen/train/approx_kl             | 0.037436243 |
|    gen/train/clip_fraction         | 0.166       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.608      |
|    gen/train/explained_variance    | 0.819       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.81        |
|    gen/train/n_updates             | 2790        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 32.9   

round:  72%|███████▏  | 219/305 [1:08:44<28:58, 20.21s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 544         |
|    gen/rollout/ep_rew_mean         | 0.0624      |
|    gen/rollout/ep_rew_wrapped_mean | 393         |
|    gen/time/fps                    | 1744        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4603904     |
|    gen/train/approx_kl             | 0.032555398 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.628      |
|    gen/train/explained_variance    | 0.957       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 5.05        |
|    gen/train/n_updates             | 2800        |
|    gen/train/policy_gradient_loss  | -0.0225     |
|    gen/train/value_loss            | 14.2   

round:  72%|███████▏  | 220/305 [1:09:04<28:27, 20.09s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 544         |
|    gen/rollout/ep_rew_mean         | 0.0603      |
|    gen/rollout/ep_rew_wrapped_mean | 433         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4620288     |
|    gen/train/approx_kl             | 0.038602255 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.702      |
|    gen/train/explained_variance    | 0.961       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.9         |
|    gen/train/n_updates             | 2810        |
|    gen/train/policy_gradient_loss  | -0.0201     |
|    gen/train/value_loss            | 26.9   

round:  72%|███████▏  | 221/305 [1:09:23<27:51, 19.90s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.046       |
|    gen/rollout/ep_rew_wrapped_mean | 461         |
|    gen/time/fps                    | 1796        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4636672     |
|    gen/train/approx_kl             | 0.033694517 |
|    gen/train/clip_fraction         | 0.222       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.697      |
|    gen/train/explained_variance    | 0.972       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.82        |
|    gen/train/n_updates             | 2820        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 13.7   

round:  73%|███████▎  | 222/305 [1:09:42<27:18, 19.74s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 553         |
|    gen/rollout/ep_rew_mean         | 0.0468      |
|    gen/rollout/ep_rew_wrapped_mean | 403         |
|    gen/time/fps                    | 1738        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4653056     |
|    gen/train/approx_kl             | 0.027260622 |
|    gen/train/clip_fraction         | 0.217       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.69       |
|    gen/train/explained_variance    | 0.92        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.16        |
|    gen/train/n_updates             | 2830        |
|    gen/train/policy_gradient_loss  | -0.0291     |
|    gen/train/value_loss            | 9.81   

round:  73%|███████▎  | 223/305 [1:10:02<26:48, 19.61s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 551        |
|    gen/rollout/ep_rew_mean         | 0.0493     |
|    gen/rollout/ep_rew_wrapped_mean | 330        |
|    gen/time/fps                    | 1801       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 4669440    |
|    gen/train/approx_kl             | 0.02989172 |
|    gen/train/clip_fraction         | 0.222      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.669     |
|    gen/train/explained_variance    | 0.952      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.98       |
|    gen/train/n_updates             | 2840       |
|    gen/train/policy_gradient_loss  | -0.0272    |
|    gen/train/value_loss            | 14.3       |
------------

round:  73%|███████▎  | 224/305 [1:10:21<26:11, 19.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 543         |
|    gen/rollout/ep_rew_mean         | 0.0636      |
|    gen/rollout/ep_rew_wrapped_mean | 299         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4685824     |
|    gen/train/approx_kl             | 0.027385581 |
|    gen/train/clip_fraction         | 0.199       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.696      |
|    gen/train/explained_variance    | 0.964       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.09        |
|    gen/train/n_updates             | 2850        |
|    gen/train/policy_gradient_loss  | -0.0162     |
|    gen/train/value_loss            | 11.5   

round:  74%|███████▍  | 225/305 [1:10:39<25:33, 19.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 542         |
|    gen/rollout/ep_rew_mean         | 0.0649      |
|    gen/rollout/ep_rew_wrapped_mean | 290         |
|    gen/time/fps                    | 1804        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4702208     |
|    gen/train/approx_kl             | 0.027678106 |
|    gen/train/clip_fraction         | 0.192       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.696      |
|    gen/train/explained_variance    | 0.957       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.71        |
|    gen/train/n_updates             | 2860        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 16.9   

round:  74%|███████▍  | 226/305 [1:10:59<25:23, 19.28s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 546         |
|    gen/rollout/ep_rew_mean         | 0.0562      |
|    gen/rollout/ep_rew_wrapped_mean | 318         |
|    gen/time/fps                    | 1739        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4718592     |
|    gen/train/approx_kl             | 0.034504414 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.674      |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.54        |
|    gen/train/n_updates             | 2870        |
|    gen/train/policy_gradient_loss  | -0.0199     |
|    gen/train/value_loss            | 14.4   

round:  74%|███████▍  | 227/305 [1:11:18<25:04, 19.29s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 537         |
|    gen/rollout/ep_rew_mean         | 0.0736      |
|    gen/rollout/ep_rew_wrapped_mean | 340         |
|    gen/time/fps                    | 1721        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4734976     |
|    gen/train/approx_kl             | 0.028427681 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.633      |
|    gen/train/explained_variance    | 0.955       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.92        |
|    gen/train/n_updates             | 2880        |
|    gen/train/policy_gradient_loss  | -0.0214     |
|    gen/train/value_loss            | 15     

round:  75%|███████▍  | 228/305 [1:11:38<24:55, 19.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 539         |
|    gen/rollout/ep_rew_mean         | 0.0708      |
|    gen/rollout/ep_rew_wrapped_mean | 355         |
|    gen/time/fps                    | 1722        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4751360     |
|    gen/train/approx_kl             | 0.027653119 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.66       |
|    gen/train/explained_variance    | 0.935       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.5        |
|    gen/train/n_updates             | 2890        |
|    gen/train/policy_gradient_loss  | -0.0163     |
|    gen/train/value_loss            | 18.6   

round:  75%|███████▌  | 229/305 [1:11:58<24:48, 19.58s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.0548      |
|    gen/rollout/ep_rew_wrapped_mean | 346         |
|    gen/time/fps                    | 1731        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4767744     |
|    gen/train/approx_kl             | 0.028841622 |
|    gen/train/clip_fraction         | 0.215       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.664      |
|    gen/train/explained_variance    | 0.923       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 33.4        |
|    gen/train/n_updates             | 2900        |
|    gen/train/policy_gradient_loss  | -0.0243     |
|    gen/train/value_loss            | 12.4   

round:  75%|███████▌  | 230/305 [1:12:17<24:17, 19.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 547         |
|    gen/rollout/ep_rew_mean         | 0.0549      |
|    gen/rollout/ep_rew_wrapped_mean | 386         |
|    gen/time/fps                    | 1778        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4784128     |
|    gen/train/approx_kl             | 0.037241325 |
|    gen/train/clip_fraction         | 0.187       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.627      |
|    gen/train/explained_variance    | 0.944       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.54        |
|    gen/train/n_updates             | 2910        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 27.3   

round:  76%|███████▌  | 231/305 [1:12:36<23:51, 19.34s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.025       |
|    gen/rollout/ep_rew_wrapped_mean | 399         |
|    gen/time/fps                    | 1738        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4800512     |
|    gen/train/approx_kl             | 0.032312527 |
|    gen/train/clip_fraction         | 0.208       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.572      |
|    gen/train/explained_variance    | 0.961       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.91        |
|    gen/train/n_updates             | 2920        |
|    gen/train/policy_gradient_loss  | -0.0198     |
|    gen/train/value_loss            | 7.33   

round:  76%|███████▌  | 232/305 [1:12:56<23:36, 19.41s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0184      |
|    gen/rollout/ep_rew_wrapped_mean | 402         |
|    gen/time/fps                    | 1794        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4816896     |
|    gen/train/approx_kl             | 0.042737212 |
|    gen/train/clip_fraction         | 0.225       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.552      |
|    gen/train/explained_variance    | 0.978       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.62        |
|    gen/train/n_updates             | 2930        |
|    gen/train/policy_gradient_loss  | -0.0205     |
|    gen/train/value_loss            | 4.85   

round:  76%|███████▋  | 233/305 [1:13:14<23:05, 19.24s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 564        |
|    gen/rollout/ep_rew_mean         | 0.0255     |
|    gen/rollout/ep_rew_wrapped_mean | 359        |
|    gen/time/fps                    | 1769       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 4833280    |
|    gen/train/approx_kl             | 0.05491654 |
|    gen/train/clip_fraction         | 0.233      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.665     |
|    gen/train/explained_variance    | 0.947      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.25       |
|    gen/train/n_updates             | 2940       |
|    gen/train/policy_gradient_loss  | -0.0271    |
|    gen/train/value_loss            | 7.69       |
------------

round:  77%|███████▋  | 234/305 [1:13:34<22:46, 19.25s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 561        |
|    gen/rollout/ep_rew_mean         | 0.031      |
|    gen/rollout/ep_rew_wrapped_mean | 326        |
|    gen/time/fps                    | 1800       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 4849664    |
|    gen/train/approx_kl             | 0.03290634 |
|    gen/train/clip_fraction         | 0.234      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.619     |
|    gen/train/explained_variance    | 0.942      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.37       |
|    gen/train/n_updates             | 2950       |
|    gen/train/policy_gradient_loss  | -0.0232    |
|    gen/train/value_loss            | 8.33       |
------------

round:  77%|███████▋  | 235/305 [1:13:53<22:24, 19.21s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 564        |
|    gen/rollout/ep_rew_mean         | 0.0253     |
|    gen/rollout/ep_rew_wrapped_mean | 317        |
|    gen/time/fps                    | 1788       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 4866048    |
|    gen/train/approx_kl             | 0.03758613 |
|    gen/train/clip_fraction         | 0.241      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.582     |
|    gen/train/explained_variance    | 0.943      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.67       |
|    gen/train/n_updates             | 2960       |
|    gen/train/policy_gradient_loss  | -0.019     |
|    gen/train/value_loss            | 7.68       |
------------

round:  77%|███████▋  | 236/305 [1:14:12<22:04, 19.19s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0229      |
|    gen/rollout/ep_rew_wrapped_mean | 344         |
|    gen/time/fps                    | 1817        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4882432     |
|    gen/train/approx_kl             | 0.032904446 |
|    gen/train/clip_fraction         | 0.21        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.568      |
|    gen/train/explained_variance    | 0.97        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.1         |
|    gen/train/n_updates             | 2970        |
|    gen/train/policy_gradient_loss  | -0.0201     |
|    gen/train/value_loss            | 6.43   

round:  78%|███████▊  | 237/305 [1:14:32<21:53, 19.31s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.031       |
|    gen/rollout/ep_rew_wrapped_mean | 367         |
|    gen/time/fps                    | 1671        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4898816     |
|    gen/train/approx_kl             | 0.027495326 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.557      |
|    gen/train/explained_variance    | 0.966       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.02        |
|    gen/train/n_updates             | 2980        |
|    gen/train/policy_gradient_loss  | -0.0168     |
|    gen/train/value_loss            | 7.49   

round:  78%|███████▊  | 238/305 [1:14:52<21:58, 19.68s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.0279      |
|    gen/rollout/ep_rew_wrapped_mean | 364         |
|    gen/time/fps                    | 1692        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4915200     |
|    gen/train/approx_kl             | 0.026424516 |
|    gen/train/clip_fraction         | 0.183       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.575      |
|    gen/train/explained_variance    | 0.92        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.27        |
|    gen/train/n_updates             | 2990        |
|    gen/train/policy_gradient_loss  | -0.0145     |
|    gen/train/value_loss            | 12.7   

round:  78%|███████▊  | 239/305 [1:15:12<21:37, 19.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.022       |
|    gen/rollout/ep_rew_wrapped_mean | 349         |
|    gen/time/fps                    | 1736        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4931584     |
|    gen/train/approx_kl             | 0.034781266 |
|    gen/train/clip_fraction         | 0.182       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.474      |
|    gen/train/explained_variance    | 0.965       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.354       |
|    gen/train/n_updates             | 3000        |
|    gen/train/policy_gradient_loss  | -0.0202     |
|    gen/train/value_loss            | 1.68   

round:  79%|███████▊  | 240/305 [1:15:32<21:26, 19.80s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0122      |
|    gen/rollout/ep_rew_wrapped_mean | 323         |
|    gen/time/fps                    | 1655        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4947968     |
|    gen/train/approx_kl             | 0.028774463 |
|    gen/train/clip_fraction         | 0.217       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.589      |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.744       |
|    gen/train/n_updates             | 3010        |
|    gen/train/policy_gradient_loss  | -0.0285     |
|    gen/train/value_loss            | 2.19   

round:  79%|███████▉  | 241/305 [1:15:52<21:18, 19.98s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0122      |
|    gen/rollout/ep_rew_wrapped_mean | 320         |
|    gen/time/fps                    | 1801        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4964352     |
|    gen/train/approx_kl             | 0.035914805 |
|    gen/train/clip_fraction         | 0.178       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.576      |
|    gen/train/explained_variance    | 0.862       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.9         |
|    gen/train/n_updates             | 3020        |
|    gen/train/policy_gradient_loss  | -0.0164     |
|    gen/train/value_loss            | 7.31   

round:  79%|███████▉  | 242/305 [1:16:11<20:39, 19.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0122      |
|    gen/rollout/ep_rew_wrapped_mean | 333         |
|    gen/time/fps                    | 1700        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4980736     |
|    gen/train/approx_kl             | 0.031455096 |
|    gen/train/clip_fraction         | 0.197       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.531      |
|    gen/train/explained_variance    | 0.694       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.15        |
|    gen/train/n_updates             | 3030        |
|    gen/train/policy_gradient_loss  | -0.0184     |
|    gen/train/value_loss            | 4.06   

round:  80%|███████▉  | 243/305 [1:16:33<21:00, 20.33s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0118      |
|    gen/rollout/ep_rew_wrapped_mean | 349         |
|    gen/time/fps                    | 1737        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 4997120     |
|    gen/train/approx_kl             | 0.030422337 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.525      |
|    gen/train/explained_variance    | 0.87        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.16        |
|    gen/train/n_updates             | 3040        |
|    gen/train/policy_gradient_loss  | -0.017      |
|    gen/train/value_loss            | 4.12   

round:  80%|████████  | 244/305 [1:16:52<20:20, 20.00s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.0133     |
|    gen/rollout/ep_rew_wrapped_mean | 379        |
|    gen/time/fps                    | 1834       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 5013504    |
|    gen/train/approx_kl             | 0.02996442 |
|    gen/train/clip_fraction         | 0.189      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.594     |
|    gen/train/explained_variance    | 0.758      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.04       |
|    gen/train/n_updates             | 3050       |
|    gen/train/policy_gradient_loss  | -0.0128    |
|    gen/train/value_loss            | 12.3       |
------------

round:  80%|████████  | 245/305 [1:17:11<19:35, 19.58s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.0133     |
|    gen/rollout/ep_rew_wrapped_mean | 392        |
|    gen/time/fps                    | 1804       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5029888    |
|    gen/train/approx_kl             | 0.02783373 |
|    gen/train/clip_fraction         | 0.188      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.619     |
|    gen/train/explained_variance    | 0.904      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.53       |
|    gen/train/n_updates             | 3060       |
|    gen/train/policy_gradient_loss  | -0.0121    |
|    gen/train/value_loss            | 9.41       |
------------

round:  81%|████████  | 246/305 [1:17:30<19:06, 19.43s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0133      |
|    gen/rollout/ep_rew_wrapped_mean | 389         |
|    gen/time/fps                    | 1807        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5046272     |
|    gen/train/approx_kl             | 0.028611887 |
|    gen/train/clip_fraction         | 0.229       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.69       |
|    gen/train/explained_variance    | 0.905       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.53        |
|    gen/train/n_updates             | 3070        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 8.4    

round:  81%|████████  | 247/305 [1:17:50<19:00, 19.67s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0098      |
|    gen/rollout/ep_rew_wrapped_mean | 353         |
|    gen/time/fps                    | 1757        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5062656     |
|    gen/train/approx_kl             | 0.035310462 |
|    gen/train/clip_fraction         | 0.247       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.768      |
|    gen/train/explained_variance    | 0.808       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 9.28        |
|    gen/train/n_updates             | 3080        |
|    gen/train/policy_gradient_loss  | -0.0351     |
|    gen/train/value_loss            | 5.07   

round:  81%|████████▏ | 248/305 [1:18:10<18:43, 19.71s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0193      |
|    gen/rollout/ep_rew_wrapped_mean | 325         |
|    gen/time/fps                    | 1765        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5079040     |
|    gen/train/approx_kl             | 0.028708031 |
|    gen/train/clip_fraction         | 0.204       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.625      |
|    gen/train/explained_variance    | 0.607       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.81        |
|    gen/train/n_updates             | 3090        |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 8      

round:  82%|████████▏ | 249/305 [1:18:29<18:19, 19.64s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.0321      |
|    gen/rollout/ep_rew_wrapped_mean | 320         |
|    gen/time/fps                    | 1802        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5095424     |
|    gen/train/approx_kl             | 0.029154133 |
|    gen/train/clip_fraction         | 0.196       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.581      |
|    gen/train/explained_variance    | 0.581       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.01        |
|    gen/train/n_updates             | 3100        |
|    gen/train/policy_gradient_loss  | -0.0169     |
|    gen/train/value_loss            | 8.23   

round:  82%|████████▏ | 250/305 [1:18:49<17:50, 19.47s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 560         |
|    gen/rollout/ep_rew_mean         | 0.0293      |
|    gen/rollout/ep_rew_wrapped_mean | 327         |
|    gen/time/fps                    | 1676        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5111808     |
|    gen/train/approx_kl             | 0.038781576 |
|    gen/train/clip_fraction         | 0.207       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.575      |
|    gen/train/explained_variance    | 0.799       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.464       |
|    gen/train/n_updates             | 3110        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 4.75   

round:  82%|████████▏ | 251/305 [1:19:09<17:51, 19.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0222      |
|    gen/rollout/ep_rew_wrapped_mean | 343         |
|    gen/time/fps                    | 1685        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5128192     |
|    gen/train/approx_kl             | 0.029251954 |
|    gen/train/clip_fraction         | 0.188       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.551      |
|    gen/train/explained_variance    | 0.888       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.48        |
|    gen/train/n_updates             | 3120        |
|    gen/train/policy_gradient_loss  | -0.0152     |
|    gen/train/value_loss            | 3.1    

round:  83%|████████▎ | 252/305 [1:19:30<17:47, 20.14s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 564        |
|    gen/rollout/ep_rew_mean         | 0.0233     |
|    gen/rollout/ep_rew_wrapped_mean | 372        |
|    gen/time/fps                    | 1724       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5144576    |
|    gen/train/approx_kl             | 0.03362482 |
|    gen/train/clip_fraction         | 0.237      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.673     |
|    gen/train/explained_variance    | 0.907      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.45       |
|    gen/train/n_updates             | 3130       |
|    gen/train/policy_gradient_loss  | -0.0249    |
|    gen/train/value_loss            | 5.07       |
------------

round:  83%|████████▎ | 253/305 [1:19:49<17:14, 19.89s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 559        |
|    gen/rollout/ep_rew_mean         | 0.0336     |
|    gen/rollout/ep_rew_wrapped_mean | 395        |
|    gen/time/fps                    | 1727       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5160960    |
|    gen/train/approx_kl             | 0.03147457 |
|    gen/train/clip_fraction         | 0.214      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.664     |
|    gen/train/explained_variance    | 0.879      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.33       |
|    gen/train/n_updates             | 3140       |
|    gen/train/policy_gradient_loss  | -0.019     |
|    gen/train/value_loss            | 9.63       |
------------

round:  83%|████████▎ | 254/305 [1:20:09<16:52, 19.84s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 553         |
|    gen/rollout/ep_rew_mean         | 0.0444      |
|    gen/rollout/ep_rew_wrapped_mean | 410         |
|    gen/time/fps                    | 1688        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5177344     |
|    gen/train/approx_kl             | 0.026506845 |
|    gen/train/clip_fraction         | 0.178       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.591      |
|    gen/train/explained_variance    | 0.912       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.23        |
|    gen/train/n_updates             | 3150        |
|    gen/train/policy_gradient_loss  | -0.0159     |
|    gen/train/value_loss            | 11.8   

round:  84%|████████▎ | 255/305 [1:20:29<16:38, 19.97s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0344      |
|    gen/rollout/ep_rew_wrapped_mean | 411         |
|    gen/time/fps                    | 1578        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5193728     |
|    gen/train/approx_kl             | 0.029633846 |
|    gen/train/clip_fraction         | 0.218       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.608      |
|    gen/train/explained_variance    | 0.938       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.83        |
|    gen/train/n_updates             | 3160        |
|    gen/train/policy_gradient_loss  | -0.0197     |
|    gen/train/value_loss            | 7.82   

round:  84%|████████▍ | 256/305 [1:20:50<16:30, 20.22s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 564        |
|    gen/rollout/ep_rew_mean         | 0.0248     |
|    gen/rollout/ep_rew_wrapped_mean | 382        |
|    gen/time/fps                    | 1774       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5210112    |
|    gen/train/approx_kl             | 0.03541144 |
|    gen/train/clip_fraction         | 0.224      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.713     |
|    gen/train/explained_variance    | 0.898      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 2.9        |
|    gen/train/n_updates             | 3170       |
|    gen/train/policy_gradient_loss  | -0.0148    |
|    gen/train/value_loss            | 6.37       |
------------

round:  84%|████████▍ | 257/305 [1:21:09<15:53, 19.87s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0184      |
|    gen/rollout/ep_rew_wrapped_mean | 363         |
|    gen/time/fps                    | 1627        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5226496     |
|    gen/train/approx_kl             | 0.028758975 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.651      |
|    gen/train/explained_variance    | 0.884       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.09        |
|    gen/train/n_updates             | 3180        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 12.6   

round:  85%|████████▍ | 258/305 [1:21:30<15:45, 20.11s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 563         |
|    gen/rollout/ep_rew_mean         | 0.0257      |
|    gen/rollout/ep_rew_wrapped_mean | 346         |
|    gen/time/fps                    | 1623        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5242880     |
|    gen/train/approx_kl             | 0.044996236 |
|    gen/train/clip_fraction         | 0.233       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.645      |
|    gen/train/explained_variance    | 0.911       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.8         |
|    gen/train/n_updates             | 3190        |
|    gen/train/policy_gradient_loss  | -0.0165     |
|    gen/train/value_loss            | 5.86   

round:  85%|████████▍ | 259/305 [1:21:51<15:43, 20.52s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 564         |
|    gen/rollout/ep_rew_mean         | 0.0234      |
|    gen/rollout/ep_rew_wrapped_mean | 338         |
|    gen/time/fps                    | 1769        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5259264     |
|    gen/train/approx_kl             | 0.034339942 |
|    gen/train/clip_fraction         | 0.231       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.627      |
|    gen/train/explained_variance    | 0.968       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.35        |
|    gen/train/n_updates             | 3200        |
|    gen/train/policy_gradient_loss  | -0.0223     |
|    gen/train/value_loss            | 4.87   

round:  85%|████████▌ | 260/305 [1:22:11<15:11, 20.26s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 567       |
|    gen/rollout/ep_rew_mean         | 0.0185    |
|    gen/rollout/ep_rew_wrapped_mean | 353       |
|    gen/time/fps                    | 1772      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 9         |
|    gen/time/total_timesteps        | 5275648   |
|    gen/train/approx_kl             | 0.0317052 |
|    gen/train/clip_fraction         | 0.23      |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.708    |
|    gen/train/explained_variance    | 0.881     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 2.89      |
|    gen/train/n_updates             | 3210      |
|    gen/train/policy_gradient_loss  | -0.0183   |
|    gen/train/value_loss            | 8.2       |
-------------------------------

round:  86%|████████▌ | 261/305 [1:22:31<14:52, 20.27s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0176      |
|    gen/rollout/ep_rew_wrapped_mean | 367         |
|    gen/time/fps                    | 1656        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5292032     |
|    gen/train/approx_kl             | 0.041784897 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.617      |
|    gen/train/explained_variance    | 0.924       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.25        |
|    gen/train/n_updates             | 3220        |
|    gen/train/policy_gradient_loss  | -0.0173     |
|    gen/train/value_loss            | 8.46   

round:  86%|████████▌ | 262/305 [1:22:52<14:34, 20.34s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0149      |
|    gen/rollout/ep_rew_wrapped_mean | 377         |
|    gen/time/fps                    | 1699        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5308416     |
|    gen/train/approx_kl             | 0.030433139 |
|    gen/train/clip_fraction         | 0.223       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.709      |
|    gen/train/explained_variance    | 0.88        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 10.3        |
|    gen/train/n_updates             | 3230        |
|    gen/train/policy_gradient_loss  | -0.014      |
|    gen/train/value_loss            | 17.9   

round:  86%|████████▌ | 263/305 [1:23:12<14:14, 20.33s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 569        |
|    gen/rollout/ep_rew_mean         | 0.0149     |
|    gen/rollout/ep_rew_wrapped_mean | 424        |
|    gen/time/fps                    | 1770       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5324800    |
|    gen/train/approx_kl             | 0.03036286 |
|    gen/train/clip_fraction         | 0.164      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.553     |
|    gen/train/explained_variance    | 0.762      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 6.39       |
|    gen/train/n_updates             | 3240       |
|    gen/train/policy_gradient_loss  | -0.0143    |
|    gen/train/value_loss            | 25.7       |
------------

round:  87%|████████▋ | 264/305 [1:23:32<13:51, 20.29s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0204      |
|    gen/rollout/ep_rew_wrapped_mean | 448         |
|    gen/time/fps                    | 1754        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5341184     |
|    gen/train/approx_kl             | 0.024592679 |
|    gen/train/clip_fraction         | 0.172       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.665      |
|    gen/train/explained_variance    | 0.726       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.6         |
|    gen/train/n_updates             | 3250        |
|    gen/train/policy_gradient_loss  | -0.0131     |
|    gen/train/value_loss            | 40.6   

round:  87%|████████▋ | 265/305 [1:23:53<13:32, 20.31s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.0351      |
|    gen/rollout/ep_rew_wrapped_mean | 469         |
|    gen/time/fps                    | 1749        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5357568     |
|    gen/train/approx_kl             | 0.029539796 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.7        |
|    gen/train/explained_variance    | 0.859       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 26.8        |
|    gen/train/n_updates             | 3260        |
|    gen/train/policy_gradient_loss  | -0.0144     |
|    gen/train/value_loss            | 29.4   

round:  87%|████████▋ | 266/305 [1:24:13<13:15, 20.40s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 557         |
|    gen/rollout/ep_rew_mean         | 0.038       |
|    gen/rollout/ep_rew_wrapped_mean | 441         |
|    gen/time/fps                    | 1700        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5373952     |
|    gen/train/approx_kl             | 0.022669993 |
|    gen/train/clip_fraction         | 0.199       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.706      |
|    gen/train/explained_variance    | 0.863       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 13.7        |
|    gen/train/n_updates             | 3270        |
|    gen/train/policy_gradient_loss  | -0.0119     |
|    gen/train/value_loss            | 36.9   

round:  88%|████████▊ | 267/305 [1:24:35<13:07, 20.74s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 550        |
|    gen/rollout/ep_rew_mean         | 0.0511     |
|    gen/rollout/ep_rew_wrapped_mean | 407        |
|    gen/time/fps                    | 1566       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 10         |
|    gen/time/total_timesteps        | 5390336    |
|    gen/train/approx_kl             | 0.03638497 |
|    gen/train/clip_fraction         | 0.248      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.686     |
|    gen/train/explained_variance    | 0.877      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 14         |
|    gen/train/n_updates             | 3280       |
|    gen/train/policy_gradient_loss  | -0.0237    |
|    gen/train/value_loss            | 23.2       |
------------

round:  88%|████████▊ | 268/305 [1:24:57<13:01, 21.13s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0399      |
|    gen/rollout/ep_rew_wrapped_mean | 379         |
|    gen/time/fps                    | 1778        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5406720     |
|    gen/train/approx_kl             | 0.022024859 |
|    gen/train/clip_fraction         | 0.173       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.718      |
|    gen/train/explained_variance    | 0.774       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 32.6        |
|    gen/train/n_updates             | 3290        |
|    gen/train/policy_gradient_loss  | -0.0122     |
|    gen/train/value_loss            | 39.5   

round:  88%|████████▊ | 269/305 [1:25:17<12:34, 20.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 556         |
|    gen/rollout/ep_rew_mean         | 0.0385      |
|    gen/rollout/ep_rew_wrapped_mean | 387         |
|    gen/time/fps                    | 1764        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5423104     |
|    gen/train/approx_kl             | 0.025245547 |
|    gen/train/clip_fraction         | 0.189       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.629      |
|    gen/train/explained_variance    | 0.873       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.29        |
|    gen/train/n_updates             | 3300        |
|    gen/train/policy_gradient_loss  | -0.00836    |
|    gen/train/value_loss            | 19.8   

round:  89%|████████▊ | 270/305 [1:25:37<12:03, 20.67s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 568        |
|    gen/rollout/ep_rew_mean         | 0.0166     |
|    gen/rollout/ep_rew_wrapped_mean | 363        |
|    gen/time/fps                    | 1830       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 8          |
|    gen/time/total_timesteps        | 5439488    |
|    gen/train/approx_kl             | 0.02707224 |
|    gen/train/clip_fraction         | 0.194      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.673     |
|    gen/train/explained_variance    | 0.755      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 5.29       |
|    gen/train/n_updates             | 3310       |
|    gen/train/policy_gradient_loss  | -0.0154    |
|    gen/train/value_loss            | 17.9       |
------------

round:  89%|████████▉ | 271/305 [1:25:57<11:33, 20.40s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 571        |
|    gen/rollout/ep_rew_mean         | 0.0109     |
|    gen/rollout/ep_rew_wrapped_mean | 346        |
|    gen/time/fps                    | 1732       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5455872    |
|    gen/train/approx_kl             | 0.04861114 |
|    gen/train/clip_fraction         | 0.215      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.615     |
|    gen/train/explained_variance    | 0.854      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.708      |
|    gen/train/n_updates             | 3320       |
|    gen/train/policy_gradient_loss  | -0.0201    |
|    gen/train/value_loss            | 5.54       |
------------

round:  89%|████████▉ | 272/305 [1:26:17<11:05, 20.17s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0117      |
|    gen/rollout/ep_rew_wrapped_mean | 326         |
|    gen/time/fps                    | 1736        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5472256     |
|    gen/train/approx_kl             | 0.025910366 |
|    gen/train/clip_fraction         | 0.186       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.588      |
|    gen/train/explained_variance    | 0.843       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.827       |
|    gen/train/n_updates             | 3330        |
|    gen/train/policy_gradient_loss  | -0.0161     |
|    gen/train/value_loss            | 4.96   

round:  90%|████████▉ | 273/305 [1:26:37<10:43, 20.12s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.0117      |
|    gen/rollout/ep_rew_wrapped_mean | 332         |
|    gen/time/fps                    | 1782        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5488640     |
|    gen/train/approx_kl             | 0.031667933 |
|    gen/train/clip_fraction         | 0.2         |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.58       |
|    gen/train/explained_variance    | 0.773       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.8        |
|    gen/train/n_updates             | 3340        |
|    gen/train/policy_gradient_loss  | -0.0132     |
|    gen/train/value_loss            | 12.4   

round:  90%|████████▉ | 274/305 [1:26:57<10:26, 20.22s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 570         |
|    gen/rollout/ep_rew_mean         | 0.013       |
|    gen/rollout/ep_rew_wrapped_mean | 361         |
|    gen/time/fps                    | 1810        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5505024     |
|    gen/train/approx_kl             | 0.032443196 |
|    gen/train/clip_fraction         | 0.22        |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.605      |
|    gen/train/explained_variance    | 0.95        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 14          |
|    gen/train/n_updates             | 3350        |
|    gen/train/policy_gradient_loss  | -0.0157     |
|    gen/train/value_loss            | 6.2    

round:  90%|█████████ | 275/305 [1:27:16<09:56, 19.87s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 572         |
|    gen/rollout/ep_rew_mean         | 0.00966     |
|    gen/rollout/ep_rew_wrapped_mean | 382         |
|    gen/time/fps                    | 1780        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5521408     |
|    gen/train/approx_kl             | 0.031898476 |
|    gen/train/clip_fraction         | 0.166       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.566      |
|    gen/train/explained_variance    | 0.735       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.33        |
|    gen/train/n_updates             | 3360        |
|    gen/train/policy_gradient_loss  | -0.00962    |
|    gen/train/value_loss            | 4.56   

round:  90%|█████████ | 276/305 [1:27:36<09:33, 19.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 571         |
|    gen/rollout/ep_rew_mean         | 0.0104      |
|    gen/rollout/ep_rew_wrapped_mean | 396         |
|    gen/time/fps                    | 1697        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5537792     |
|    gen/train/approx_kl             | 0.038118176 |
|    gen/train/clip_fraction         | 0.221       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.566      |
|    gen/train/explained_variance    | 0.847       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.68        |
|    gen/train/n_updates             | 3370        |
|    gen/train/policy_gradient_loss  | -0.0249     |
|    gen/train/value_loss            | 8.23   

round:  91%|█████████ | 277/305 [1:27:56<09:19, 19.98s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0224      |
|    gen/rollout/ep_rew_wrapped_mean | 392         |
|    gen/time/fps                    | 1761        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5554176     |
|    gen/train/approx_kl             | 0.030120458 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.562      |
|    gen/train/explained_variance    | 0.896       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.928       |
|    gen/train/n_updates             | 3380        |
|    gen/train/policy_gradient_loss  | -0.0182     |
|    gen/train/value_loss            | 5.52   

round:  91%|█████████ | 278/305 [1:28:16<08:53, 19.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 562         |
|    gen/rollout/ep_rew_mean         | 0.0301      |
|    gen/rollout/ep_rew_wrapped_mean | 394         |
|    gen/time/fps                    | 1763        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5570560     |
|    gen/train/approx_kl             | 0.027752776 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.507      |
|    gen/train/explained_variance    | 0.892       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 3.17        |
|    gen/train/n_updates             | 3390        |
|    gen/train/policy_gradient_loss  | -0.015      |
|    gen/train/value_loss            | 12.8   

round:  91%|█████████▏| 279/305 [1:28:36<08:36, 19.88s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 561        |
|    gen/rollout/ep_rew_mean         | 0.0317     |
|    gen/rollout/ep_rew_wrapped_mean | 384        |
|    gen/time/fps                    | 1725       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5586944    |
|    gen/train/approx_kl             | 0.04246849 |
|    gen/train/clip_fraction         | 0.274      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.61      |
|    gen/train/explained_variance    | 0.893      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.18       |
|    gen/train/n_updates             | 3400       |
|    gen/train/policy_gradient_loss  | -0.0217    |
|    gen/train/value_loss            | 12.6       |
------------

round:  92%|█████████▏| 280/305 [1:28:56<08:17, 19.91s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 567         |
|    gen/rollout/ep_rew_mean         | 0.0205      |
|    gen/rollout/ep_rew_wrapped_mean | 388         |
|    gen/time/fps                    | 1837        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 5603328     |
|    gen/train/approx_kl             | 0.034689628 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.605      |
|    gen/train/explained_variance    | 0.94        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.5         |
|    gen/train/n_updates             | 3410        |
|    gen/train/policy_gradient_loss  | -0.0177     |
|    gen/train/value_loss            | 7.17   

round:  92%|█████████▏| 281/305 [1:29:14<07:46, 19.44s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0424      |
|    gen/rollout/ep_rew_wrapped_mean | 374         |
|    gen/time/fps                    | 1623        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5619712     |
|    gen/train/approx_kl             | 0.038138784 |
|    gen/train/clip_fraction         | 0.238       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.659      |
|    gen/train/explained_variance    | 0.922       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.36        |
|    gen/train/n_updates             | 3420        |
|    gen/train/policy_gradient_loss  | -0.0218     |
|    gen/train/value_loss            | 6.35   

round:  92%|█████████▏| 282/305 [1:29:35<07:37, 19.91s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 553        |
|    gen/rollout/ep_rew_mean         | 0.0446     |
|    gen/rollout/ep_rew_wrapped_mean | 350        |
|    gen/time/fps                    | 1705       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5636096    |
|    gen/train/approx_kl             | 0.03026902 |
|    gen/train/clip_fraction         | 0.199      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.609     |
|    gen/train/explained_variance    | 0.881      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.73       |
|    gen/train/n_updates             | 3430       |
|    gen/train/policy_gradient_loss  | -0.0148    |
|    gen/train/value_loss            | 11.8       |
------------

round:  93%|█████████▎| 283/305 [1:29:57<07:28, 20.38s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0431      |
|    gen/rollout/ep_rew_wrapped_mean | 340         |
|    gen/time/fps                    | 1529        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5652480     |
|    gen/train/approx_kl             | 0.027726285 |
|    gen/train/clip_fraction         | 0.189       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.608      |
|    gen/train/explained_variance    | 0.772       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.66        |
|    gen/train/n_updates             | 3440        |
|    gen/train/policy_gradient_loss  | -0.0153     |
|    gen/train/value_loss            | 10     

round:  93%|█████████▎| 284/305 [1:30:19<07:22, 21.06s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 561       |
|    gen/rollout/ep_rew_mean         | 0.0278    |
|    gen/rollout/ep_rew_wrapped_mean | 354       |
|    gen/time/fps                    | 1587      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 10        |
|    gen/time/total_timesteps        | 5668864   |
|    gen/train/approx_kl             | 0.0368132 |
|    gen/train/clip_fraction         | 0.2       |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.61     |
|    gen/train/explained_variance    | 0.652     |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 4.69      |
|    gen/train/n_updates             | 3450      |
|    gen/train/policy_gradient_loss  | -0.0182   |
|    gen/train/value_loss            | 11.9      |
-------------------------------

round:  93%|█████████▎| 285/305 [1:30:41<07:07, 21.35s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 554         |
|    gen/rollout/ep_rew_mean         | 0.0416      |
|    gen/rollout/ep_rew_wrapped_mean | 377         |
|    gen/time/fps                    | 1520        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5685248     |
|    gen/train/approx_kl             | 0.030657155 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.596      |
|    gen/train/explained_variance    | 0.459       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 7.83        |
|    gen/train/n_updates             | 3460        |
|    gen/train/policy_gradient_loss  | -0.019      |
|    gen/train/value_loss            | 15.8   

round:  94%|█████████▍| 286/305 [1:31:05<06:57, 21.95s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 560         |
|    gen/rollout/ep_rew_mean         | 0.0313      |
|    gen/rollout/ep_rew_wrapped_mean | 385         |
|    gen/time/fps                    | 1578        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5701632     |
|    gen/train/approx_kl             | 0.032729454 |
|    gen/train/clip_fraction         | 0.235       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.651      |
|    gen/train/explained_variance    | 0.689       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.1         |
|    gen/train/n_updates             | 3470        |
|    gen/train/policy_gradient_loss  | -0.0202     |
|    gen/train/value_loss            | 18.6   

round:  94%|█████████▍| 287/305 [1:31:27<06:35, 21.99s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 565         |
|    gen/rollout/ep_rew_mean         | 0.0227      |
|    gen/rollout/ep_rew_wrapped_mean | 390         |
|    gen/time/fps                    | 1576        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5718016     |
|    gen/train/approx_kl             | 0.039087422 |
|    gen/train/clip_fraction         | 0.191       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.504      |
|    gen/train/explained_variance    | 0.626       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 6.96        |
|    gen/train/n_updates             | 3480        |
|    gen/train/policy_gradient_loss  | -0.0209     |
|    gen/train/value_loss            | 8.29   

round:  94%|█████████▍| 288/305 [1:31:48<06:10, 21.78s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 569         |
|    gen/rollout/ep_rew_mean         | 0.0163      |
|    gen/rollout/ep_rew_wrapped_mean | 358         |
|    gen/time/fps                    | 1623        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 10          |
|    gen/time/total_timesteps        | 5734400     |
|    gen/train/approx_kl             | 0.045054503 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.596      |
|    gen/train/explained_variance    | 0.724       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.97        |
|    gen/train/n_updates             | 3490        |
|    gen/train/policy_gradient_loss  | -0.0189     |
|    gen/train/value_loss            | 9.3    

round:  95%|█████████▍| 289/305 [1:32:10<05:49, 21.86s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 568         |
|    gen/rollout/ep_rew_mean         | 0.0175      |
|    gen/rollout/ep_rew_wrapped_mean | 307         |
|    gen/time/fps                    | 1762        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5750784     |
|    gen/train/approx_kl             | 0.038337924 |
|    gen/train/clip_fraction         | 0.215       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.678      |
|    gen/train/explained_variance    | 0.649       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.19        |
|    gen/train/n_updates             | 3500        |
|    gen/train/policy_gradient_loss  | -0.0172     |
|    gen/train/value_loss            | 6.83   

round:  95%|█████████▌| 290/305 [1:32:29<05:14, 20.99s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 566         |
|    gen/rollout/ep_rew_mean         | 0.0203      |
|    gen/rollout/ep_rew_wrapped_mean | 277         |
|    gen/time/fps                    | 1738        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5767168     |
|    gen/train/approx_kl             | 0.032420013 |
|    gen/train/clip_fraction         | 0.224       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.605      |
|    gen/train/explained_variance    | 0.713       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 10.5        |
|    gen/train/n_updates             | 3510        |
|    gen/train/policy_gradient_loss  | -0.0292     |
|    gen/train/value_loss            | 8.31   

round:  95%|█████████▌| 291/305 [1:32:48<04:45, 20.39s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 561        |
|    gen/rollout/ep_rew_mean         | 0.0304     |
|    gen/rollout/ep_rew_wrapped_mean | 271        |
|    gen/time/fps                    | 1710       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5783552    |
|    gen/train/approx_kl             | 0.03284747 |
|    gen/train/clip_fraction         | 0.223      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.578     |
|    gen/train/explained_variance    | 0.644      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 1.65       |
|    gen/train/n_updates             | 3520       |
|    gen/train/policy_gradient_loss  | -0.0289    |
|    gen/train/value_loss            | 5.49       |
------------

round:  96%|█████████▌| 292/305 [1:33:07<04:21, 20.12s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 558         |
|    gen/rollout/ep_rew_mean         | 0.0367      |
|    gen/rollout/ep_rew_wrapped_mean | 295         |
|    gen/time/fps                    | 1781        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5799936     |
|    gen/train/approx_kl             | 0.056512944 |
|    gen/train/clip_fraction         | 0.288       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.618      |
|    gen/train/explained_variance    | 0.363       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 15.6        |
|    gen/train/n_updates             | 3530        |
|    gen/train/policy_gradient_loss  | -0.0363     |
|    gen/train/value_loss            | 7.79   

round:  96%|█████████▌| 293/305 [1:33:27<03:59, 19.94s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 555         |
|    gen/rollout/ep_rew_mean         | 0.0439      |
|    gen/rollout/ep_rew_wrapped_mean | 325         |
|    gen/time/fps                    | 1751        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5816320     |
|    gen/train/approx_kl             | 0.029967356 |
|    gen/train/clip_fraction         | 0.2         |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.629      |
|    gen/train/explained_variance    | 0.28        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 2.75        |
|    gen/train/n_updates             | 3540        |
|    gen/train/policy_gradient_loss  | -0.0174     |
|    gen/train/value_loss            | 17.5   

round:  96%|█████████▋| 294/305 [1:33:47<03:38, 19.83s/it]

--------------------------------------------------
| raw/                               |           |
|    gen/rollout/ep_len_mean         | 555       |
|    gen/rollout/ep_rew_mean         | 0.0441    |
|    gen/rollout/ep_rew_wrapped_mean | 353       |
|    gen/time/fps                    | 1802      |
|    gen/time/iterations             | 1         |
|    gen/time/time_elapsed           | 9         |
|    gen/time/total_timesteps        | 5832704   |
|    gen/train/approx_kl             | 0.0371073 |
|    gen/train/clip_fraction         | 0.25      |
|    gen/train/clip_range            | 0.2       |
|    gen/train/entropy_loss          | -0.642    |
|    gen/train/explained_variance    | 0.74      |
|    gen/train/learning_rate         | 0.0004    |
|    gen/train/loss                  | 3.87      |
|    gen/train/n_updates             | 3550      |
|    gen/train/policy_gradient_loss  | -0.0182   |
|    gen/train/value_loss            | 16.7      |
-------------------------------

round:  97%|█████████▋| 295/305 [1:34:06<03:16, 19.66s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 545         |
|    gen/rollout/ep_rew_mean         | 0.0604      |
|    gen/rollout/ep_rew_wrapped_mean | 388         |
|    gen/time/fps                    | 1713        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5849088     |
|    gen/train/approx_kl             | 0.032311358 |
|    gen/train/clip_fraction         | 0.213       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.649      |
|    gen/train/explained_variance    | 0.648       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 11.8        |
|    gen/train/n_updates             | 3560        |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 21.6   

round:  97%|█████████▋| 296/305 [1:34:26<02:58, 19.88s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 542         |
|    gen/rollout/ep_rew_mean         | 0.066       |
|    gen/rollout/ep_rew_wrapped_mean | 391         |
|    gen/time/fps                    | 1746        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5865472     |
|    gen/train/approx_kl             | 0.027839016 |
|    gen/train/clip_fraction         | 0.214       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.641      |
|    gen/train/explained_variance    | 0.511       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 25          |
|    gen/train/n_updates             | 3570        |
|    gen/train/policy_gradient_loss  | -0.0133     |
|    gen/train/value_loss            | 26.8   

round:  97%|█████████▋| 297/305 [1:34:46<02:38, 19.82s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 545         |
|    gen/rollout/ep_rew_mean         | 0.06        |
|    gen/rollout/ep_rew_wrapped_mean | 428         |
|    gen/time/fps                    | 1747        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5881856     |
|    gen/train/approx_kl             | 0.028227491 |
|    gen/train/clip_fraction         | 0.169       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.594      |
|    gen/train/explained_variance    | 0.582       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 18.6        |
|    gen/train/n_updates             | 3580        |
|    gen/train/policy_gradient_loss  | -0.0106     |
|    gen/train/value_loss            | 42.9   

round:  98%|█████████▊| 298/305 [1:35:06<02:19, 19.87s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 554        |
|    gen/rollout/ep_rew_mean         | 0.0421     |
|    gen/rollout/ep_rew_wrapped_mean | 461        |
|    gen/time/fps                    | 1623       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 10         |
|    gen/time/total_timesteps        | 5898240    |
|    gen/train/approx_kl             | 0.03077632 |
|    gen/train/clip_fraction         | 0.173      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.544     |
|    gen/train/explained_variance    | 0.822      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 37.9       |
|    gen/train/n_updates             | 3590       |
|    gen/train/policy_gradient_loss  | -0.00878   |
|    gen/train/value_loss            | 28.4       |
------------

round:  98%|█████████▊| 299/305 [1:35:26<02:00, 20.05s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 549         |
|    gen/rollout/ep_rew_mean         | 0.051       |
|    gen/rollout/ep_rew_wrapped_mean | 452         |
|    gen/time/fps                    | 1846        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 8           |
|    gen/time/total_timesteps        | 5914624     |
|    gen/train/approx_kl             | 0.029669574 |
|    gen/train/clip_fraction         | 0.207       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.569      |
|    gen/train/explained_variance    | 0.816       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 34.2        |
|    gen/train/n_updates             | 3600        |
|    gen/train/policy_gradient_loss  | -0.013      |
|    gen/train/value_loss            | 20.7   

round:  98%|█████████▊| 300/305 [1:35:46<01:39, 19.86s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 556         |
|    gen/rollout/ep_rew_mean         | 0.0384      |
|    gen/rollout/ep_rew_wrapped_mean | 424         |
|    gen/time/fps                    | 1703        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5931008     |
|    gen/train/approx_kl             | 0.035434265 |
|    gen/train/clip_fraction         | 0.202       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.627      |
|    gen/train/explained_variance    | 0.831       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 4.66        |
|    gen/train/n_updates             | 3610        |
|    gen/train/policy_gradient_loss  | -0.0187     |
|    gen/train/value_loss            | 19.9   

round:  99%|█████████▊| 301/305 [1:36:07<01:20, 20.20s/it]

---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 555        |
|    gen/rollout/ep_rew_mean         | 0.0412     |
|    gen/rollout/ep_rew_wrapped_mean | 385        |
|    gen/time/fps                    | 1784       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 9          |
|    gen/time/total_timesteps        | 5947392    |
|    gen/train/approx_kl             | 0.03967598 |
|    gen/train/clip_fraction         | 0.211      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.56      |
|    gen/train/explained_variance    | 0.75       |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 7.3        |
|    gen/train/n_updates             | 3620       |
|    gen/train/policy_gradient_loss  | -0.0181    |
|    gen/train/value_loss            | 13.8       |
------------

round:  99%|█████████▉| 302/305 [1:36:27<01:00, 20.18s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.0309      |
|    gen/rollout/ep_rew_wrapped_mean | 388         |
|    gen/time/fps                    | 1683        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5963776     |
|    gen/train/approx_kl             | 0.033877693 |
|    gen/train/clip_fraction         | 0.174       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.624      |
|    gen/train/explained_variance    | 0.824       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 8.09        |
|    gen/train/n_updates             | 3630        |
|    gen/train/policy_gradient_loss  | -0.0122     |
|    gen/train/value_loss            | 25.8   

round:  99%|█████████▉| 303/305 [1:36:48<00:40, 20.36s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 561         |
|    gen/rollout/ep_rew_mean         | 0.0315      |
|    gen/rollout/ep_rew_wrapped_mean | 406         |
|    gen/time/fps                    | 1786        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5980160     |
|    gen/train/approx_kl             | 0.038816538 |
|    gen/train/clip_fraction         | 0.216       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.57       |
|    gen/train/explained_variance    | 0.822       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 1.2         |
|    gen/train/n_updates             | 3640        |
|    gen/train/policy_gradient_loss  | -0.0136     |
|    gen/train/value_loss            | 16.8   

round: 100%|█████████▉| 304/305 [1:37:07<00:20, 20.09s/it]

----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 559         |
|    gen/rollout/ep_rew_mean         | 0.035       |
|    gen/rollout/ep_rew_wrapped_mean | 487         |
|    gen/time/fps                    | 1799        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 9           |
|    gen/time/total_timesteps        | 5996544     |
|    gen/train/approx_kl             | 0.026088633 |
|    gen/train/clip_fraction         | 0.158       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.522      |
|    gen/train/explained_variance    | 0.867       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 26.8        |
|    gen/train/n_updates             | 3650        |
|    gen/train/policy_gradient_loss  | -0.0118     |
|    gen/train/value_loss            | 56.8   

round: 100%|██████████| 305/305 [1:37:27<00:00, 19.17s/it]


In [702]:
compute_score(task=env, policy=third_policy)

Best score: 0.733
Average score: (0.136, 0.273)


array([0.7328125, 0.       , 0.       , 0.       , 0.       , 0.       ,
       0.628125 , 0.       , 0.       , 0.       ])

### Discussion
My thoughts are mainly the same as the last question. However, I know realize I probably made a mistake by using the FlatObsWrapper, which encodes the string instructions as part of the observations. In our use case, these don't matter as there is only ever one box of interest. Similarly, the color information does not matter. Thus, I believe I could get better performance with the ImgObsWrapper.

## Problem 4
This is a bonus problem and optional for all.

In this problem, the goal is to learn a unified agent for all the three tasks shown above.

In [None]:
######## PUT YOUR CODE HERE ########
# Train an agent to solve the task

######## PUT YOUR CODE HERE ########

In [None]:
def unified_policy(observation):
  ######## PUT YOUR CODE HERE ########
  action = first_task.action_space.sample()
  ######## PUT YOUR CODE HERE ########
  return action

In [None]:
performance_on_first_task = compute_score(task=first_task, policy=unified_policy)

In [None]:
performance_on_second_task = compute_score(task=second_task, policy=unified_policy)

In [None]:
performance_on_third_task = compute_score(task=third_task, policy=unified_policy)

In [None]:
total_performance = np.concatenate((performance_on_first_task, performance_on_second_task, performance_on_third_task), axis=None)

In [None]:
print(f"Average score: {round(total_performance.mean(),3)}")