# Atividade Prática II - Treinamento e Validação de Modelos de RL

**Aluno:** Recigio Poffo

**Disciplina:** Reinforcement Learning - Turma I

**Data:** 03/07/2021



Neste trabalho vamos aplicar `Gym`, `Stable-Baselines3` e `RL Baselines Zoo` para lidar com o treinamento e validação de problemas de aprendizado por reforço. Sua tarefa é:

1. Selecionar um cenário da biblioteca `Gym` de sua preferência, desde que este cenário também seja contemplado pelos modelos disponibilizados na `rl baselines zoo`;
2. Selecionar três algoritmos das biblioteca `Stable-baselines3` para resolver esse problema. Pesquise na documentação da biblioteca quais são os algoritmos mais adequados para o ambiente escolhido e justifique a sua escolha. 
3. Realize o treinamento de cada um dos três modelos ---você pode ajustar os parâmetros do modelos, se achar necessário--- e salve os modelos em disco.
4. De posse dos modelos treinados e salvos, carregue-os e avalie-os por 10 episódios. Apresente os resultados médios e gere a curva de recompensa acumulada disponibilizada pelo `TensorBoard`.
5. Compare os resultados dos modelos treinados com os resultados obtidos por modelo(s) existentes no `RL Baselines Zoo` para o cenário escolhido.
6. Gere um vídeo do melhor modelo que você treinou e do modelo escolhido na `RL Baselines Zoo`. Verifique a documentação de cada biblioteca sobre a criação do vídeo e visualização em Notebooks.



* **Data de entrega:** 16/07/2021
* **Local de envio:** AVA.
* **Tipo de documento:** Notebook (`.ipynb`).



### Testando ambiente base 

In [None]:
#importa a biblioteca gym
import gym
import time


scenario = 'BeamRiderNoFrameskip-v4'
env = gym.make(scenario)

observation = env.reset()
#define a quantidade de passos de tempo
for _ in range(100):
  #desenha a visualização do ambiente  
  env.render()
  #toma uma ação de forma aleatória
  action = env.action_space.sample()
  #observa o ambiente com a base na ação executada
  observation, reward, done, info = env.step(action)
  #adiciona um atraso na execução para melhorar a visualização
  time.sleep(0.03)
  #verifica se o agente completou a tarefa
  if done:
    #reinicia a simulação
    observation = env.reset()
#fecha a visualização
env.close()

In [20]:
from IPython.display import Image

### Escolhendo os modelos

#### Foi escolhido o ambiente BeamRiderNoFrameskip-v4

#### Foram escolhidos os modelos A2C, PPO,DQN. Foram utilizados estes por que eles funcionam razoavelmente diferentes pelo que eu pesquisei na documentação do https://stable-baselines3.readthedocs.io/en/master/. 


#### A2C: https://openai.com/blog/baselines-acktr-a2c/
#### PPO: https://openai.com/blog/openai-baselines-ppo/
#### DQN: https://www.nature.com/articles/nature14236

#### O A2C é baseado no A3C e faz uso de multiplos trabalhadores tendêndeciando ao gradiente natural. O PPO, que é um algoritmo que tenta resolver problema scom gradientes descendentes, que podem demorar muito ou serem dificeis de achar um "tunning" ideal. Por fim, o DQN, que tem como base o Q-Learning, mais especificamente a versão "Neural Fitted Q Iteration"

In [None]:
#tensorboard --logdir /tmp/stable-baselines/

### Considerações Iniciais

In [None]:
#Para o modelo gerar a saida de dados de eval no tensor flow, a unica forma que encontrei foi criar dois ambientes,
#um de treinamento e outro e avaliação. Além disso, precisei setar a frequência do eval e o numero de episodios.
#coloquei apenas 1 avaliação por que acredito que o simulador ficava "parado", como explico melhor mais adiante e 
#demorava muito para continuaro treinamento. Treinamento feito inicialmente no 2e6, com mesmo numero para comparar no ZOO

### A2C

In [1]:
import gym
import time
from stable_baselines3 import A2C

#carrega o ambiente
env_a2c = gym.make('BeamRiderNoFrameskip-v4')
eval_env_a2c = gym.make('BeamRiderNoFrameskip-v4')

#instancia o algoritmo de aprendizagem
model = A2C('MlpPolicy', env_a2c, verbose=1,create_eval_env=True,tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon_a2c")

#treina o algoritmo por 2e5 passos de tempo
model.learn(total_timesteps=int(2e6),eval_env=eval_env_a2c,eval_freq=200000, n_eval_episodes=1)


#### Visualização do primeiro resultado

In [None]:
import time

for _ in range(2):
    obs = env_a2c.reset()
    for i in range(1000):
        action, _states = model.predict(obs, deterministic=True)
        obs, rewards, done, info = env_a2c.step(action)
        time.sleep(0.003)
        env_a2c.render()
        
env_a2c.close()



In [6]:
env_a2c.close()

#### Salvamento, carregamento e avaliação do modelo

In [2]:
# Salva o agente
model.save("modelos/a2c/recigio_beamrider2M")

In [3]:
# deleta o agente/modelo
del model  

In [2]:
# carrega um modelo já treinado
model = A2C.load("modelos/a2c/recigio_beamrider2M", env=env_a2c)

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [None]:
# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, render=True,)

In [13]:
print(mean_reward)
print(std_reward)

660.0
0.0


### PPO

In [5]:
import gym
import time
from stable_baselines3 import PPO

#carrega o ambiente
env_ppo = gym.make('BeamRiderNoFrameskip-v4')
eval_env_ppo = gym.make('BeamRiderNoFrameskip-v4')

#instancia o algoritmo de aprendizagem
model2 = PPO('MlpPolicy', env_ppo, verbose=1,create_eval_env=True,tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon_ppo")

#treina o algoritmo por 2e5 passos de tempo
model2.learn(total_timesteps=int(2e6),eval_env=eval_env_ppo,eval_freq=200000, n_eval_episodes=1)

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to /tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon_ppo\PPO_1
-----------------------------
| time/              |      |
|    fps             | 397  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 191          |
|    iterations           | 2            |
|    time_elapsed         | 21           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0107437195 |
|    clip_fraction        | 0.0756       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.1

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 5.55e+03    |
|    ep_rew_mean          | 381         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 11          |
|    time_elapsed         | 167         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.010906251 |
|    clip_fraction        | 0.109       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.1        |
|    explained_variance   | -0.0306     |
|    learning_rate        | 0.0003      |
|    loss                 | 29.4        |
|    n_updates            | 100         |
|    policy_gradient_loss | -0.018      |
|    value_loss           | 46          |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.03e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.14e+03    |
|    ep_rew_mean          | 455         |
| time/                   |             |
|    fps                  | 127         |
|    iterations           | 21          |
|    time_elapsed         | 338         |
|    total_timesteps      | 43008       |
| train/                  |             |
|    approx_kl            | 0.014205655 |
|    clip_fraction        | 0.151       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.06       |
|    explained_variance   | -0.0521     |
|    learning_rate        | 0.0003      |
|    loss                 | 20          |
|    n_updates            | 200         |
|    policy_gradient_loss | -0.0163     |
|    value_loss           | 22.4        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.23e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.27e+03     |
|    ep_rew_mean          | 427          |
| time/                   |              |
|    fps                  | 126          |
|    iterations           | 31           |
|    time_elapsed         | 499          |
|    total_timesteps      | 63488        |
| train/                  |              |
|    approx_kl            | 0.0108986115 |
|    clip_fraction        | 0.086        |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.95        |
|    explained_variance   | -0.0229      |
|    learning_rate        | 0.0003       |
|    loss                 | 18.3         |
|    n_updates            | 300          |
|    policy_gradient_loss | -0.00847     |
|    value_loss           | 36.1         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.59e+03    |
|    ep_rew_mean          | 445         |
| time/                   |             |
|    fps                  | 126         |
|    iterations           | 41          |
|    time_elapsed         | 663         |
|    total_timesteps      | 83968       |
| train/                  |             |
|    approx_kl            | 0.008661712 |
|    clip_fraction        | 0.0611      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.78       |
|    explained_variance   | -0.0284     |
|    learning_rate        | 0.0003      |
|    loss                 | 10          |
|    n_updates            | 400         |
|    policy_gradient_loss | -0.00969    |
|    value_loss           | 15.7        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 6.49e+03

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.01e+03    |
|    ep_rew_mean          | 461         |
| time/                   |             |
|    fps                  | 127         |
|    iterations           | 51          |
|    time_elapsed         | 820         |
|    total_timesteps      | 104448      |
| train/                  |             |
|    approx_kl            | 0.008543698 |
|    clip_fraction        | 0.0715      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.64       |
|    explained_variance   | -0.0525     |
|    learning_rate        | 0.0003      |
|    loss                 | 20.4        |
|    n_updates            | 500         |
|    policy_gradient_loss | -0.00907    |
|    value_loss           | 30.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.99e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.27e+03    |
|    ep_rew_mean          | 476         |
| time/                   |             |
|    fps                  | 127         |
|    iterations           | 61          |
|    time_elapsed         | 976         |
|    total_timesteps      | 124928      |
| train/                  |             |
|    approx_kl            | 0.018960942 |
|    clip_fraction        | 0.158       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.19       |
|    explained_variance   | -0.632      |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0378     |
|    n_updates            | 600         |
|    policy_gradient_loss | -0.0102     |
|    value_loss           | 0.0295      |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.27e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.52e+03    |
|    ep_rew_mean          | 488         |
| time/                   |             |
|    fps                  | 128         |
|    iterations           | 71          |
|    time_elapsed         | 1131        |
|    total_timesteps      | 145408      |
| train/                  |             |
|    approx_kl            | 0.011069181 |
|    clip_fraction        | 0.0805      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.1        |
|    explained_variance   | 0.00441     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.747       |
|    n_updates            | 700         |
|    policy_gradient_loss | -0.00661    |
|    value_loss           | 22.9        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.52

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.79e+03    |
|    ep_rew_mean          | 497         |
| time/                   |             |
|    fps                  | 128         |
|    iterations           | 81          |
|    time_elapsed         | 1286        |
|    total_timesteps      | 165888      |
| train/                  |             |
|    approx_kl            | 0.004459809 |
|    clip_fraction        | 0.0732      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.875      |
|    explained_variance   | 0.0141      |
|    learning_rate        | 0.0003      |
|    loss                 | 24.5        |
|    n_updates            | 800         |
|    policy_gradient_loss | -0.00697    |
|    value_loss           | 15.4        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.79e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.89e+03     |
|    ep_rew_mean          | 493          |
| time/                   |              |
|    fps                  | 129          |
|    iterations           | 91           |
|    time_elapsed         | 1443         |
|    total_timesteps      | 186368       |
| train/                  |              |
|    approx_kl            | 0.0076142056 |
|    clip_fraction        | 0.0893       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.8         |
|    explained_variance   | -0.000162    |
|    learning_rate        | 0.0003       |
|    loss                 | 2.35         |
|    n_updates            | 900          |
|    policy_gradient_loss | -0.00838     |
|    value_loss           | 18.1         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.83e+03    |
|    ep_rew_mean          | 493         |
| time/                   |             |
|    fps                  | 127         |
|    iterations           | 100         |
|    time_elapsed         | 1610        |
|    total_timesteps      | 204800      |
| train/                  |             |
|    approx_kl            | 0.012354061 |
|    clip_fraction        | 0.154       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.837      |
|    explained_variance   | -0.051      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0305      |
|    n_updates            | 990         |
|    policy_gradient_loss | -0.00898    |
|    value_loss           | 0.00975     |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.2e

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.35e+03     |
|    ep_rew_mean          | 507          |
| time/                   |              |
|    fps                  | 127          |
|    iterations           | 110          |
|    time_elapsed         | 1767         |
|    total_timesteps      | 225280       |
| train/                  |              |
|    approx_kl            | 0.0027320292 |
|    clip_fraction        | 0.0193       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.759       |
|    explained_variance   | 0.00903      |
|    learning_rate        | 0.0003       |
|    loss                 | 17.5         |
|    n_updates            | 1090         |
|    policy_gradient_loss | -0.00551     |
|    value_loss           | 43.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.36e+03     |
|    ep_rew_mean          | 511          |
| time/                   |              |
|    fps                  | 127          |
|    iterations           | 120          |
|    time_elapsed         | 1925         |
|    total_timesteps      | 245760       |
| train/                  |              |
|    approx_kl            | 0.0040304763 |
|    clip_fraction        | 0.0504       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.609       |
|    explained_variance   | 0.0295       |
|    learning_rate        | 0.0003       |
|    loss                 | 1.11         |
|    n_updates            | 1190         |
|    policy_gradient_loss | -0.00717     |
|    value_loss           | 28.3         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 8.36e+03   |
|    ep_rew_mean          | 509        |
| time/                   |            |
|    fps                  | 127        |
|    iterations           | 130        |
|    time_elapsed         | 2081       |
|    total_timesteps      | 266240     |
| train/                  |            |
|    approx_kl            | 0.00426084 |
|    clip_fraction        | 0.0493     |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.617     |
|    explained_variance   | 0.002      |
|    learning_rate        | 0.0003     |
|    loss                 | 6.28       |
|    n_updates            | 1290       |
|    policy_gradient_loss | -0.00395   |
|    value_loss           | 15.6       |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.36e+03     |
|    ep_re

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.62e+03    |
|    ep_rew_mean          | 517         |
| time/                   |             |
|    fps                  | 128         |
|    iterations           | 140         |
|    time_elapsed         | 2237        |
|    total_timesteps      | 286720      |
| train/                  |             |
|    approx_kl            | 0.024368979 |
|    clip_fraction        | 0.0897      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.599      |
|    explained_variance   | -0.167      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0294      |
|    n_updates            | 1390        |
|    policy_gradient_loss | -0.00841    |
|    value_loss           | 0.0643      |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.62e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.34e+03     |
|    ep_rew_mean          | 496          |
| time/                   |              |
|    fps                  | 128          |
|    iterations           | 150          |
|    time_elapsed         | 2394         |
|    total_timesteps      | 307200       |
| train/                  |              |
|    approx_kl            | 0.0043944833 |
|    clip_fraction        | 0.0315       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.638       |
|    explained_variance   | 0.00868      |
|    learning_rate        | 0.0003       |
|    loss                 | 27.7         |
|    n_updates            | 1490         |
|    policy_gradient_loss | -0.00241     |
|    value_loss           | 30.4         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.5e+03     |
|    ep_rew_mean          | 502         |
| time/                   |             |
|    fps                  | 128         |
|    iterations           | 160         |
|    time_elapsed         | 2551        |
|    total_timesteps      | 327680      |
| train/                  |             |
|    approx_kl            | 0.004569036 |
|    clip_fraction        | 0.0705      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.602      |
|    explained_variance   | 0.00771     |
|    learning_rate        | 0.0003      |
|    loss                 | 2.57        |
|    n_updates            | 1590        |
|    policy_gradient_loss | -0.00508    |
|    value_loss           | 16.2        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.5e

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.66e+03     |
|    ep_rew_mean          | 509          |
| time/                   |              |
|    fps                  | 128          |
|    iterations           | 170          |
|    time_elapsed         | 2707         |
|    total_timesteps      | 348160       |
| train/                  |              |
|    approx_kl            | 0.0035258143 |
|    clip_fraction        | 0.031        |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.437       |
|    explained_variance   | 0.00349      |
|    learning_rate        | 0.0003       |
|    loss                 | 31           |
|    n_updates            | 1690         |
|    policy_gradient_loss | -0.00276     |
|    value_loss           | 36.9         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.85e+03    |
|    ep_rew_mean          | 517         |
| time/                   |             |
|    fps                  | 128         |
|    iterations           | 180         |
|    time_elapsed         | 2864        |
|    total_timesteps      | 368640      |
| train/                  |             |
|    approx_kl            | 0.015452104 |
|    clip_fraction        | 0.129       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.411      |
|    explained_variance   | -4.23       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0595      |
|    n_updates            | 1790        |
|    policy_gradient_loss | -0.025      |
|    value_loss           | 0.0657      |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.85e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.99e+03     |
|    ep_rew_mean          | 524          |
| time/                   |              |
|    fps                  | 128          |
|    iterations           | 190          |
|    time_elapsed         | 3020         |
|    total_timesteps      | 389120       |
| train/                  |              |
|    approx_kl            | 0.0033507938 |
|    clip_fraction        | 0.0548       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.355       |
|    explained_variance   | 0.00447      |
|    learning_rate        | 0.0003       |
|    loss                 | 6.33         |
|    n_updates            | 1890         |
|    policy_gradient_loss | -0.00703     |
|    value_loss           | 15.9         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.05e+03     |
|    ep_rew_mean          | 527          |
| time/                   |              |
|    fps                  | 128          |
|    iterations           | 199          |
|    time_elapsed         | 3182         |
|    total_timesteps      | 407552       |
| train/                  |              |
|    approx_kl            | 0.0023205704 |
|    clip_fraction        | 0.0354       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.395       |
|    explained_variance   | 0.013        |
|    learning_rate        | 0.0003       |
|    loss                 | 6.72         |
|    n_updates            | 1980         |
|    policy_gradient_loss | -0.00373     |
|    value_loss           | 29.6         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.12e+03     |
|    ep_rew_mean          | 529          |
| time/                   |              |
|    fps                  | 128          |
|    iterations           | 209          |
|    time_elapsed         | 3329         |
|    total_timesteps      | 428032       |
| train/                  |              |
|    approx_kl            | 0.0074239075 |
|    clip_fraction        | 0.0385       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.365       |
|    explained_variance   | 0.000427     |
|    learning_rate        | 0.0003       |
|    loss                 | 5.11         |
|    n_updates            | 2080         |
|    policy_gradient_loss | -0.00345     |
|    value_loss           | 30.1         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.3e+03      |
|    ep_rew_mean          | 532          |
| time/                   |              |
|    fps                  | 129          |
|    iterations           | 219          |
|    time_elapsed         | 3475         |
|    total_timesteps      | 448512       |
| train/                  |              |
|    approx_kl            | 0.0055661555 |
|    clip_fraction        | 0.0498       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.309       |
|    explained_variance   | -0.051       |
|    learning_rate        | 0.0003       |
|    loss                 | 2.93         |
|    n_updates            | 2180         |
|    policy_gradient_loss | -0.0034      |
|    value_loss           | 7.92         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.53e+03    |
|    ep_rew_mean          | 533         |
| time/                   |             |
|    fps                  | 129         |
|    iterations           | 229         |
|    time_elapsed         | 3621        |
|    total_timesteps      | 468992      |
| train/                  |             |
|    approx_kl            | 0.004654942 |
|    clip_fraction        | 0.0578      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.508      |
|    explained_variance   | -0.126      |
|    learning_rate        | 0.0003      |
|    loss                 | -0.000763   |
|    n_updates            | 2280        |
|    policy_gradient_loss | -0.0015     |
|    value_loss           | 0.0264      |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.53e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.46e+03    |
|    ep_rew_mean          | 528         |
| time/                   |             |
|    fps                  | 129         |
|    iterations           | 239         |
|    time_elapsed         | 3768        |
|    total_timesteps      | 489472      |
| train/                  |             |
|    approx_kl            | 0.004341785 |
|    clip_fraction        | 0.043       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.393      |
|    explained_variance   | 0.0132      |
|    learning_rate        | 0.0003      |
|    loss                 | 22.7        |
|    n_updates            | 2380        |
|    policy_gradient_loss | -0.00373    |
|    value_loss           | 22.8        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.42

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 9.37e+03   |
|    ep_rew_mean          | 526        |
| time/                   |            |
|    fps                  | 130        |
|    iterations           | 249        |
|    time_elapsed         | 3914       |
|    total_timesteps      | 509952     |
| train/                  |            |
|    approx_kl            | 0.01234855 |
|    clip_fraction        | 0.132      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.512     |
|    explained_variance   | -0.712     |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0122    |
|    n_updates            | 2480       |
|    policy_gradient_loss | -0.0175    |
|    value_loss           | 0.0249     |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.37e+03    |
|    ep_rew_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.5e+03     |
|    ep_rew_mean          | 527         |
| time/                   |             |
|    fps                  | 130         |
|    iterations           | 259         |
|    time_elapsed         | 4060        |
|    total_timesteps      | 530432      |
| train/                  |             |
|    approx_kl            | 0.004878914 |
|    clip_fraction        | 0.0578      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.47       |
|    explained_variance   | 0.000275    |
|    learning_rate        | 0.0003      |
|    loss                 | 2.07        |
|    n_updates            | 2580        |
|    policy_gradient_loss | -0.004      |
|    value_loss           | 15.4        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.5e

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.62e+03     |
|    ep_rew_mean          | 534          |
| time/                   |              |
|    fps                  | 130          |
|    iterations           | 269          |
|    time_elapsed         | 4206         |
|    total_timesteps      | 550912       |
| train/                  |              |
|    approx_kl            | 0.0035057673 |
|    clip_fraction        | 0.0521       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.313       |
|    explained_variance   | 0.00508      |
|    learning_rate        | 0.0003       |
|    loss                 | 19.8         |
|    n_updates            | 2680         |
|    policy_gradient_loss | -0.00388     |
|    value_loss           | 26           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.65e+03     |
|    ep_rew_mean          | 536          |
| time/                   |              |
|    fps                  | 131          |
|    iterations           | 279          |
|    time_elapsed         | 4351         |
|    total_timesteps      | 571392       |
| train/                  |              |
|    approx_kl            | 0.0051052365 |
|    clip_fraction        | 0.0681       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.263       |
|    explained_variance   | -0.0219      |
|    learning_rate        | 0.0003       |
|    loss                 | 0.237        |
|    n_updates            | 2780         |
|    policy_gradient_loss | -0.00443     |
|    value_loss           | 7.99         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.72e+03     |
|    ep_rew_mean          | 540          |
| time/                   |              |
|    fps                  | 131          |
|    iterations           | 289          |
|    time_elapsed         | 4497         |
|    total_timesteps      | 591872       |
| train/                  |              |
|    approx_kl            | 0.0031449348 |
|    clip_fraction        | 0.0379       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.404       |
|    explained_variance   | 0.00604      |
|    learning_rate        | 0.0003       |
|    loss                 | 4.8          |
|    n_updates            | 2880         |
|    policy_gradient_loss | -0.00505     |
|    value_loss           | 30.1         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.7e+03      |
|    ep_rew_mean          | 538          |
| time/                   |              |
|    fps                  | 131          |
|    iterations           | 298          |
|    time_elapsed         | 4653         |
|    total_timesteps      | 610304       |
| train/                  |              |
|    approx_kl            | 0.0045033987 |
|    clip_fraction        | 0.0433       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.415       |
|    explained_variance   | 0.0054       |
|    learning_rate        | 0.0003       |
|    loss                 | 39.5         |
|    n_updates            | 2970         |
|    policy_gradient_loss | -0.0054      |
|    value_loss           | 36           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.76e+03    |
|    ep_rew_mean          | 542         |
| time/                   |             |
|    fps                  | 131         |
|    iterations           | 308         |
|    time_elapsed         | 4799        |
|    total_timesteps      | 630784      |
| train/                  |             |
|    approx_kl            | 0.013395247 |
|    clip_fraction        | 0.105       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.439      |
|    explained_variance   | -5.11       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0212     |
|    n_updates            | 3070        |
|    policy_gradient_loss | -0.019      |
|    value_loss           | 0.0806      |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.76e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.82e+03     |
|    ep_rew_mean          | 546          |
| time/                   |              |
|    fps                  | 131          |
|    iterations           | 318          |
|    time_elapsed         | 4945         |
|    total_timesteps      | 651264       |
| train/                  |              |
|    approx_kl            | 0.0038857535 |
|    clip_fraction        | 0.0317       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.338       |
|    explained_variance   | 0.00793      |
|    learning_rate        | 0.0003       |
|    loss                 | 17.1         |
|    n_updates            | 3170         |
|    policy_gradient_loss | -0.00567     |
|    value_loss           | 30.4         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.85e+03    |
|    ep_rew_mean          | 547         |
| time/                   |             |
|    fps                  | 131         |
|    iterations           | 328         |
|    time_elapsed         | 5091        |
|    total_timesteps      | 671744      |
| train/                  |             |
|    approx_kl            | 0.002024447 |
|    clip_fraction        | 0.0157      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.364      |
|    explained_variance   | 0.00476     |
|    learning_rate        | 0.0003      |
|    loss                 | 3.88        |
|    n_updates            | 3270        |
|    policy_gradient_loss | -0.00233    |
|    value_loss           | 22.6        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.88e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.91e+03    |
|    ep_rew_mean          | 551         |
| time/                   |             |
|    fps                  | 132         |
|    iterations           | 338         |
|    time_elapsed         | 5236        |
|    total_timesteps      | 692224      |
| train/                  |             |
|    approx_kl            | 0.003144648 |
|    clip_fraction        | 0.0466      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.321      |
|    explained_variance   | 0.00707     |
|    learning_rate        | 0.0003      |
|    loss                 | 26          |
|    n_updates            | 3370        |
|    policy_gradient_loss | -0.00682    |
|    value_loss           | 29.7        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 9.91e+03

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.96e+03    |
|    ep_rew_mean          | 554         |
| time/                   |             |
|    fps                  | 132         |
|    iterations           | 348         |
|    time_elapsed         | 5382        |
|    total_timesteps      | 712704      |
| train/                  |             |
|    approx_kl            | 0.060702983 |
|    clip_fraction        | 0.178       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.442      |
|    explained_variance   | -4.95       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0341     |
|    n_updates            | 3470        |
|    policy_gradient_loss | -0.038      |
|    value_loss           | 0.141       |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.96e+

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+04      |
|    ep_rew_mean          | 557        |
| time/                   |            |
|    fps                  | 132        |
|    iterations           | 358        |
|    time_elapsed         | 5529       |
|    total_timesteps      | 733184     |
| train/                  |            |
|    approx_kl            | 0.03460091 |
|    clip_fraction        | 0.114      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.376     |
|    explained_variance   | -5.18      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.00468   |
|    n_updates            | 3570       |
|    policy_gradient_loss | -0.0201    |
|    value_loss           | 0.135      |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1e+04        |
|    ep_re

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1e+04        |
|    ep_rew_mean          | 557          |
| time/                   |              |
|    fps                  | 132          |
|    iterations           | 368          |
|    time_elapsed         | 5675         |
|    total_timesteps      | 753664       |
| train/                  |              |
|    approx_kl            | 0.0050832797 |
|    clip_fraction        | 0.0519       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.397       |
|    explained_variance   | 0.0183       |
|    learning_rate        | 0.0003       |
|    loss                 | 21.9         |
|    n_updates            | 3670         |
|    policy_gradient_loss | -0.0035      |
|    value_loss           | 43.2         |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mea

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean          | 561        |
| time/                   |            |
|    fps                  | 132        |
|    iterations           | 378        |
|    time_elapsed         | 5821       |
|    total_timesteps      | 774144     |
| train/                  |            |
|    approx_kl            | 0.05011772 |
|    clip_fraction        | 0.173      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.438     |
|    explained_variance   | -6.23      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0676    |
|    n_updates            | 3770       |
|    policy_gradient_loss | -0.0231    |
|    value_loss           | 0.069      |
----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.01e+04   |
|    ep_rew_mean

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.02e+04     |
|    ep_rew_mean          | 564          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 388          |
|    time_elapsed         | 5967         |
|    total_timesteps      | 794624       |
| train/                  |              |
|    approx_kl            | 0.0045064883 |
|    clip_fraction        | 0.0616       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.41        |
|    explained_variance   | 0.00587      |
|    learning_rate        | 0.0003       |
|    loss                 | 0.245        |
|    n_updates            | 3870         |
|    policy_gradient_loss | -0.00692     |
|    value_loss           | 15.7         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.02e+04     |
|    ep_rew_mean          | 565          |
| time/                   |              |
|    fps                  | 132          |
|    iterations           | 397          |
|    time_elapsed         | 6123         |
|    total_timesteps      | 813056       |
| train/                  |              |
|    approx_kl            | 0.0037936314 |
|    clip_fraction        | 0.0405       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.396       |
|    explained_variance   | 0.0186       |
|    learning_rate        | 0.0003       |
|    loss                 | 20.7         |
|    n_updates            | 3960         |
|    policy_gradient_loss | -0.00443     |
|    value_loss           | 36.2         |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mea

---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1.02e+04  |
|    ep_rew_mean          | 568       |
| time/                   |           |
|    fps                  | 132       |
|    iterations           | 407       |
|    time_elapsed         | 6270      |
|    total_timesteps      | 833536    |
| train/                  |           |
|    approx_kl            | 0.0283567 |
|    clip_fraction        | 0.0876    |
|    clip_range           | 0.2       |
|    entropy_loss         | -0.47     |
|    explained_variance   | -4.59     |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0423   |
|    n_updates            | 4060      |
|    policy_gradient_loss | -0.0115   |
|    value_loss           | 0.0899    |
---------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1.02e+04  |
|    ep_rew_mean          | 568       |


-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.03e+04    |
|    ep_rew_mean          | 570         |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 417         |
|    time_elapsed         | 6416        |
|    total_timesteps      | 854016      |
| train/                  |             |
|    approx_kl            | 0.006292998 |
|    clip_fraction        | 0.034       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.382      |
|    explained_variance   | 0.00234     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.382       |
|    n_updates            | 4160        |
|    policy_gradient_loss | -0.00464    |
|    value_loss           | 8.14        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.03e+04    |
|    ep_rew_mean          | 569         |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 427         |
|    time_elapsed         | 6562        |
|    total_timesteps      | 874496      |
| train/                  |             |
|    approx_kl            | 0.013476977 |
|    clip_fraction        | 0.0872      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.454      |
|    explained_variance   | -2.05       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.00979     |
|    n_updates            | 4260        |
|    policy_gradient_loss | -0.00919    |
|    value_loss           | 0.155       |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03e+04     |
|    ep_rew_mean          | 570          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 437          |
|    time_elapsed         | 6709         |
|    total_timesteps      | 894976       |
| train/                  |              |
|    approx_kl            | 0.0027187667 |
|    clip_fraction        | 0.0252       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.402       |
|    explained_variance   | 0.0135       |
|    learning_rate        | 0.0003       |
|    loss                 | 14.5         |
|    n_updates            | 4360         |
|    policy_gradient_loss | -0.00253     |
|    value_loss           | 30.1         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03e+04     |
|    ep_rew_mean          | 572          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 447          |
|    time_elapsed         | 6855         |
|    total_timesteps      | 915456       |
| train/                  |              |
|    approx_kl            | 0.0041276608 |
|    clip_fraction        | 0.052        |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.452       |
|    explained_variance   | 0.00226      |
|    learning_rate        | 0.0003       |
|    loss                 | 1.03         |
|    n_updates            | 4460         |
|    policy_gradient_loss | -0.00561     |
|    value_loss           | 15.5         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03e+04     |
|    ep_rew_mean          | 574          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 457          |
|    time_elapsed         | 7001         |
|    total_timesteps      | 935936       |
| train/                  |              |
|    approx_kl            | 0.0023687321 |
|    clip_fraction        | 0.0303       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.438       |
|    explained_variance   | 0.00907      |
|    learning_rate        | 0.0003       |
|    loss                 | 19.1         |
|    n_updates            | 4560         |
|    policy_gradient_loss | -0.00356     |
|    value_loss           | 16           |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.04e+04     |
|    ep_rew_mean          | 576          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 467          |
|    time_elapsed         | 7146         |
|    total_timesteps      | 956416       |
| train/                  |              |
|    approx_kl            | 0.0036290006 |
|    clip_fraction        | 0.057        |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.392       |
|    explained_variance   | 0.0078       |
|    learning_rate        | 0.0003       |
|    loss                 | 5.96         |
|    n_updates            | 4660         |
|    policy_gradient_loss | -0.00567     |
|    value_loss           | 15.5         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.04e+04    |
|    ep_rew_mean          | 577         |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 477         |
|    time_elapsed         | 7293        |
|    total_timesteps      | 976896      |
| train/                  |             |
|    approx_kl            | 0.003994997 |
|    clip_fraction        | 0.0418      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.448      |
|    explained_variance   | 0.0201      |
|    learning_rate        | 0.0003      |
|    loss                 | 13          |
|    n_updates            | 4760        |
|    policy_gradient_loss | -0.00315    |
|    value_loss           | 43.4        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.04e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.04e+04    |
|    ep_rew_mean          | 578         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 487         |
|    time_elapsed         | 7439        |
|    total_timesteps      | 997376      |
| train/                  |             |
|    approx_kl            | 0.005166124 |
|    clip_fraction        | 0.0741      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.408      |
|    explained_variance   | 0.00138     |
|    learning_rate        | 0.0003      |
|    loss                 | 2.75        |
|    n_updates            | 4860        |
|    policy_gradient_loss | -0.0056     |
|    value_loss           | 15.5        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.04

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.04e+04    |
|    ep_rew_mean          | 575         |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 496         |
|    time_elapsed         | 7596        |
|    total_timesteps      | 1015808     |
| train/                  |             |
|    approx_kl            | 0.012888439 |
|    clip_fraction        | 0.0885      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.627      |
|    explained_variance   | 0.0117      |
|    learning_rate        | 0.0003      |
|    loss                 | 3.32        |
|    n_updates            | 4950        |
|    policy_gradient_loss | -0.00572    |
|    value_loss           | 15.5        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.04e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.03e+04    |
|    ep_rew_mean          | 573         |
| time/                   |             |
|    fps                  | 133         |
|    iterations           | 506         |
|    time_elapsed         | 7743        |
|    total_timesteps      | 1036288     |
| train/                  |             |
|    approx_kl            | 0.022549178 |
|    clip_fraction        | 0.119       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.718      |
|    explained_variance   | -1.2        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.146       |
|    n_updates            | 5050        |
|    policy_gradient_loss | -0.0108     |
|    value_loss           | 0.131       |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.03

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.04e+04     |
|    ep_rew_mean          | 574          |
| time/                   |              |
|    fps                  | 133          |
|    iterations           | 516          |
|    time_elapsed         | 7889         |
|    total_timesteps      | 1056768      |
| train/                  |              |
|    approx_kl            | 0.0042110262 |
|    clip_fraction        | 0.0354       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.616       |
|    explained_variance   | 0.00602      |
|    learning_rate        | 0.0003       |
|    loss                 | 17           |
|    n_updates            | 5150         |
|    policy_gradient_loss | -0.00417     |
|    value_loss           | 36.6         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.05e+04     |
|    ep_rew_mean          | 574          |
| time/                   |              |
|    fps                  | 134          |
|    iterations           | 526          |
|    time_elapsed         | 8034         |
|    total_timesteps      | 1077248      |
| train/                  |              |
|    approx_kl            | 0.0038261518 |
|    clip_fraction        | 0.0333       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.61        |
|    explained_variance   | 0.153        |
|    learning_rate        | 0.0003       |
|    loss                 | 3.48         |
|    n_updates            | 5250         |
|    policy_gradient_loss | -0.00476     |
|    value_loss           | 31.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.06e+04    |
|    ep_rew_mean          | 578         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 536         |
|    time_elapsed         | 8180        |
|    total_timesteps      | 1097728     |
| train/                  |             |
|    approx_kl            | 0.019676175 |
|    clip_fraction        | 0.0773      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.643      |
|    explained_variance   | -0.86       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0483     |
|    n_updates            | 5350        |
|    policy_gradient_loss | -0.00613    |
|    value_loss           | 0.00756     |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.06

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.07e+04    |
|    ep_rew_mean          | 579         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 546         |
|    time_elapsed         | 8326        |
|    total_timesteps      | 1118208     |
| train/                  |             |
|    approx_kl            | 0.005397902 |
|    clip_fraction        | 0.0646      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.657      |
|    explained_variance   | -0.0312     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0241      |
|    n_updates            | 5450        |
|    policy_gradient_loss | -0.00452    |
|    value_loss           | 8.05        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.07e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.07e+04     |
|    ep_rew_mean          | 584          |
| time/                   |              |
|    fps                  | 134          |
|    iterations           | 556          |
|    time_elapsed         | 8472         |
|    total_timesteps      | 1138688      |
| train/                  |              |
|    approx_kl            | 0.0054244334 |
|    clip_fraction        | 0.0549       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.576       |
|    explained_variance   | 0.00156      |
|    learning_rate        | 0.0003       |
|    loss                 | 10.3         |
|    n_updates            | 5550         |
|    policy_gradient_loss | -0.00507     |
|    value_loss           | 15.6         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 585         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 566         |
|    time_elapsed         | 8617        |
|    total_timesteps      | 1159168     |
| train/                  |             |
|    approx_kl            | 0.005083957 |
|    clip_fraction        | 0.0523      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.478      |
|    explained_variance   | 0.0047      |
|    learning_rate        | 0.0003      |
|    loss                 | 13.2        |
|    n_updates            | 5650        |
|    policy_gradient_loss | -0.00416    |
|    value_loss           | 23.5        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.08e+04   |
|    ep_rew_mean          | 587        |
| time/                   |            |
|    fps                  | 134        |
|    iterations           | 576        |
|    time_elapsed         | 8763       |
|    total_timesteps      | 1179648    |
| train/                  |            |
|    approx_kl            | 0.00603284 |
|    clip_fraction        | 0.045      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.466     |
|    explained_variance   | -0.0112    |
|    learning_rate        | 0.0003     |
|    loss                 | 14.4       |
|    n_updates            | 5750       |
|    policy_gradient_loss | -0.00342   |
|    value_loss           | 15.3       |
----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.08e+04   |
|    ep_rew_mean

Eval num_timesteps=1200000, episode_reward=660.00 +/- 0.00
Episode length: 11795.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.18e+04    |
|    mean_reward          | 660         |
| time/                   |             |
|    total timesteps      | 1200000     |
| train/                  |             |
|    approx_kl            | 0.004476059 |
|    clip_fraction        | 0.0407      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.488      |
|    explained_variance   | 0.00744     |
|    learning_rate        | 0.0003      |
|    loss                 | 3.05        |
|    n_updates            | 5850        |
|    policy_gradient_loss | -0.00335    |
|    value_loss           | 8.12        |
-----------------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.09e+04 |
|    ep_rew_mean     | 587      |
| time/        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 582         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 595         |
|    time_elapsed         | 9065        |
|    total_timesteps      | 1218560     |
| train/                  |             |
|    approx_kl            | 0.014919357 |
|    clip_fraction        | 0.122       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.521      |
|    explained_variance   | -5.87       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.057      |
|    n_updates            | 5940        |
|    policy_gradient_loss | -0.0174     |
|    value_loss           | 0.06        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 579         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 605         |
|    time_elapsed         | 9211        |
|    total_timesteps      | 1239040     |
| train/                  |             |
|    approx_kl            | 0.008207217 |
|    clip_fraction        | 0.0567      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.541      |
|    explained_variance   | 0.00499     |
|    learning_rate        | 0.0003      |
|    loss                 | 3.47        |
|    n_updates            | 6040        |
|    policy_gradient_loss | -0.00412    |
|    value_loss           | 15.6        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.07

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 578         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 615         |
|    time_elapsed         | 9357        |
|    total_timesteps      | 1259520     |
| train/                  |             |
|    approx_kl            | 0.009535714 |
|    clip_fraction        | 0.0802      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.479      |
|    explained_variance   | 0.00621     |
|    learning_rate        | 0.0003      |
|    loss                 | 2.12        |
|    n_updates            | 6140        |
|    policy_gradient_loss | -0.00297    |
|    value_loss           | 8.01        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.09e+04    |
|    ep_rew_mean          | 584         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 625         |
|    time_elapsed         | 9503        |
|    total_timesteps      | 1280000     |
| train/                  |             |
|    approx_kl            | 0.009343265 |
|    clip_fraction        | 0.0475      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.377      |
|    explained_variance   | 0.0324      |
|    learning_rate        | 0.0003      |
|    loss                 | 71.3        |
|    n_updates            | 6240        |
|    policy_gradient_loss | -0.00288    |
|    value_loss           | 36.2        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.09

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 583         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 635         |
|    time_elapsed         | 9649        |
|    total_timesteps      | 1300480     |
| train/                  |             |
|    approx_kl            | 0.029763844 |
|    clip_fraction        | 0.158       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.57       |
|    explained_variance   | -7.66       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.02       |
|    n_updates            | 6340        |
|    policy_gradient_loss | -0.0183     |
|    value_loss           | 0.0592      |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.08e+04

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.09e+04     |
|    ep_rew_mean          | 584          |
| time/                   |              |
|    fps                  | 134          |
|    iterations           | 645          |
|    time_elapsed         | 9795         |
|    total_timesteps      | 1320960      |
| train/                  |              |
|    approx_kl            | 0.0066128387 |
|    clip_fraction        | 0.0846       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.706       |
|    explained_variance   | -5.95e-05    |
|    learning_rate        | 0.0003       |
|    loss                 | 8.93         |
|    n_updates            | 6440         |
|    policy_gradient_loss | -0.00738     |
|    value_loss           | 23.1         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.09e+04    |
|    ep_rew_mean          | 586         |
| time/                   |             |
|    fps                  | 134         |
|    iterations           | 655         |
|    time_elapsed         | 9941        |
|    total_timesteps      | 1341440     |
| train/                  |             |
|    approx_kl            | 0.009379855 |
|    clip_fraction        | 0.0985      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.666      |
|    explained_variance   | -1.8        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0149      |
|    n_updates            | 6540        |
|    policy_gradient_loss | -0.00511    |
|    value_loss           | 0.0607      |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.09

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.09e+04    |
|    ep_rew_mean          | 584         |
| time/                   |             |
|    fps                  | 135         |
|    iterations           | 665         |
|    time_elapsed         | 10087       |
|    total_timesteps      | 1361920     |
| train/                  |             |
|    approx_kl            | 0.006414871 |
|    clip_fraction        | 0.056       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.609      |
|    explained_variance   | -0.00955    |
|    learning_rate        | 0.0003      |
|    loss                 | 29.9        |
|    n_updates            | 6640        |
|    policy_gradient_loss | -0.00637    |
|    value_loss           | 23          |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.09

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 585         |
| time/                   |             |
|    fps                  | 135         |
|    iterations           | 675         |
|    time_elapsed         | 10233       |
|    total_timesteps      | 1382400     |
| train/                  |             |
|    approx_kl            | 0.005325585 |
|    clip_fraction        | 0.0566      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.55       |
|    explained_variance   | 0.000215    |
|    learning_rate        | 0.0003      |
|    loss                 | 5.59        |
|    n_updates            | 6740        |
|    policy_gradient_loss | -0.00271    |
|    value_loss           | 8.03        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.1e+0

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1.1e+04  |
|    ep_rew_mean     | 592      |
| time/              |          |
|    fps             | 118      |
|    iterations      | 684      |
|    time_elapsed    | 11791    |
|    total_timesteps | 1400832  |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.1e+04     |
|    ep_rew_mean          | 592         |
| time/                   |             |
|    fps                  | 118         |
|    iterations           | 685         |
|    time_elapsed         | 11806       |
|    total_timesteps      | 1402880     |
| train/                  |             |
|    approx_kl            | 0.061276168 |
|    clip_fraction        | 0.17        |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.509      |
|    explained_variance   | -2.11       |
|    learning_rate        | 0.

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.1e+04      |
|    ep_rew_mean          | 595          |
| time/                   |              |
|    fps                  | 119          |
|    iterations           | 694          |
|    time_elapsed         | 11937        |
|    total_timesteps      | 1421312      |
| train/                  |              |
|    approx_kl            | 0.0048327167 |
|    clip_fraction        | 0.0413       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.264       |
|    explained_variance   | 0.00599      |
|    learning_rate        | 0.0003       |
|    loss                 | 24.2         |
|    n_updates            | 6930         |
|    policy_gradient_loss | -0.00139     |
|    value_loss           | 29.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.1e+04     |
|    ep_rew_mean          | 594         |
| time/                   |             |
|    fps                  | 119         |
|    iterations           | 704         |
|    time_elapsed         | 12082       |
|    total_timesteps      | 1441792     |
| train/                  |             |
|    approx_kl            | 0.006858806 |
|    clip_fraction        | 0.0794      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.318      |
|    explained_variance   | 0.00136     |
|    learning_rate        | 0.0003      |
|    loss                 | 20.4        |
|    n_updates            | 7030        |
|    policy_gradient_loss | -0.000988   |
|    value_loss           | 38.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.09e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.09e+04     |
|    ep_rew_mean          | 590          |
| time/                   |              |
|    fps                  | 119          |
|    iterations           | 714          |
|    time_elapsed         | 12228        |
|    total_timesteps      | 1462272      |
| train/                  |              |
|    approx_kl            | 0.0074001187 |
|    clip_fraction        | 0.0377       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.194       |
|    explained_variance   | -0.0369      |
|    learning_rate        | 0.0003       |
|    loss                 | 11.8         |
|    n_updates            | 7130         |
|    policy_gradient_loss | -0.00301     |
|    value_loss           | 8.17         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+04    |
|    ep_rew_mean          | 588         |
| time/                   |             |
|    fps                  | 119         |
|    iterations           | 724         |
|    time_elapsed         | 12374       |
|    total_timesteps      | 1482752     |
| train/                  |             |
|    approx_kl            | 0.006478217 |
|    clip_fraction        | 0.0313      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.371      |
|    explained_variance   | 0.0255      |
|    learning_rate        | 0.0003      |
|    loss                 | 43.8        |
|    n_updates            | 7230        |
|    policy_gradient_loss | -0.00307    |
|    value_loss           | 54.9        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.08e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.08e+04     |
|    ep_rew_mean          | 588          |
| time/                   |              |
|    fps                  | 120          |
|    iterations           | 734          |
|    time_elapsed         | 12519        |
|    total_timesteps      | 1503232      |
| train/                  |              |
|    approx_kl            | 0.0087810755 |
|    clip_fraction        | 0.0515       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.316       |
|    explained_variance   | -0.0105      |
|    learning_rate        | 0.0003       |
|    loss                 | 13.9         |
|    n_updates            | 7330         |
|    policy_gradient_loss | -0.00311     |
|    value_loss           | 15.7         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.06e+04     |
|    ep_rew_mean          | 583          |
| time/                   |              |
|    fps                  | 120          |
|    iterations           | 744          |
|    time_elapsed         | 12665        |
|    total_timesteps      | 1523712      |
| train/                  |              |
|    approx_kl            | 0.0057907654 |
|    clip_fraction        | 0.0467       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.246       |
|    explained_variance   | -2.54        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.00298     |
|    n_updates            | 7430         |
|    policy_gradient_loss | -0.00769     |
|    value_loss           | 0.0375       |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

--------------------------------------------
| rollout/                |                |
|    ep_len_mean          | 1.06e+04       |
|    ep_rew_mean          | 588            |
| time/                   |                |
|    fps                  | 120            |
|    iterations           | 754            |
|    time_elapsed         | 12811          |
|    total_timesteps      | 1544192        |
| train/                  |                |
|    approx_kl            | -5.9604645e-08 |
|    clip_fraction        | 0              |
|    clip_range           | 0.2            |
|    entropy_loss         | -0.00571       |
|    explained_variance   | -1.19e-07      |
|    learning_rate        | 0.0003         |
|    loss                 | 5.49e-05       |
|    n_updates            | 7530           |
|    policy_gradient_loss | 0.00233        |
|    value_loss           | 0.00545        |
--------------------------------------------
-------------------------------------------
| rollout/ 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.07e+04     |
|    ep_rew_mean          | 585          |
| time/                   |              |
|    fps                  | 120          |
|    iterations           | 764          |
|    time_elapsed         | 12956        |
|    total_timesteps      | 1564672      |
| train/                  |              |
|    approx_kl            | 0.0033752958 |
|    clip_fraction        | 0.0401       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.181       |
|    explained_variance   | -0.00285     |
|    learning_rate        | 0.0003       |
|    loss                 | 0.0461       |
|    n_updates            | 7630         |
|    policy_gradient_loss | -0.00202     |
|    value_loss           | 7.88         |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.07e+04     |
|    ep_rew_mean          | 577          |
| time/                   |              |
|    fps                  | 120          |
|    iterations           | 774          |
|    time_elapsed         | 13102        |
|    total_timesteps      | 1585152      |
| train/                  |              |
|    approx_kl            | 0.0027660758 |
|    clip_fraction        | 0.0293       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.289       |
|    explained_variance   | 3.36e-05     |
|    learning_rate        | 0.0003       |
|    loss                 | 8.99         |
|    n_updates            | 7730         |
|    policy_gradient_loss | -0.00415     |
|    value_loss           | 15.5         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.07e+04     |
|    ep_rew_mean          | 577          |
| time/                   |              |
|    fps                  | 109          |
|    iterations           | 783          |
|    time_elapsed         | 14662        |
|    total_timesteps      | 1603584      |
| train/                  |              |
|    approx_kl            | 0.0017760238 |
|    clip_fraction        | 0.0202       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.393       |
|    explained_variance   | 0.000485     |
|    learning_rate        | 0.0003       |
|    loss                 | 0.743        |
|    n_updates            | 7820         |
|    policy_gradient_loss | -0.000659    |
|    value_loss           | 7.88         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1.06e+04   |
|    ep_rew_mean          | 574        |
| time/                   |            |
|    fps                  | 109        |
|    iterations           | 793        |
|    time_elapsed         | 14808      |
|    total_timesteps      | 1624064    |
| train/                  |            |
|    approx_kl            | 0.01170277 |
|    clip_fraction        | 0.0606     |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.308     |
|    explained_variance   | 0.00128    |
|    learning_rate        | 0.0003     |
|    loss                 | 5.76       |
|    n_updates            | 7920       |
|    policy_gradient_loss | -0.00315   |
|    value_loss           | 7.9        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.07e+04    |
|    ep_rew_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.05e+04    |
|    ep_rew_mean          | 565         |
| time/                   |             |
|    fps                  | 109         |
|    iterations           | 803         |
|    time_elapsed         | 14955       |
|    total_timesteps      | 1644544     |
| train/                  |             |
|    approx_kl            | 0.011023393 |
|    clip_fraction        | 0.0676      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.473      |
|    explained_variance   | -0.00651    |
|    learning_rate        | 0.0003      |
|    loss                 | 15.7        |
|    n_updates            | 8020        |
|    policy_gradient_loss | -0.00279    |
|    value_loss           | 22.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1.05e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.04e+04     |
|    ep_rew_mean          | 562          |
| time/                   |              |
|    fps                  | 110          |
|    iterations           | 813          |
|    time_elapsed         | 15103        |
|    total_timesteps      | 1665024      |
| train/                  |              |
|    approx_kl            | 0.0012205604 |
|    clip_fraction        | 0.0104       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.272       |
|    explained_variance   | -0.000365    |
|    learning_rate        | 0.0003       |
|    loss                 | 27.7         |
|    n_updates            | 8120         |
|    policy_gradient_loss | -0.00108     |
|    value_loss           | 48.7         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.01e+04     |
|    ep_rew_mean          | 554          |
| time/                   |              |
|    fps                  | 110          |
|    iterations           | 823          |
|    time_elapsed         | 15249        |
|    total_timesteps      | 1685504      |
| train/                  |              |
|    approx_kl            | 0.0021580965 |
|    clip_fraction        | 0.0232       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.201       |
|    explained_variance   | 0.000782     |
|    learning_rate        | 0.0003       |
|    loss                 | 33.1         |
|    n_updates            | 8220         |
|    policy_gradient_loss | -0.00122     |
|    value_loss           | 45.1         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.74e+03     |
|    ep_rew_mean          | 540          |
| time/                   |              |
|    fps                  | 110          |
|    iterations           | 833          |
|    time_elapsed         | 15396        |
|    total_timesteps      | 1705984      |
| train/                  |              |
|    approx_kl            | 0.0061214743 |
|    clip_fraction        | 0.0257       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.279       |
|    explained_variance   | -0.00506     |
|    learning_rate        | 0.0003       |
|    loss                 | 7.5          |
|    n_updates            | 8320         |
|    policy_gradient_loss | 6.78e-05     |
|    value_loss           | 24.6         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.54e+03     |
|    ep_rew_mean          | 535          |
| time/                   |              |
|    fps                  | 111          |
|    iterations           | 843          |
|    time_elapsed         | 15542        |
|    total_timesteps      | 1726464      |
| train/                  |              |
|    approx_kl            | 0.0066750944 |
|    clip_fraction        | 0.0407       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.2         |
|    explained_variance   | -0.014       |
|    learning_rate        | 0.0003       |
|    loss                 | 27.7         |
|    n_updates            | 8420         |
|    policy_gradient_loss | -0.00279     |
|    value_loss           | 30.3         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 9.29e+03     |
|    ep_rew_mean          | 525          |
| time/                   |              |
|    fps                  | 111          |
|    iterations           | 853          |
|    time_elapsed         | 15688        |
|    total_timesteps      | 1746944      |
| train/                  |              |
|    approx_kl            | 0.0051137363 |
|    clip_fraction        | 0.0206       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.258       |
|    explained_variance   | 0.00877      |
|    learning_rate        | 0.0003       |
|    loss                 | 5.91         |
|    n_updates            | 8520         |
|    policy_gradient_loss | -0.00128     |
|    value_loss           | 23           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.01e+03    |
|    ep_rew_mean          | 512         |
| time/                   |             |
|    fps                  | 111         |
|    iterations           | 863         |
|    time_elapsed         | 15834       |
|    total_timesteps      | 1767424     |
| train/                  |             |
|    approx_kl            | 0.003736007 |
|    clip_fraction        | 0.0987      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.458      |
|    explained_variance   | 0.00715     |
|    learning_rate        | 0.0003      |
|    loss                 | 1.04        |
|    n_updates            | 8620        |
|    policy_gradient_loss | -0.00365    |
|    value_loss           | 8.08        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 9.01e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.78e+03     |
|    ep_rew_mean          | 508          |
| time/                   |              |
|    fps                  | 111          |
|    iterations           | 873          |
|    time_elapsed         | 15980        |
|    total_timesteps      | 1787904      |
| train/                  |              |
|    approx_kl            | 0.0070164627 |
|    clip_fraction        | 0.0661       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.372       |
|    explained_variance   | 0.00571      |
|    learning_rate        | 0.0003       |
|    loss                 | 17.4         |
|    n_updates            | 8720         |
|    policy_gradient_loss | -0.00195     |
|    value_loss           | 30.8         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.58e+03    |
|    ep_rew_mean          | 499         |
| time/                   |             |
|    fps                  | 102         |
|    iterations           | 882         |
|    time_elapsed         | 17539       |
|    total_timesteps      | 1806336     |
| train/                  |             |
|    approx_kl            | 0.005165317 |
|    clip_fraction        | 0.0476      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.338      |
|    explained_variance   | -0.013      |
|    learning_rate        | 0.0003      |
|    loss                 | 34.4        |
|    n_updates            | 8810        |
|    policy_gradient_loss | -0.00247    |
|    value_loss           | 29.9        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.53e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.25e+03    |
|    ep_rew_mean          | 484         |
| time/                   |             |
|    fps                  | 103         |
|    iterations           | 892         |
|    time_elapsed         | 17685       |
|    total_timesteps      | 1826816     |
| train/                  |             |
|    approx_kl            | 0.007989772 |
|    clip_fraction        | 0.0829      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.462      |
|    explained_variance   | 0.00693     |
|    learning_rate        | 0.0003      |
|    loss                 | 32.4        |
|    n_updates            | 8910        |
|    policy_gradient_loss | -0.00318    |
|    value_loss           | 44.2        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.25

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 8.23e+03    |
|    ep_rew_mean          | 486         |
| time/                   |             |
|    fps                  | 103         |
|    iterations           | 902         |
|    time_elapsed         | 17830       |
|    total_timesteps      | 1847296     |
| train/                  |             |
|    approx_kl            | 0.012173069 |
|    clip_fraction        | 0.0655      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.346      |
|    explained_variance   | -0.00783    |
|    learning_rate        | 0.0003      |
|    loss                 | 0.256       |
|    n_updates            | 9010        |
|    policy_gradient_loss | -0.000555   |
|    value_loss           | 8.15        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.23

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 8.07e+03     |
|    ep_rew_mean          | 482          |
| time/                   |              |
|    fps                  | 103          |
|    iterations           | 912          |
|    time_elapsed         | 17977        |
|    total_timesteps      | 1867776      |
| train/                  |              |
|    approx_kl            | 0.0047139865 |
|    clip_fraction        | 0.0421       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.247       |
|    explained_variance   | -0.00133     |
|    learning_rate        | 0.0003       |
|    loss                 | 7.46         |
|    n_updates            | 9110         |
|    policy_gradient_loss | -0.003       |
|    value_loss           | 25.6         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.87e+03     |
|    ep_rew_mean          | 479          |
| time/                   |              |
|    fps                  | 104          |
|    iterations           | 922          |
|    time_elapsed         | 18123        |
|    total_timesteps      | 1888256      |
| train/                  |              |
|    approx_kl            | 0.0063312645 |
|    clip_fraction        | 0.0452       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.329       |
|    explained_variance   | 0.0107       |
|    learning_rate        | 0.0003       |
|    loss                 | 25.6         |
|    n_updates            | 9210         |
|    policy_gradient_loss | -0.00267     |
|    value_loss           | 36           |
------------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean 

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 7.73e+03   |
|    ep_rew_mean          | 475        |
| time/                   |            |
|    fps                  | 104        |
|    iterations           | 932        |
|    time_elapsed         | 18270      |
|    total_timesteps      | 1908736    |
| train/                  |            |
|    approx_kl            | 0.04747235 |
|    clip_fraction        | 0.116      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.327     |
|    explained_variance   | -3.46      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0109    |
|    n_updates            | 9310       |
|    policy_gradient_loss | -0.00859   |
|    value_loss           | 0.116      |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.73e+03    |
|    ep_rew_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.49e+03    |
|    ep_rew_mean          | 465         |
| time/                   |             |
|    fps                  | 104         |
|    iterations           | 942         |
|    time_elapsed         | 18416       |
|    total_timesteps      | 1929216     |
| train/                  |             |
|    approx_kl            | 0.004127317 |
|    clip_fraction        | 0.0617      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.316      |
|    explained_variance   | -0.00274    |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0464      |
|    n_updates            | 9410        |
|    policy_gradient_loss | -0.00254    |
|    value_loss           | 7.9         |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.49

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.28e+03     |
|    ep_rew_mean          | 461          |
| time/                   |              |
|    fps                  | 105          |
|    iterations           | 952          |
|    time_elapsed         | 18567        |
|    total_timesteps      | 1949696      |
| train/                  |              |
|    approx_kl            | 0.0025619068 |
|    clip_fraction        | 0.039        |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.218       |
|    explained_variance   | -0.0012      |
|    learning_rate        | 0.0003       |
|    loss                 | 6.91         |
|    n_updates            | 9510         |
|    policy_gradient_loss | -0.00114     |
|    value_loss           | 24.6         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.17e+03    |
|    ep_rew_mean          | 457         |
| time/                   |             |
|    fps                  | 105         |
|    iterations           | 962         |
|    time_elapsed         | 18714       |
|    total_timesteps      | 1970176     |
| train/                  |             |
|    approx_kl            | 0.004552854 |
|    clip_fraction        | 0.0321      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.265      |
|    explained_variance   | 0.00515     |
|    learning_rate        | 0.0003      |
|    loss                 | 0.552       |
|    n_updates            | 9610        |
|    policy_gradient_loss | -0.00109    |
|    value_loss           | 8.04        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.17

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.9e+03     |
|    ep_rew_mean          | 449         |
| time/                   |             |
|    fps                  | 105         |
|    iterations           | 972         |
|    time_elapsed         | 18862       |
|    total_timesteps      | 1990656     |
| train/                  |             |
|    approx_kl            | 0.002558792 |
|    clip_fraction        | 0.0336      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.23       |
|    explained_variance   | 0.0132      |
|    learning_rate        | 0.0003      |
|    loss                 | 15.1        |
|    n_updates            | 9710        |
|    policy_gradient_loss | -0.00222    |
|    value_loss           | 43.4        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.9e

<stable_baselines3.ppo.ppo.PPO at 0x1fecc57fdf0>

#### Visualização do primeiro resultado

In [16]:
import time

for _ in range(2):
    obs = env_ppo.reset()
    for i in range(1000):
        action, _states = model2.predict(obs, deterministic=True)
        obs, rewards, done, info = env_ppo.step(action)
        time.sleep(0.003)
        env_ppo.render()
        
env_ppo.close()

In [None]:
env_ppo.close()

#### Salvamento, carregamento e avaliação do modelo

In [6]:
# Salva o agente
model2.save("modelos/ppo/recigio_beamrider2M")



In [7]:
# deleta o agente/modelo
del model2  

In [9]:
# carrega um modelo já treinado
model2 = PPO.load("modelos/ppo/recigio_beamrider2M", env=env_ppo)

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [None]:
# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model2, model2.get_env(), n_eval_episodes=10, render=True)



In [22]:
print(mean_reward)
print(std_reward)

0.0
0.0


### DQN

In [1]:
import gym
import time
from stable_baselines3 import DQN

#carrega o ambiente
env_dqn = gym.make('BeamRiderNoFrameskip-v4')
eval_env_dqn = gym.make('BeamRiderNoFrameskip-v4')

#Para o DQN, para ele conseguir rodar, com a memória que tenho na máquina, tive que setar o batch e buffer_size menor.
#instancia o algoritmo de aprendizagem
model3 = DQN('MlpPolicy', env_dqn, verbose=1,batch_size=128, buffer_size=10000,create_eval_env=True,tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon_dqn")

#treina o algoritmo por 2e5 passos de tempo
model3.learn(total_timesteps=int(2e6),eval_env=eval_env_dqn,eval_freq=200000, n_eval_episodes=1)

#### Visualização do primeiro resultado

In [None]:
import time

for _ in range(2):
    obs = env_dqn.reset()
    for i in range(1000):
        action, _states = model3.predict(obs, deterministic=True)
        obs, rewards, done, info = env_dqn.step(action)
        time.sleep(0.003)
        env_dqn.render()
env.close()

#### Salvamento, carregamento e avaliação do modelo

In [4]:
# Salva o agente
model3.save("modelos/dqn/recigio_beamrider2M")



In [5]:
# deleta o agente/modelo
del model3 

In [2]:
# carrega um modelo já treinado
model3 = DQN.load("modelos/dqn/recigio_beamrider_2M", env=env_dqn)

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [3]:
# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model3, model3.get_env(), n_eval_episodes=10, render=True)



In [4]:
print(mean_reward)
print(std_reward)

44.0
0.0


#### Análise do problema

In [10]:
# Observei que acontece uma curva no treinamento, onde ele melhora até 800k timesteps, ou um pouco mais,
# depois acredito que acontece algum tipo de overfit ou maximo gradiente, por que ele para de performar e 
# fica parado para o A2C e PPO.
# Devido a isso, farei um novo treinamento limitando a 700k steps.

#### Gráficos do tensor Flow

![mean_reward](graficos/2M_1.png)

![mean_ep_length](graficos/2M_2.png)

### Novos treinamentos

In [6]:
import gym
import time
from stable_baselines3 import A2C,PPO,DQN
from stable_baselines3.common.sb2_compat.rmsprop_tf_like import RMSpropTFLike

#carrega o ambiente
env = gym.make('BeamRiderNoFrameskip-v4')
eval_env = gym.make('BeamRiderNoFrameskip-v4')

#instancia o algoritmo de aprendizagem
model = A2C('MlpPolicy', env, verbose=1,create_eval_env=True,ent_coef=0.01,vf_coef=0.25,policy_kwargs=dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5)),tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_a2c")

#treina o algoritmo
model.learn(total_timesteps=700000,eval_env=eval_env,eval_freq=175000, n_eval_episodes=1)

model.save("modelos/a2c/recigio_beamrider_700k")
del model 



Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to /tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_a2c\A2C_1
-------------------------------------
| time/                 |           |
|    fps                | 116       |
|    iterations         | 100       |
|    time_elapsed       | 4         |
|    total_timesteps    | 500       |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -4.97e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 99        |
|    policy_loss        | 0.0732    |
|    value_loss         | 0.0445    |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 143      |
|    iterations         | 20

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.09e+03 |
|    ep_rew_mean        | 528      |
| time/                 |          |
|    fps                | 177      |
|    iterations         | 1600     |
|    time_elapsed       | 45       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -74.5    |
|    learning_rate      | 0.0007   |
|    n_updates          | 1599     |
|    policy_loss        | -0.0153  |
|    value_loss         | 5.67e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.09e+03 |
|    ep_rew_mean        | 528      |
| time/                 |          |
|    fps                | 177      |
|    iterations         | 1700     |
|    time_elapsed       | 47       |
|    total_timesteps    | 8500     |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.46e+03 |
|    ep_rew_mean        | 374      |
| time/                 |          |
|    fps                | 181      |
|    iterations         | 2900     |
|    time_elapsed       | 79       |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -21.9    |
|    learning_rate      | 0.0007   |
|    n_updates          | 2899     |
|    policy_loss        | 0.00793  |
|    value_loss         | 1.7e-05  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.46e+03 |
|    ep_rew_mean        | 374      |
| time/                 |          |
|    fps                | 181      |
|    iterations         | 3000     |
|    time_elapsed       | 82       |
|    total_timesteps    | 15000    |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.22e+03  |
|    ep_rew_mean        | 330       |
| time/                 |           |
|    fps                | 181       |
|    iterations         | 4200      |
|    time_elapsed       | 115       |
|    total_timesteps    | 21000     |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -1.01e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 4199      |
|    policy_loss        | -0.00253  |
|    value_loss         | 3.53e-06  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.22e+03  |
|    ep_rew_mean        | 330       |
| time/                 |           |
|    fps                | 181       |
|    iterations         | 4300      |
|    time_elapsed       | 118       |
|    total_timesteps    | 21500     |
| train/    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.22e+03  |
|    ep_rew_mean        | 330       |
| time/                 |           |
|    fps                | 183       |
|    iterations         | 5500      |
|    time_elapsed       | 149       |
|    total_timesteps    | 27500     |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -2.58e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 5499      |
|    policy_loss        | 0.00122   |
|    value_loss         | 1.87e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.57e+03 |
|    ep_rew_mean        | 343      |
| time/                 |          |
|    fps                | 183      |
|    iterations         | 5600     |
|    time_elapsed       | 152      |
|    total_timesteps    | 28000    |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.51e+03 |
|    ep_rew_mean        | 367      |
| time/                 |          |
|    fps                | 183      |
|    iterations         | 6800     |
|    time_elapsed       | 184      |
|    total_timesteps    | 34000    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0.693    |
|    learning_rate      | 0.0007   |
|    n_updates          | 6799     |
|    policy_loss        | 0.00369  |
|    value_loss         | 6.34e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.51e+03 |
|    ep_rew_mean        | 367      |
| time/                 |          |
|    fps                | 183      |
|    iterations         | 6900     |
|    time_elapsed       | 187      |
|    total_timesteps    | 34500    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.51e+03 |
|    ep_rew_mean        | 367      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 8100     |
|    time_elapsed       | 219      |
|    total_timesteps    | 40500    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -0.924   |
|    learning_rate      | 0.0007   |
|    n_updates          | 8099     |
|    policy_loss        | 0.0136   |
|    value_loss         | 4.34e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.83e+03 |
|    ep_rew_mean        | 415      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 8200     |
|    time_elapsed       | 222      |
|    total_timesteps    | 41000    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.83e+03 |
|    ep_rew_mean        | 415      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 9400     |
|    time_elapsed       | 255      |
|    total_timesteps    | 47000    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -410     |
|    learning_rate      | 0.0007   |
|    n_updates          | 9399     |
|    policy_loss        | -0.00174 |
|    value_loss         | 4.06e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.83e+03 |
|    ep_rew_mean        | 415      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 9500     |
|    time_elapsed       | 257      |
|    total_timesteps    | 47500    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.06e+03 |
|    ep_rew_mean        | 440      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 10700    |
|    time_elapsed       | 289      |
|    total_timesteps    | 53500    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -131     |
|    learning_rate      | 0.0007   |
|    n_updates          | 10699    |
|    policy_loss        | 0.000424 |
|    value_loss         | 4.16e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.06e+03 |
|    ep_rew_mean        | 440      |
| time/                 |          |
|    fps                | 184      |
|    iterations         | 10800    |
|    time_elapsed       | 292      |
|    total_timesteps    | 54000    |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 6.03e+03  |
|    ep_rew_mean        | 431       |
| time/                 |           |
|    fps                | 185       |
|    iterations         | 12000     |
|    time_elapsed       | 323       |
|    total_timesteps    | 60000     |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -8.32e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 11999     |
|    policy_loss        | 0.000748  |
|    value_loss         | 2.72e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.03e+03 |
|    ep_rew_mean        | 431      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 12100    |
|    time_elapsed       | 325      |
|    total_timesteps    | 60500    |
| train/             

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 6.01e+03  |
|    ep_rew_mean        | 428       |
| time/                 |           |
|    fps                | 186       |
|    iterations         | 13300     |
|    time_elapsed       | 356       |
|    total_timesteps    | 66500     |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -20.1     |
|    learning_rate      | 0.0007    |
|    n_updates          | 13299     |
|    policy_loss        | -0.000528 |
|    value_loss         | 2.3e-07   |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 6.01e+03 |
|    ep_rew_mean        | 428      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 13400    |
|    time_elapsed       | 358      |
|    total_timesteps    | 67000    |
| train/             

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.88e+03  |
|    ep_rew_mean        | 407       |
| time/                 |           |
|    fps                | 186       |
|    iterations         | 14600     |
|    time_elapsed       | 390       |
|    total_timesteps    | 73000     |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -1.05e+05 |
|    learning_rate      | 0.0007    |
|    n_updates          | 14599     |
|    policy_loss        | 0.00951   |
|    value_loss         | 2.24e-05  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.88e+03  |
|    ep_rew_mean        | 407       |
| time/                 |           |
|    fps                | 186       |
|    iterations         | 14700     |
|    time_elapsed       | 393       |
|    total_timesteps    | 73500     |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.87e+03 |
|    ep_rew_mean        | 400      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 15900    |
|    time_elapsed       | 428      |
|    total_timesteps    | 79500    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -98      |
|    learning_rate      | 0.0007   |
|    n_updates          | 15899    |
|    policy_loss        | 0.00525  |
|    value_loss         | 8.8e-06  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.87e+03 |
|    ep_rew_mean        | 400      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 16000    |
|    time_elapsed       | 431      |
|    total_timesteps    | 80000    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.9e+03  |
|    ep_rew_mean        | 393      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 17200    |
|    time_elapsed       | 464      |
|    total_timesteps    | 86000    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -3.31    |
|    learning_rate      | 0.0007   |
|    n_updates          | 17199    |
|    policy_loss        | -0.0231  |
|    value_loss         | 0.000119 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.75e+03 |
|    ep_rew_mean        | 385      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 17300    |
|    time_elapsed       | 467      |
|    total_timesteps    | 86500    |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.72e+03  |
|    ep_rew_mean        | 394       |
| time/                 |           |
|    fps                | 185       |
|    iterations         | 18500     |
|    time_elapsed       | 497       |
|    total_timesteps    | 92500     |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -4.09e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 18499     |
|    policy_loss        | 0.00198   |
|    value_loss         | 2.17e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.72e+03 |
|    ep_rew_mean        | 394      |
| time/                 |          |
|    fps                | 185      |
|    iterations         | 18600    |
|    time_elapsed       | 500      |
|    total_timesteps    | 93000    |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.65e+03 |
|    ep_rew_mean        | 381      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 19800    |
|    time_elapsed       | 531      |
|    total_timesteps    | 99000    |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -160     |
|    learning_rate      | 0.0007   |
|    n_updates          | 19799    |
|    policy_loss        | 0.00481  |
|    value_loss         | 7e-06    |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.65e+03 |
|    ep_rew_mean        | 381      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 19900    |
|    time_elapsed       | 533      |
|    total_timesteps    | 99500    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.6e+03  |
|    ep_rew_mean        | 377      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 21100    |
|    time_elapsed       | 564      |
|    total_timesteps    | 105500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -44.6    |
|    learning_rate      | 0.0007   |
|    n_updates          | 21099    |
|    policy_loss        | 0.00328  |
|    value_loss         | 3.33e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.6e+03  |
|    ep_rew_mean        | 377      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 21200    |
|    time_elapsed       | 567      |
|    total_timesteps    | 106000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.54e+03 |
|    ep_rew_mean        | 372      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 22400    |
|    time_elapsed       | 598      |
|    total_timesteps    | 112000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.134   |
|    learning_rate      | 0.0007   |
|    n_updates          | 22399    |
|    policy_loss        | -0.00148 |
|    value_loss         | 5.55e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.54e+03 |
|    ep_rew_mean        | 372      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 22500    |
|    time_elapsed       | 600      |
|    total_timesteps    | 112500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.6e+03  |
|    ep_rew_mean        | 384      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 23700    |
|    time_elapsed       | 631      |
|    total_timesteps    | 118500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -1.97    |
|    learning_rate      | 0.0007   |
|    n_updates          | 23699    |
|    policy_loss        | -0.00706 |
|    value_loss         | 1.62e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.6e+03  |
|    ep_rew_mean        | 384      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 23800    |
|    time_elapsed       | 634      |
|    total_timesteps    | 119000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.58e+03 |
|    ep_rew_mean        | 388      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 25000    |
|    time_elapsed       | 664      |
|    total_timesteps    | 125000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -13.9    |
|    learning_rate      | 0.0007   |
|    n_updates          | 24999    |
|    policy_loss        | -0.00181 |
|    value_loss         | 1.63e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.58e+03  |
|    ep_rew_mean        | 388       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 25100     |
|    time_elapsed       | 667       |
|    total_timesteps    | 125500    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.56e+03 |
|    ep_rew_mean        | 383      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 26300    |
|    time_elapsed       | 698      |
|    total_timesteps    | 131500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -31.9    |
|    learning_rate      | 0.0007   |
|    n_updates          | 26299    |
|    policy_loss        | -0.00256 |
|    value_loss         | 1.68e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.49e+03  |
|    ep_rew_mean        | 378       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 26400     |
|    time_elapsed       | 700       |
|    total_timesteps    | 132000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.49e+03 |
|    ep_rew_mean        | 378      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 27600    |
|    time_elapsed       | 733      |
|    total_timesteps    | 138000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0.408    |
|    learning_rate      | 0.0007   |
|    n_updates          | 27599    |
|    policy_loss        | -0.00202 |
|    value_loss         | 2.03e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.49e+03 |
|    ep_rew_mean        | 378      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 27700    |
|    time_elapsed       | 736      |
|    total_timesteps    | 138500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.51e+03  |
|    ep_rew_mean        | 374       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 28900     |
|    time_elapsed       | 767       |
|    total_timesteps    | 144500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -19.6     |
|    learning_rate      | 0.0007    |
|    n_updates          | 28899     |
|    policy_loss        | -0.000315 |
|    value_loss         | 1.09e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.51e+03 |
|    ep_rew_mean        | 374      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 29000    |
|    time_elapsed       | 769      |
|    total_timesteps    | 145000   |
| train/             

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.45e+03  |
|    ep_rew_mean        | 368       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 30200     |
|    time_elapsed       | 800       |
|    total_timesteps    | 151000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -96.1     |
|    learning_rate      | 0.0007    |
|    n_updates          | 30199     |
|    policy_loss        | -0.000729 |
|    value_loss         | 1.46e-07  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.4e+03   |
|    ep_rew_mean        | 363       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 30300     |
|    time_elapsed       | 803       |
|    total_timesteps    | 151500    |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.4e+03  |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 31500    |
|    time_elapsed       | 834      |
|    total_timesteps    | 157500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -0.877   |
|    learning_rate      | 0.0007   |
|    n_updates          | 31499    |
|    policy_loss        | 0.00257  |
|    value_loss         | 2.02e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.4e+03  |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 31600    |
|    time_elapsed       | 836      |
|    total_timesteps    | 158000   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.39e+03  |
|    ep_rew_mean        | 364       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 32900     |
|    time_elapsed       | 870       |
|    total_timesteps    | 164500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -4.07e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 32899     |
|    policy_loss        | -0.00071  |
|    value_loss         | 9.24e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.39e+03 |
|    ep_rew_mean        | 364      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 33000    |
|    time_elapsed       | 872      |
|    total_timesteps    | 165000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 362      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 34200    |
|    time_elapsed       | 905      |
|    total_timesteps    | 171000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -647     |
|    learning_rate      | 0.0007   |
|    n_updates          | 34199    |
|    policy_loss        | 0.000809 |
|    value_loss         | 1.73e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 362      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 34300    |
|    time_elapsed       | 907      |
|    total_timesteps    | 171500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 35500    |
|    time_elapsed       | 951      |
|    total_timesteps    | 177500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -8.54    |
|    learning_rate      | 0.0007   |
|    n_updates          | 35499    |
|    policy_loss        | 0.00183  |
|    value_loss         | 2.14e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 35600    |
|    time_elapsed       | 954      |
|    total_timesteps    | 178000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 359      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 36800    |
|    time_elapsed       | 985      |
|    total_timesteps    | 184000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -48.8    |
|    learning_rate      | 0.0007   |
|    n_updates          | 36799    |
|    policy_loss        | 0.00239  |
|    value_loss         | 1.23e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.33e+03  |
|    ep_rew_mean        | 359       |
| time/                 |           |
|    fps                | 186       |
|    iterations         | 36900     |
|    time_elapsed       | 988       |
|    total_timesteps    | 184500    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 356       |
| time/                 |           |
|    fps                | 186       |
|    iterations         | 38100     |
|    time_elapsed       | 1019      |
|    total_timesteps    | 190500    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0007    |
|    n_updates          | 38099     |
|    policy_loss        | -0.000322 |
|    value_loss         | 2.84e-08  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 356      |
| time/                 |          |
|    fps                | 186      |
|    iterations         | 38200    |
|    time_elapsed       | 1022     |
|    total_timesteps    | 191000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 364      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 39400    |
|    time_elapsed       | 1053     |
|    total_timesteps    | 197000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -29.2    |
|    learning_rate      | 0.0007   |
|    n_updates          | 39399    |
|    policy_loss        | -0.00973 |
|    value_loss         | 3.37e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.35e+03  |
|    ep_rew_mean        | 364       |
| time/                 |           |
|    fps                | 187       |
|    iterations         | 39500     |
|    time_elapsed       | 1055      |
|    total_timesteps    | 197500    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.38e+03 |
|    ep_rew_mean        | 369      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 40700    |
|    time_elapsed       | 1086     |
|    total_timesteps    | 203500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -399     |
|    learning_rate      | 0.0007   |
|    n_updates          | 40699    |
|    policy_loss        | -0.00117 |
|    value_loss         | 2.32e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.38e+03 |
|    ep_rew_mean        | 369      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 40800    |
|    time_elapsed       | 1088     |
|    total_timesteps    | 204000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.39e+03 |
|    ep_rew_mean        | 369      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 42000    |
|    time_elapsed       | 1119     |
|    total_timesteps    | 210000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -12.6    |
|    learning_rate      | 0.0007   |
|    n_updates          | 41999    |
|    policy_loss        | 0.00239  |
|    value_loss         | 1.28e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.39e+03 |
|    ep_rew_mean        | 370      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 42100    |
|    time_elapsed       | 1122     |
|    total_timesteps    | 210500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.37e+03 |
|    ep_rew_mean        | 365      |
| time/                 |          |
|    fps                | 187      |
|    iterations         | 43400    |
|    time_elapsed       | 1155     |
|    total_timesteps    | 217000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | 0.172    |
|    learning_rate      | 0.0007   |
|    n_updates          | 43399    |
|    policy_loss        | -0.0066  |
|    value_loss         | 1.31e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.37e+03  |
|    ep_rew_mean        | 365       |
| time/                 |           |
|    fps                | 187       |
|    iterations         | 43500     |
|    time_elapsed       | 1157      |
|    total_timesteps    | 217500    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.34e+03 |
|    ep_rew_mean        | 364      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 44700    |
|    time_elapsed       | 1188     |
|    total_timesteps    | 223500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -1.68    |
|    learning_rate      | 0.0007   |
|    n_updates          | 44699    |
|    policy_loss        | -0.00497 |
|    value_loss         | 8.52e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.33e+03  |
|    ep_rew_mean        | 364       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 44800     |
|    time_elapsed       | 1191      |
|    total_timesteps    | 224000    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.34e+03  |
|    ep_rew_mean        | 364       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 46000     |
|    time_elapsed       | 1222      |
|    total_timesteps    | 230000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -120      |
|    learning_rate      | 0.0007    |
|    n_updates          | 45999     |
|    policy_loss        | -0.000246 |
|    value_loss         | 9.47e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.34e+03 |
|    ep_rew_mean        | 364      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 46100    |
|    time_elapsed       | 1224     |
|    total_timesteps    | 230500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 47300    |
|    time_elapsed       | 1255     |
|    total_timesteps    | 236500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -139     |
|    learning_rate      | 0.0007   |
|    n_updates          | 47299    |
|    policy_loss        | 0.00712  |
|    value_loss         | 1.33e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 361      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 47400    |
|    time_elapsed       | 1258     |
|    total_timesteps    | 237000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 368      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 48600    |
|    time_elapsed       | 1288     |
|    total_timesteps    | 243000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -24.3    |
|    learning_rate      | 0.0007   |
|    n_updates          | 48599    |
|    policy_loss        | 0.00352  |
|    value_loss         | 2.9e-06  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 368      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 48700    |
|    time_elapsed       | 1291     |
|    total_timesteps    | 243500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 365      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 49900    |
|    time_elapsed       | 1322     |
|    total_timesteps    | 249500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -248     |
|    learning_rate      | 0.0007   |
|    n_updates          | 49899    |
|    policy_loss        | 0.00585  |
|    value_loss         | 8.85e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 365      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 50000    |
|    time_elapsed       | 1324     |
|    total_timesteps    | 250000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.3e+03  |
|    ep_rew_mean        | 358      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 51300    |
|    time_elapsed       | 1358     |
|    total_timesteps    | 256500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -293     |
|    learning_rate      | 0.0007   |
|    n_updates          | 51299    |
|    policy_loss        | -0.00955 |
|    value_loss         | 2.04e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.3e+03   |
|    ep_rew_mean        | 358       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 51400     |
|    time_elapsed       | 1360      |
|    total_timesteps    | 257000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 357      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 52600    |
|    time_elapsed       | 1391     |
|    total_timesteps    | 263000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -227     |
|    learning_rate      | 0.0007   |
|    n_updates          | 52599    |
|    policy_loss        | 0.00237  |
|    value_loss         | 2.55e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 357      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 52700    |
|    time_elapsed       | 1393     |
|    total_timesteps    | 263500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 358      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 54000    |
|    time_elapsed       | 1427     |
|    total_timesteps    | 270000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -28.5    |
|    learning_rate      | 0.0007   |
|    n_updates          | 53999    |
|    policy_loss        | 0.00178  |
|    value_loss         | 2.56e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 358      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 54100    |
|    time_elapsed       | 1429     |
|    total_timesteps    | 270500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 360      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 55300    |
|    time_elapsed       | 1460     |
|    total_timesteps    | 276500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -5.31    |
|    learning_rate      | 0.0007   |
|    n_updates          | 55299    |
|    policy_loss        | -0.00447 |
|    value_loss         | 5.51e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 360      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 55400    |
|    time_elapsed       | 1462     |
|    total_timesteps    | 277000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 359      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 56600    |
|    time_elapsed       | 1493     |
|    total_timesteps    | 283000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -2.58    |
|    learning_rate      | 0.0007   |
|    n_updates          | 56599    |
|    policy_loss        | -0.00597 |
|    value_loss         | 1.03e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 359      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 56700    |
|    time_elapsed       | 1496     |
|    total_timesteps    | 283500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.32e+03 |
|    ep_rew_mean        | 363      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 57900    |
|    time_elapsed       | 1527     |
|    total_timesteps    | 289500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -503     |
|    learning_rate      | 0.0007   |
|    n_updates          | 57899    |
|    policy_loss        | 0.000274 |
|    value_loss         | 4.15e-07 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.32e+03  |
|    ep_rew_mean        | 363       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 58000     |
|    time_elapsed       | 1529      |
|    total_timesteps    | 290000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.32e+03 |
|    ep_rew_mean        | 363      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 59200    |
|    time_elapsed       | 1560     |
|    total_timesteps    | 296000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -485     |
|    learning_rate      | 0.0007   |
|    n_updates          | 59199    |
|    policy_loss        | -0.003   |
|    value_loss         | 3.16e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.32e+03 |
|    ep_rew_mean        | 363      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 59300    |
|    time_elapsed       | 1562     |
|    total_timesteps    | 296500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 362      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 60500    |
|    time_elapsed       | 1594     |
|    total_timesteps    | 302500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -37.2    |
|    learning_rate      | 0.0007   |
|    n_updates          | 60499    |
|    policy_loss        | 0.00154  |
|    value_loss         | 4.79e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 362      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 60600    |
|    time_elapsed       | 1596     |
|    total_timesteps    | 303000   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.3e+03   |
|    ep_rew_mean        | 357       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 61800     |
|    time_elapsed       | 1627      |
|    total_timesteps    | 309000    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -6.4      |
|    learning_rate      | 0.0007    |
|    n_updates          | 61799     |
|    policy_loss        | -0.000238 |
|    value_loss         | 8.94e-07  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.3e+03   |
|    ep_rew_mean        | 357       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 61900     |
|    time_elapsed       | 1630      |
|    total_timesteps    | 309500    |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 355      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 63100    |
|    time_elapsed       | 1662     |
|    total_timesteps    | 315500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | 0.632    |
|    learning_rate      | 0.0007   |
|    n_updates          | 63099    |
|    policy_loss        | -0.0106  |
|    value_loss         | 2.45e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 355       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 63200     |
|    time_elapsed       | 1665      |
|    total_timesteps    | 316000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 64400    |
|    time_elapsed       | 1696     |
|    total_timesteps    | 322000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.0818  |
|    learning_rate      | 0.0007   |
|    n_updates          | 64399    |
|    policy_loss        | -0.00871 |
|    value_loss         | 1.83e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 64500    |
|    time_elapsed       | 1699     |
|    total_timesteps    | 322500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 65700    |
|    time_elapsed       | 1731     |
|    total_timesteps    | 328500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -793     |
|    learning_rate      | 0.0007   |
|    n_updates          | 65699    |
|    policy_loss        | -0.00237 |
|    value_loss         | 1.44e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.26e+03  |
|    ep_rew_mean        | 354       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 65800     |
|    time_elapsed       | 1734      |
|    total_timesteps    | 329000    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.24e+03  |
|    ep_rew_mean        | 353       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 67000     |
|    time_elapsed       | 1768      |
|    total_timesteps    | 335000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -2.78e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 66999     |
|    policy_loss        | 0.000308  |
|    value_loss         | 2.52e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 67100    |
|    time_elapsed       | 1771     |
|    total_timesteps    | 335500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 68300    |
|    time_elapsed       | 1803     |
|    total_timesteps    | 341500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -4.7     |
|    learning_rate      | 0.0007   |
|    n_updates          | 68299    |
|    policy_loss        | -0.0144  |
|    value_loss         | 5.87e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 68400    |
|    time_elapsed       | 1805     |
|    total_timesteps    | 342000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 69600    |
|    time_elapsed       | 1836     |
|    total_timesteps    | 348000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -101     |
|    learning_rate      | 0.0007   |
|    n_updates          | 69599    |
|    policy_loss        | 0.00372  |
|    value_loss         | 4.1e-06  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 69700    |
|    time_elapsed       | 1839     |
|    total_timesteps    | 348500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 355       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 70900     |
|    time_elapsed       | 1878      |
|    total_timesteps    | 354500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -1.37e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 70899     |
|    policy_loss        | 0.00119   |
|    value_loss         | 2.06e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 355      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 71000    |
|    time_elapsed       | 1880     |
|    total_timesteps    | 355000   |
| train/             

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 355       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 72200     |
|    time_elapsed       | 1911      |
|    total_timesteps    | 361000    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -57.2     |
|    learning_rate      | 0.0007    |
|    n_updates          | 72199     |
|    policy_loss        | -0.000148 |
|    value_loss         | 2.01e-06  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 355       |
| time/                 |           |
|    fps                | 188       |
|    iterations         | 72300     |
|    time_elapsed       | 1913      |
|    total_timesteps    | 361500    |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 73500    |
|    time_elapsed       | 1944     |
|    total_timesteps    | 367500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -211     |
|    learning_rate      | 0.0007   |
|    n_updates          | 73499    |
|    policy_loss        | -0.00169 |
|    value_loss         | 1.73e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 73600    |
|    time_elapsed       | 1947     |
|    total_timesteps    | 368000   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.25e+03  |
|    ep_rew_mean        | 353       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 74800     |
|    time_elapsed       | 1978      |
|    total_timesteps    | 374000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -1.24     |
|    learning_rate      | 0.0007    |
|    n_updates          | 74799     |
|    policy_loss        | -0.000733 |
|    value_loss         | 5.24e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 74900    |
|    time_elapsed       | 1980     |
|    total_timesteps    | 374500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 76100    |
|    time_elapsed       | 2011     |
|    total_timesteps    | 380500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.379   |
|    learning_rate      | 0.0007   |
|    n_updates          | 76099    |
|    policy_loss        | -0.00586 |
|    value_loss         | 7.73e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 76200    |
|    time_elapsed       | 2014     |
|    total_timesteps    | 381000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 77400    |
|    time_elapsed       | 2044     |
|    total_timesteps    | 387000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0        |
|    learning_rate      | 0.0007   |
|    n_updates          | 77399    |
|    policy_loss        | 0.000263 |
|    value_loss         | 1.69e-08 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 77500    |
|    time_elapsed       | 2047     |
|    total_timesteps    | 387500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.24e+03  |
|    ep_rew_mean        | 353       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 78700     |
|    time_elapsed       | 2078      |
|    total_timesteps    | 393500    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | 0.0544    |
|    learning_rate      | 0.0007    |
|    n_updates          | 78699     |
|    policy_loss        | -0.000296 |
|    value_loss         | 7.07e-07  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.25e+03  |
|    ep_rew_mean        | 352       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 78800     |
|    time_elapsed       | 2080      |
|    total_timesteps    | 394000    |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 80000    |
|    time_elapsed       | 2111     |
|    total_timesteps    | 400000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -71      |
|    learning_rate      | 0.0007   |
|    n_updates          | 79999    |
|    policy_loss        | -0.00641 |
|    value_loss         | 9.97e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 80100    |
|    time_elapsed       | 2114     |
|    total_timesteps    | 400500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 81300    |
|    time_elapsed       | 2144     |
|    total_timesteps    | 406500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.917   |
|    learning_rate      | 0.0007   |
|    n_updates          | 81299    |
|    policy_loss        | -0.0118  |
|    value_loss         | 3.71e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.22e+03  |
|    ep_rew_mean        | 350       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 81400     |
|    time_elapsed       | 2147      |
|    total_timesteps    | 407000    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.24e+03  |
|    ep_rew_mean        | 353       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 82600     |
|    time_elapsed       | 2178      |
|    total_timesteps    | 413000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -1.86e+05 |
|    learning_rate      | 0.0007    |
|    n_updates          | 82599     |
|    policy_loss        | -0.00282  |
|    value_loss         | 2.39e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 82700    |
|    time_elapsed       | 2180     |
|    total_timesteps    | 413500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 83900    |
|    time_elapsed       | 2211     |
|    total_timesteps    | 419500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -178     |
|    learning_rate      | 0.0007   |
|    n_updates          | 83899    |
|    policy_loss        | 0.00249  |
|    value_loss         | 2.65e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.26e+03  |
|    ep_rew_mean        | 351       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 84000     |
|    time_elapsed       | 2213      |
|    total_timesteps    | 420000    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.27e+03  |
|    ep_rew_mean        | 350       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 85200     |
|    time_elapsed       | 2244      |
|    total_timesteps    | 426000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -9.63e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 85199     |
|    policy_loss        | 0.000268  |
|    value_loss         | 4.41e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 85300    |
|    time_elapsed       | 2247     |
|    total_timesteps    | 426500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 86500    |
|    time_elapsed       | 2277     |
|    total_timesteps    | 432500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -0.235   |
|    learning_rate      | 0.0007   |
|    n_updates          | 86499    |
|    policy_loss        | -0.00837 |
|    value_loss         | 1.59e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 86600    |
|    time_elapsed       | 2280     |
|    total_timesteps    | 433000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.28e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 87800    |
|    time_elapsed       | 2311     |
|    total_timesteps    | 439000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0.728    |
|    learning_rate      | 0.0007   |
|    n_updates          | 87799    |
|    policy_loss        | -0.00897 |
|    value_loss         | 1.66e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.28e+03  |
|    ep_rew_mean        | 349       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 87900     |
|    time_elapsed       | 2313      |
|    total_timesteps    | 439500    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 89100    |
|    time_elapsed       | 2344     |
|    total_timesteps    | 445500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | 0.287    |
|    learning_rate      | 0.0007   |
|    n_updates          | 89099    |
|    policy_loss        | -0.00517 |
|    value_loss         | 6.71e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 89200    |
|    time_elapsed       | 2347     |
|    total_timesteps    | 446000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 90400    |
|    time_elapsed       | 2378     |
|    total_timesteps    | 452000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -50.7    |
|    learning_rate      | 0.0007   |
|    n_updates          | 90399    |
|    policy_loss        | 0.00181  |
|    value_loss         | 8.89e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 90500    |
|    time_elapsed       | 2381     |
|    total_timesteps    | 452500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 344      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 91700    |
|    time_elapsed       | 2411     |
|    total_timesteps    | 458500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -2.41    |
|    learning_rate      | 0.0007   |
|    n_updates          | 91699    |
|    policy_loss        | 0.001    |
|    value_loss         | 4.63e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 344      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 91800    |
|    time_elapsed       | 2414     |
|    total_timesteps    | 459000   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.28e+03  |
|    ep_rew_mean        | 347       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 93000     |
|    time_elapsed       | 2445      |
|    total_timesteps    | 465000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -2.16e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 92999     |
|    policy_loss        | -0.0058   |
|    value_loss         | 8.49e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.28e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 93100    |
|    time_elapsed       | 2447     |
|    total_timesteps    | 465500   |
| train/             

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.28e+03  |
|    ep_rew_mean        | 348       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 94300     |
|    time_elapsed       | 2478      |
|    total_timesteps    | 471500    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -0.627    |
|    learning_rate      | 0.0007    |
|    n_updates          | 94299     |
|    policy_loss        | -0.000148 |
|    value_loss         | 6.49e-09  |
-------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.28e+03  |
|    ep_rew_mean        | 348       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 94400     |
|    time_elapsed       | 2481      |
|    total_timesteps    | 472000    |
| train/    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 95600    |
|    time_elapsed       | 2512     |
|    total_timesteps    | 478000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -48.8    |
|    learning_rate      | 0.0007   |
|    n_updates          | 95599    |
|    policy_loss        | -0.0107  |
|    value_loss         | 3.14e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.27e+03  |
|    ep_rew_mean        | 349       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 95700     |
|    time_elapsed       | 2514      |
|    total_timesteps    | 478500    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.27e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 96900    |
|    time_elapsed       | 2545     |
|    total_timesteps    | 484500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -6.96    |
|    learning_rate      | 0.0007   |
|    n_updates          | 96899    |
|    policy_loss        | 0.00322  |
|    value_loss         | 3.62e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.27e+03  |
|    ep_rew_mean        | 349       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 97000     |
|    time_elapsed       | 2548      |
|    total_timesteps    | 485000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 98200    |
|    time_elapsed       | 2579     |
|    total_timesteps    | 491000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -37.4    |
|    learning_rate      | 0.0007   |
|    n_updates          | 98199    |
|    policy_loss        | 0.00261  |
|    value_loss         | 1.72e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 350      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 98300    |
|    time_elapsed       | 2581     |
|    total_timesteps    | 491500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.3e+03   |
|    ep_rew_mean        | 349       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 99500     |
|    time_elapsed       | 2612      |
|    total_timesteps    | 497500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -63.8     |
|    learning_rate      | 0.0007    |
|    n_updates          | 99499     |
|    policy_loss        | -0.000258 |
|    value_loss         | 3.95e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.3e+03  |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 99600    |
|    time_elapsed       | 2615     |
|    total_timesteps    | 498000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 352      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 100800   |
|    time_elapsed       | 2645     |
|    total_timesteps    | 504000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -214     |
|    learning_rate      | 0.0007   |
|    n_updates          | 100799   |
|    policy_loss        | -0.00143 |
|    value_loss         | 1.22e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 100900   |
|    time_elapsed       | 2648     |
|    total_timesteps    | 504500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.3e+03  |
|    ep_rew_mean        | 352      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 102100   |
|    time_elapsed       | 2679     |
|    total_timesteps    | 510500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -6.86    |
|    learning_rate      | 0.0007   |
|    n_updates          | 102099   |
|    policy_loss        | 0.00103  |
|    value_loss         | 3.64e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.3e+03  |
|    ep_rew_mean        | 352      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 102200   |
|    time_elapsed       | 2682     |
|    total_timesteps    | 511000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 103400   |
|    time_elapsed       | 2712     |
|    total_timesteps    | 517000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0.194    |
|    learning_rate      | 0.0007   |
|    n_updates          | 103399   |
|    policy_loss        | -0.00769 |
|    value_loss         | 1.56e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 103500   |
|    time_elapsed       | 2715     |
|    total_timesteps    | 517500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 104700   |
|    time_elapsed       | 2746     |
|    total_timesteps    | 523500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0        |
|    learning_rate      | 0.0007   |
|    n_updates          | 104699   |
|    policy_loss        | -0.00301 |
|    value_loss         | 2.3e-06  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 104800   |
|    time_elapsed       | 2748     |
|    total_timesteps    | 524000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 106000   |
|    time_elapsed       | 2791     |
|    total_timesteps    | 530000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -3.43    |
|    learning_rate      | 0.0007   |
|    n_updates          | 105999   |
|    policy_loss        | -0.013   |
|    value_loss         | 3.67e-05 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.31e+03  |
|    ep_rew_mean        | 353       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 106100    |
|    time_elapsed       | 2794      |
|    total_timesteps    | 530500    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.31e+03  |
|    ep_rew_mean        | 355       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 107300    |
|    time_elapsed       | 2825      |
|    total_timesteps    | 536500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -85.1     |
|    learning_rate      | 0.0007    |
|    n_updates          | 107299    |
|    policy_loss        | -0.000979 |
|    value_loss         | 2.26e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.31e+03 |
|    ep_rew_mean        | 355      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 107400   |
|    time_elapsed       | 2827     |
|    total_timesteps    | 537000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.32e+03 |
|    ep_rew_mean        | 356      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 108600   |
|    time_elapsed       | 2858     |
|    total_timesteps    | 543000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -116     |
|    learning_rate      | 0.0007   |
|    n_updates          | 108599   |
|    policy_loss        | 0.00166  |
|    value_loss         | 7.61e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.32e+03 |
|    ep_rew_mean        | 356      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 108700   |
|    time_elapsed       | 2861     |
|    total_timesteps    | 543500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.35e+03 |
|    ep_rew_mean        | 360      |
| time/                 |          |
|    fps                | 189      |
|    iterations         | 109900   |
|    time_elapsed       | 2892     |
|    total_timesteps    | 549500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -232     |
|    learning_rate      | 0.0007   |
|    n_updates          | 109899   |
|    policy_loss        | -0.00133 |
|    value_loss         | 9.89e-07 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.34e+03  |
|    ep_rew_mean        | 359       |
| time/                 |           |
|    fps                | 189       |
|    iterations         | 110000    |
|    time_elapsed       | 2894      |
|    total_timesteps    | 550000    |
| train/                |    

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.33e+03  |
|    ep_rew_mean        | 357       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 111200    |
|    time_elapsed       | 2925      |
|    total_timesteps    | 556000    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -4.62e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 111199    |
|    policy_loss        | -0.00186  |
|    value_loss         | 9.37e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.33e+03 |
|    ep_rew_mean        | 357      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 111300   |
|    time_elapsed       | 2928     |
|    total_timesteps    | 556500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 112500   |
|    time_elapsed       | 2959     |
|    total_timesteps    | 562500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -138     |
|    learning_rate      | 0.0007   |
|    n_updates          | 112499   |
|    policy_loss        | -0.00218 |
|    value_loss         | 1.85e-06 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.29e+03  |
|    ep_rew_mean        | 354       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 112600    |
|    time_elapsed       | 2962      |
|    total_timesteps    | 563000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.29e+03 |
|    ep_rew_mean        | 354      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 113800   |
|    time_elapsed       | 2992     |
|    total_timesteps    | 569000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -186     |
|    learning_rate      | 0.0007   |
|    n_updates          | 113799   |
|    policy_loss        | 0.00186  |
|    value_loss         | 1.65e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.28e+03 |
|    ep_rew_mean        | 351      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 113900   |
|    time_elapsed       | 2995     |
|    total_timesteps    | 569500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 115100   |
|    time_elapsed       | 3026     |
|    total_timesteps    | 575500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -84.6    |
|    learning_rate      | 0.0007   |
|    n_updates          | 115099   |
|    policy_loss        | 0.00187  |
|    value_loss         | 1.11e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.25e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 115200   |
|    time_elapsed       | 3029     |
|    total_timesteps    | 576000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 116400   |
|    time_elapsed       | 3060     |
|    total_timesteps    | 582000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -0.879   |
|    learning_rate      | 0.0007   |
|    n_updates          | 116399   |
|    policy_loss        | -0.00114 |
|    value_loss         | 4.96e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 116500   |
|    time_elapsed       | 3062     |
|    total_timesteps    | 582500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.21e+03  |
|    ep_rew_mean        | 347       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 117800    |
|    time_elapsed       | 3096      |
|    total_timesteps    | 589000    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -5.81     |
|    learning_rate      | 0.0007    |
|    n_updates          | 117799    |
|    policy_loss        | -0.000879 |
|    value_loss         | 9.09e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 117900   |
|    time_elapsed       | 3098     |
|    total_timesteps    | 589500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 119100   |
|    time_elapsed       | 3129     |
|    total_timesteps    | 595500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | 0.377    |
|    learning_rate      | 0.0007   |
|    n_updates          | 119099   |
|    policy_loss        | -0.0126  |
|    value_loss         | 4.04e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 119200   |
|    time_elapsed       | 3132     |
|    total_timesteps    | 596000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.18e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 120500   |
|    time_elapsed       | 3165     |
|    total_timesteps    | 602500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -10.6    |
|    learning_rate      | 0.0007   |
|    n_updates          | 120499   |
|    policy_loss        | 0.0044   |
|    value_loss         | 6.18e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.18e+03 |
|    ep_rew_mean        | 347      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 120600   |
|    time_elapsed       | 3168     |
|    total_timesteps    | 603000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.2e+03  |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 121800   |
|    time_elapsed       | 3198     |
|    total_timesteps    | 609000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -80      |
|    learning_rate      | 0.0007   |
|    n_updates          | 121799   |
|    policy_loss        | 0.000118 |
|    value_loss         | 8.94e-08 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.2e+03  |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 121900   |
|    time_elapsed       | 3201     |
|    total_timesteps    | 609500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.19e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 123100   |
|    time_elapsed       | 3232     |
|    total_timesteps    | 615500   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.671   |
|    learning_rate      | 0.0007   |
|    n_updates          | 123099   |
|    policy_loss        | -0.00885 |
|    value_loss         | 1.84e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.19e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 123200   |
|    time_elapsed       | 3234     |
|    total_timesteps    | 616000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 352      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 124400   |
|    time_elapsed       | 3265     |
|    total_timesteps    | 622000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -4.18    |
|    learning_rate      | 0.0007   |
|    n_updates          | 124399   |
|    policy_loss        | -0.0061  |
|    value_loss         | 8.86e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 352      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 124500   |
|    time_elapsed       | 3268     |
|    total_timesteps    | 622500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 125800   |
|    time_elapsed       | 3302     |
|    total_timesteps    | 629000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -4.72    |
|    learning_rate      | 0.0007   |
|    n_updates          | 125799   |
|    policy_loss        | 0.00313  |
|    value_loss         | 2.13e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 125900   |
|    time_elapsed       | 3304     |
|    total_timesteps    | 629500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 353      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 127100   |
|    time_elapsed       | 3335     |
|    total_timesteps    | 635500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0        |
|    learning_rate      | 0.0007   |
|    n_updates          | 127099   |
|    policy_loss        | -0.00193 |
|    value_loss         | 7.71e-07 |
------------------------------------
-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.25e+03  |
|    ep_rew_mean        | 352       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 127200    |
|    time_elapsed       | 3338      |
|    total_timesteps    | 636000    |
| train/                |    

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 128400   |
|    time_elapsed       | 3368     |
|    total_timesteps    | 642000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | -0.0992  |
|    learning_rate      | 0.0007   |
|    n_updates          | 128399   |
|    policy_loss        | 0.00164  |
|    value_loss         | 1.45e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.22e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 128500   |
|    time_elapsed       | 3371     |
|    total_timesteps    | 642500   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 129700   |
|    time_elapsed       | 3402     |
|    total_timesteps    | 648500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -0.0118  |
|    learning_rate      | 0.0007   |
|    n_updates          | 129699   |
|    policy_loss        | 0.00158  |
|    value_loss         | 1.38e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 129800   |
|    time_elapsed       | 3404     |
|    total_timesteps    | 649000   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.24e+03  |
|    ep_rew_mean        | 349       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 131000    |
|    time_elapsed       | 3435      |
|    total_timesteps    | 655000    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -1.36e+03 |
|    learning_rate      | 0.0007    |
|    n_updates          | 130999    |
|    policy_loss        | -0.00385  |
|    value_loss         | 6.17e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 131100   |
|    time_elapsed       | 3438     |
|    total_timesteps    | 655500   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 345      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 132300   |
|    time_elapsed       | 3469     |
|    total_timesteps    | 661500   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -57.2    |
|    learning_rate      | 0.0007   |
|    n_updates          | 132299   |
|    policy_loss        | -0.00795 |
|    value_loss         | 1.62e-05 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 345      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 132400   |
|    time_elapsed       | 3471     |
|    total_timesteps    | 662000   |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 133600   |
|    time_elapsed       | 3502     |
|    total_timesteps    | 668000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | 0.206    |
|    learning_rate      | 0.0007   |
|    n_updates          | 133599   |
|    policy_loss        | -0.0114  |
|    value_loss         | 3.3e-05  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.21e+03 |
|    ep_rew_mean        | 346      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 133700   |
|    time_elapsed       | 3505     |
|    total_timesteps    | 668500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.23e+03  |
|    ep_rew_mean        | 348       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 134900    |
|    time_elapsed       | 3535      |
|    total_timesteps    | 674500    |
| train/                |           |
|    entropy_loss       | -2.2      |
|    explained_variance | -5.23e+04 |
|    learning_rate      | 0.0007    |
|    n_updates          | 134899    |
|    policy_loss        | -0.00135  |
|    value_loss         | 2.18e-06  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.23e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 135000   |
|    time_elapsed       | 3538     |
|    total_timesteps    | 675000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.23e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 136200   |
|    time_elapsed       | 3569     |
|    total_timesteps    | 681000   |
| train/                |          |
|    entropy_loss       | -2.2     |
|    explained_variance | -196     |
|    learning_rate      | 0.0007   |
|    n_updates          | 136199   |
|    policy_loss        | -0.00062 |
|    value_loss         | 3.05e-07 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.23e+03 |
|    ep_rew_mean        | 349      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 136300   |
|    time_elapsed       | 3571     |
|    total_timesteps    | 681500   |
| train/                |          |
|

-------------------------------------
| rollout/              |           |
|    ep_len_mean        | 5.24e+03  |
|    ep_rew_mean        | 348       |
| time/                 |           |
|    fps                | 190       |
|    iterations         | 137500    |
|    time_elapsed       | 3602      |
|    total_timesteps    | 687500    |
| train/                |           |
|    entropy_loss       | -2.19     |
|    explained_variance | -5.9e+03  |
|    learning_rate      | 0.0007    |
|    n_updates          | 137499    |
|    policy_loss        | -0.000444 |
|    value_loss         | 9.11e-07  |
-------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.24e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 137600   |
|    time_elapsed       | 3605     |
|    total_timesteps    | 688000   |
| train/             

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 138800   |
|    time_elapsed       | 3636     |
|    total_timesteps    | 694000   |
| train/                |          |
|    entropy_loss       | -2.19    |
|    explained_variance | 0.359    |
|    learning_rate      | 0.0007   |
|    n_updates          | 138799   |
|    policy_loss        | -0.00422 |
|    value_loss         | 5.47e-06 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 5.26e+03 |
|    ep_rew_mean        | 348      |
| time/                 |          |
|    fps                | 190      |
|    iterations         | 138900   |
|    time_elapsed       | 3638     |
|    total_timesteps    | 694500   |
| train/                |          |
|

New best mean reward!
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 5.26e+03 |
|    ep_rew_mean     | 348      |
| time/              |          |
|    fps             | 189      |
|    iterations      | 140000   |
|    time_elapsed    | 3698     |
|    total_timesteps | 700000   |
---------------------------------


TypeError: __init__() got an unexpected keyword argument 'cliprange'

In [7]:

env.reset()
eval_env.reset()

#instancia o algoritmo de aprendizagem
model = PPO('MlpPolicy', env, verbose=1, vf_coef= 0.5, ent_coef=0.01, create_eval_env=True, batch_size=256,clip_range=0.1,tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_ppo")

#treina o algoritmo
model.learn(total_timesteps=700000,eval_env=eval_env,eval_freq=175000, n_eval_episodes=1)

model.save("modelos/ppo/recigio_beamrider_700k")
del model 

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to /tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_ppo\PPO_1
-----------------------------
| time/              |      |
|    fps             | 403  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 219         |
|    iterations           | 2           |
|    time_elapsed         | 18          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.002402385 |
|    clip_fraction        | 0.116       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.2        |

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 7.03e+03    |
|    ep_rew_mean          | 411         |
| time/                   |             |
|    fps                  | 165         |
|    iterations           | 11          |
|    time_elapsed         | 135         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.002131322 |
|    clip_fraction        | 0.121       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.18       |
|    explained_variance   | -0.0422     |
|    learning_rate        | 0.0003      |
|    loss                 | 11.4        |
|    n_updates            | 100         |
|    policy_gradient_loss | -0.00927    |
|    value_loss           | 26.8        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 7.03

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.42e+03     |
|    ep_rew_mean          | 387          |
| time/                   |              |
|    fps                  | 161          |
|    iterations           | 21           |
|    time_elapsed         | 266          |
|    total_timesteps      | 43008        |
| train/                  |              |
|    approx_kl            | 0.0061480897 |
|    clip_fraction        | 0.227        |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.16        |
|    explained_variance   | -5.37        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.0396       |
|    n_updates            | 200          |
|    policy_gradient_loss | -0.0114      |
|    value_loss           | 0.355        |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.85e+03     |
|    ep_rew_mean          | 435          |
| time/                   |              |
|    fps                  | 160          |
|    iterations           | 31           |
|    time_elapsed         | 396          |
|    total_timesteps      | 63488        |
| train/                  |              |
|    approx_kl            | 0.0025642086 |
|    clip_fraction        | 0.075        |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.15        |
|    explained_variance   | -0.111       |
|    learning_rate        | 0.0003       |
|    loss                 | 20.7         |
|    n_updates            | 300          |
|    policy_gradient_loss | -0.0107      |
|    value_loss           | 35           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 6.34e+03   |
|    ep_rew_mean          | 423        |
| time/                   |            |
|    fps                  | 159        |
|    iterations           | 41         |
|    time_elapsed         | 526        |
|    total_timesteps      | 83968      |
| train/                  |            |
|    approx_kl            | 0.00283902 |
|    clip_fraction        | 0.122      |
|    clip_range           | 0.1        |
|    entropy_loss         | -2.13      |
|    explained_variance   | 0.024      |
|    learning_rate        | 0.0003     |
|    loss                 | 13.4       |
|    n_updates            | 400        |
|    policy_gradient_loss | -0.0118    |
|    value_loss           | 31.4       |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.34e+03    |
|    ep_rew_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.34e+03     |
|    ep_rew_mean          | 407          |
| time/                   |              |
|    fps                  | 158          |
|    iterations           | 51           |
|    time_elapsed         | 657          |
|    total_timesteps      | 104448       |
| train/                  |              |
|    approx_kl            | 0.0041959477 |
|    clip_fraction        | 0.221        |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.08        |
|    explained_variance   | 0.00651      |
|    learning_rate        | 0.0003       |
|    loss                 | 19.1         |
|    n_updates            | 500          |
|    policy_gradient_loss | -0.0113      |
|    value_loss           | 29.2         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.36e+03     |
|    ep_rew_mean          | 403          |
| time/                   |              |
|    fps                  | 158          |
|    iterations           | 61           |
|    time_elapsed         | 788          |
|    total_timesteps      | 124928       |
| train/                  |              |
|    approx_kl            | 0.0042413166 |
|    clip_fraction        | 0.193        |
|    clip_range           | 0.1          |
|    entropy_loss         | -2.05        |
|    explained_variance   | -0.00532     |
|    learning_rate        | 0.0003       |
|    loss                 | 30.4         |
|    n_updates            | 600          |
|    policy_gradient_loss | -0.0135      |
|    value_loss           | 37.3         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.3e+03     |
|    ep_rew_mean          | 407         |
| time/                   |             |
|    fps                  | 158         |
|    iterations           | 71          |
|    time_elapsed         | 919         |
|    total_timesteps      | 145408      |
| train/                  |             |
|    approx_kl            | 0.003932124 |
|    clip_fraction        | 0.122       |
|    clip_range           | 0.1         |
|    entropy_loss         | -2.05       |
|    explained_variance   | -0.0292     |
|    learning_rate        | 0.0003      |
|    loss                 | 13.2        |
|    n_updates            | 700         |
|    policy_gradient_loss | -0.0109     |
|    value_loss           | 30.2        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.3e

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.3e+03      |
|    ep_rew_mean          | 420          |
| time/                   |              |
|    fps                  | 157          |
|    iterations           | 81           |
|    time_elapsed         | 1050         |
|    total_timesteps      | 165888       |
| train/                  |              |
|    approx_kl            | 0.0070505133 |
|    clip_fraction        | 0.235        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.99        |
|    explained_variance   | -0.0206      |
|    learning_rate        | 0.0003       |
|    loss                 | 0.431        |
|    n_updates            | 800          |
|    policy_gradient_loss | -0.00644     |
|    value_loss           | 8.32         |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mea

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.14e+03    |
|    ep_rew_mean          | 420         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 90          |
|    time_elapsed         | 1182        |
|    total_timesteps      | 184320      |
| train/                  |             |
|    approx_kl            | 0.004627493 |
|    clip_fraction        | 0.207       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.94       |
|    explained_variance   | 0.00594     |
|    learning_rate        | 0.0003      |
|    loss                 | 10.8        |
|    n_updates            | 890         |
|    policy_gradient_loss | -0.0101     |
|    value_loss           | 27.4        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.14e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.19e+03     |
|    ep_rew_mean          | 428          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 100          |
|    time_elapsed         | 1313         |
|    total_timesteps      | 204800       |
| train/                  |              |
|    approx_kl            | 0.0061687436 |
|    clip_fraction        | 0.214        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.93        |
|    explained_variance   | -0.00206     |
|    learning_rate        | 0.0003       |
|    loss                 | 19.8         |
|    n_updates            | 990          |
|    policy_gradient_loss | -0.0109      |
|    value_loss           | 30.4         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.2e+03     |
|    ep_rew_mean          | 435         |
| time/                   |             |
|    fps                  | 156         |
|    iterations           | 110         |
|    time_elapsed         | 1443        |
|    total_timesteps      | 225280      |
| train/                  |             |
|    approx_kl            | 0.005265076 |
|    clip_fraction        | 0.194       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.94       |
|    explained_variance   | -0.0175     |
|    learning_rate        | 0.0003      |
|    loss                 | 5.26        |
|    n_updates            | 1090        |
|    policy_gradient_loss | -0.0077     |
|    value_loss           | 8.3         |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.2e+0

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.28e+03     |
|    ep_rew_mean          | 437          |
| time/                   |              |
|    fps                  | 156          |
|    iterations           | 120          |
|    time_elapsed         | 1573         |
|    total_timesteps      | 245760       |
| train/                  |              |
|    approx_kl            | 0.0060389987 |
|    clip_fraction        | 0.253        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.94        |
|    explained_variance   | -0.00123     |
|    learning_rate        | 0.0003       |
|    loss                 | 8.43         |
|    n_updates            | 1190         |
|    policy_gradient_loss | -0.00723     |
|    value_loss           | 25.6         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.26e+03    |
|    ep_rew_mean          | 445         |
| time/                   |             |
|    fps                  | 156         |
|    iterations           | 130         |
|    time_elapsed         | 1703        |
|    total_timesteps      | 266240      |
| train/                  |             |
|    approx_kl            | 0.005496812 |
|    clip_fraction        | 0.164       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.93       |
|    explained_variance   | 0.00377     |
|    learning_rate        | 0.0003      |
|    loss                 | 35.4        |
|    n_updates            | 1290        |
|    policy_gradient_loss | -0.00611    |
|    value_loss           | 53.6        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 6.2e+03 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.21e+03    |
|    ep_rew_mean          | 438         |
| time/                   |             |
|    fps                  | 156         |
|    iterations           | 140         |
|    time_elapsed         | 1833        |
|    total_timesteps      | 286720      |
| train/                  |             |
|    approx_kl            | 0.005093068 |
|    clip_fraction        | 0.224       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.00378     |
|    learning_rate        | 0.0003      |
|    loss                 | 4           |
|    n_updates            | 1390        |
|    policy_gradient_loss | -0.00797    |
|    value_loss           | 15.5        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.21e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.28e+03    |
|    ep_rew_mean          | 444         |
| time/                   |             |
|    fps                  | 156         |
|    iterations           | 150         |
|    time_elapsed         | 1964        |
|    total_timesteps      | 307200      |
| train/                  |             |
|    approx_kl            | 0.004257547 |
|    clip_fraction        | 0.117       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.86       |
|    explained_variance   | -0.0794     |
|    learning_rate        | 0.0003      |
|    loss                 | 13.4        |
|    n_updates            | 1490        |
|    policy_gradient_loss | -0.00624    |
|    value_loss           | 22.9        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.27e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.17e+03    |
|    ep_rew_mean          | 440         |
| time/                   |             |
|    fps                  | 156         |
|    iterations           | 160         |
|    time_elapsed         | 2094        |
|    total_timesteps      | 327680      |
| train/                  |             |
|    approx_kl            | 0.005738699 |
|    clip_fraction        | 0.2         |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.79       |
|    explained_variance   | 0.00358     |
|    learning_rate        | 0.0003      |
|    loss                 | 23.6        |
|    n_updates            | 1590        |
|    policy_gradient_loss | -0.0023     |
|    value_loss           | 37.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.17e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.37e+03     |
|    ep_rew_mean          | 444          |
| time/                   |              |
|    fps                  | 156          |
|    iterations           | 170          |
|    time_elapsed         | 2225         |
|    total_timesteps      | 348160       |
| train/                  |              |
|    approx_kl            | 0.0051222215 |
|    clip_fraction        | 0.165        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.76        |
|    explained_variance   | 0.000783     |
|    learning_rate        | 0.0003       |
|    loss                 | 21.2         |
|    n_updates            | 1690         |
|    policy_gradient_loss | -0.00554     |
|    value_loss           | 30.6         |
------------------------------------------
Eval num_timesteps=350000, episode_reward=616.00 +/- 0.00
Episode length: 10413.00 +/- 0.00
-----

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.29e+03    |
|    ep_rew_mean          | 442         |
| time/                   |             |
|    fps                  | 154         |
|    iterations           | 179         |
|    time_elapsed         | 2365        |
|    total_timesteps      | 366592      |
| train/                  |             |
|    approx_kl            | 0.003102325 |
|    clip_fraction        | 0.117       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.71       |
|    explained_variance   | 0.00564     |
|    learning_rate        | 0.0003      |
|    loss                 | 22.1        |
|    n_updates            | 1780        |
|    policy_gradient_loss | -0.00709    |
|    value_loss           | 36.7        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.25

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.21e+03    |
|    ep_rew_mean          | 442         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 189         |
|    time_elapsed         | 2495        |
|    total_timesteps      | 387072      |
| train/                  |             |
|    approx_kl            | 0.004653491 |
|    clip_fraction        | 0.179       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.71       |
|    explained_variance   | 0.00522     |
|    learning_rate        | 0.0003      |
|    loss                 | 3.15        |
|    n_updates            | 1880        |
|    policy_gradient_loss | -0.00654    |
|    value_loss           | 15.4        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.21

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.22e+03    |
|    ep_rew_mean          | 439         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 199         |
|    time_elapsed         | 2625        |
|    total_timesteps      | 407552      |
| train/                  |             |
|    approx_kl            | 0.004633789 |
|    clip_fraction        | 0.139       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.73       |
|    explained_variance   | 0.0015      |
|    learning_rate        | 0.0003      |
|    loss                 | 33.9        |
|    n_updates            | 1980        |
|    policy_gradient_loss | -0.00657    |
|    value_loss           | 46.5        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.22e+

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.29e+03     |
|    ep_rew_mean          | 447          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 209          |
|    time_elapsed         | 2756         |
|    total_timesteps      | 428032       |
| train/                  |              |
|    approx_kl            | 0.0031812196 |
|    clip_fraction        | 0.0921       |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.64        |
|    explained_variance   | 0.00307      |
|    learning_rate        | 0.0003       |
|    loss                 | 27.8         |
|    n_updates            | 2080         |
|    policy_gradient_loss | -0.00531     |
|    value_loss           | 64           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.27e+03     |
|    ep_rew_mean          | 446          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 219          |
|    time_elapsed         | 2887         |
|    total_timesteps      | 448512       |
| train/                  |              |
|    approx_kl            | 0.0062294807 |
|    clip_fraction        | 0.167        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.67        |
|    explained_variance   | 0.00497      |
|    learning_rate        | 0.0003       |
|    loss                 | 18.3         |
|    n_updates            | 2180         |
|    policy_gradient_loss | -0.00581     |
|    value_loss           | 30.2         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.27e+03     |
|    ep_rew_mean          | 450          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 229          |
|    time_elapsed         | 3017         |
|    total_timesteps      | 468992       |
| train/                  |              |
|    approx_kl            | 0.0046095415 |
|    clip_fraction        | 0.168        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.63        |
|    explained_variance   | -0.00213     |
|    learning_rate        | 0.0003       |
|    loss                 | 7.95         |
|    n_updates            | 2280         |
|    policy_gradient_loss | -0.0043      |
|    value_loss           | 15.4         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.19e+03     |
|    ep_rew_mean          | 445          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 239          |
|    time_elapsed         | 3148         |
|    total_timesteps      | 489472       |
| train/                  |              |
|    approx_kl            | 0.0058690878 |
|    clip_fraction        | 0.163        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.69        |
|    explained_variance   | 0.00305      |
|    learning_rate        | 0.0003       |
|    loss                 | 16.6         |
|    n_updates            | 2380         |
|    policy_gradient_loss | -0.00696     |
|    value_loss           | 37.3         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.21e+03    |
|    ep_rew_mean          | 442         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 249         |
|    time_elapsed         | 3279        |
|    total_timesteps      | 509952      |
| train/                  |             |
|    approx_kl            | 0.007153285 |
|    clip_fraction        | 0.173       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.69       |
|    explained_variance   | -0.147      |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0205     |
|    n_updates            | 2480        |
|    policy_gradient_loss | -0.00298    |
|    value_loss           | 0.148       |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.21

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.22e+03     |
|    ep_rew_mean          | 443          |
| time/                   |              |
|    fps                  | 154          |
|    iterations           | 258          |
|    time_elapsed         | 3409         |
|    total_timesteps      | 528384       |
| train/                  |              |
|    approx_kl            | 0.0035657876 |
|    clip_fraction        | 0.192        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.68        |
|    explained_variance   | 0.00295      |
|    learning_rate        | 0.0003       |
|    loss                 | 14.7         |
|    n_updates            | 2570         |
|    policy_gradient_loss | -0.00421     |
|    value_loss           | 37.2         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.2e+03      |
|    ep_rew_mean          | 444          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 268          |
|    time_elapsed         | 3539         |
|    total_timesteps      | 548864       |
| train/                  |              |
|    approx_kl            | 0.0037116464 |
|    clip_fraction        | 0.109        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.58        |
|    explained_variance   | 0.0041       |
|    learning_rate        | 0.0003       |
|    loss                 | 19.4         |
|    n_updates            | 2670         |
|    policy_gradient_loss | -0.00519     |
|    value_loss           | 44.3         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.21e+03    |
|    ep_rew_mean          | 451         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 278         |
|    time_elapsed         | 3670        |
|    total_timesteps      | 569344      |
| train/                  |             |
|    approx_kl            | 0.004971208 |
|    clip_fraction        | 0.121       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.57       |
|    explained_variance   | 0.00304     |
|    learning_rate        | 0.0003      |
|    loss                 | 20.9        |
|    n_updates            | 2770        |
|    policy_gradient_loss | -0.00632    |
|    value_loss           | 58.6        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.21

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.13e+03    |
|    ep_rew_mean          | 451         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 288         |
|    time_elapsed         | 3800        |
|    total_timesteps      | 589824      |
| train/                  |             |
|    approx_kl            | 0.005768998 |
|    clip_fraction        | 0.216       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.53       |
|    explained_variance   | 0.00689     |
|    learning_rate        | 0.0003      |
|    loss                 | 16.2        |
|    n_updates            | 2870        |
|    policy_gradient_loss | -0.00781    |
|    value_loss           | 29.9        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.13

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.06e+03     |
|    ep_rew_mean          | 449          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 298          |
|    time_elapsed         | 3931         |
|    total_timesteps      | 610304       |
| train/                  |              |
|    approx_kl            | 0.0060197874 |
|    clip_fraction        | 0.145        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.53        |
|    explained_variance   | -0.0168      |
|    learning_rate        | 0.0003       |
|    loss                 | 8.6          |
|    n_updates            | 2970         |
|    policy_gradient_loss | -0.00557     |
|    value_loss           | 24.2         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.04e+03    |
|    ep_rew_mean          | 453         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 308         |
|    time_elapsed         | 4063        |
|    total_timesteps      | 630784      |
| train/                  |             |
|    approx_kl            | 0.005177203 |
|    clip_fraction        | 0.0929      |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.48       |
|    explained_variance   | -5.53e-05   |
|    learning_rate        | 0.0003      |
|    loss                 | 20.1        |
|    n_updates            | 3070        |
|    policy_gradient_loss | -0.00318    |
|    value_loss           | 59.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.04e+

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6e+03       |
|    ep_rew_mean          | 456         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 318         |
|    time_elapsed         | 4193        |
|    total_timesteps      | 651264      |
| train/                  |             |
|    approx_kl            | 0.007301066 |
|    clip_fraction        | 0.158       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.44       |
|    explained_variance   | -0.0023     |
|    learning_rate        | 0.0003      |
|    loss                 | 18.7        |
|    n_updates            | 3170        |
|    policy_gradient_loss | -0.00635    |
|    value_loss           | 25.2        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6e+03 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 6.01e+03    |
|    ep_rew_mean          | 459         |
| time/                   |             |
|    fps                  | 155         |
|    iterations           | 328         |
|    time_elapsed         | 4323        |
|    total_timesteps      | 671744      |
| train/                  |             |
|    approx_kl            | 0.005020452 |
|    clip_fraction        | 0.173       |
|    clip_range           | 0.1         |
|    entropy_loss         | -1.37       |
|    explained_variance   | 0.00549     |
|    learning_rate        | 0.0003      |
|    loss                 | 15.4        |
|    n_updates            | 3270        |
|    policy_gradient_loss | -0.00376    |
|    value_loss           | 36.7        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.03

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 6.03e+03     |
|    ep_rew_mean          | 463          |
| time/                   |              |
|    fps                  | 155          |
|    iterations           | 338          |
|    time_elapsed         | 4454         |
|    total_timesteps      | 692224       |
| train/                  |              |
|    approx_kl            | 0.0073981676 |
|    clip_fraction        | 0.205        |
|    clip_range           | 0.1          |
|    entropy_loss         | -1.32        |
|    explained_variance   | -0.000654    |
|    learning_rate        | 0.0003       |
|    loss                 | 4.68         |
|    n_updates            | 3370         |
|    policy_gradient_loss | -0.00306     |
|    value_loss           | 8            |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

In [9]:

env.reset()
eval_env.reset()

#instancia o algoritmo de aprendizagem
model = DQN('MlpPolicy', env, verbose=1,batch_size=128, buffer_size=10000,gradient_steps=1,exploration_fraction=0.1,create_eval_env=True,tensorboard_log="/tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_dqn")

#treina o algoritmo
model.learn(total_timesteps=700000,eval_env=eval_env,eval_freq=175000, n_eval_episodes=1)

model.save("modelos/qdn/recigio_beamrider_700k")
del model 

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Logging to /tmp/stable-baselines/BeamRiderNoFrameskip-v4/custon2_dqn\DQN_1
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.71e+03 |
|    ep_rew_mean      | 363      |
|    exploration rate | 0.69     |
| time/               |          |
|    episodes         | 4        |
|    fps              | 1021     |
|    time_elapsed     | 22       |
|    total timesteps  | 22823    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.38e+03 |
|    ep_rew_mean      | 396      |
|    exploration rate | 0.416    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 1026     |
|    t

New best mean reward!
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.89e+03 |
|    ep_rew_mean      | 401      |
|    exploration rate | 0.05     |
| time/               |          |
|    episodes         | 60       |
|    fps              | 114      |
|    time_elapsed     | 3095     |
|    total timesteps  | 353182   |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.34     |
|    n_updates        | 75795    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.84e+03 |
|    ep_rew_mean      | 396      |
|    exploration rate | 0.05     |
| time/               |          |
|    episodes         | 64       |
|    fps              | 113      |
|    time_elapsed     | 3293     |
|    total timesteps  | 373828   |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.000146 

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.48e+03 |
|    ep_rew_mean      | 360      |
|    exploration rate | 0.05     |
| time/               |          |
|    episodes         | 120      |
|    fps              | 107      |
|    time_elapsed     | 6180     |
|    total timesteps  | 663616   |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.000291 |
|    n_updates        | 153403   |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 5.45e+03 |
|    ep_rew_mean      | 353      |
|    exploration rate | 0.05     |
| time/               |          |
|    episodes         | 124      |
|    fps              | 107      |
|    time_elapsed     | 6359     |
|    total timesteps  | 681625   |
| train/              |          |
|    learning_rate    | 0.0001   |
|    loss             | 0.336    |
|    n_updates      



#### Testando novos resultados

In [13]:
import gym
import time
from stable_baselines3 import A2C,PPO,DQN
from stable_baselines3.common.sb2_compat.rmsprop_tf_like import RMSpropTFLike

#carrega o ambiente
env = gym.make('BeamRiderNoFrameskip-v4')

model = A2C.load("modelos/a2c/recigio_beamrider_700k", env=env)

# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, render=True)

print(mean_reward)
print(std_reward)


Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
756.0
0.0


In [None]:
# O algoritmo achou um lugar que considerou mais ideal para se proteger dos tiros e atingir alguns inimigos
# ao invés de se movimenntar

In [1]:
import gym
import time
from stable_baselines3 import A2C,PPO,DQN
from stable_baselines3.common.sb2_compat.rmsprop_tf_like import RMSpropTFLike

#carrega o ambiente
env = gym.make('BeamRiderNoFrameskip-v4')

model = PPO.load("modelos/ppo/recigio_beamrider_700k", env=env)

# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, render=True)

print(mean_reward)
print(std_reward)


Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.




176.0
0.0


In [None]:
# tentou achar um padrão de movimentacao que fizesse mais pontos, sem realmente observar os inimigos

In [2]:
import gym
import time
from stable_baselines3 import A2C,PPO,DQN
from stable_baselines3.common.sb2_compat.rmsprop_tf_like import RMSpropTFLike

#carrega o ambiente
env = gym.make('BeamRiderNoFrameskip-v4')

model = DQN.load("modelos/qdn/recigio_beamrider_700k", env=env)

# importando biblioteca de avaliação e avaliando por 10 episodios
from stable_baselines3.common.evaluation import evaluate_policy

mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, render=True)

print(mean_reward)
print(std_reward)


Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
39.6
60.4899991734171


In [None]:
#um resultado um pouco melhor que o ppo

### Resultados

In [26]:
# Os resultados com 700k se mostraram um pouco melhores. Conteudo, para acalnsar resultados melhores, também,
# foram atualizados alguns parametros parametrizados. Me parece que esse cenário em especifico precisa de um.
# tunning melhor. Vendo agora, talvez o problema não seja exatamente overfit, mas parametrização.
# Também, me parece que seria necessário bem mais etapas de treinamento.

![mean_reward](graficos/700k_1.png)

![mean_reward](graficos/700k_2.png)

### Treinando utilizando modelos RL Baselines3 Zoooo

In [None]:
# Devido a algum bug, apagou os relatorios de execução dos treinamentos.

In [14]:
%cd rl-baselines3-zoo/

C:\xampp\htdocs\posfurb\reinforcementlearning\rl-baselines3-zoo


#### A2C - Treinamento com 2M de steps

In [None]:
!python train.py --algo a2c --env BeamRiderNoFrameskip-v4 --tensorboard-log /tmp/stable-baselines/ -n 2000000 --save-freq 200000 --eval-freq 200000 --eval-episodes 10 --n-eval-envs 10

#### PPO - Treinamento com 2M de steps

In [None]:
#devido a um bug ele escreveu metade em cada celula
!python train.py --algo ppo --env BeamRiderNoFrameskip-v4 --tensorboard-log /tmp/stable-baselines/ -n 2000000 --save-freq 200000 --eval-freq 200000 --eval-episodes 10 --n-eval-envs 10

#### DQN - Treinamento com 2M de steps

In [None]:
!python train.py --algo dqn --env BeamRiderNoFrameskip-v4 --tensorboard-log /tmp/stable-baselines/ -n 2000000 --save-freq 200000 --eval-freq 200000 --eval-episodes 10 --n-eval-envs 10

#### Visualização dos resultados - Utilizando a biblioteca enjoy apontando pra pasta de saida dos treinamentos

In [None]:
!python enjoy.py --algo a2c --env BeamRiderNoFrameskip-v4 --folder logs/ -n 5000

In [None]:
!python enjoy.py --algo ppo --env BeamRiderNoFrameskip-v4 --folder logs/ -n 5000

In [None]:
!python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 --folder logs/ -n 5000

In [None]:
# Os resultados foram bons, não ótimos, a ia consegue algum sucesso. Parece um pouco aleatorio os movimentos,
# acredito que devido a complexidade, precisava de mais treinamento. 
# O agente pronto da biblioteca, performa bem melhor.

![mean_reward ](graficos/Zoo_11.png)

![mean_reward](graficos/Zoo_11.png)

![mean_reward](graficos/Zoo_22.png)

# Resultados

## Tabela comparativa

In [28]:
import pandas as pd
data = [
        #analise
        ['A2C', '2M',0,0], 
        ['A2C', '700k',756,0], 
        ['A2C ZOO', '700k',578,'NA'], 
        ['PPO', '2M',0,0], 
        ['PPO', '700k',176,0], 
        ['PPO ZOO', '700k',649,'NA'],
        ['DQN', '2M',44,0], 
        ['DQN', '700k',39,60], 
        ['DQN ZOO', '700k',3432,'NA']
        # modelo 1
       ]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Algoritmo', 'TimeSteps','mean_reward','std_reward'])
  
# print dataframe.
df.sort_values(by=['mean_reward'], ascending=False)

Unnamed: 0,Algoritmo,TimeSteps,mean_reward,std_reward
8,DQN ZOO,700k,3432,
1,A2C,700k,756,0.0
5,PPO ZOO,700k,649,
2,A2C ZOO,700k,578,
4,PPO,700k,176,0.0
6,DQN,2M,44,0.0
7,DQN,700k,39,60.0
0,A2C,2M,0,0.0
3,PPO,2M,0,0.0


In [30]:
# Comparativamente os resultados do zoo parecem muito melhor tunados, numa mesma quantidade de timesptep.
# Talvez seja alguma parametrização ou configuração minha que falhou, mas acredito que não.
# Os algoritmos que eu treinei, tentavam encontrar um ponto otimo no cenário para ficarem parados, enquanto
# os algoritmos do zoom tentam se mover observando os inimigos. Acredito que que meus algoritmos estão empacando
# em algum maximo gradiente ou treinando com poucas etapas. Com precisei fazer vários experimentos, não
# tive tempo de treinar com mais de 2M de timesteps os modelos. Demoram em média 3 ou 4 horas cada um.
# Também, me parece que o DQN, baseado em Q-Learning parece o algoritmo mais promissor. 

## Melhores resultados A2C e DQN ZOO - Videos

In [6]:
import gym
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv
from stable_baselines3 import A2C,PPO,DQN

#carrega o ambiente
env_model = gym.make('BeamRiderNoFrameskip-v4')
model = A2C.load("modelos/a2c/recigio_beamrider_700k", env=env_model)

env_id = 'BeamRiderNoFrameskip-v4'
video_folder = 'videos/'
video_length = 3500

env = DummyVecEnv([lambda: gym.make(env_id)])
obs = env.reset()

# Record the video starting at the first step
env = VecVideoRecorder(env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix="random-agent-{}".format(env_id))

env.reset()
for _ in range(video_length + 1):
  action, _states = model.predict(obs, deterministic=True)
  obs, rewards, done, info = env.step(action)
# Save the video
env.close()

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Saving video to C:\xampp\htdocs\posfurb\reinforcementlearning\videos\random-agent-BeamRiderNoFrameskip-v4-step-0-to-step-3500.mp4


In [1]:
%cd rl-baselines3-zoo/

C:\xampp\htdocs\posfurb\reinforcementlearning\rl-baselines3-zoo


In [2]:
!python -m utils.record_video --algo dqn --env BeamRiderNoFrameskip-v4 --folder logs/ -n 3500

Loading latest experiment, id=1
Stacking 4 frames
Wrapping the env in a VecTransposeImage.
Saving video to C:\xampp\htdocs\posfurb\reinforcementlearning\rl-baselines3-zoo\logs\dqn\BeamRiderNoFrameskip-v4_1\videos\final-model-dqn-BeamRiderNoFrameskip-v4-step-0-to-step-3500.mp4


Exception ignored in: <function VecVideoRecorder.__del__ at 0x00000234054FC5E0>
Traceback (most recent call last):
  File "C:\xampp\htdocs\posfurb\aprendizadodemaquina\python\.env\lib\site-packages\stable_baselines3\common\vec_env\vec_video_recorder.py", line 113, in __del__
  File "C:\xampp\htdocs\posfurb\aprendizadodemaquina\python\.env\lib\site-packages\stable_baselines3\common\vec_env\vec_video_recorder.py", line 109, in close
AttributeError: 'NoneType' object has no attribute 'close'


In [3]:
from IPython.display import Video
%cd ..

C:\xampp\htdocs\posfurb\reinforcementlearning


### DQN ZOO Video

In [5]:
Video("videos/final-model-dqn-BeamRiderNoFrameskip-v4-step-0-to-step-3500.mp4")

### A2C Video

In [7]:
Video("videos/random-agent-BeamRiderNoFrameskip-v4-step-0-to-step-3500.mp4")