# Biblioteca de Algoritmos - Lab 03

Nos últimos anos, muitas bibliotecas RL foram desenvolvidas. Essas bibliotecas foram projetadas para ter todas as ferramentas necessárias para implementar e testar agentes de Aprendizado por Reforço .

Ainda assim, elas se diferem muito. É por isso que é importante escolher uma biblioteca que seja rápida, confiável e relevante para sua tarefa de RL. Do ponto de vista técnico, existem algumas coisas a se ter em mente ao considerar uma bilioteca para RL.

- **Suporte para bibliotecas de aprendizado de máquina existentes:** Como o RL normalmente usa algoritmos baseados em gradiente para aprender e ajustar funções de política, você vai querer que ele suporte sua biblioteca favorita (Tensorflow, Keras, Pytorch, etc.)
- **Escalabilidade:** RL é computacionalmente intensivo e ter a opção de executar de forma distribuída torna-se importante ao atacar ambientes complexos.
- **Composibilidade:** Os algoritmos de RL normalmente envolvem simulações e muitos outros componentes. Você vai querer uma biblioteca que permita reutilizar componentes de algoritmos de RL, que seja compatível com várias estruturas de aprendizado profundo.

[Aqui](https://docs.google.com/spreadsheets/d/1ZWhViAwCpRqupA5E_xFHSaBaaBZ1wAjO6PvmmEEpXGI/edit#gid=0) você consegue visualizar uma lista com algumas bibliotecas existentes.

<img src="https://i1.wp.com/neptune.ai/wp-content/uploads/RL-tools.png?resize=1024%2C372&ssl=1" width=500>


## Ray RLlib

[Ray](https://docs.ray.io/en/latest/) é uma plataforma de execução distribuída que fornece bases para paralelismo e escalabilidade que são simples de usar e permitem que os programas Python sejam escalados em qualquer lugar, de um notebook a um grande cluster. Além disso, construída sobre o Ray, temos a [RLlib](https://docs.ray.io/en/latest/rllib.html), que fornece uma API unificada que pode ser aproveitada em uma ampla gama de aplicações.

<br>

<img src="https://miro.medium.com/max/1838/1*_bomm09XtiZfQ52Kfz9Ciw.png" width=600>


A RLlib foi projetada para oferecer suporte a várias estruturas de aprendizado profundo (TensorFlow e PyTorch) e pode ser acessada por meio de uma API Python simples. Atualmente, ela vem com uma [série de algoritmos RL](https://docs.ray.io/en/latest/rllib-algorithms.html#available-algorithms-overview).

Em particular, a RLlib permite um desenvolvimento rápido porque torna mais fácil construir algoritmos RL escaláveis ​​por meio da reutilização e montagem de implementações existentes. A RLlib também permite que os desenvolvedores usem redes neurais criadas com várias estruturas de aprendizado profundo e se integra facilmente a simuladores de terceiros.


## Configuração

Você precisará fazer uma cópia deste notebook em seu Google Drive antes de editar. Você pode fazer isso com **Arquivo → Salvar uma cópia no Drive**.

In [1]:
import os
#from google.colab import drive
#drive.mount("/content/gdrive")

In [2]:
# Seu trabalho será armazenado em uma pasta chamada `minicurso_rl` por padrão 
# para evitar que o tempo limite da instância do Colab exclua suas edições

DRIVE_PATH = "../minicurso_rl/lab03"
#DRIVE_PYTHON_PATH = DRIVE_PATH.replace("\\", "")
if not os.path.exists(DRIVE_PATH):
  %mkdir -p $DRIVE_PATH

In [3]:
# Ambiente da competição
#!pip install --upgrade ceia-soccer-twos > /dev/null 2>&1
# a versão do ray compatível com a implementação dos agentes disponibilizada é a 1.4.0
#!pip install 'aioredis==1.3.1' > /dev/null 2>&1 
#!pip install 'aiohttp==3.7.4' > /dev/null 2>&1 
#!pip install 'ray==1.4.0' > /dev/null 2>&1 
#!pip install 'ray[rllib]==1.4.0' > /dev/null 2>&1 
#!pip install 'ray[tune]==1.4.0' > /dev/null 2>&1 
#!pip install torch > /dev/null 2>&1 
#!pip install lz4 > /dev/null 2>&1 

# Dependências necessárias para gravar os vídeos
#!apt-get install -y xvfb x11-utils > /dev/null 2>&1 
#!pip install pyvirtualdisplay==0.2.* > /dev/null 2>&1 

In [4]:
#! wget http://www.atarimania.com/roms/Roms.rar
#! mkdir ../content/ROM/
#! unrar e ../content/Roms.rar ../content/ROM/ -y
#! python -m atari_py.import_roms ../content/ROM/ > /dev/null 2>&1

In [5]:
# Inicializa uma instância de um display virtual
from pyvirtualdisplay import Display
display = Display(visible=False, size=(1400, 900))
_ = display.start()

In [6]:
# Carrega a extensão do notebook TensorBoard
%load_ext tensorboard

## Ambiente

O OpenAI Gym possui um wrapper VideoRecorder que pode gravar um vídeo do ambiente em formato MP4. Abaixo iremos interagir no ambiente do [Carpole](https://gym.openai.com/envs/CartPole-v0/) executando ações aleatórias e gravar o resultado.

In [88]:
import gym
from gym.wrappers.monitoring.video_recorder import VideoRecorder

environment_id = "CartPole-v0"

In [8]:
import gym
from gym.wrappers.monitoring.video_recorder import VideoRecorder

env = gym.make(environment_id)
before_training = os.path.join(
    DRIVE_PATH, "{}_before_training.mp4".format(environment_id)
)

In [9]:
print(before_training)

video = VideoRecorder(env, before_training)
env.reset()
for i in range(200):
  env.render()
  video.capture_frame()
  observation, reward, done, info = env.step(env.action_space.sample())

video.close()
env.close()

../minicurso_rl/lab03/CartPole-v0_before_training.mp4




O código acima salvou o arquivo de vídeo no seu Drive. Para exibi-lo no notebook, você precisa de uma função auxiliar.

In [10]:
from base64 import b64encode
def render_mp4(videopath: str) -> str:
  mp4 = open(videopath, 'rb').read()
  base64_encoded_mp4 = b64encode(mp4).decode()
  return f'<video width=400 controls><source src="data:video/mp4;' \
         f'base64,{base64_encoded_mp4}" type="video/mp4"></video>'

O código abaixo renderiza os resultados. Você deve obter um vídeo semelhante ao abaixo.

In [11]:
from IPython.display import HTML
html = render_mp4(before_training)
HTML(html)

## Treinando um agente de Aprendizado por Reforço

Primeiro, vamos começar a executar o Ray em segundo plano. Executar um `ray.shutdown()` seguido por um `ray.init()` deve dar início às coisas.

In [12]:
import ray

ray.shutdown()
ray.init(ignore_reinit_error=True, include_dashboard=False)

{'node_ip_address': '192.168.130.2',
 'raylet_ip_address': '192.168.130.2',
 'redis_address': '192.168.130.2:6379',
 'object_store_address': '/tmp/ray/session_2021-11-12_23-54-36_935172_22643/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-11-12_23-54-36_935172_22643/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2021-11-12_23-54-36_935172_22643',
 'metrics_export_port': 62556,
 'node_id': '325cbe677fea6c1547686fe403f55f09f5b14875188dc3525b6b8915'}

### Basic Python API

Em alto nível, RLlib fornece uma classe Trainer que contém uma política para interação com o ambiente. Por meio da interface do Trainer, a política pode ser treinada, avaliada ou computar uma ação. 

Para cada algoritmo gostaríamos de configurar os parâmetros (taxa de aprendizado, tamanho da rede, tamanho do batch, etc.) de acordo com a nossa aplicação.  Para isso o Ray fornece dois níveis de paramêtros que podemos alterar. Primeiramente temos os parâmetros comuns a todos os algoritmos. Você pode conferir uma lista com os parâmetros disponíveis através desse [link](https://docs.ray.io/en/latest/rllib-training.html#common-parameters).

E para cada [algoritmo disponível no ray](https://docs.ray.io/en/latest/rllib-algorithms.html#available-algorithms-overview) temos os parâmetros específicos. Na imagem abaixo podemos ver os parâmetros específicos para o algoritmo [Policy Gradient](https://docs.ray.io/en/latest/rllib-algorithms.html#policy-gradients).


<img src='https://drive.google.com/uc?id=1yKJDJViHE_F9JH7NTQMYtQL3KLBJoJyk' width="500" >


In [None]:
import ray
import ray.rllib.agents.pg as pg
from ray.tune.logger import pretty_print

config = pg.DEFAULT_CONFIG.copy()
config["num_gpus"] = 0
config["num_workers"] = 1
config["lr"] = 0.0004
config["framework"] = "torch"

trainer = pg.PGTrainer(config=config, env=environment_id)
episodes = 1000

for i in range(episodes):
   # Executa uma iteração de treinamento da política com Policy Gradient (PG)
   result = trainer.train()
   print(pretty_print(result))

   if i % 100 == 0:
       checkpoint = trainer.save()
       print("checkpoint saved at", checkpoint)

last_checkpoint = trainer.save()

In [14]:
print("Last checkpoint saved at", last_checkpoint)

Last checkpoint saved at /home/eduardo/ray_results/PG_CartPole-v0_2021-11-12_23-54-41x6aj24su/checkpoint_001000/checkpoint-1000


Agora vamos criar outro vídeo, mas desta vez escolha a ação recomendada pelo modelo treinado em vez de agir aleatoriamente.

In [15]:
trainer = pg.PGTrainer(config=config, env=environment_id)
trainer.restore(last_checkpoint)

after_training = os.path.join(
    DRIVE_PATH, "{}after_training_basic_api.mp4".format(environment_id)
)
after_video = VideoRecorder(env, after_training)
observation = env.reset()
done = False
while not done:
  env.render()
  after_video.capture_frame()
  action = trainer.compute_action(observation)
  observation, reward, done, info = env.step(action)
after_video.close()
env.close()
html = render_mp4(after_training)
HTML(html)

2021-11-12 23:57:15,018	INFO trainable.py:378 -- Restored on 192.168.130.2 from checkpoint: /home/eduardo/ray_results/PG_CartPole-v0_2021-11-12_23-54-41x6aj24su/checkpoint_001000/checkpoint-1000
2021-11-12 23:57:15,018	INFO trainable.py:385 -- Current state after restoring: {'_iteration': 1000, '_timesteps_total': None, '_time_total': 140.12676167488098, '_episodes_total': 1207}


### Usando ambiente ou modelos personalizados

A API Python fornece a flexibilidade necessária para aplicar o RLlib a novos problemas. Você precisará usar esta API se desejar usar ambientes ou modelos personalizados com RLlib. Abaixo veremos um exemplo de um ambiente e um modelo customizado.

<br>


Para maiores informações veja em [APIs Python avançadas](https://docs.ray.io/en/latest/rllib-training.html#advanced-python-apis).

In [16]:
import gym
from gym.spaces import Discrete, Box
import numpy as np
import os
import random

import torch
import torch.nn as nn

import ray
from ray import tune
from ray.rllib.agents import pg
from ray.rllib.env.env_context import EnvContext
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.tune.logger import pretty_print

In [17]:
class SimpleCorridor(gym.Env):
    """Exemplo de um ambiente personalizado em que você tem que andar por um 
    corredor. Você pode configurar o comprimento do corredor através da 
    configuração do ambiente."""

    def __init__(self, config: EnvContext):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(
            0.0, self.end_pos, shape=(1, ), dtype=np.float32)
        # Define a seed. É usado apenas para a recompensa final.
        self.seed(config.worker_index * config.num_workers)

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        assert action in [0, 1], action
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        # Produz uma recompensa aleatória quando atingirmos a meta.
        return [self.cur_pos], \
            random.random() * 2 if done else -0.1, done, {}

    def seed(self, seed=None):
        random.seed(seed)

In [18]:
class TorchCustomModel(TorchModelV2, nn.Module):
    """Exemplo de um modelo personalizado PyTorch que apenas delega para uma 
    fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.torch_sub_model = TorchFC(obs_space, action_space, num_outputs,
                                       model_config, name)

    def forward(self, input_dict, state, seq_lens):
        input_dict["obs"] = input_dict["obs"].float()
        fc_out, _ = self.torch_sub_model(input_dict, state, seq_lens)
        return fc_out, []

    def value_function(self):
        return torch.reshape(self.torch_sub_model.value_function(), [-1])

In [19]:
# Também pode registrar a função de criar um ambiente explicitamente com:
# register_env("corridor", lambda config: SimpleCorridor(config))

# Registrar o modelo customizado
ModelCatalog.register_custom_model(
    "my_model", TorchCustomModel
)

config = {
    "env": SimpleCorridor,  # ou "corridor" se registrado
    "env_config": {
        "corridor_length": 5,
    },
    "model": {
        "custom_model": "my_model",
        "vf_share_layers": True,
    },
    "num_workers": 1,  
    "framework": "torch",
}

stop = {
    "training_iteration": 50,
    "timesteps_total": 100000,
    "episode_reward_mean": 0.1,
}

In [None]:
pg_config = pg.DEFAULT_CONFIG.copy()
pg_config.update(config)
pg_config["lr"] = 1e-3

trainer = pg.PGTrainer(config=pg_config, env=SimpleCorridor)
# executa o loop de treinamento manual e imprime os resultados após cada iteração
for _ in range(stop["training_iteration"]):
    result = trainer.train()
    print(pretty_print(result))
    
    # pare o treinamento caso tiver alcançado a quantidade de steps desejada
    # ou caso a recompensa desejada seja alcançada
    if result["timesteps_total"] >= stop["timesteps_total"] or \
            result["episode_reward_mean"] >= stop["episode_reward_mean"]:
        break

### Ray Tune

Todos os Trainers do RLlib são compatíveis com a API do [Ray Tune](https://docs.ray.io/en/master/tune/index.html). Isso permite que eles sejam facilmente usados em experimentos com o Tune. Por exemplo, o código a seguir executa o mesmo treino com o CartPole com o algoritmo PG.

In [None]:
import ray
config = {
    "env": environment_id,
    "framework": "torch",
}
stop = {"episode_reward_mean": 150, "timesteps_total": 100000}

# Executar o treinamento
analysis = ray.tune.run(
    "PG",
    config=config,
    stop=stop,
    checkpoint_freq=10,
    checkpoint_at_end=True,
    local_dir=os.path.join(DRIVE_PATH, "results")
)

Embora o objeto de análise retornado do `ray.tune.run` anteriormente não tivesse nenhuma instância Trainer, ele tem todas as informações necessárias para reconstruir um de um checkpoint salvo.

O retorno do Ray Tune é um objeto [ExperimentAnalysis](https://docs.ray.io/en/latest/tune/api_docs/analysis.html?highlight=ExperimentAnalysis#experimentanalysis-tune-experimentanalysis) onde é possível resgatar qual o melhor checkpoint do treino.

In [22]:
from ray.rllib.agents.pg import PGTrainer

# restaurar um Trainer 
trial = analysis.get_best_logdir("episode_reward_mean", "max")
checkpoint = analysis.get_best_checkpoint(
  trial,
  "training_iteration",
  "max",
)
trainer = PGTrainer(config=config)
trainer.restore(checkpoint)

2021-11-12 23:57:37,588	INFO trainable.py:378 -- Restored on 192.168.130.2 from checkpoint: /home/eduardo/ceia/curso-rl-ceia-2021/labs/minicurso_rl/lab03/results/PG/PG_CartPole-v0_6ac44_00000_0_2021-11-12_23-57-19/checkpoint_000099/checkpoint-99
2021-11-12 23:57:37,589	INFO trainable.py:385 -- Current state after restoring: {'_iteration': 99, '_timesteps_total': None, '_time_total': 13.992749214172363, '_episodes_total': 219}


Agora vamos criar outro vídeo, mas desta vez escolha a ação recomendada pelo modelo treinado com a API Tune.

In [23]:
after_training = after_training = os.path.join(
    DRIVE_PATH, "{}after_training_tune.mp4".format(environment_id)
)
after_video = VideoRecorder(env, after_training)
observation = env.reset()
done = False
while not done:
  env.render()
  after_video.capture_frame()
  action = trainer.compute_action(observation)
  observation, reward, done, info = env.step(action)
after_video.close()
env.close()
# You should get a video similar to the one below. 
html = render_mp4(after_training)
HTML(html)

O Tune gera arquivos do [Tensorboard](https://www.tensorflow.org/tensorboard) automaticamente durante o `tune.run()` Para visualizar a aprendizagem no tensorboard, execute o célula abaixo:

In [24]:
%tensorboard --logdir ../minicurso_rl/lab03/results/PG

## Hyperparameter Tuning com o Ray Tune

[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) é uma biblioteca para execução de experimentos e ajuste de hiperparâmetros. Vamos agora tentar encontrar hiperparâmetros que possam resolver o ambiente [Cartpole](https://gym.openai.com/envs/CartPole-v1/) no menor número de passos de tempo. Esteja preparado para que demore um pouco para ser executado.

In [None]:
parameter_search_config = {
    "env": environment_id,
    "framework": "torch",
    "num_gpus": 1,  # porcentagem da gpu disponível para treino
    "num_workers": 2,

    # Hyperparameter tuning
    "model": {
      "fcnet_hiddens": ray.tune.grid_search([[32], [64]]),
      "fcnet_activation": ray.tune.grid_search(["linear", "relu"]),
    },
    "lr": ray.tune.uniform(1e-7, 1e-2)
}

# To explicitly stop or restart Ray, use the shutdown API.
ray.shutdown()

ray.init(
  num_cpus=3,
  include_dashboard=False,
  ignore_reinit_error=True,
  log_to_driver=False,
)

parameter_search_analysis = ray.tune.run(
  "PG",
  config=parameter_search_config,
  stop=stop,
  num_samples=5,
  metric="timesteps_total",
  mode="min",
)

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens
PG_CartPole-v0_2b2c7_00000,PENDING,,0.00241435,linear,[32]
PG_CartPole-v0_2b2c7_00001,PENDING,,0.00553195,relu,[32]
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64]
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64]
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32]
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32]
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64]
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64]
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32]
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32]


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens
PG_CartPole-v0_2b2c7_00000,RUNNING,,0.00241435,linear,[32]
PG_CartPole-v0_2b2c7_00001,PENDING,,0.00553195,relu,[32]
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64]
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64]
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32]
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32]
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64]
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64]
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32]
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32]


Result for PG_CartPole-v0_2b2c7_00000:
  agent_timesteps_total: 400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-44-43
  done: false
  episode_len_mean: 23.666666666666668
  episode_media: {}
  episode_reward_max: 41.0
  episode_reward_mean: 23.666666666666668
  episode_reward_min: 12.0
  episodes_this_iter: 12
  episodes_total: 12
  experiment_id: 40605ae4361e4247bd74c7999a2ca42b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 11.35116195678711
    num_agent_steps_sampled: 400
    num_steps_sampled: 400
    num_steps_trained: 400
  iterations_since_restore: 1
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 29.5
    ram_util_percent: 75.5
  pid: 29315
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03043039521174644
    mean_env_render_ms: 0.0
    mean_env_wait_ms

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00000,RUNNING,192.168.130.2:29315,0.00241435,linear,[32],31.0,3.97008,12400.0,51.75,183.0,13.0,51.75
PG_CartPole-v0_2b2c7_00001,PENDING,,0.00553195,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00000:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-44-48
  done: false
  episode_len_mean: 59.59
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 59.59
  episode_reward_min: 13.0
  episodes_this_iter: 6
  episodes_total: 358
  experiment_id: 40605ae4361e4247bd74c7999a2ca42b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 18.762834548950195
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29315
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029449517173246117
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.042415734771822707
    mean_inference_ms: 0.47698346724906443

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00000,RUNNING,192.168.130.2:29315,0.00241435,linear,[32],65.0,8.72825,26000.0,103.55,200.0,13.0,103.55
PG_CartPole-v0_2b2c7_00001,PENDING,,0.00553195,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00000:
  agent_timesteps_total: 28400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-44-53
  done: false
  episode_len_mean: 116.89
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 116.89
  episode_reward_min: 15.0
  episodes_this_iter: 2
  episodes_total: 479
  experiment_id: 40605ae4361e4247bd74c7999a2ca42b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 26.742889404296875
    num_agent_steps_sampled: 28400
    num_steps_sampled: 28400
    num_steps_trained: 28400
  iterations_since_restore: 71
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29315
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030620073875388076
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04422593129217676
    mean_inference_ms: 0.4954558773267401

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00001,RUNNING,192.168.130.2:29444,0.00553195,relu,[32],1.0,0.135902,400.0,19.4,29.0,10.0,19.4
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00001:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-08
  done: false
  episode_len_mean: 66.79
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 66.79
  episode_reward_min: 19.0
  episodes_this_iter: 5
  episodes_total: 342
  experiment_id: f4b7998b22f647a1ae170d7128633f61
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 15.919065475463867
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29444
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02948951504492923
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04237239382360593
    mean_inference_ms: 0.49192662071173343
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00001,RUNNING,192.168.130.2:29444,0.00553195,relu,[32],38.0,4.92316,15200.0,66.79,200.0,19.0,66.79
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00001:
  agent_timesteps_total: 28800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-13
  done: false
  episode_len_mean: 131.45
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 131.45
  episode_reward_min: 43.0
  episodes_this_iter: 2
  episodes_total: 449
  experiment_id: f4b7998b22f647a1ae170d7128633f61
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 23.126312255859375
    num_agent_steps_sampled: 28800
    num_steps_sampled: 28800
    num_steps_trained: 28800
  iterations_since_restore: 72
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29444
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029949963994666352
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.043293389893173446
    mean_inference_ms: 0.500167342545036

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00001,RUNNING,192.168.130.2:29444,0.00553195,relu,[32],72.0,9.74516,28800.0,131.45,200.0,43.0,131.45
PG_CartPole-v0_2b2c7_00002,PENDING,,0.00477552,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00001:
  agent_timesteps_total: 35200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-16
  done: true
  episode_len_mean: 150.38
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.38
  episode_reward_min: 55.0
  episodes_this_iter: 2
  episodes_total: 488
  experiment_id: f4b7998b22f647a1ae170d7128633f61
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 21.495386123657227
    num_agent_steps_sampled: 35200
    num_steps_sampled: 35200
    num_steps_trained: 35200
  iterations_since_restore: 88
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29444
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030455576772324314
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04418247351433701
    mean_inference_ms: 0.5082265137392936


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00002,RUNNING,192.168.130.2:29577,0.00477552,linear,[64],1.0,0.132777,400.0,19.7368,46.0,11.0,19.7368
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00002:
  agent_timesteps_total: 13600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-27
  done: false
  episode_len_mean: 96.6
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 96.6
  episode_reward_min: 28.0
  episodes_this_iter: 2
  episodes_total: 200
  experiment_id: 6e975f439cb14cf98341b0d37f08bb5d
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 23.457441329956055
    num_agent_steps_sampled: 13600
    num_steps_sampled: 13600
    num_steps_trained: 13600
  iterations_since_restore: 34
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29577
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03211157477870274
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.047477958384008395
    mean_inference_ms: 0.5221184637653064
   

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00002,RUNNING,192.168.130.2:29577,0.00477552,linear,[64],34.0,4.86402,13600.0,96.6,200.0,28.0,96.6
PG_CartPole-v0_2b2c7_00003,PENDING,,0.00565651,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00002:
  agent_timesteps_total: 22400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-31
  done: true
  episode_len_mean: 150.37
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.37
  episode_reward_min: 44.0
  episodes_this_iter: 2
  episodes_total: 251
  experiment_id: 6e975f439cb14cf98341b0d37f08bb5d
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 26.02692413330078
    num_agent_steps_sampled: 22400
    num_steps_sampled: 22400
    num_steps_trained: 22400
  iterations_since_restore: 56
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 41.8
    ram_util_percent: 75.8
  pid: 29577
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.0323689685182987
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.047845193427

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00003,RUNNING,192.168.130.2:29670,0.00565651,relu,[64],1.0,0.138824,400.0,20.5,55.0,11.0,20.5
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00003:
  agent_timesteps_total: 13600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-43
  done: false
  episode_len_mean: 84.25
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 84.25
  episode_reward_min: 20.0
  episodes_this_iter: 2
  episodes_total: 248
  experiment_id: 8bfd960378664977983f94f52f12c837
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 21.440555572509766
    num_agent_steps_sampled: 13600
    num_steps_sampled: 13600
    num_steps_trained: 13600
  iterations_since_restore: 34
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29670
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03161156826076133
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04698328089335966
    mean_inference_ms: 0.5380688926805209
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00003,RUNNING,192.168.130.2:29670,0.00565651,relu,[64],34.0,4.98318,13600.0,84.25,200.0,20.0,84.25
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00003:
  agent_timesteps_total: 26400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-48
  done: false
  episode_len_mean: 149.04
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 149.04
  episode_reward_min: 31.0
  episodes_this_iter: 2
  episodes_total: 332
  experiment_id: 8bfd960378664977983f94f52f12c837
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 25.567340850830078
    num_agent_steps_sampled: 26400
    num_steps_sampled: 26400
    num_steps_trained: 26400
  iterations_since_restore: 66
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29670
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03230024606310092
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.047999863260310145
    mean_inference_ms: 0.5479299084371989

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00003,RUNNING,192.168.130.2:29670,0.00565651,relu,[64],66.0,9.69261,26400.0,149.04,200.0,31.0,149.04
PG_CartPole-v0_2b2c7_00004,PENDING,,0.00111003,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00003:
  agent_timesteps_total: 27200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-45-49
  done: true
  episode_len_mean: 151.89
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.89
  episode_reward_min: 31.0
  episodes_this_iter: 2
  episodes_total: 336
  experiment_id: 8bfd960378664977983f94f52f12c837
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 25.1894474029541
    num_agent_steps_sampled: 27200
    num_steps_sampled: 27200
    num_steps_trained: 27200
  iterations_since_restore: 68
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 39.5
    ram_util_percent: 75.7
  pid: 29670
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03231205152808045
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.048018298796

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],1.0,0.190826,400.0,23.3125,42.0,12.0,23.3125
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00004:
  agent_timesteps_total: 14400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-01
  done: false
  episode_len_mean: 33.07
  episode_media: {}
  episode_reward_max: 115.0
  episode_reward_mean: 33.07
  episode_reward_min: 11.0
  episodes_this_iter: 7
  episodes_total: 524
  experiment_id: 05bc2ed9e6a44deabcf48ac1e555372c
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 16.358373641967773
    num_agent_steps_sampled: 14400
    num_steps_sampled: 14400
    num_steps_trained: 14400
  iterations_since_restore: 36
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29764
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.0313440252361936
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.045792034805535425
    mean_inference_ms: 0.5009352648173445
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],36.0,4.91482,14400.0,33.07,115.0,11.0,33.07
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],68.0,9.56618,27200.0,45.21,188.0,13.0,45.21
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00004:
  agent_timesteps_total: 27600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-06
  done: false
  episode_len_mean: 46.27
  episode_media: {}
  episode_reward_max: 188.0
  episode_reward_mean: 46.27
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 838
  experiment_id: 05bc2ed9e6a44deabcf48ac1e555372c
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 14.499448776245117
    num_agent_steps_sampled: 27600
    num_steps_sampled: 27600
    num_steps_trained: 27600
  iterations_since_restore: 69
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29764
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03231993148866529
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04730731954211745
    mean_inference_ms: 0.5169603247627438
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],101.0,14.2812,40400.0,55.22,156.0,17.0,55.22
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00004:
  agent_timesteps_total: 40800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-11
  done: false
  episode_len_mean: 55.69
  episode_media: {}
  episode_reward_max: 156.0
  episode_reward_mean: 55.69
  episode_reward_min: 17.0
  episodes_this_iter: 7
  episodes_total: 1087
  experiment_id: 05bc2ed9e6a44deabcf48ac1e555372c
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 14.622653007507324
    num_agent_steps_sampled: 40800
    num_steps_sampled: 40800
    num_steps_trained: 40800
  iterations_since_restore: 102
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29764
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03258559390538607
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04774389940869886
    mean_inference_ms: 0.5205167924758497


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],136.0,18.9655,54400.0,82.25,200.0,13.0,82.25
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00004:
  agent_timesteps_total: 55200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-16
  done: false
  episode_len_mean: 84.65
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 84.65
  episode_reward_min: 13.0
  episodes_this_iter: 4
  episodes_total: 1275
  experiment_id: 05bc2ed9e6a44deabcf48ac1e555372c
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.842885971069336
    num_agent_steps_sampled: 55200
    num_steps_sampled: 55200
    num_steps_trained: 55200
  iterations_since_restore: 138
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29764
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03236665881480755
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.047420938955626595
    mean_inference_ms: 0.5174015087220722

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00004,RUNNING,192.168.130.2:29764,0.00111003,linear,[32],174.0,23.8334,69600.0,117.77,200.0,20.0,117.77
PG_CartPole-v0_2b2c7_00005,PENDING,,0.00122893,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00004:
  agent_timesteps_total: 70000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-21
  done: false
  episode_len_mean: 117.96
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 117.96
  episode_reward_min: 20.0
  episodes_this_iter: 3
  episodes_total: 1401
  experiment_id: 05bc2ed9e6a44deabcf48ac1e555372c
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 23.54143524169922
    num_agent_steps_sampled: 70000
    num_steps_sampled: 70000
    num_steps_trained: 70000
  iterations_since_restore: 175
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29764
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03195826874817931
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04672595256302868
    mean_inference_ms: 0.5115003626978853

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],1.0,0.155414,400.0,19.4,56.0,10.0,19.4
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 12800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-37
  done: false
  episode_len_mean: 28.83
  episode_media: {}
  episode_reward_max: 99.0
  episode_reward_mean: 28.83
  episode_reward_min: 8.0
  episodes_this_iter: 16
  episodes_total: 506
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 9.224214553833008
    num_agent_steps_sampled: 12800
    num_steps_sampled: 12800
    num_steps_trained: 12800
  iterations_since_restore: 32
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03442536253920643
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.051043840775386685
    mean_inference_ms: 0.5680164928411218
   

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],32.0,4.95213,12800.0,28.83,99.0,8.0,28.83
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 26800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-42
  done: false
  episode_len_mean: 44.46
  episode_media: {}
  episode_reward_max: 119.0
  episode_reward_mean: 44.46
  episode_reward_min: 9.0
  episodes_this_iter: 8
  episodes_total: 871
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 16.660831451416016
    num_agent_steps_sampled: 26800
    num_steps_sampled: 26800
    num_steps_trained: 26800
  iterations_since_restore: 67
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03263417392930128
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.0478280062405294
    mean_inference_ms: 0.5412038647604834
    

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],67.0,9.72824,26800.0,44.46,119.0,9.0,44.46
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 39600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-47
  done: false
  episode_len_mean: 61.22
  episode_media: {}
  episode_reward_max: 155.0
  episode_reward_mean: 61.22
  episode_reward_min: 14.0
  episodes_this_iter: 6
  episodes_total: 1108
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 14.7152099609375
    num_agent_steps_sampled: 39600
    num_steps_sampled: 39600
    num_steps_trained: 39600
  iterations_since_restore: 99
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 41.0
    ram_util_percent: 75.6
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03289663818549315
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.048154639086

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],99.0,14.5089,39600.0,61.22,155.0,14.0,61.22
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 53600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-53
  done: false
  episode_len_mean: 89.19
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 89.19
  episode_reward_min: 15.0
  episodes_this_iter: 3
  episodes_total: 1278
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.4312686920166
    num_agent_steps_sampled: 53600
    num_steps_sampled: 53600
    num_steps_trained: 53600
  iterations_since_restore: 134
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.032442964244533294
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04742156161012414
    mean_inference_ms: 0.538726936716162
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],134.0,19.297,53600.0,89.19,200.0,15.0,89.19
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 66800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-46-58
  done: false
  episode_len_mean: 110.33
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 110.33
  episode_reward_min: 18.0
  episodes_this_iter: 2
  episodes_total: 1401
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.212039947509766
    num_agent_steps_sampled: 66800
    num_steps_sampled: 66800
    num_steps_trained: 66800
  iterations_since_restore: 167
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.032585098132334435
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.047521201659218824
    mean_inference_ms: 0.5405752182081

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],167.0,24.0537,66800.0,110.33,200.0,18.0,110.33
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 81200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-03
  done: false
  episode_len_mean: 134.98
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 134.98
  episode_reward_min: 25.0
  episodes_this_iter: 3
  episodes_total: 1509
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.988006591796875
    num_agent_steps_sampled: 81200
    num_steps_sampled: 81200
    num_steps_trained: 81200
  iterations_since_restore: 203
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03212345074117597
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04682890382176689
    mean_inference_ms: 0.534231412247423

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00005,RUNNING,192.168.130.2:29924,0.00122893,relu,[32],203.0,28.754,81200.0,134.98,200.0,25.0,134.98
PG_CartPole-v0_2b2c7_00006,PENDING,,0.00924176,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00005:
  agent_timesteps_total: 94000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-07
  done: true
  episode_len_mean: 150.33
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.33
  episode_reward_min: 29.0
  episodes_this_iter: 2
  episodes_total: 1594
  experiment_id: 6b552e9fedc24d6dbcf83baaf6969011
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 26.069538116455078
    num_agent_steps_sampled: 94000
    num_steps_sampled: 94000
    num_steps_trained: 94000
  iterations_since_restore: 235
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 34.5
    ram_util_percent: 75.5
  pid: 29924
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03180318219594789
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04617370

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00006,RUNNING,192.168.130.2:30089,0.00924176,linear,[64],1.0,0.130104,400.0,20.3684,32.0,9.0,20.3684
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00006:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-19
  done: false
  episode_len_mean: 128.31
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 128.31
  episode_reward_min: 21.0
  episodes_this_iter: 3
  episodes_total: 166
  experiment_id: 0f1aeac73755453e934b94716fbb795d
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 16.997631072998047
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30089
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029735861600384408
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04315106390791524
    mean_inference_ms: 0.4867503337915463

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00006,RUNNING,192.168.130.2:30089,0.00924176,linear,[64],38.0,4.91822,15200.0,128.31,200.0,21.0,128.31
PG_CartPole-v0_2b2c7_00007,PENDING,,0.000904234,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00006:
  agent_timesteps_total: 19600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-20
  done: true
  episode_len_mean: 151.22
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.22
  episode_reward_min: 41.0
  episodes_this_iter: 3
  episodes_total: 197
  experiment_id: 0f1aeac73755453e934b94716fbb795d
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 17.874845504760742
    num_agent_steps_sampled: 19600
    num_steps_sampled: 19600
    num_steps_trained: 19600
  iterations_since_restore: 49
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30089
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029861323458469168
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04337021603520918
    mean_inference_ms: 0.4889502219017302


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],1.0,0.163441,400.0,22.9412,66.0,9.0,22.9412
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 14400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-32
  done: false
  episode_len_mean: 37.33
  episode_media: {}
  episode_reward_max: 151.0
  episode_reward_mean: 37.33
  episode_reward_min: 12.0
  episodes_this_iter: 12
  episodes_total: 483
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 10.682307243347168
    num_agent_steps_sampled: 14400
    num_steps_sampled: 14400
    num_steps_trained: 14400
  iterations_since_restore: 36
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03118260058186797
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.0458032889335411
    mean_inference_ms: 0.5177922207579492
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],36.0,4.95665,14400.0,37.33,151.0,12.0,37.33
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],69.0,9.64649,27600.0,49.11,168.0,11.0,49.11
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 28000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-37
  done: false
  episode_len_mean: 48.51
  episode_media: {}
  episode_reward_max: 168.0
  episode_reward_mean: 48.51
  episode_reward_min: 11.0
  episodes_this_iter: 10
  episodes_total: 792
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 11.47698974609375
    num_agent_steps_sampled: 28000
    num_steps_sampled: 28000
    num_steps_trained: 28000
  iterations_since_restore: 70
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03133033980442347
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04624714920006076
    mean_inference_ms: 0.5200530954883
    m

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],103.0,14.349,41200.0,66.5,200.0,19.0,66.5
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 42000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-42
  done: false
  episode_len_mean: 66.45
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 66.45
  episode_reward_min: 19.0
  episodes_this_iter: 7
  episodes_total: 1020
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 16.012176513671875
    num_agent_steps_sampled: 42000
    num_steps_sampled: 42000
    num_steps_trained: 42000
  iterations_since_restore: 105
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03150231543231444
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04641396014090041
    mean_inference_ms: 0.5222081990266532


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],140.0,19.0232,56000.0,84.08,200.0,21.0,84.08
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 56800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-47
  done: false
  episode_len_mean: 85.59
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 85.59
  episode_reward_min: 21.0
  episodes_this_iter: 4
  episodes_total: 1205
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 24.674955368041992
    num_agent_steps_sampled: 56800
    num_steps_sampled: 56800
    num_steps_trained: 56800
  iterations_since_restore: 142
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03080106005267217
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04534275816266852
    mean_inference_ms: 0.5113132407004983


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],177.0,23.764,70800.0,109.89,200.0,26.0,109.89
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 71600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-52
  done: false
  episode_len_mean: 111.83
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 111.83
  episode_reward_min: 26.0
  episodes_this_iter: 3
  episodes_total: 1343
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.118925094604492
    num_agent_steps_sampled: 71600
    num_steps_sampled: 71600
    num_steps_trained: 71600
  iterations_since_restore: 179
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030449902473936467
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04481889370788391
    mean_inference_ms: 0.50530690595029

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00007,RUNNING,192.168.130.2:30182,0.000904234,relu,[64],214.0,28.4985,85600.0,149.58,200.0,15.0,149.58
PG_CartPole-v0_2b2c7_00008,PENDING,,0.00576621,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00007:
  agent_timesteps_total: 86400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-47-58
  done: false
  episode_len_mean: 149.19
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 149.19
  episode_reward_min: 15.0
  episodes_this_iter: 4
  episodes_total: 1441
  experiment_id: 20be7bfc0bda4dd5ae9636fd71942fe7
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 23.281095504760742
    num_agent_steps_sampled: 86400
    num_steps_sampled: 86400
    num_steps_trained: 86400
  iterations_since_restore: 216
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30182
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030237721820819886
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04447103709562814
    mean_inference_ms: 0.50207491889099

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00008,RUNNING,192.168.130.2:30289,0.00576621,linear,[32],1.0,0.138902,400.0,25.8,44.0,10.0,25.8
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00008:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-10
  done: false
  episode_len_mean: 90.11
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 90.11
  episode_reward_min: 22.0
  episodes_this_iter: 4
  episodes_total: 257
  experiment_id: fcdd5011c51644ee960511155712f4b3
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 17.70591163635254
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30289
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029795891568525166
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.041850545992229315
    mean_inference_ms: 0.4824595157132228
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00008,RUNNING,192.168.130.2:30289,0.00576621,linear,[32],38.0,4.84861,15200.0,90.11,200.0,22.0,90.11
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00008:
  agent_timesteps_total: 30400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-15
  done: false
  episode_len_mean: 149.9
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 149.9
  episode_reward_min: 43.0
  episodes_this_iter: 2
  episodes_total: 357
  experiment_id: fcdd5011c51644ee960511155712f4b3
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.77598762512207
    num_agent_steps_sampled: 30400
    num_steps_sampled: 30400
    num_steps_trained: 30400
  iterations_since_restore: 76
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30289
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02966841787068093
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04158655702929761
    mean_inference_ms: 0.47933609439430475
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00008,RUNNING,192.168.130.2:30289,0.00576621,linear,[32],76.0,9.61966,30400.0,149.9,200.0,43.0,149.9
PG_CartPole-v0_2b2c7_00009,PENDING,,0.00472464,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,


Result for PG_CartPole-v0_2b2c7_00008:
  agent_timesteps_total: 30800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-15
  done: true
  episode_len_mean: 150.96
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.96
  episode_reward_min: 43.0
  episodes_this_iter: 2
  episodes_total: 359
  experiment_id: fcdd5011c51644ee960511155712f4b3
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 21.469539642333984
    num_agent_steps_sampled: 30800
    num_steps_sampled: 30800
    num_steps_trained: 30800
  iterations_since_restore: 77
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30289
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029666386436993365
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04158316276394909
    mean_inference_ms: 0.47930317512524395

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00009,RUNNING,192.168.130.2:30429,0.00472464,relu,[32],1.0,0.149594,400.0,21.4118,47.0,10.0,21.4118
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00009:
  agent_timesteps_total: 14800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-26
  done: false
  episode_len_mean: 67.79
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 67.79
  episode_reward_min: 15.0
  episodes_this_iter: 4
  episodes_total: 341
  experiment_id: c3288b0aa0704322871a96f1a9a13e5b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.11512565612793
    num_agent_steps_sampled: 14800
    num_steps_sampled: 14800
    num_steps_trained: 14800
  iterations_since_restore: 37
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30429
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02934005320478363
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.043686381536775835
    mean_inference_ms: 0.503177484420635
   

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00009,RUNNING,192.168.130.2:30429,0.00472464,relu,[32],37.0,4.95904,14800.0,67.79,200.0,15.0,67.79
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00009:
  agent_timesteps_total: 29600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-31
  done: false
  episode_len_mean: 132.5
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 132.5
  episode_reward_min: 33.0
  episodes_this_iter: 2
  episodes_total: 458
  experiment_id: c3288b0aa0704322871a96f1a9a13e5b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 20.93899154663086
    num_agent_steps_sampled: 29600
    num_steps_sampled: 29600
    num_steps_trained: 29600
  iterations_since_restore: 74
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30429
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029359262212023226
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04378738043962045
    mean_inference_ms: 0.5035607444108783
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00009,RUNNING,192.168.130.2:30429,0.00472464,relu,[32],74.0,9.75678,29600.0,132.5,200.0,33.0,132.5
PG_CartPole-v0_2b2c7_00010,PENDING,,0.00584183,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00009:
  agent_timesteps_total: 33600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-33
  done: true
  episode_len_mean: 150.62
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.62
  episode_reward_min: 44.0
  episodes_this_iter: 4
  episodes_total: 485
  experiment_id: c3288b0aa0704322871a96f1a9a13e5b
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 18.044252395629883
    num_agent_steps_sampled: 33600
    num_steps_sampled: 33600
    num_steps_trained: 33600
  iterations_since_restore: 84
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30429
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029258688521027684
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.0436128140798108
    mean_inference_ms: 0.501877919099757
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00010,RUNNING,192.168.130.2:30526,0.00584183,linear,[64],1.0,0.131192,400.0,26.0,61.0,14.0,26.0
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00010:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-44
  done: false
  episode_len_mean: 122.69
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 122.69
  episode_reward_min: 18.0
  episodes_this_iter: 2
  episodes_total: 190
  experiment_id: ba0c5e3a30974ad3b77518656aee25c4
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 27.519609451293945
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30526
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.028815543258017395
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.042099692555242235
    mean_inference_ms: 0.474344664237801

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00010,RUNNING,192.168.130.2:30526,0.00584183,linear,[64],38.0,4.90508,15200.0,122.69,200.0,18.0,122.69
PG_CartPole-v0_2b2c7_00011,PENDING,,0.00244953,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,


Result for PG_CartPole-v0_2b2c7_00010:
  agent_timesteps_total: 20000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-46
  done: true
  episode_len_mean: 151.57
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.57
  episode_reward_min: 24.0
  episodes_this_iter: 2
  episodes_total: 216
  experiment_id: ba0c5e3a30974ad3b77518656aee25c4
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 23.014806747436523
    num_agent_steps_sampled: 20000
    num_steps_sampled: 20000
    num_steps_trained: 20000
  iterations_since_restore: 50
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30526
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02899359006408548
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04234929089312733
    mean_inference_ms: 0.47733343925335164


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00011,RUNNING,192.168.130.2:30618,0.00244953,relu,[64],1.0,0.133998,400.0,21.2778,47.0,11.0,21.2778
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51


Result for PG_CartPole-v0_2b2c7_00011:
  agent_timesteps_total: 14800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-48-57
  done: false
  episode_len_mean: 55.05
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 55.05
  episode_reward_min: 13.0
  episodes_this_iter: 7
  episodes_total: 382
  experiment_id: 94f0d186d4944527b6a73542d8335aec
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 15.419468879699707
    num_agent_steps_sampled: 14800
    num_steps_sampled: 14800
    num_steps_trained: 14800
  iterations_since_restore: 37
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30618
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02933482685927735
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04219266798125025
    mean_inference_ms: 0.488147773106463
   

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00011,RUNNING,192.168.130.2:30618,0.00244953,relu,[64],38.0,4.96633,15200.0,55.4,200.0,13.0,55.4
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51


Result for PG_CartPole-v0_2b2c7_00011:
  agent_timesteps_total: 29200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-02
  done: false
  episode_len_mean: 118.89
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 118.89
  episode_reward_min: 23.0
  episodes_this_iter: 2
  episodes_total: 521
  experiment_id: 94f0d186d4944527b6a73542d8335aec
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 26.83793830871582
    num_agent_steps_sampled: 29200
    num_steps_sampled: 29200
    num_steps_trained: 29200
  iterations_since_restore: 73
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30618
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029375420879274264
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04215820131976823
    mean_inference_ms: 0.48921261345566913

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00011,RUNNING,192.168.130.2:30618,0.00244953,relu,[64],74.0,9.70145,29600.0,120.42,200.0,23.0,120.42
PG_CartPole-v0_2b2c7_00012,PENDING,,0.00629699,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51


Result for PG_CartPole-v0_2b2c7_00011:
  agent_timesteps_total: 36000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-05
  done: true
  episode_len_mean: 151.51
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.51
  episode_reward_min: 23.0
  episodes_this_iter: 2
  episodes_total: 558
  experiment_id: 94f0d186d4944527b6a73542d8335aec
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 25.68998908996582
    num_agent_steps_sampled: 36000
    num_steps_sampled: 36000
    num_steps_trained: 36000
  iterations_since_restore: 90
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 32.2
    ram_util_percent: 75.7
  pid: 30618
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02950513812333322
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04229031144

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00012,RUNNING,192.168.130.2:30749,0.00629699,linear,[32],1.0,0.150746,400.0,18.1053,51.0,9.0,18.1053
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38


Result for PG_CartPole-v0_2b2c7_00012:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-17
  done: false
  episode_len_mean: 77.26
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 77.26
  episode_reward_min: 15.0
  episodes_this_iter: 2
  episodes_total: 314
  experiment_id: 0e0d1da703834a76ba863807e62388d0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.378263473510742
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30749
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029728455505227548
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04296744957343997
    mean_inference_ms: 0.4866956944429315
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00012,RUNNING,192.168.130.2:30749,0.00629699,linear,[32],38.0,4.95432,15200.0,77.26,200.0,15.0,77.26
PG_CartPole-v0_2b2c7_00013,PENDING,,0.00840001,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38


Result for PG_CartPole-v0_2b2c7_00012:
  agent_timesteps_total: 27200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-21
  done: true
  episode_len_mean: 151.4
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.4
  episode_reward_min: 21.0
  episodes_this_iter: 3
  episodes_total: 388
  experiment_id: 0e0d1da703834a76ba863807e62388d0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 20.92131805419922
    num_agent_steps_sampled: 27200
    num_steps_sampled: 27200
    num_steps_trained: 27200
  iterations_since_restore: 68
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30749
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.02992684652581645
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04326747885785386
    mean_inference_ms: 0.48992606751378376
   

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00013,RUNNING,192.168.130.2:30842,0.00840001,relu,[32],1.0,0.140781,400.0,28.8462,64.0,12.0,28.8462
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37


Result for PG_CartPole-v0_2b2c7_00013:
  agent_timesteps_total: 14400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-33
  done: false
  episode_len_mean: 65.09
  episode_media: {}
  episode_reward_max: 147.0
  episode_reward_mean: 65.09
  episode_reward_min: 23.0
  episodes_this_iter: 4
  episodes_total: 312
  experiment_id: 7eb74436b4fa4d089a38a578546d9fce
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 15.191848754882812
    num_agent_steps_sampled: 14400
    num_steps_sampled: 14400
    num_steps_trained: 14400
  iterations_since_restore: 36
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30842
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030554072464550424
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.044141046929501854
    mean_inference_ms: 0.5123628284159042


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00013,RUNNING,192.168.130.2:30842,0.00840001,relu,[32],37.0,4.97635,14800.0,65.1,147.0,23.0,65.1
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37


Result for PG_CartPole-v0_2b2c7_00013:
  agent_timesteps_total: 28400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-38
  done: false
  episode_len_mean: 116.58
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 116.58
  episode_reward_min: 32.0
  episodes_this_iter: 2
  episodes_total: 441
  experiment_id: 7eb74436b4fa4d089a38a578546d9fce
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 21.33912467956543
    num_agent_steps_sampled: 28400
    num_steps_sampled: 28400
    num_steps_trained: 28400
  iterations_since_restore: 71
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30842
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030002546640546277
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04328529407898849
    mean_inference_ms: 0.5042216687175327


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00013,RUNNING,192.168.130.2:30842,0.00840001,relu,[32],72.0,9.71877,28800.0,119.13,200.0,32.0,119.13
PG_CartPole-v0_2b2c7_00014,PENDING,,0.000126603,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37


Result for PG_CartPole-v0_2b2c7_00013:
  agent_timesteps_total: 34400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-40
  done: true
  episode_len_mean: 150.16
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.16
  episode_reward_min: 36.0
  episodes_this_iter: 2
  episodes_total: 472
  experiment_id: 7eb74436b4fa4d089a38a578546d9fce
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.753934860229492
    num_agent_steps_sampled: 34400
    num_steps_sampled: 34400
    num_steps_trained: 34400
  iterations_since_restore: 86
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30842
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03015812982864284
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04355978463429236
    mean_inference_ms: 0.5066981526377616
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],1.0,0.137002,400.0,23.25,58.0,12.0,23.25
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 14800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-52
  done: false
  episode_len_mean: 25.57
  episode_media: {}
  episode_reward_max: 72.0
  episode_reward_mean: 25.57
  episode_reward_min: 9.0
  episodes_this_iter: 14
  episodes_total: 592
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 9.952630043029785
    num_agent_steps_sampled: 14800
    num_steps_sampled: 14800
    num_steps_trained: 14800
  iterations_since_restore: 37
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030086650735277082
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04402269929660676
    mean_inference_ms: 0.48631728710464317
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],37.0,4.90456,14800.0,25.57,72.0,9.0,25.57
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 28400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-49-57
  done: false
  episode_len_mean: 27.92
  episode_media: {}
  episode_reward_max: 82.0
  episode_reward_mean: 27.92
  episode_reward_min: 10.0
  episodes_this_iter: 12
  episodes_total: 1085
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 11.613635063171387
    num_agent_steps_sampled: 28400
    num_steps_sampled: 28400
    num_steps_trained: 28400
  iterations_since_restore: 71
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03102038622628392
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.045482139174890045
    mean_inference_ms: 0.5006224433510154


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],71.0,9.65171,28400.0,27.92,82.0,10.0,27.92
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 41200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-02
  done: false
  episode_len_mean: 31.12
  episode_media: {}
  episode_reward_max: 87.0
  episode_reward_mean: 31.12
  episode_reward_min: 10.0
  episodes_this_iter: 13
  episodes_total: 1513
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 10.427549362182617
    num_agent_steps_sampled: 41200
    num_steps_sampled: 41200
    num_steps_trained: 41200
  iterations_since_restore: 103
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 43.9
    ram_util_percent: 76.2
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.031865436471826436
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04690302

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],103.0,14.4463,41200.0,31.12,87.0,10.0,31.12
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 55200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-07
  done: false
  episode_len_mean: 35.89
  episode_media: {}
  episode_reward_max: 90.0
  episode_reward_mean: 35.89
  episode_reward_min: 10.0
  episodes_this_iter: 9
  episodes_total: 1920
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 14.715375900268555
    num_agent_steps_sampled: 55200
    num_steps_sampled: 55200
    num_steps_trained: 55200
  iterations_since_restore: 138
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03185599494601009
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04695476291087045
    mean_inference_ms: 0.5137816901390656
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],138.0,19.2449,55200.0,35.89,90.0,10.0,35.89
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 69200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-12
  done: false
  episode_len_mean: 40.88
  episode_media: {}
  episode_reward_max: 137.0
  episode_reward_mean: 40.88
  episode_reward_min: 12.0
  episodes_this_iter: 11
  episodes_total: 2285
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 11.280622482299805
    num_agent_steps_sampled: 69200
    num_steps_sampled: 69200
    num_steps_trained: 69200
  iterations_since_restore: 173
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.031573948350219413
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04655768407421824
    mean_inference_ms: 0.509518652722875

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],173.0,23.9721,69200.0,40.88,137.0,12.0,40.88
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 84000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-17
  done: false
  episode_len_mean: 44.72
  episode_media: {}
  episode_reward_max: 144.0
  episode_reward_mean: 44.72
  episode_reward_min: 10.0
  episodes_this_iter: 9
  episodes_total: 2627
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 13.501670837402344
    num_agent_steps_sampled: 84000
    num_steps_sampled: 84000
    num_steps_trained: 84000
  iterations_since_restore: 210
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03133166861820335
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04596408276612926
    mean_inference_ms: 0.5058228707940026


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],210.0,28.7224,84000.0,44.72,144.0,10.0,44.72
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 98800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-22
  done: false
  episode_len_mean: 48.78
  episode_media: {}
  episode_reward_max: 116.0
  episode_reward_mean: 48.78
  episode_reward_min: 14.0
  episodes_this_iter: 11
  episodes_total: 2954
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 10.44020938873291
    num_agent_steps_sampled: 98800
    num_steps_sampled: 98800
    num_steps_trained: 98800
  iterations_since_restore: 247
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.031024767209045217
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.045421629668489165
    mean_inference_ms: 0.501288949378300

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00014,RUNNING,192.168.130.2:30940,0.000126603,linear,[64],247.0,33.4022,98800.0,48.78,116.0,14.0,48.78
PG_CartPole-v0_2b2c7_00015,PENDING,,0.00116946,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89


Result for PG_CartPole-v0_2b2c7_00014:
  agent_timesteps_total: 100000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-23
  done: true
  episode_len_mean: 50.25
  episode_media: {}
  episode_reward_max: 188.0
  episode_reward_mean: 50.25
  episode_reward_min: 14.0
  episodes_this_iter: 9
  episodes_total: 2976
  experiment_id: 63bde690ccfa4e018a233b86116304b0
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 12.591657638549805
    num_agent_steps_sampled: 100000
    num_steps_sampled: 100000
    num_steps_trained: 100000
  iterations_since_restore: 250
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 30940
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.031012553093265706
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.045397357595717364
    mean_inference_ms: 0.501128735415

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00015,RUNNING,192.168.130.2:31095,0.00116946,relu,[64],1.0,0.148817,400.0,24.6,60.0,11.0,24.6
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08


Result for PG_CartPole-v0_2b2c7_00015:
  agent_timesteps_total: 15200
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-34
  done: false
  episode_len_mean: 41.96
  episode_media: {}
  episode_reward_max: 191.0
  episode_reward_mean: 41.96
  episode_reward_min: 13.0
  episodes_this_iter: 11
  episodes_total: 477
  experiment_id: a15033e08d8548fb8ee210035814dd00
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 10.92720890045166
    num_agent_steps_sampled: 15200
    num_steps_sampled: 15200
    num_steps_trained: 15200
  iterations_since_restore: 38
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31095
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029489432469051727
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04220573379971094
    mean_inference_ms: 0.48665090526872457


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00015,RUNNING,192.168.130.2:31095,0.00116946,relu,[64],38.0,4.92417,15200.0,41.96,191.0,13.0,41.96
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08


Result for PG_CartPole-v0_2b2c7_00015:
  agent_timesteps_total: 29600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-39
  done: false
  episode_len_mean: 70.29
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 70.29
  episode_reward_min: 16.0
  episodes_this_iter: 5
  episodes_total: 727
  experiment_id: a15033e08d8548fb8ee210035814dd00
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 18.72626304626465
    num_agent_steps_sampled: 29600
    num_steps_sampled: 29600
    num_steps_trained: 29600
  iterations_since_restore: 74
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31095
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029708280730689127
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.042445591760517626
    mean_inference_ms: 0.4904019611885375
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00015,RUNNING,192.168.130.2:31095,0.00116946,relu,[64],74.0,9.62231,29600.0,70.29,200.0,16.0,70.29
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08


Result for PG_CartPole-v0_2b2c7_00015:
  agent_timesteps_total: 43600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-44
  done: false
  episode_len_mean: 116.08
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 116.08
  episode_reward_min: 22.0
  episodes_this_iter: 3
  episodes_total: 851
  experiment_id: a15033e08d8548fb8ee210035814dd00
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 24.80535125732422
    num_agent_steps_sampled: 43600
    num_steps_sampled: 43600
    num_steps_trained: 43600
  iterations_since_restore: 109
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31095
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.029974519415374847
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04278024319404021
    mean_inference_ms: 0.4941623571992746

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00015,RUNNING,192.168.130.2:31095,0.00116946,relu,[64],109.0,14.3913,43600.0,116.08,200.0,22.0,116.08
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08


Result for PG_CartPole-v0_2b2c7_00015:
  agent_timesteps_total: 56800
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-49
  done: false
  episode_len_mean: 145.52
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 145.52
  episode_reward_min: 24.0
  episodes_this_iter: 4
  episodes_total: 942
  experiment_id: a15033e08d8548fb8ee210035814dd00
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 18.16251564025879
    num_agent_steps_sampled: 56800
    num_steps_sampled: 56800
    num_steps_trained: 56800
  iterations_since_restore: 142
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 42.0
    ram_util_percent: 75.9
  pid: 31095
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.030601495262692388
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04378038

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00015,RUNNING,192.168.130.2:31095,0.00116946,relu,[64],142.0,19.1325,56800.0,145.52,200.0,24.0,145.52
PG_CartPole-v0_2b2c7_00016,PENDING,,0.00520193,linear,[32],,,,,,,
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08


Result for PG_CartPole-v0_2b2c7_00015:
  agent_timesteps_total: 66400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-50-53
  done: true
  episode_len_mean: 150.34
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.34
  episode_reward_min: 24.0
  episodes_this_iter: 2
  episodes_total: 1004
  experiment_id: a15033e08d8548fb8ee210035814dd00
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.983802795410156
    num_agent_steps_sampled: 66400
    num_steps_sampled: 66400
    num_steps_trained: 66400
  iterations_since_restore: 166
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31095
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.031135395522577575
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.0446357635137148
    mean_inference_ms: 0.5117927245524806

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00016,RUNNING,192.168.130.2:31219,0.00520193,linear,[32],1.0,0.134228,400.0,21.9412,50.0,11.0,21.9412
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08
PG_CartPole-v0_2b2c7_00005,TERMINATED,,0.00122893,relu,[32],235.0,32.9418,94000.0,150.33,200.0,29.0,150.33


Result for PG_CartPole-v0_2b2c7_00016:
  agent_timesteps_total: 14000
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-51-05
  done: false
  episode_len_mean: 73.21
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 73.21
  episode_reward_min: 24.0
  episodes_this_iter: 3
  episodes_total: 274
  experiment_id: 9fe901f5e96d48ac948358a200193a42
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 20.150306701660156
    num_agent_steps_sampled: 14000
    num_steps_sampled: 14000
    num_steps_trained: 14000
  iterations_since_restore: 35
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31219
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.032785683564879264
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.048724315710717836
    mean_inference_ms: 0.529109522842293
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00016,RUNNING,192.168.130.2:31219,0.00520193,linear,[32],35.0,4.93064,14000.0,73.21,200.0,24.0,73.21
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08
PG_CartPole-v0_2b2c7_00005,TERMINATED,,0.00122893,relu,[32],235.0,32.9418,94000.0,150.33,200.0,29.0,150.33


Result for PG_CartPole-v0_2b2c7_00016:
  agent_timesteps_total: 28400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-51-10
  done: false
  episode_len_mean: 138.69
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 138.69
  episode_reward_min: 34.0
  episodes_this_iter: 2
  episodes_total: 377
  experiment_id: 9fe901f5e96d48ac948358a200193a42
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 26.603275299072266
    num_agent_steps_sampled: 28400
    num_steps_sampled: 28400
    num_steps_trained: 28400
  iterations_since_restore: 71
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31219
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03150541852725763
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04625255472155328
    mean_inference_ms: 0.508904590082753
 

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00016,RUNNING,192.168.130.2:31219,0.00520193,linear,[32],71.0,9.68468,28400.0,138.69,200.0,34.0,138.69
PG_CartPole-v0_2b2c7_00017,PENDING,,0.00799952,relu,[32],,,,,,,
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08
PG_CartPole-v0_2b2c7_00005,TERMINATED,,0.00122893,relu,[32],235.0,32.9418,94000.0,150.33,200.0,29.0,150.33


Result for PG_CartPole-v0_2b2c7_00016:
  agent_timesteps_total: 32400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-51-12
  done: true
  episode_len_mean: 151.25
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 151.25
  episode_reward_min: 34.0
  episodes_this_iter: 2
  episodes_total: 399
  experiment_id: 9fe901f5e96d48ac948358a200193a42
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 22.069923400878906
    num_agent_steps_sampled: 32400
    num_steps_sampled: 32400
    num_steps_trained: 32400
  iterations_since_restore: 81
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31219
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03150099491374513
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.046035884499978585
    mean_inference_ms: 0.5071704364137378


Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00017,RUNNING,192.168.130.2:31375,0.00799952,relu,[32],1.0,0.148624,400.0,23.3125,64.0,9.0,23.3125
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08
PG_CartPole-v0_2b2c7_00005,TERMINATED,,0.00122893,relu,[32],235.0,32.9418,94000.0,150.33,200.0,29.0,150.33
PG_CartPole-v0_2b2c7_00006,TERMINATED,,0.00924176,linear,[64],49.0,6.283,19600.0,151.22,200.0,41.0,151.22


Result for PG_CartPole-v0_2b2c7_00017:
  agent_timesteps_total: 14400
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-51-23
  done: false
  episode_len_mean: 73.91
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 73.91
  episode_reward_min: 18.0
  episodes_this_iter: 3
  episodes_total: 284
  experiment_id: 40334f42cc63473a960bb0ad8ba121f5
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 19.081628799438477
    num_agent_steps_sampled: 14400
    num_steps_sampled: 14400
    num_steps_trained: 14400
  iterations_since_restore: 36
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf: {}
  pid: 31375
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.03144545006250001
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.045233969583375035
    mean_inference_ms: 0.517446284752933
  

Trial name,status,loc,lr,model/fcnet_activation,model/fcnet_hiddens,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PG_CartPole-v0_2b2c7_00017,RUNNING,192.168.130.2:31375,0.00799952,relu,[32],36.0,4.89211,14400.0,73.91,200.0,18.0,73.91
PG_CartPole-v0_2b2c7_00018,PENDING,,0.00850983,linear,[64],,,,,,,
PG_CartPole-v0_2b2c7_00019,PENDING,,0.000209516,relu,[64],,,,,,,
PG_CartPole-v0_2b2c7_00000,TERMINATED,,0.00241435,linear,[32],93.0,12.6749,37200.0,151.51,200.0,27.0,151.51
PG_CartPole-v0_2b2c7_00001,TERMINATED,,0.00553195,relu,[32],88.0,12.1834,35200.0,150.38,200.0,55.0,150.38
PG_CartPole-v0_2b2c7_00002,TERMINATED,,0.00477552,linear,[64],56.0,7.91449,22400.0,150.37,200.0,44.0,150.37
PG_CartPole-v0_2b2c7_00003,TERMINATED,,0.00565651,relu,[64],68.0,9.99143,27200.0,151.89,200.0,31.0,151.89
PG_CartPole-v0_2b2c7_00004,TERMINATED,,0.00111003,linear,[32],202.0,27.7631,80800.0,152.08,200.0,22.0,152.08
PG_CartPole-v0_2b2c7_00005,TERMINATED,,0.00122893,relu,[32],235.0,32.9418,94000.0,150.33,200.0,29.0,150.33
PG_CartPole-v0_2b2c7_00006,TERMINATED,,0.00924176,linear,[64],49.0,6.283,19600.0,151.22,200.0,41.0,151.22


Result for PG_CartPole-v0_2b2c7_00017:
  agent_timesteps_total: 27600
  custom_metrics:
    default_policy: {}
  date: 2021-11-13_03-51-28
  done: true
  episode_len_mean: 150.56
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 150.56
  episode_reward_min: 53.0
  episodes_this_iter: 3
  episodes_total: 365
  experiment_id: 40334f42cc63473a960bb0ad8ba121f5
  hostname: eduardo-G7-7588
  info:
    learner:
      default_policy:
        allreduce_latency: 0.0
        policy_loss: 18.816303253173828
    num_agent_steps_sampled: 27600
    num_steps_sampled: 27600
    num_steps_trained: 27600
  iterations_since_restore: 69
  node_ip: 192.168.130.2
  num_healthy_workers: 2
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 30.6
    ram_util_percent: 75.9
  pid: 31375
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.0309108381472501
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.04413659650

In [None]:
print(
  "Melhores hiperparâmetros encontrados:",
  parameter_search_analysis.best_config,
)

Especificando num_samples = 5 significa que você obterá cinco amostras aleatórias para a taxa de aprendizagem. Para cada um deles, existem dois valores para o tamanho da camada oculta e dois valores para a função de ativação. Portanto, haverá 5 * 2 * 2 = 20 tentativas, mostradas com seus status na saída da célula à medida que o cálculo é executado.

Observe que Ray mostra a melhor configuração atual à medida que avança. Isso inclui todos os valores padrão que foram definidos, o que é um bom lugar para encontrar outros parâmetros que podem ser ajustados.


## Exercício

Agora que você conhece a API básica do Ray Tune e da RLLib, **utilize o ambiente `BreakoutNoFrameskip-v4` e treine agentes com os algoritmos A3C, PPO e SAC**. Lembre-se de utilizar também o tensorboard para acompanhar e comparar as curvas de aprendizado de suas execuções.

Descrições dos algoritmos e seus respectivos hiperparâmetros podem ser encontrados [aqui](https://docs.ray.io/en/latest/rllib-algorithms.html#available-algorithms-overview).

In [39]:
#reimport libs
import ray
import os

environment_id = 'BreakoutNoFrameskip-v4'

# To explicitly stop or restart Ray, use the shutdown API.
ray.shutdown()

ray.init(
  include_dashboard=False,
  ignore_reinit_error=True,
  log_to_driver=False,
)

{'node_ip_address': '192.168.130.2',
 'raylet_ip_address': '192.168.130.2',
 'redis_address': '192.168.130.2:6379',
 'object_store_address': '/tmp/ray/session_2021-11-13_00-45-44_824627_22643/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-11-13_00-45-44_824627_22643/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2021-11-13_00-45-44_824627_22643',
 'metrics_export_port': 43323,
 'node_id': 'd8b5c6d4e72a8ccbfaff032747917213ae1b88dbc814c70d613328dd'}

In [74]:
def query_environment(name):
    
  env = gym.make(name)
  spec = gym.spec(name)

  print(f"Action Space: {env.action_space}")
  print(f"Observation Space: {env.observation_space}")
  print(f"Max Episode Steps: {spec.max_episode_steps}")
  print(f"Nondeterministic: {spec.nondeterministic}")
  print(f"Reward Range: {env.reward_range}")
  print(f"Reward Threshold: {spec.reward_threshold}")

In [34]:
query_environment(environment_id)

Action Space: Discrete(4)
Observation Space: Box([[[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 ...

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]], [[[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 ...

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255

In [None]:
a3c_analysis = ray.tune.run(
    "A3C",
    config={
        # RL setup
        "env": environment_id,
        # system settings
        "num_gpus": 1,
        "num_workers": 2,
        "num_envs_per_worker": NUM_ENVS_PER_WORKER,
        "log_level": "INFO",
        "framework": "torch",    
    },
    stop={
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 1200, # 20min
    },
    checkpoint_freq=50,
    checkpoint_at_end=True,
    local_dir=os.path.join("../minicurso_rl/lab03/results", "breakout")
)

In [None]:
trial = a3c_analysis.get_best_trial("episode_reward_mean", "max")
checkpoint = a3c_analysis.get_best_checkpoint(
  trial,
  "episode_reward_mean",
  "max",
)
print('A3C Results (20min):')
print('episode_reward_max', trial.last_result['episode_reward_max'])
print('episode_reward_min', trial.last_result['episode_reward_min'])
print('episode_reward_mean', trial.last_result['episode_reward_mean'])
print('episode_len_mean', trial.last_result['episode_len_mean'])
print('checkpoint', checkpoint)

In [None]:
ppo_analysis = ray.tune.run(
    "PPO",
    config={
        # RL setup
        "env": environment_id,
        # system settings
        "num_gpus": 1,
        "num_workers": 2,
        "num_envs_per_worker": 5,
        "log_level": "INFO",
        "framework": "torch",
        
        # Hiperparametros obtidos de:
        # https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/ppo/atari-ppo.yaml
        "lambda": 0.95,
        "kl_coeff": 0.5,
        "clip_rewards": True,
        "clip_param": 0.1,
        "vf_clip_param": 10.0,
        "entropy_coeff": 0.01,
        "train_batch_size": 5000,
        "rollout_fragment_length": 100,
        "sgd_minibatch_size": 500,
        "num_sgd_iter": 10,
        "batch_mode": "truncate_episodes",
        "observation_filter": "NoFilter",
        "model": { "vf_share_layers": True }
    },
    stop={
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 1200, # 20min
    },
    checkpoint_freq=100,
    checkpoint_at_end=True,
    local_dir=os.path.join("../minicurso_rl/lab03", "results/breakout")
)

In [None]:
trial = ppo_analysis.get_best_trial("episode_reward_mean", "max")
checkpoint = ppo_analysis.get_best_checkpoint(
  trial,
  "episode_reward_mean",
  "max",
)
print('PPO Results (20min):')
print('episode_reward_max', trial.last_result['episode_reward_max'])
print('episode_reward_min', trial.last_result['episode_reward_min'])
print('episode_reward_mean', trial.last_result['episode_reward_mean'])
print('episode_len_mean', trial.last_result['episode_len_mean'])
print('checkpoint', checkpoint)

In [None]:
sac_analysis = ray.tune.run(
    "SAC",
    config={
        # RL setup
        "env": environment_id,
        # system settings
        "num_gpus": 1,
        "num_workers": 0,
        "log_level": "INFO",
        "framework": "torch",
        
        #limitar replay buffer size (padrão estorou memoria)
        "buffer_size": int(1e5),
        
        # Artigo usa 20k timesteps para começar o treinamento,
        # O tunning do ray usa 100k
        # mas por limites de memoria e o buffer ser pequeno vamos com 5k
        "learning_starts": 5000,
        
        # Hiperparametros obtidos de: (modificado) (outros env's, não otimizado para o BreakoutNoFrameskip-v4)
        #https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/atari-sac.yaml
        "gamma": 0.99,
        "Q_model": {
                "fcnet_activation": "relu",
                "fcnet_hiddens": [512, 512]
            },
        "policy_model" :{
                "fcnet_activation": "relu",
                "fcnet_hiddens": [512, 512]
            },
        # Do hard syncs.
        # Soft-syncs seem to work less reliably for discrete action spaces.
        "tau": 1.0,
        "target_network_update_freq": 8000,
        "target_entropy": "auto",
        "clip_rewards": 1.0,
        "n_step": 1,
        "rollout_fragment_length": 1,
        "prioritized_replay": True,
        "train_batch_size": 64,
        "timesteps_per_iteration": 4,
        "optimization": {
                "actor_learning_rate": 0.0003,
                "critic_learning_rate": 0.0003,
                "entropy_learning_rate": 0.0003,
            },
        "metrics_smoothing_episodes": 5
    },
    stop={
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 1200, # 20min
    },
    checkpoint_freq=50,
    checkpoint_at_end=True,
    local_dir=os.path.join("../minicurso_rl/lab03", "results/breakout")
)

In [79]:
trial = sac_analysis.get_best_trial("episode_reward_mean", "max")
checkpoint = sac_analysis.get_best_checkpoint(
  trial,
  "episode_reward_mean",
  "max",
)
print('SAC Results (20min):')
print('episode_reward_max', trial.last_result['episode_reward_max'])
print('episode_reward_min', trial.last_result['episode_reward_min'])
print('episode_reward_mean', trial.last_result['episode_reward_mean'])
print('episode_len_mean', trial.last_result['episode_len_mean'])
print('checkpoint', checkpoint)

SAC Results:
episode_reward_max 9.0
episode_reward_min 0.0
episode_reward_mean 3.6
episode_len_mean 974.0
checkpoint /home/eduardo/ceia/curso-rl-ceia-2021/labs/minicurso_rl/lab03/results/breakout/SAC/SAC_BreakoutNoFrameskip-v4_dd6e1_00000_0_2021-11-13_02-45-10/checkpoint_000750/checkpoint-750


In [87]:
%tensorboard --logdir ../minicurso_rl/lab03/results/breakout

Reusing TensorBoard on port 6006 (pid 10664), started 2:41:32 ago. (Use '!kill 10664' to kill it.)

# Bônus

Como tarefa bônus, experimente com os algoritmos aprendidos no ambiente `soccer_twos`, que será utilizado na competição final deste curso*. Para facilitar, utilize a variação `team_vs_policy` como no laboratório anterior.

<img src="https://raw.githubusercontent.com/bryanoliveira/soccer-twos-env/master/images/screenshot.png" height="400">

> Visualização do ambiente

Este ambiente consiste em um jogo de futebol de carros 2x2, ou seja, o objetivo é marcar um gol no adversário o mais rápido possível. Na variação `team_vs_policy`, seu agente controla um jogador do time azul e joga contra um time aleatório. Mais informações sobre o ambiente podem ser encontradas [no repositório](https://github.com/bryanoliveira/soccer-twos-env) e [na documentação do Unity ml-agents](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#soccer-twos).


**Sua tarefa é treinar um agente com a interface do Ray apresentada, experimentando com diferentes algoritmos e hiperparâmetros.**


<br>

*A variação utilizada na competição será a `multiagent_player`, mas agentes treinados para `team_vs_policy` podem ser facilmente adaptados. Na seção "Exportando seu agente treinado" o agente "MyDqnSoccerAgent" faz exatamente isso.

Utilize o ambiente instanciado abaixo para executar o algoritmo de treinamento. Ao final da execução, a recompensa do seu agente por episódio deve tender a +2.

In [None]:
import soccer_twos

# Fecha o ambiente caso tenha sido aberto anteriormente
try: env.close()
except: pass

env = soccer_twos.make(
    variation=soccer_twos.EnvType.team_vs_policy,
    flatten_branched=True, # converte o action_space de MultiDiscrete para Discrete
    single_player=True, # controla um dos jogadores enquanto os outros ficam parados
    opponent_policy=lambda *_: 0,  # faz os oponentes ficarem parados
)

# Obtem tamanhos de estado e ação
state_size = env.observation_space.shape[0]
action_size = env.action_space.n

print("Tamanho do estado: {}, tamanho da ação: {}".format(state_size, action_size))
env.close()

In [None]:
from ray import tune

def create_rllib_env(env_config: dict = {}):
    # suporte a múltiplas instâncias do ambiente na mesma máquina
    if hasattr(env_config, "worker_index"):
        env_config["worker_id"] = (
            env_config.worker_index * env_config.get("num_envs_per_worker", 1)
            + env_config.vector_index
        )
    return soccer_twos.make(**env_config)

# registra ambiente no Ray
tune.registry.register_env("Soccer", create_rllib_env)

Utilize a configuração abaixo como ponto de partida para seus testes. 

A parte mais imporante é a chave `env_config`, que configura o ambiente para ser compatível com o agente disponibilizado para exportação do seu agente. Neste ponto do curso você já deve conseguir testar as outras variações do ambiente e utilizar as APIs do Ray para treinar um agente próximo (ou melhor) do que o [ceia_baseline_agent](https://drive.google.com/file/d/1WEjr48D7QG9uVy1tf4GJAZTpimHtINzE/view). Exemplos de como utilizar as outras variações podem ser encontrados [aqui](https://github.com/dlb-rl/rl-tournament-starter/). Ao utilizar essas variações, você deve utilizar também outras definições de agente para lidar com os diferentes espaços de observação e ação (que também estão presentes nos exemplos).

In [None]:
NUM_ENVS_PER_WORKER = 2

analysis = tune.run(
    "PPO",
    config={
        # system settings
        "num_gpus": 1,
        "num_workers": 1,
        "num_envs_per_worker": NUM_ENVS_PER_WORKER,
        "log_level": "INFO",
        "framework": "torch",
        # RL setup
        "env": "Soccer",
        "env_config": {
            "num_envs_per_worker": NUM_ENVS_PER_WORKER,
            "variation": soccer_twos.EnvType.team_vs_policy,
            "single_player": True,
            "flatten_branched": True,
            "opponent_policy": lambda *_: 0,
        },
    },
    stop={
        # 10000000 (10M) de steps podem ser necessários para aprender uma política útil
        "timesteps_total": 10000000,
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 14400, # 4h
    },
    checkpoint_freq=100,
    checkpoint_at_end=True,
    local_dir=os.path.join(DRIVE_PATH, "results")
)

## Exportando seu agente treinado

Assim como no Lab 02, você pode exportar seu agente treinado para ser executado como competidor no ambiente da competição ou simplesmente assistí-lo. Para isso, devemos definir uma classe de agente que implemente a interface e trate as observações/ações para o formato da competição. Abaixo, configuramos qual experimento/checkpoint exportar e guardamos a implementação em uma variável para salvá-la em um arquivo posteriormente.

In [None]:
ALGORITHM = "PPO"
TRIAL = analysis.get_best_logdir("episode_reward_mean", "max")
CHECKPOINT = analysis.get_best_checkpoint(
  TRIAL,
  "training_iteration",
  "max",
)
TRIAL, CHECKPOINT

In [None]:
agent_file = f"""
import pickle
import os

import gym
from gym_unity.envs import ActionFlattener
import ray
from ray import tune
from ray.tune.registry import get_trainable_cls

from soccer_twos import AgentInterface, DummyEnv


ALGORITHM = "{ALGORITHM}"
CHECKPOINT_PATH = os.path.join(
    os.path.dirname(os.path.abspath(__file__)), 
    "{CHECKPOINT.split("lab03/")[1]}"
)


class MyRaySoccerAgent(AgentInterface):
    def __init__(self, env: gym.Env):
        super().__init__()
        ray.init(ignore_reinit_error=True)

        self.flattener = ActionFlattener(env.action_space.nvec)

        # Load configuration from checkpoint file.
        config_path = ""
        if CHECKPOINT_PATH:
            config_dir = os.path.dirname(CHECKPOINT_PATH)
            config_path = os.path.join(config_dir, "params.pkl")
            # Try parent directory.
            if not os.path.exists(config_path):
                config_path = os.path.join(config_dir, "../params.pkl")

        # Load the config from pickled.
        if os.path.exists(config_path):
            with open(config_path, "rb") as f:
                config = pickle.load(f)
        else:
            # If no config in given checkpoint -> Error.
            raise ValueError(
                "Could not find params.pkl in either the checkpoint dir or "
                "its parent directory!"
            )

        # no need for parallelism on evaluation
        config["num_workers"] = 0
        config["num_gpus"] = 0

        # create a dummy env since it's required but we only care about the policy
        obs_space = env.observation_space
        act_space = self.flattener.action_space
        tune.registry.register_env(
            "DummyEnv",
            lambda *_: DummyEnv(obs_space, act_space),
        )
        config["env"] = "DummyEnv"

        # create the Trainer from config
        cls = get_trainable_cls(ALGORITHM)
        agent = cls(env=config["env"], config=config)
        # load state from checkpoint
        agent.restore(CHECKPOINT_PATH)
        # get default policy for evaluation
        self.policy = agent.get_policy()

    def act(self, observation):
        actions = {{}}
        for player_id in observation:
            # compute_single_action returns a tuple of (action, action_info, ...)
            # as we only need the action, we discard the other elements
            actions[player_id] = self.flattener.lookup_action(
                self.policy.compute_single_action(observation[player_id])[0]
            )
        return actions
"""

In [None]:
import os
import shutil

agent_name = "my_ray_soccer_agent"
agent_path = os.path.join(DRIVE_PATH, agent_name, agent_name)
shutil.rmtree(agent_path)
os.makedirs(agent_path)

# salva a classe do agente
with open(os.path.join(agent_path, "agent.py"), "w") as f:
    f.write(agent_file)

# salva um __init__ para criar o módulo Python
with open(os.path.join(agent_path, "__init__.py"), "w") as f:
    f.write("from .agent import MyRaySoccerAgent")

# copia o trial inteiro, incluindo os arquivos de configuração do experimento
shutil.copytree(TRIAL, os.path.join(agent_path, TRIAL.split("lab03/")[1]))

# empacota tudo num arquivo .zip
shutil.make_archive(os.path.join(DRIVE_PATH, agent_name), "zip", os.path.join(DRIVE_PATH, agent_name))

Após empacotar todos os arquivos necessários para a execução do seu agente, será criado um arquivo `minicurso_rl/lab03/my_ray_soccer_agent.zip` nos arquivos do Colab e na pasta correspondente no Google Drive. Baixe o arquivo e extraia-o para alguma pasta no seu computador. 

Assumindo que o ambiente Python já está configurado (e.g. os pacotes no [requirements.txt](https://github.com/dlb-rl/rl-tournament-starter/blob/main/requirements.txt) estão instalados), rode `python -m soccer_twos.watch -m my_ray_soccer_agent` para assistir seu agente jogando contra si mesmo. 

Você também pode testar dois agentes diferentes jogando um contra o outro. Utilize o seguinte comando: `python -m soccer_twos.watch -m1 my_ray_soccer_agent -m2 ceia_baseline_agent`. Você pode baixar o agente *ceia_baseline_agent* [aqui](https://drive.google.com/file/d/1WEjr48D7QG9uVy1tf4GJAZTpimHtINzE/view).