Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Compute actions with AlphaZero algorithm #13177

Closed
DoxakisCh opened this issue Jan 4, 2021 · 7 comments
Closed

[rllib] Compute actions with AlphaZero algorithm #13177

DoxakisCh opened this issue Jan 4, 2021 · 7 comments
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues windows

Comments

@DoxakisCh
Copy link

Hi,

After the training of an AlphaZero trainer in a environment i have tried to load it and evaluate it but when i use the compute_action command in order to compute the action based on the current observation i get the following error:

Traceback (most recent call last):
File "C:/#######################/rllib/AlphaZero_Trainer.py", line 99, in
action = alphazero_trainer.compute_action(observation=obs)
File "C:#######################\ray\rllib\agents\trainer.py", line 830, in compute_action
timestep=self.global_vars["timestep"])
File "C:#######################\ray\rllib\policy\policy.py", line 194, in compute_single_action
timestep=timestep)
File "C:#######################\ray\rllib\contrib\alpha_zero\core\alpha_zero_policy.py", line 50, in compute_actions
for i, episode in enumerate(episodes):
TypeError: 'NoneType' object is not iterable

I used the same command for ppo, IMPALA and a2c trainers and it worked fine. Am I missing anything?

Thanks in advance!

@DoxakisCh DoxakisCh added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 4, 2021
@richardliaw
Copy link
Contributor

cc @sven1977

@richardliaw richardliaw added P2 Important issue, but not time-critical windows and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 14, 2021
@lairning
Copy link

Same issue

Code Snippet

import argparse

import ray
from ray.tune.registry import register_env
from ray.rllib.contrib.alpha_zero.models.custom_torch_models import DenseModel
from ray.rllib.contrib.alpha_zero.environments.cartpole import CartPole
from ray.rllib.contrib.alpha_zero.core.alpha_zero_trainer import AlphaZeroTrainer
from ray.rllib.models.catalog import ModelCatalog

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--training-iteration", default=2, type=int)
    args = parser.parse_args()
    ray.init()

    ModelCatalog.register_custom_model("dense_model", DenseModel)
    register_env("CartPoleEnv", lambda _: CartPole())

    config = {
        "num_workers"       : 0,
        "rollout_fragment_length": 50,
        "train_batch_size"  : 500,
        "sgd_minibatch_size": 64,
        "lr"                : 1e-4,
        "num_sgd_iter"      : 1,
        "mcts_config"       : {
            "puct_coefficient"   : 1.5,
            "num_simulations"    : 100,
            "temperature"        : 1.0,
            "dirichlet_epsilon"  : 0.20,
            "dirichlet_noise"    : 0.03,
            "argmax_tree_policy" : False,
            "add_dirichlet_noise": True,
        },
        "ranked_rewards"    : {
            "enable": True,
        },
        "model"             : {
            "custom_model": "dense_model",
        },
    }

    agent = AlphaZeroTrainer(config=config, env="CartPoleEnv")

    for _ in range(args.training_iteration):
        agent.train()

    env = CartPole()
    episode_reward = 0
    done = False
    obs = env.reset
    while not done:
        print(obs)
        action = agent.compute_action(obs)
        obs, episode_reward, done, info = env.step(action)

    print(episode_reward)

    ray.shutdown()
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Error Message

Traceback (most recent call last):
  File "alpha0_err1.py", line 58, in <module>
    action = agent.compute_action(obs)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 819, in compute_action
    policy_id].transform(observation)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 240, in transform
    self.write(observation, array, 0)
  File "/home/md/miniconda3/envs/simpy/lib/python3.7/site-packages/ray/rllib/models/preprocessors.py", line 247, in write
    observation = OrderedDict(sorted(observation.items()))
AttributeError: 'function' object has no attribute 'items'

@KiborgBBK
Copy link

Good day.
I would like to know if there is still an answer on this topic? I'm also suffering with this (action = agent.compute_action (obs)), then I get exactly the same error then (ValueError: ('Observation ({}) outside given space ({})!')
thanks in advance for the answer

@andras-kth
Copy link

@lairning
Have you tried changing obs = env.reset to obs = env.reset()?

@lairning
Copy link

@mehes-kth
Thank you for spotting the bug!

@andras-kth
Copy link

andras-kth commented Sep 19, 2021

Hi,

After the training of an AlphaZero trainer in a environment i have tried to load it and evaluate it but when i use the compute_action command in order to compute the action based on the current observation i get the following error:

Traceback (most recent call last):
File "C:/#######################/rllib/AlphaZero_Trainer.py", line 99, in
action = alphazero_trainer.compute_action(observation=obs)
File "C:#######################\ray\rllib\agents\trainer.py", line 830, in compute_action
timestep=self.global_vars["timestep"])
File "C:#######################\ray\rllib\policy\policy.py", line 194, in compute_single_action
timestep=timestep)
File "C:#######################\ray\rllib\contrib\alpha_zero\core\alpha_zero_policy.py", line 50, in compute_actions
for i, episode in enumerate(episodes):
TypeError: 'NoneType' object is not iterable

I used the same command for ppo, IMPALA and a2c trainers and it worked fine. Am I missing anything?

Thanks in advance!

I think I found the problem. The AlphaZeroTrainer has a callback that overrides on_episode_start,
which is likely not called from the code that fails. BTW, I've been trying to use rllib.rollout.rollout,
and that fails the exact same way, since that doesn't call new_episode (which in turn should invoke
the callback), either... That's on 1.6.0. I see that on the master branch rollout is updated to use
.evaluate; but upon closer inspection that's largely a simple name-change, no improvement in code,
meaning that it will also fail the same way.

What does work (mostly) is to use the much more "correct" and flexible evaluation option in tune.run.
Doing so, does take a bit of fiddling with the configuration to effectively disable training (and currently
that only works with the "deprecated" simple_optimizer), but here's what I cooked up, in case, anyone
finds it useful:

from ray import rllib, tune
from ray.tune.utils.trainable import TrainableUtil

agent_type = 'contrib/AlphaZero'
checkpoint_dir = ...
config = ...

# evaluation ONLY: avoid MultiGPU optimizer, set all relevant sizes to 0
config.update(
    simple_optimizer=True,
    num_workers=0,
    train_batch_size=0,
    rollout_fragment_length=0,
    timesteps_per_iteration=0,
    evaluation_interval=1,
    # evaluation_num_workers=...,
    # evaluation_config=dict(explore=False),
    # evaluation_num_episodes=...,
)

agent = rllib.agents.registry.get_trainer_class(agent_type)(config=config)
# may need adjustment depending on checkpoint frequency
checkpoint_path = TrainableUtil.get_checkpoints_paths(checkpoint_dir).chkpt_path[0]
agent.restore(checkpoint_path)

results = tune.run(
    agent,
    config=config,
    ...
)

I think tune.run should provide a simpler way, say a train=False boolean, to disable training.
That should also work for the MultiGPU version, which currently breaks with train_batch_size=0.

@andras-kth
Copy link

See #14477 (specifically, #14477 (comment)) for another approach...

@richardliaw richardliaw added the rllib RLlib related issues label Oct 5, 2021
@avnishn avnishn closed this as completed Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical rllib RLlib related issues windows
Projects
None yet
Development

No branches or pull requests

6 participants