Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] _get_torch_exploration_action doesn't support tuple action dist #10228

Closed
1 of 2 tasks
ThomasLecat opened this issue Aug 20, 2020 · 2 comments · Fixed by #10443
Closed
1 of 2 tasks

[rllib] _get_torch_exploration_action doesn't support tuple action dist #10228

ThomasLecat opened this issue Aug 20, 2020 · 2 comments · Fixed by #10443
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical

Comments

@ThomasLecat
Copy link
Contributor

ThomasLecat commented Aug 20, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.15.4
  • Ray installed from (source or binary): binary (via pip)
  • Ray version: 0.8.6., but nothing seems to have changed on master
  • Python version: 3.7

What is the problem?

When using tuple action distributions (as advised in #6372) and exploration is disabled, the line:

logp = torch.zeros((action.size()[0], ), dtype=torch.float32)

from _get_torch_exploration_action raises the following exception:

AttributeError: 'tuple' object has no attribute 'size'

A simple fix that supports any type of distribution would be:

logp = torch.zeros_like(action_dist.sampled_action_logp())

I can submit a PR if it helps.

Reproduction (REQUIRED)

Exact command to reproduce: python rllib_cartpole.py for the following file

import gym.envs.classic_control
from gym.spaces import Tuple, Discrete

import ray
from ray import tune


class CustomCartpole(gym.envs.classic_control.CartPoleEnv):
    """Add a dimension to the cartpole action space that is ignored."""

    def __init__(self, env_config):
        super().__init__()
        # if override_actions is false this is just the Cartpole environment
        self.override_actions = env_config['override_actions']
        if self.override_actions:
            # 2 is the environment's normal action space
            # 4 is just a dummy number to give it an extra dimension
            self.original_action_space = self.action_space
            self.action_space = Tuple([Discrete(2), Discrete(4)])
            self.tuple_action_space = self.action_space

    def step(self, action):
        # call the cartpole environment with the original action
        if self.override_actions:
            self.action_space = self.original_action_space
            return super().step(action[0])
        else:
            return super().step(action)


def main():
    ray.init()
    tune.run(
        "PPO",
        stop={"episode_reward_mean": 50},
        config={
            "env": CustomCartpole,
            "env_config": {'override_actions': True},
            "num_gpus": 0,
            "num_workers": 1,
            "eager": False,
            "evaluation_interval": 1,
            "evaluation_config": {
                "explore": False,
            },
            "framework": "torch",
        },
    )


if __name__ == '__main__':
    main()
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@ThomasLecat ThomasLecat added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 20, 2020
@ericl ericl added rllib P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 20, 2020
@ericl
Copy link
Contributor

ericl commented Aug 20, 2020

The proposed fix makes sense to me. We could alternatively try to get the batch dimension of the tuple, but I don't see an existing helper method for that, so your proposal is probably simpler.

And yeah, a PR would be great!

@ThomasLecat
Copy link
Contributor Author

Thanks for your answer!
Just got back from holidays, I opened a PR #10443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants