Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Action Space showing floats when dtype=int #3107

Closed
windowshopr opened this issue Oct 1, 2022 · 6 comments
Closed

[Question] Action Space showing floats when dtype=int #3107

windowshopr opened this issue Oct 1, 2022 · 6 comments

Comments

@windowshopr
Copy link

windowshopr commented Oct 1, 2022

Question

This isn't NOT a bug, but I also didn't want to submit it as a bug in case it's really not, but when I set an action space as a Box with dtype=int or dtype=np.int32, when the environment is reset for the first time, the actions are showing as floats? Any idea why?

I've provided a minimal reproducible example below:

from random import randint
from numpy import inf, float32, array, int32, int64
import gym
from stable_baselines3 import A2C, DQN, PPO

"""Class of environment"""
class Custom_Environment(gym.Env):

    metadata = {'render.modes': ['human', 'text']}

    """Initialize the environment"""
    def __init__(self):
        super(Custom_Environment, self).__init__()
        # Spaces
        self.action_space = gym.spaces.Box(low=0, high=1000, shape=(37,), dtype=int)
        self.observation_space = gym.spaces.Box(low=0, high=1000, shape=(37,), dtype=int)


    """Reset the Environment"""
    def reset(self):
        self.done = False
        self.current_state = self.observation_space.sample()
        return self.current_state


    """Step Through the Environment"""
    def step(self, action):
        # Inspect the action space, see not integers
        for i in range(len(action)):
            print(action[i])
        # Throw an error here to stop the code
        stop()
        step_reward = 0
        return self.current_state, step_reward, self.done, {} # step_reward

env = Custom_Environment()

model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

OUTPUT

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
0.0
0.0
0.0
0.6412628293037415
1.6904927492141724
1.3439135551452637
0.0
1.1811418533325195
0.0
1.066895842552185
0.0
0.0
0.0
0.6355175375938416
0.0
0.0
0.1874774545431137
0.4175698757171631
0.0
0.8219074010848999
0.0
0.0
0.0
0.0
0.0
0.0
0.10093195736408234
0.7056602835655212
0.0
0.2542327344417572
0.17636002600193024
0.0
1.3428149223327637
0.25007113814353943
0.0
0.0
0.6961379051208496

The code will stop at my stop() function call which doesn't exist, but you can see the floats printed out in the output. Any ideas?

@windowshopr
Copy link
Author

Someone here has asked the same question, but there's no answer

@balisujohn
Copy link
Contributor

balisujohn commented Oct 2, 2022

This looks like a bug to me, I'll take a look.

edit: this is a bug in stable-baselines3, since the action is produced by a sb3 policy, the sb3 policy is ignoring the fact that the action space is a box with type int. So to look into this, we would need to know the versions of gym and stable-baselines3 you're using.

@windowshopr
Copy link
Author

gym==0.21.0
stable_baselines3==1.6.1

It's worth noting that I am running this in a Google Colab notebook...

@windowshopr
Copy link
Author

I also just tried upgrading gym in the notebook to 0.26.1, same issue.

@balisujohn
Copy link
Contributor

balisujohn commented Oct 2, 2022

So a box action space with int values is equivalent in some sense to a multi-discrete action space, which stable-baselines3 does currently support. I think stable-baselines3 doesn't support integer values box action spaces from looking at their code, so if you really need int valued box support and a multi-discrete space wont work, then I'd recommend making a feature request on stable-baselines3.

To be clear, I think most likely you will be able to get the desired behavior with a MultiDiscrete action space.

@windowshopr
Copy link
Author

This did fix the issue, changing the action space to:

self.action_space = gym.spaces.MultiDiscrete([1000 for _ in range(37)], dtype=int)

...worked in turning them into int's so I can go ahead with that, but maybe the Box int thing could be figured out in the future. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants