Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add a flag to MultiBinary to allow one-hot encoding #508

Closed
1 task done
fracapuano opened this issue May 17, 2023 · 14 comments
Closed
1 task done

[Proposal] Add a flag to MultiBinary to allow one-hot encoding #508

fracapuano opened this issue May 17, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@fracapuano
Copy link

Proposal

Currently, sampling from spaces.MultiBinary produces an array in which more than one element is 1. When dealing with flatten arrays (that is, arrays of shape (n,)) this might not be problematic.

However, for matrices and tensors it is often the case that one uses MultiBinary to produce a one-hot encoding like representation of either observations or actions. I am proposing we implement this by adding a flag onehot:bool=False to MultiBinary __init__(...) so that when sampling one has that only one bit per row is set to 1.

Motivation

No space allows a representation of one-hot encoded like observations/actions.

Pitch

Bring one-hot encoding to gymnasium.spaces.

Alternatives

I have considered implementing my own custom space but thought this might be interesting for the community too.

Additional context

n, m = 6, 5
observation_space = spaces.MultiBinary([n,m])
action_space = spaces.MultiBinary([n,m])

print(observation_space.sample())
# [
# [1 0 1 1 1]
# [0 1 0 1 0]
# [1 1 0 1 0]
# [1 1 1 0 0]
# [1 0 0 0 0]
# [0 1 0 1 0]
# ]

print(action_space.sample())
# [
# [0 1 0 1 1]
# [1 1 1 0 0]
# [0 0 1 0 0]
# [1 0 1 1 0]
# [0 1 1 1 0]
# [1 1 1 0 1]
# ]

Which clearly is non-desirable when dealing with one-hot encodings as, at an action level, this sample represents choosing 3 options for the choice represented in the first row, for instance.

Checklist

  • I have checked that there is no similar issue in the repo
@fracapuano fracapuano added the enhancement New feature or request label May 17, 2023
@fracapuano fracapuano changed the title [Proposal] Add a flag to MultiBinary so that one bit per row is 1 only [Proposal] Add a flag to MultiBinary to allow one-hot encoding May 17, 2023
@pseudo-rnd-thoughts
Copy link
Member

Hey, I think this is a good idea.
Would you be interested in adding the argument to MultiBinary?

@fracapuano
Copy link
Author

Hey @pseudo-rnd-thoughts , absolutely. How do we proceed from here?

@RedTachyon
Copy link
Member

I'm not sure about this tbh. If we introduce this, we'd also have to adjust the contains method.

The code in both sample and contains would probably have a structure like

def sample(self):
    if self.onehot:
        ...
    else:
        ...

at which point the functionality should probably just be split into two classes.

So then we can consider whether we want to create a new OneHot space. But unless I'm missing something, this conveys the same exact information as a Discrete (or MultiDiscrete if we want a different shape), just represented differently, so it feels redundant.

@fracapuano
Copy link
Author

I am not sure I completely agree that this would convey the same information of MultiDiscrete...
MultiDiscrete uses integer encoding to represent different classes whereas this directly point to one-hot encoding, which just feel way more natural when dealing with lots of problems.

I agree you can always go from one to the other, it is just a matter of how common one practice is with respect to others.

The contains method can be adjusted with ease. As a matter of fact, I have just implemented my own custom space starting from MultiBinary and modifying it. I could post is here maybe, you'll see is nothing too complex.
I agree with the self.onehot switch. I was thinking I could implement the sample and contains methods with:

def sample(self): 
    def sample_onehot(self): 
         # code for sampling with onehot
         return sample
     def sample_simple(self): 
         # old code
         return sample
     if self.onehot: 
         return sample_onehot()
     else: 
         return sample_simple()

The same structure would be used for contains. Maybe I could post here my OneHotSpace once I am done testing it out.

@RedTachyon
Copy link
Member

I am not sure I completely agree that this would convey the same information of MultiDiscrete

I agree you can always go from one to the other, it is just a matter of how common one practice is with respect to others.

These two statements are in contradiction.

To be clear with what I mean, you have a direct correspondence between Discrete(n) and Onehot(n). 3 is the exact same as [0, 0, 0, 1], it's just differently presented. If we add a OneHot space, we do not enable using any new spaces because anything you'd like to express as a OneHot, you can already express as a Discrete.


I was thinking I could implement the sample and contains methods with:

To clarify my other point about the if X: ... else: ... structure, the point was that it's generally just bad practice. Here's one SO thread, you can find a bunch more sources explaining it. The code with two functions defined inside of sample is even more hacky, and we're trying to stay pretty strict on good code quality unless there are really good reasons to compromise on it.


Overall my argument against adding this goes roughly as follows:

  1. If we were to implement this as part of MultiBinary, we'd introduce a code smell via the boolean argument
  2. We don't want to introduce bad new code
  3. If we were to add this feature in some way, a far preferable approach would be just splitting it in two separate classes, i.e. MultiBinary stays as it is, and OneHot is created
  4. If OneHot exists, it exactly duplicates Discrete or MultiDiscrete, so adding it is redundant.

I might be open to a different approach for adding this functionality, but I don't know how important it even is for Gymnasium to handle. Converting between integers and one-hot arrays is fairly trivial. If you want to use one-hot in your code, you can call space.contains(onehot.argmax()) and it's done. Going the other way could be something like np.eye(space.n)[space.sample()]

It miiiiiiiight make sense to have back-and-forth conversion as a method on Discrete, so you'd have e.g.

space = Discrete(5)
space.to_onehot(space.sample())  # [0, 1, 0, 0, 0]
space.from_onehot([0, 1, 0, 0, 0])  # 2

but I'm not sure how useful it really is since DL/RL libraries usually have their own utilities to do that (and it's a one-liner anyways, unless we want to optimize for high dimensionality)

@fracapuano
Copy link
Author

Thanks for the very precise answer. I clearly am a novice to these topics, hence my appreciation for such a detailed answer.

I agree on most of the things you said. I am still convinced that, for good design purposes in custom environments, having a OneHot space already available would make things easier but this only is a personal opinion. By the way, I already implemented such space (mainly by modifying here-and-there MultiBinary), maybe you can have a look if you thing that might interesting.

Ultimately, I am convinced that having a OneHot already available would make custom-env development easier, but I am more than happy to accept your feedback on this

@fracapuano
Copy link
Author

fracapuano commented May 17, 2023


@fracapuano
Copy link
Author

Very similar to MultiBinary ;)

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented May 17, 2023

This conversion reminds me of #102 which pointed out that flatten(Discrete) -> Box means that samples of Box are not one-hotted therefore, unflatten(Box.sample) are not contained in the original Discrete space.
Therefore this might introduce more problems if people expect that flatten_space(Discrete) -> OneHot

To me, this is part of the legacy / technical debt that Gym / Gymnasium has and is difficult to change / fix due to its size and aim of backward compatibility.

@fracapuano
Copy link
Author

This conversion reminds me of #102 which pointed out that flatten(Discrete) -> Box means that samples of Box are not one-hotted therefore, unflatten(Box.sample) are not contained in the original Discrete space.
Therefore this might introduce more problems if people expect that flatten_space(Discrete) -> OneHot

I am not sure I completely understand this... Can you please point out which conversion are you referring to?

So why not adding a OneHot space for all those cases in which it just feel more natural than using Discrete/MultiDiscrete for custom environment design? I already have mine implemented and tested. Let me know if you would be interested in testing it out.

@fracapuano
Copy link
Author

Hey there, any update?

@pseudo-rnd-thoughts
Copy link
Member

Hey,

Read #102 for the whole discussion, in short, Gymnasium allows users to flatten a space which for Discrete is transformed to a Box. The problem is that samples generated by the Box cannot be reverted back to the Discrete values as they are not one-hotted. See the example code in #102. The problem is that we can't solve this and just have to put a warning that flattened samples, i.e., Box.sample(), are not guaranteed to exist within the original space, i.e., Discrete.

One of the proposals in #102 was to introduce a OneHot as replacement to Box in this case but we found there are critical reasons we can't do this.
In addition to the code smell issues pointed out by @RedTachyon, I suspect that introducing the OneHot will cause confusion with the issue related above and given that Discrete is functionally identical to OneHot for users if they want this feature then I don't think we will add a OneHot space. Finally, adding a space into Gymnasium is not as simple as one thinks given we have many spaces.utils and vector.utils functions that need to be added / updated.

Thank you for opening the issue, I am going to close the issue, we can continue the discussion until there is a particularly strong argument that we haven't considered yet then we can reopen

@zig1000
Copy link

zig1000 commented Nov 5, 2023

Hi, I have a fairly simple motivating example and ran across this and #102.

I'm trying to write a new gym env (hopefully written The Right Way from the start), and I'd like to run stable-baselines3 on it.
The env has an action space of type Dict(Discrete, ...).

stable-baseline3 does not support Dict action spaces, so as far as I can tell the Right way to handle this is to wrap the env in a FlattenAction wrapper. However, doing so results in errors as stable-baselines I assume calls sample() on its end, and generates invalid one-hots of e.g. all-zeros.

The advice in #102 to sample before flattening does not seem to apply here as the sampling isn't on my end.

The only fix I can think of would be to do some truly horrible one-hotifying in the FlattenAction wrapper before the invalid values can be passed to unflatten. np.argmax also doesn't seem like it would suffice alone as it front-biases the results in the case of ties. Similarly if I changed my space to use MultiBinary my env-internal code would be a mess of calls to an unbiased-argmax.

But overall the fundamental problem seems to me to be that there does not exist a space that is both 1. flat 2. sample() behaves like a one-hot. Discrete violates 1, MultiBinary violates 2. Without such a space, there doesn't appear to be a 'clean' env setup for both random sampling and actually feeding values to/from a NN.

To me, this would make the addition of the Discrete-lookalike OneHot space worth it. It would also save users from googling for half an hour trying to figure out whether they should go for MultiBinary/Box or Discrete when they know they want a one hot (as I did). Due to this I think in some cases it may actually reduce confusion in spite of the similarity to Discrete.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 6, 2023

Hi, thanks for your comment.

In short, yes, I agree with your pain. However, I still believe this is not generally possible due to the current implementation of flatten with Discrete which cannot be solved without possibly breaking a large amount of users' code.
I don't know if SB3 supports custom spaces but I suggest making your own custom space for OneHot. An alternative solution is a hack where you create a new class that inherits from Box with a custom sample function that is unflattenable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants