Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What is the recommanded way to return multiple image as observation? #156

Closed
INFCode opened this issue Nov 23, 2022 · 2 comments
Labels
question Further information is requested

Comments

@INFCode
Copy link

INFCode commented Nov 23, 2022

Question

I am writing my very first custom environment, which currently runs into some trouble. The environment needs to return 2 images as observation. Since they are logically representing different things, it is not reasonable to concatenate them side by side.

My current solution is to use a Box space that has shape (96,96,3,2) and returns stacked images. However, the env_checker gives a warning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector).

I currently come up with several solutions but I am not sure which is the recommended one:

  1. Just set disable_env_checker to True when registering the environment using gym.register. I do not think this is actually a solution. Also, I think many other checkers provided by the env_checker are still helpful and should not be disabled due to this problem.
  2. Stack the 2 images in the channel dimension, i.e. change the output shape to be (96,96,6). Since the current env_checker does not check the number of channels, it should be able to pass the check. But the problem is that it seems really weird to me to have an 'image' that has 6 channels, and the first 3 channels are even independent with the last 3.
  3. I can use one of the composite spaces such as Dict or Tuple. This is a nice choice for writing an environment, but I have to first warp it into a Dict or something else in the environment and then transform it back into a tensor of shape (96,96,3,2) inside my model, which seems to be kind of redundant.

Are there any suggestions on which solution is recommended or a better choice, or are there any other ways that can solve this?

@INFCode INFCode added the question Further information is requested label Nov 23, 2022
@pseudo-rnd-thoughts
Copy link
Member

Thanks for the question, I think you are generally correct in what you are saying.
Personally, I would use the Dict or Tuple space, in particular, if you are using two different neural network for the images, otherwise, I would use (2, 96, 96, 3) as your Box space where the two images can be treated like the batch size.

@INFCode
Copy link
Author

INFCode commented Nov 24, 2022

Thank you. I guess I will try to use the Dict space. It seems that the Tuple observation space is not widely supported by RL libraries (e.g. see this and the first Note box of this).

@INFCode INFCode closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants