[Question] What is the recommanded way to return multiple image as observation? #156

INFCode · 2022-11-23T13:17:04Z

Question

I am writing my very first custom environment, which currently runs into some trouble. The environment needs to return 2 images as observation. Since they are logically representing different things, it is not reasonable to concatenate them side by side.

My current solution is to use a Box space that has shape (96,96,3,2) and returns stacked images. However, the env_checker gives a warning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector).

I currently come up with several solutions but I am not sure which is the recommended one:

Just set disable_env_checker to True when registering the environment using gym.register. I do not think this is actually a solution. Also, I think many other checkers provided by the env_checker are still helpful and should not be disabled due to this problem.
Stack the 2 images in the channel dimension, i.e. change the output shape to be (96,96,6). Since the current env_checker does not check the number of channels, it should be able to pass the check. But the problem is that it seems really weird to me to have an 'image' that has 6 channels, and the first 3 channels are even independent with the last 3.
I can use one of the composite spaces such as Dict or Tuple. This is a nice choice for writing an environment, but I have to first warp it into a Dict or something else in the environment and then transform it back into a tensor of shape (96,96,3,2) inside my model, which seems to be kind of redundant.

Are there any suggestions on which solution is recommended or a better choice, or are there any other ways that can solve this?

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-11-23T14:53:18Z

Thanks for the question, I think you are generally correct in what you are saying.
Personally, I would use the Dict or Tuple space, in particular, if you are using two different neural network for the images, otherwise, I would use (2, 96, 96, 3) as your Box space where the two images can be treated like the batch size.

INFCode · 2022-11-24T00:20:04Z

Thank you. I guess I will try to use the Dict space. It seems that the Tuple observation space is not widely supported by RL libraries (e.g. see this and the first Note box of this).

INFCode added the question Further information is requested label Nov 23, 2022

INFCode closed this as completed Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What is the recommanded way to return multiple image as observation? #156

[Question] What is the recommanded way to return multiple image as observation? #156

INFCode commented Nov 23, 2022

pseudo-rnd-thoughts commented Nov 23, 2022

INFCode commented Nov 24, 2022

[Question] What is the recommanded way to return multiple image as observation? #156

[Question] What is the recommanded way to return multiple image as observation? #156

Comments

INFCode commented Nov 23, 2022

Question

pseudo-rnd-thoughts commented Nov 23, 2022

INFCode commented Nov 24, 2022