-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem running OpenAI Gym environments with image observations #22
Comments
Hi @singulaire, sorry for this bug. Indeed we only tested ddpg with flattened observations. This should be fixed by flattening the observations and actions before adding them to the pool. For example, this line can be replaced with pool.add_sample(
self.env.observation_space.flatten(observation),
self.env.action_space.flatten(action),
reward * self.scale_reward,
terminal
) Could you see if this works? If so, I'm happy to accept a pull request or fix it on my end. |
I was able to get the example to work by using the fix plus a few other flatten operations and will make a pull request presently. That said, while this allows the code to execute without breaking, flattening means we aren't taking advantage of structure in image data. What do you think? I will gladly look into it if you believe this is a good addition to the project. |
Thanks, and glad it worked! I agree that the network-based classes should be more flexible. What further changes are required for the GaussianMLPPolicy? Or did you mean the DeterministicMLPPolicy class? Feel free to also include the necessary changes in the pull request! |
I meant the DeterministicMLPPolicy, but the ContinuousMLPQFunction class could also benefit from the same treatment. It also may or may not be desirable to have a mix of CNNs which accept image input and fully connected networks which take flattened inputs, but that would require more complex logic (e.g. a "FLATTEN" flag for each object in charge of flattening). Finally, I ran into some additional problems with non-flattened input which I couldn't solve easily, probably due to insufficient familiarity with Theano. All in all, I included just the flattening logic in #23, so that DDPG works with image observations, although it doesn't take advantage of structure in image data. |
Closing this since #23 is merged. Thanks! |
When trying to run experiments with an OpenAI Gym environment that returns image observations (e.g. CarRacing-v0), using the same configuration as the ddpg example and only changing the environment, there are a number of places where execution will break in either the policy, algorithm, or q function networks. The code seems to assume observations will be a flattened vector, whereas the CarRacing environment returns image arrays of the shape (96, 96, 3).
This does not seem to happen with e.g. the trpo_gym example, which builds its network using the MLP class.
The text was updated successfully, but these errors were encountered: