Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running OpenAI Gym environments with image observations #22

Closed
singulaire opened this issue Jun 24, 2016 · 5 comments
Closed

Problem running OpenAI Gym environments with image observations #22

singulaire opened this issue Jun 24, 2016 · 5 comments

Comments

@singulaire
Copy link
Contributor

singulaire commented Jun 24, 2016

When trying to run experiments with an OpenAI Gym environment that returns image observations (e.g. CarRacing-v0), using the same configuration as the ddpg example and only changing the environment, there are a number of places where execution will break in either the policy, algorithm, or q function networks. The code seems to assume observations will be a flattened vector, whereas the CarRacing environment returns image arrays of the shape (96, 96, 3).

This does not seem to happen with e.g. the trpo_gym example, which builds its network using the MLP class.

@dementrock
Copy link
Member

Hi @singulaire, sorry for this bug. Indeed we only tested ddpg with flattened observations. This should be fixed by flattening the observations and actions before adding them to the pool. For example, this line can be replaced with

pool.add_sample(
    self.env.observation_space.flatten(observation),
    self.env.action_space.flatten(action),
    reward * self.scale_reward,
    terminal
)

Could you see if this works? If so, I'm happy to accept a pull request or fix it on my end.

@singulaire
Copy link
Contributor Author

I was able to get the example to work by using the fix plus a few other flatten operations and will make a pull request presently. That said, while this allows the code to execute without breaking, flattening means we aren't taking advantage of structure in image data.
I think a more long term solution would be to change the various network-based classes so that they can be given a custom network as an argument, as was done in bc1b506. The changes that were needed for GaussianMLPPolicy are fairly small. With this capability available, it would be easy to make CNNs with the ConvNetwork class, which should be well suited to image data.

What do you think? I will gladly look into it if you believe this is a good addition to the project.

@dementrock
Copy link
Member

Thanks, and glad it worked! I agree that the network-based classes should be more flexible. What further changes are required for the GaussianMLPPolicy? Or did you mean the DeterministicMLPPolicy class? Feel free to also include the necessary changes in the pull request!

@singulaire
Copy link
Contributor Author

I meant the DeterministicMLPPolicy, but the ContinuousMLPQFunction class could also benefit from the same treatment. It also may or may not be desirable to have a mix of CNNs which accept image input and fully connected networks which take flattened inputs, but that would require more complex logic (e.g. a "FLATTEN" flag for each object in charge of flattening). Finally, I ran into some additional problems with non-flattened input which I couldn't solve easily, probably due to insufficient familiarity with Theano.

All in all, I included just the flattening logic in #23, so that DDPG works with image observations, although it doesn't take advantage of structure in image data.

@dementrock
Copy link
Member

Closing this since #23 is merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants