Problem running OpenAI Gym environments with image observations #22

singulaire · 2016-06-24T06:21:24Z

When trying to run experiments with an OpenAI Gym environment that returns image observations (e.g. CarRacing-v0), using the same configuration as the ddpg example and only changing the environment, there are a number of places where execution will break in either the policy, algorithm, or q function networks. The code seems to assume observations will be a flattened vector, whereas the CarRacing environment returns image arrays of the shape (96, 96, 3).

This does not seem to happen with e.g. the trpo_gym example, which builds its network using the MLP class.

dementrock · 2016-06-24T16:53:30Z

Hi @singulaire, sorry for this bug. Indeed we only tested ddpg with flattened observations. This should be fixed by flattening the observations and actions before adding them to the pool. For example, this line can be replaced with

pool.add_sample(
    self.env.observation_space.flatten(observation),
    self.env.action_space.flatten(action),
    reward * self.scale_reward,
    terminal
)

Could you see if this works? If so, I'm happy to accept a pull request or fix it on my end.

singulaire · 2016-06-27T08:13:14Z

I was able to get the example to work by using the fix plus a few other flatten operations and will make a pull request presently. That said, while this allows the code to execute without breaking, flattening means we aren't taking advantage of structure in image data.
I think a more long term solution would be to change the various network-based classes so that they can be given a custom network as an argument, as was done in bc1b506. The changes that were needed for GaussianMLPPolicy are fairly small. With this capability available, it would be easy to make CNNs with the ConvNetwork class, which should be well suited to image data.

What do you think? I will gladly look into it if you believe this is a good addition to the project.

dementrock · 2016-06-27T17:45:53Z

Thanks, and glad it worked! I agree that the network-based classes should be more flexible. What further changes are required for the GaussianMLPPolicy? Or did you mean the DeterministicMLPPolicy class? Feel free to also include the necessary changes in the pull request!

singulaire · 2016-06-30T00:19:41Z

I meant the DeterministicMLPPolicy, but the ContinuousMLPQFunction class could also benefit from the same treatment. It also may or may not be desirable to have a mix of CNNs which accept image input and fully connected networks which take flattened inputs, but that would require more complex logic (e.g. a "FLATTEN" flag for each object in charge of flattening). Finally, I ran into some additional problems with non-flattened input which I couldn't solve easily, probably due to insufficient familiarity with Theano.

All in all, I included just the flattening logic in #23, so that DDPG works with image observations, although it doesn't take advantage of structure in image data.

dementrock · 2016-07-01T06:03:47Z

Closing this since #23 is merged. Thanks!

singulaire mentioned this issue Jun 30, 2016

add flattening logic so ddpg can handle image observations #23

Merged

dementrock closed this as completed Jul 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running OpenAI Gym environments with image observations #22

Problem running OpenAI Gym environments with image observations #22

singulaire commented Jun 24, 2016 •

edited

Loading

dementrock commented Jun 24, 2016

singulaire commented Jun 27, 2016

dementrock commented Jun 27, 2016

singulaire commented Jun 30, 2016

dementrock commented Jul 1, 2016

Problem running OpenAI Gym environments with image observations #22

Problem running OpenAI Gym environments with image observations #22

Comments

singulaire commented Jun 24, 2016 • edited Loading

dementrock commented Jun 24, 2016

singulaire commented Jun 27, 2016

dementrock commented Jun 27, 2016

singulaire commented Jun 30, 2016

dementrock commented Jul 1, 2016

singulaire commented Jun 24, 2016 •

edited

Loading