Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gym env #12

Open
ShuvenduRoy opened this issue May 30, 2018 · 12 comments
Open

Gym env #12

ShuvenduRoy opened this issue May 30, 2018 · 12 comments

Comments

@ShuvenduRoy
Copy link

Why necessarily 'NoFrameskip'? And what is the specification of 'NoFrameskip'?

assert 'NoFrameskip' in env.spec.id

@ShuvenduRoy
Copy link
Author

I tried 'Pong-V0' and it did not quite do well. Max reward I got was -2. What might have caused the problem?

@garkavem
Copy link

These wrappers are from openai baselines project. Frameskip is traditionally used in atari games to speed up learning. You simply do not need to see all the frames - every 4th will do. But previously frameskip was build in environments - all gym environments that do not have "NoFrameskip" in the name already give you only every 4th frame. For some reasons in baselines project they decide to take environments without frameskip and feed it to wrapper MaxAndSkipEnv.
So, if you remove line
assert 'NoFrameskip' in env.spec.id
and feed Pong-V0 to MaxAndSkipEnv wrapper you will get only every 16th frame. Probably that's not enough to play pong.

@ShuvenduRoy
Copy link
Author

As this environment is not currently available in gym, what should I change to reduce this 16 frame gap to 4

@garkavem
Copy link

Just remove
env = MaxAndSkipEnv(env, skip=4)
from "make_atari" in wrappers

@ShuvenduRoy
Copy link
Author

ShuvenduRoy commented May 31, 2018

Ok, that's good

But I am wondering about the shape of the state. According to the original paper

The details of the architecture are explained in the Methods. 
The input to the neural network consists of an 84 x 84 x 4 image produced by the preprocessing map w, followed by three convolutional layers

But I checked the shape of the state. which is (1, 84, 84). Where this preprocessing is deviating from the original paper

@garkavem
Copy link

Call wrap_deepmind with argument frame_stack=True. This problem was raised in #9

@ShuvenduRoy
Copy link
Author

Looks like something mysterious is happening. I also wonder how this even worked without sequence information. Any idea what is going on here?

Are we Bruteforcing the model to learn from current pixel only, which might not work in more complex case?

@garkavem
Copy link

Well Pong is just an extremely simple game. I suspect that someone with perfect reaction(like RL-agen) can just always move towards ball and that would be sufficient to win. For many atari games stacking frames is necessary though.

@ShuvenduRoy
Copy link
Author

ok!!!

I am not quite getting the logic from code, where from this 4 frames are coming. As this env skips 4 frames, what is the situation now? Are we having 16 frame information in (4, 80, 80) size state? Or the skipping 4 frame is now inserted in this?

@garkavem
Copy link

You have information about the current frame, 4 frames ago, 8 frames ago and 16 frames ago. You don't have the skipped frames.

@ShuvenduRoy
Copy link
Author

ow. I got it. Thanks :-)

@ShuvenduRoy
Copy link
Author

With all this information with the modification, I trained the model. But could not quite regenerate the result as the original one. Here is the code and the result

Any explanation where it is causing problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants