-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to obtain skipped frames from the Atari envs? #275
Comments
There's not currently an interface for this. cc/ @joschu what do you think we should do here? |
You can stack the frames returned by the environment |
@joschu I know, but the environment already applies each action 2-4 times in a row and the env only returns one observation afterwards. I could implement frame skipping ontop of that, but that's not the same thing. Could you reopen please, or am I getting this wrong? |
That's how we've decided to define the Environment, since that's how it worked in previous papers that reported results on Atari. |
Okay, so previous papers skip 2-4 frames and use the last four non-skipped frames as input? |
Correct. (and they are generally a bit ambiguous about it.) |
For example, the 2015 Nature paper by Mnih et al. introduces history frames as:
By that point, the didn't talk about frame skipping yet, so I assume that the m most recent frames refer to the m last game ticks. They later describe frame skipping as:
I assume that by frames they refer now to inputs to the Q-function, in which case, the history stack contains all frames, not just skipped ones. Anyways, if you're sure about that, that would be very helpful. I can't reproduce DQN and A3C results so far, but it might as well be a problem with my implementation of the algorithms. |
You may be interested in these discussions. https://groups.google.com/d/msg/deep-q-learning/81RbgJ4L9To/G8ZbpsIXCwAJ https://groups.google.com/forum/#!msg/arcade-learning-environment/N0syOKmRAH4/Gwsjx2KZAgAJ |
Thanks a lot, that made it clear. They stack the last non-skipped frames. Mnih et al. 2015 also take the max of two consecutive frames, at the emulator level. This is not possible with Gym, I guess? |
I didn't know about the max operation--good to know. It could be done, but I'm inclined not to, as we're not necessarily trying to precisely replicate the experimental setup from that paper. |
Let me summarize this because we see not a few people ask the question on frame skipping. DeepMind's original DQN paper
so an input to the neural network is consisted of four frame;
ALE provides mechanism for frame skipping (combined with adjustable random action repeat) and color averaging over skipped frames. This is also used in Gym's Atari Environment has built-in stochastic frame skipping common to all games. So the frames returned from environment are not consecutive. The reason behind Gym's stochastic frame skipping is, as mentioned above, to make environment stochastic. (I guess without this, the game will be completely deterministic?) I think if you want to reproduce the behavior of the original DQN paper, the easiest will be disabling frame skip and color averaging in |
@mthrok Would it be realistic to have a constructor argument to the |
To clarify the reason behind the max operator. Old Atari consoles used the slow decay property of the phospohors (which scintillate the screen) by using 2 successive frames to fully display an object. So a single frame may not contain the full info and a max operator on 2 successive frames at any point of time will have the complete information necessary for the NN to interpret. This max operator works on 2 SUCCESSIVE frames. If we skip a few frames then max operator will not make sense and the model may not converge. As suggested by mthrok, we can use the noframeskip option, take the max operator for successive screens and can write some custom code to skip frames in order to replicate the deepmind paper. |
Many RL papers stack the last couple of frames into a single observation to address partial observability. The Atari envs repeat actions for 2, 3, or 4 frames, which is also common. However, they only return the new frame afterwards. How can I access the skipped frames from the Atari envs in order to reproduce algorithms such as DQN?
The text was updated successfully, but these errors were encountered: