How to obtain skipped frames from the Atari envs? #275

danijar · 2016-08-06T05:59:49Z

Many RL papers stack the last couple of frames into a single observation to address partial observability. The Atari envs repeat actions for 2, 3, or 4 frames, which is also common. However, they only return the new frame afterwards. How can I access the skipped frames from the Atari envs in order to reproduce algorithms such as DQN?

gdb · 2016-08-06T07:05:13Z

There's not currently an interface for this.

cc/ @joschu what do you think we should do here?

joschu · 2016-08-06T21:10:48Z

You can stack the frames returned by the environment
That's the same thing done in the other papers that use the Atari domain, such as deepmind's paper.
I.e., they stack frames (t, t-4, t-8, t-12), not (t,t-1,t-2,t-3).

danijar · 2016-08-08T22:29:40Z

@joschu I know, but the environment already applies each action 2-4 times in a row and the env only returns one observation afterwards. I could implement frame skipping ontop of that, but that's not the same thing. Could you reopen please, or am I getting this wrong?

joschu · 2016-08-08T22:52:10Z

That's how we've decided to define the Environment, since that's how it worked in previous papers that reported results on Atari.
If you want to use the skipped frames, you can make a new Environment for your own use.

danijar · 2016-08-09T01:34:17Z

Okay, so previous papers skip 2-4 frames and use the last four non-skipped frames as input?

joschu · 2016-08-09T02:04:34Z

Correct. (and they are generally a bit ambiguous about it.)
None of the previous papers use the stochastic frame skipping, though.
We did that because otherwise the games are fully deterministic.

danijar · 2016-08-09T15:29:30Z

For example, the 2015 Nature paper by Mnih et al. introduces history frames as:

The function w from algorithm 1 described below applies this preprocessing to the m most recent frames and stacks them to produce the input to the Q-function, in which m=4, although the algorithm is robust to different values of m (for example, 3 or 5).

By that point, the didn't talk about frame skipping yet, so I assume that the m most recent frames refer to the m last game ticks. They later describe frame skipping as:

Following previous approaches to playing Atari 2600 games, we also use a simple frame-skipping technique. More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames.

I assume that by frames they refer now to inputs to the Q-function, in which case, the history stack contains all frames, not just skipped ones.

Anyways, if you're sure about that, that would be very helpful. I can't reproduce DQN and A3C results so far, but it might as well be a problem with my implementation of the algorithms.

mthrok · 2016-08-09T15:50:53Z

@danijar

You may be interested in these discussions.

https://groups.google.com/d/msg/deep-q-learning/81RbgJ4L9To/G8ZbpsIXCwAJ

https://groups.google.com/forum/#!msg/deep-q-learning/7VK6nmj8x8Q/CSOc5SAcAwAJ;context-place=msg/deep-q-learning/JyDjxVKHiFQ/KxGEeLxSBQAJ

https://groups.google.com/forum/#!msg/arcade-learning-environment/N0syOKmRAH4/Gwsjx2KZAgAJ

danijar · 2016-08-10T04:28:39Z

Thanks a lot, that made it clear. They stack the last non-skipped frames. Mnih et al. 2015 also take the max of two consecutive frames, at the emulator level. This is not possible with Gym, I guess?

joschu · 2016-08-10T15:33:16Z

I didn't know about the max operation--good to know. It could be done, but I'm inclined not to, as we're not necessarily trying to precisely replicate the experimental setup from that paper.

mthrok · 2016-08-10T16:33:27Z

Let me summarize this because we see not a few people ask the question on frame skipping.

DeepMind's original DQN paper

used frame skipping (for fast playing/learning) and
applied pixel-wise max to consecutive frames (to handle flickering).

so an input to the neural network is consisted of four frame;

[max(T-1, T), max(T+3, T+4), max(T+7, T+8), max(T+11, T+12)]

ALE provides mechanism for frame skipping (combined with adjustable random action repeat) and color averaging over skipped frames. This is also used in simple_dqn's ALEEnvironment

Gym's Atari Environment has built-in stochastic frame skipping common to all games. So the frames returned from environment are not consecutive.

The reason behind Gym's stochastic frame skipping is, as mentioned above, to make environment stochastic. (I guess without this, the game will be completely deterministic?)
cf. in original DQN and simple_dqn same randomness is achieved by having agent performs random number of dummy actions at the beginning of each episode.

I think if you want to reproduce the behavior of the original DQN paper, the easiest will be disabling frame skip and color averaging in ALEEnvironment then construct the mechanism on agent side.

danijar · 2016-08-10T20:14:23Z

@mthrok Would it be realistic to have a constructor argument to the AtariEnv that speficies the choices for the stochastic frame skipping? The default value could be (2, 3, 3) and that is what the pre-registered Atari envs use. However, it would make it easy to register the new envs without frame skipping (0,). I'd acually be happy to do the change for this as a small PR that helps me get started with contributing to Gym.

mthrok · 2016-08-10T22:29:52Z

@danijar
Looking at the discussion in #282, I thought the same thing (If kwarg is introduced to environment creation, we can customize the env for DQN recreation.) and I think the approach is consistent. But since I am outside of the organization so all I can do is just 👍.

allohvk · 2022-10-29T07:29:07Z

That's how we've decided to define the Environment, since that's how it worked in previous papers that reported results on Atari. If you want to use the skipped frames, you can make a new Environment for your own use.

To clarify the reason behind the max operator. Old Atari consoles used the slow decay property of the phospohors (which scintillate the screen) by using 2 successive frames to fully display an object. So a single frame may not contain the full info and a max operator on 2 successive frames at any point of time will have the complete information necessary for the NN to interpret. This max operator works on 2 SUCCESSIVE frames. If we skip a few frames then max operator will not make sense and the model may not converge. As suggested by mthrok, we can use the noframeskip option, take the max operator for successive screens and can write some custom code to skip frames in order to replicate the deepmind paper.

danijar changed the title ~~How to obtain frames during action repeat from the Atari envs?~~ How to obtain skipped frames from the Atari envs? Aug 6, 2016

joschu closed this as completed Aug 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to obtain skipped frames from the Atari envs? #275

How to obtain skipped frames from the Atari envs? #275

danijar commented Aug 6, 2016

gdb commented Aug 6, 2016

joschu commented Aug 6, 2016

danijar commented Aug 8, 2016 •

edited

Loading

joschu commented Aug 8, 2016

danijar commented Aug 9, 2016

joschu commented Aug 9, 2016

danijar commented Aug 9, 2016 •

edited

Loading

mthrok commented Aug 9, 2016 •

edited

Loading

danijar commented Aug 10, 2016 •

edited

Loading

joschu commented Aug 10, 2016

mthrok commented Aug 10, 2016 •

edited

Loading

danijar commented Aug 10, 2016 •

edited

Loading

mthrok commented Aug 10, 2016

allohvk commented Oct 29, 2022 •

edited

Loading

How to obtain skipped frames from the Atari envs? #275

How to obtain skipped frames from the Atari envs? #275

Comments

danijar commented Aug 6, 2016

gdb commented Aug 6, 2016

joschu commented Aug 6, 2016

danijar commented Aug 8, 2016 • edited Loading

joschu commented Aug 8, 2016

danijar commented Aug 9, 2016

joschu commented Aug 9, 2016

danijar commented Aug 9, 2016 • edited Loading

mthrok commented Aug 9, 2016 • edited Loading

danijar commented Aug 10, 2016 • edited Loading

joschu commented Aug 10, 2016

mthrok commented Aug 10, 2016 • edited Loading

danijar commented Aug 10, 2016 • edited Loading

mthrok commented Aug 10, 2016

allohvk commented Oct 29, 2022 • edited Loading

danijar commented Aug 8, 2016 •

edited

Loading

danijar commented Aug 9, 2016 •

edited

Loading

mthrok commented Aug 9, 2016 •

edited

Loading

danijar commented Aug 10, 2016 •

edited

Loading

mthrok commented Aug 10, 2016 •

edited

Loading

danijar commented Aug 10, 2016 •

edited

Loading

allohvk commented Oct 29, 2022 •

edited

Loading