Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to obtain skipped frames from the Atari envs? #275

Closed
danijar opened this issue Aug 6, 2016 · 14 comments
Closed

How to obtain skipped frames from the Atari envs? #275

danijar opened this issue Aug 6, 2016 · 14 comments

Comments

@danijar
Copy link

danijar commented Aug 6, 2016

Many RL papers stack the last couple of frames into a single observation to address partial observability. The Atari envs repeat actions for 2, 3, or 4 frames, which is also common. However, they only return the new frame afterwards. How can I access the skipped frames from the Atari envs in order to reproduce algorithms such as DQN?

@danijar danijar changed the title How to obtain frames during action repeat from the Atari envs? How to obtain skipped frames from the Atari envs? Aug 6, 2016
@gdb
Copy link
Collaborator

gdb commented Aug 6, 2016

There's not currently an interface for this.

cc/ @joschu what do you think we should do here?

@joschu
Copy link
Contributor

joschu commented Aug 6, 2016

You can stack the frames returned by the environment
That's the same thing done in the other papers that use the Atari domain, such as deepmind's paper.
I.e., they stack frames (t, t-4, t-8, t-12), not (t,t-1,t-2,t-3).

@joschu joschu closed this as completed Aug 6, 2016
@danijar
Copy link
Author

danijar commented Aug 8, 2016

@joschu I know, but the environment already applies each action 2-4 times in a row and the env only returns one observation afterwards. I could implement frame skipping ontop of that, but that's not the same thing. Could you reopen please, or am I getting this wrong?

@joschu
Copy link
Contributor

joschu commented Aug 8, 2016

That's how we've decided to define the Environment, since that's how it worked in previous papers that reported results on Atari.
If you want to use the skipped frames, you can make a new Environment for your own use.

@danijar
Copy link
Author

danijar commented Aug 9, 2016

Okay, so previous papers skip 2-4 frames and use the last four non-skipped frames as input?

@joschu
Copy link
Contributor

joschu commented Aug 9, 2016

Correct. (and they are generally a bit ambiguous about it.)
None of the previous papers use the stochastic frame skipping, though.
We did that because otherwise the games are fully deterministic.

@danijar
Copy link
Author

danijar commented Aug 9, 2016

For example, the 2015 Nature paper by Mnih et al. introduces history frames as:

The function w from algorithm 1 described below applies this preprocessing to the m most recent frames and stacks them to produce the input to the Q-function, in which m=4, although the algorithm is robust to different values of m (for example, 3 or 5).

By that point, the didn't talk about frame skipping yet, so I assume that the m most recent frames refer to the m last game ticks. They later describe frame skipping as:

Following previous approaches to playing Atari 2600 games, we also use a simple frame-skipping technique. More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames.

I assume that by frames they refer now to inputs to the Q-function, in which case, the history stack contains all frames, not just skipped ones.

Anyways, if you're sure about that, that would be very helpful. I can't reproduce DQN and A3C results so far, but it might as well be a problem with my implementation of the algorithms.

@danijar
Copy link
Author

danijar commented Aug 10, 2016

Thanks a lot, that made it clear. They stack the last non-skipped frames. Mnih et al. 2015 also take the max of two consecutive frames, at the emulator level. This is not possible with Gym, I guess?

@joschu
Copy link
Contributor

joschu commented Aug 10, 2016

I didn't know about the max operation--good to know. It could be done, but I'm inclined not to, as we're not necessarily trying to precisely replicate the experimental setup from that paper.

@mthrok
Copy link
Contributor

mthrok commented Aug 10, 2016

Let me summarize this because we see not a few people ask the question on frame skipping.

DeepMind's original DQN paper

  • used frame skipping (for fast playing/learning) and
  • applied pixel-wise max to consecutive frames (to handle flickering).

so an input to the neural network is consisted of four frame;

[max(T-1, T), max(T+3, T+4), max(T+7, T+8), max(T+11, T+12)]

ALE provides mechanism for frame skipping (combined with adjustable random action repeat) and color averaging over skipped frames. This is also used in simple_dqn's ALEEnvironment

Gym's Atari Environment has built-in stochastic frame skipping common to all games. So the frames returned from environment are not consecutive.

The reason behind Gym's stochastic frame skipping is, as mentioned above, to make environment stochastic. (I guess without this, the game will be completely deterministic?)
cf. in original DQN and simple_dqn same randomness is achieved by having agent performs random number of dummy actions at the beginning of each episode.

I think if you want to reproduce the behavior of the original DQN paper, the easiest will be disabling frame skip and color averaging in ALEEnvironment then construct the mechanism on agent side.

@danijar
Copy link
Author

danijar commented Aug 10, 2016

@mthrok Would it be realistic to have a constructor argument to the AtariEnv that speficies the choices for the stochastic frame skipping? The default value could be (2, 3, 3) and that is what the pre-registered Atari envs use. However, it would make it easy to register the new envs without frame skipping (0,). I'd acually be happy to do the change for this as a small PR that helps me get started with contributing to Gym.

@mthrok
Copy link
Contributor

mthrok commented Aug 10, 2016

@danijar
Looking at the discussion in #282, I thought the same thing (If kwarg is introduced to environment creation, we can customize the env for DQN recreation.) and I think the approach is consistent. But since I am outside of the organization so all I can do is just 👍.

@allohvk
Copy link

allohvk commented Oct 29, 2022

That's how we've decided to define the Environment, since that's how it worked in previous papers that reported results on Atari. If you want to use the skipped frames, you can make a new Environment for your own use.

To clarify the reason behind the max operator. Old Atari consoles used the slow decay property of the phospohors (which scintillate the screen) by using 2 successive frames to fully display an object. So a single frame may not contain the full info and a max operator on 2 successive frames at any point of time will have the complete information necessary for the NN to interpret. This max operator works on 2 SUCCESSIVE frames. If we skip a few frames then max operator will not make sense and the model may not converge. As suggested by mthrok, we can use the noframeskip option, take the max operator for successive screens and can write some custom code to skip frames in order to replicate the deepmind paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants