Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference for DeepMind using FireResetEnv wrapper's functionality #240

Open
DennisSoemers opened this issue Dec 30, 2017 · 14 comments
Open

Comments

@DennisSoemers
Copy link

In atari_wrappers.py (specifically, the wrap_deepmind function all the way at the bottom), the docstring says that that function configures the environment for "DeepMind-style Atari".

I was curious if anyone can provide me with a reference to any DeepMind paper mentioning that they do in fact use functionality as implemented by the FireResetEnv wrapper? I was able to find mentions of the functionality implemented by all the other wrappers applied in that function (and also the ones in the make_atari function above) in various DeepMind papers (such as the Mnih et al. (2015) DQN Nature paper), but was unable to find any text resembling the functionality of the FireResetEnv.

It may just be a minor detail, but I do think it's important to be precise with this kind of stuff for the sake of reproducibility.

@muupan
Copy link

muupan commented Mar 18, 2018

I'm also curious whether the DeepMind papers use the same trick. I was unable to find it in their code (dqn, alewrap and xitari).

@DennisSoemers
Copy link
Author

@muupan I've continued looking into this since writing that issue. I have not gotten 100% confirmation anywhere. I did send a mail to John Schulman (who wrote the initial commit in which the comment appeared in this repository claiming that this is deepmind-style, although he did not originally implement that wrapper), and he has no idea where it came from.

Additionally, it appears like the intended functionality (automatically playing actions that are required to start the game on reset) is actually already implemented in the Arcade Learning Environment itself, in the following line of code: https://github.com/mgbellemare/Arcade-Learning-Environment/blob/master/src/environment/stella_environment.cpp#L88

So, to me it appears like anyone who uses the Arcade Learning Environment (which is basically everyone, it's DeepMind, but it's also everyone who runs atari games through OpenAI gym, etc.) already gets this functionality out of the box, even without the FireResetEnv. I suspect the FireResetEnv may be completely useless. I've never personally tested how it affects performance anywhere, that would still be interesting to do, just to make sure. I did very briefly test the AirRaid and Asterix games (which, as far as I've been able to find, are two games that supposedly require pressing FIRE to start a game), and they appeared to play just fine both with and without the FireResetEnv.

@muupan
Copy link

muupan commented Mar 22, 2018

@DennisSoemers Thank you for the information. At least, FireResetEnv actually makes a difference for Breakout. Breakout has no getStartingActions in the ALE, but it needs FIRE to shoot a ball. Without FireResetEnv, when the algorithm fails to learn to push FIRE, it gets stuck.

@DennisSoemers
Copy link
Author

Ah, I see, thanks for letting me know about that one. I guess the only way to tell for sure is to contact DeepMind and ask them if they're using anything like this. Right now I'm inclined to bet on "no" though.

@muupan
Copy link

muupan commented Mar 22, 2018

I sent an email to the author of the DQN paper. I hope he will answer it.

@DennisSoemers
Copy link
Author

@muupan Just wondering if you ever got a reply?

@muupan
Copy link

muupan commented Apr 15, 2018

Unfortunately not.

@Kaixhin
Copy link

Kaixhin commented May 8, 2018

After doing some experiments I suspect that they don't use this wrapper, and like @muupan I was unable to find it in their code. It seems that for the DQN-based agents at least, DeepMind evaluates using an ɛ-greedy policy, where ɛ = 0.05. So during evaluation in the early stages of training, the ball does end up getting released quickly in Breakout (whereas when I used ɛ = 0.001 evaluation basically got stuck). I'm training a Rainbow agent now, will try to remember to report back with results once it is done.

By the way, if you're interested in reproducibility, DeepMind's code shows that they use bilinear interpolation for downsampling, as opposed to the wrapper here which uses pixel area relation.

@Kaixhin
Copy link

Kaixhin commented May 22, 2018

Got confirmation from Charles Beattie at DeepMind that they do not use anything like the FireResetEnv wrapper, so I have no idea where that came from.

@muupan
Copy link

muupan commented Oct 6, 2018

@Kaixhin

It seems that for the DQN-based agents at least, DeepMind evaluates using an ɛ-greedy policy, where ɛ = 0.05

Which specific papers do you mean by the DQN-based agents? As far as I know, at least the PER paper seems to use 0.01 for up-to-30-noop evaluation (from Table 5 of http://arxiv.org/abs/1511.05952), while the QR-DQN paper seems to use 0.001 for up-to-30-noop evaluation (from "Best agent performance" subsection of https://arxiv.org/abs/1710.10044). Correct me if I'm wrong.

@Kaixhin
Copy link

Kaixhin commented Oct 6, 2018

Sorry yes they may differ from paper to paper. I got ɛ = 0.05 from the (Nature) DQN paper, but recently it seems like they've been using ɛ = 0.001.

Unfortunately they are still changing settings - the new Pop-Art + IMPALA paper takes away the termination on loss of life wrapper. I hope they'll settle on the setup in the Revisiting ALE paper, but as long as DM is concerned with improving upon their own results it seems unlikely.

@muupan
Copy link

muupan commented Oct 6, 2018

Thank you very much for clarifying it. I hope so, too.

AdamGleave added a commit to HumanCompatibleAI/baselines that referenced this issue Mar 24, 2019
@steffenvan
Copy link

Hi @Kaixhin ,
since this is still open, I was wondering what exactly you mean with "PopArt + IMPALA paper takes away the loss of life wrapper"? I'm asking because I'm currently trying to reproduce those exact results with the current wrapper at the moment.

Best,
Steffen

@Kaixhin
Copy link

Kaixhin commented Apr 30, 2019

@steffenvan I assume that if you don't use the EpisodicLifeEnv wrapper you'll get an equivalent setup to the one they used in that paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants