-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN: Optimizer #10
Comments
RMSprop with default DeepMind parameters is complete garbage. After 5,000,000 frames it raised the average score per episode to only -15 for Pong. For reference, Adam can converge to nearly perfect games (average score of +16) in the same amount of time. Long story short, RMSprop in Keras is either different from what they used, or Adam is just plain better. No more exploration will be done with RMSprop. EDIT: fix some spelling, grammar, etc. |
Nadam and Adam produce similar results. Nadam seems to take a small amount of extra time. Adam will be used from here on out. Notebooks are searching for a solid learning rate to lock for remaining experiments. EDIT: Nadam just achieved a high average score of 18.1. Rethinking this with more notebooks |
high learning rates seem to cause an explosion of gradients in the early stages. (i.e. 1e-4, 1e-3, 2e-3, etc.). something stable like 2e-5 might be the best learning rate |
Further experiments confirm that Adam running at 1e-4 produces unstable results. 2e-5 will be in place from here on out. |
The text was updated successfully, but these errors were encountered: