Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN: Optimizer #10

Closed
3 tasks done
Kautenja opened this issue Apr 5, 2018 · 4 comments
Closed
3 tasks done

DQN: Optimizer #10

Kautenja opened this issue Apr 5, 2018 · 4 comments

Comments

@Kautenja
Copy link
Owner

Kautenja commented Apr 5, 2018

  • Adam
  • Nadam
  • RMSprop
@Kautenja
Copy link
Owner Author

Kautenja commented Apr 8, 2018

RMSprop with default DeepMind parameters is complete garbage. After 5,000,000 frames it raised the average score per episode to only -15 for Pong. For reference, Adam can converge to nearly perfect games (average score of +16) in the same amount of time. Long story short, RMSprop in Keras is either different from what they used, or Adam is just plain better. No more exploration will be done with RMSprop.

EDIT: fix some spelling, grammar, etc.

@Kautenja
Copy link
Owner Author

Kautenja commented Apr 9, 2018

Nadam and Adam produce similar results. Nadam seems to take a small amount of extra time. Adam will be used from here on out. Notebooks are searching for a solid learning rate to lock for remaining experiments.

EDIT: Nadam just achieved a high average score of 18.1. Rethinking this with more notebooks

@Kautenja
Copy link
Owner Author

Kautenja commented Apr 9, 2018

high learning rates seem to cause an explosion of gradients in the early stages. (i.e. 1e-4, 1e-3, 2e-3, etc.). something stable like 2e-5 might be the best learning rate

@Kautenja
Copy link
Owner Author

Further experiments confirm that Adam running at 1e-4 produces unstable results. 2e-5 will be in place from here on out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant