Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results and args used for space invaders? #1

Closed
gtoubassi opened this issue May 1, 2016 · 4 comments
Closed

Results and args used for space invaders? #1

gtoubassi opened this issue May 1, 2016 · 4 comments

Comments

@gtoubassi
Copy link

First off, thanks for sharing this awesome repo. I am attempting to reproduce the deepmind results with TF and am slowly getting better results as I buff out the many subtle issues. Your repo has really helped me figure things out!

I wanted to know what results you get for space invaders (on a public post I saw you mentioned 1500), and exactly what args you are using. If haven't run deep_rl_ale for the full 200 epochs but at about 125 (with all default args) I was seeing average scores of about 1000-1100. Maybe I just need to let it run, but I wanted to make sure.

Thanks much!

@Jabberwockyll
Copy link
Owner

First off, thanks for sharing this awesome repo. I am attempting to reproduce the deepmind results with TF and am slowly getting better results as I buff out the many subtle issues. Your repo has really helped me figure things out!

You're welcome! I'm excited that it's actually being used by others!

I wanted to know what results you get for space invaders (on a public post I saw you mentioned 1500), and exactly what args you are using. If haven't run deep_rl_ale for the full 200 epochs but at about 125 (with all default args) I was seeing average scores of about 1000-1100. Maybe I just need to let it run, but I wanted to make sure.

I got 1514 using --double_dqn and --gradient_clip=10. All other args were the defaults. DeepMind reports 1975 from the nature paper and 3154 with double dqn. These are significantly higher than my results, but I'm unaware of any differences between my implementation and theirs that might cause this. However, it looks like training progress hadn't yet converged and was still improving when the experiment ended:

space_invaders_scores

It seems like I was getting 1000-1300 range around epoch 125.

I don't know if it would have kept improving, but it seems to be progressing much slower than theirs. I have changed the initialization of the moving averages in my rmsprop since then, which might make the training progress a little faster initially, but I don't think it would make a major difference.

Let me know what results you get with what settings! It takes a while to run these experiments, so the more data the better. You could try running it for more than 200 epochs if you have the time and see if it still improves. Let me know if you have any hypotheses about the performance differences as well. I plan on putting all of my results in the wiki when I have finished testing on a couple more games.

@gtoubassi
Copy link
Author

Thats very helpful. I have a few other random questions if you are willing to contact me over mail (my github user name on gee male). I would also be interested in a more systematic hunt to reproduce the deepmind results with TF if you are game. Perhaps we could mount an offensive with help of others from the deep-q-learning google group.

@nishithbsk
Copy link

This question is off-topic to this discussion, but how did you use tensorboard to obtain the graph of score_per_game vs. epochs? I tried to use tensorboard --logdir <path/to/records/dir_containing_tf_events> but was not able to see any graphs. Maybe this broke when I upgraded my tensorflow to 0.8?

@Jabberwockyll
Copy link
Owner

Try tensorboard --logdir=${PWD}. You can launch tensorboard from any parent directory too. I'm using 0.8 and it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants