Performance of release v1.0 on Space Invaders #26

marintoro · 2018-08-06T09:27:00Z

I just launched the release v1.0 (commit 952fcb4) on Space Invaders for the whole week-end (around 25M steps). I took the exact same code with the exact same random seed.
I got really lower performance than the one you are showing.
Here are the plots of rewards and Q-values

Could you explain exactly how you got your results for this release? Did you try multiple experiments with different random seed and average them or just took the best one of them?
Or maybe it's a pytorch, atari_py or any other library issue? Could you give all your library version?

The text was updated successfully, but these errors were encountered:

Kaixhin · 2018-08-06T22:22:11Z

Thanks for checking! I've dug through my original results for 1.0 and I've got the models/plots for Beam Rider, Enduro, Ms. Pac-Man and Seaquest. Performance for Seaquest has also dropped for 1.1, don't have 1.1 results for any other games. Got rid of my suboptimal 1.0 Frostbite results after I got expected results on 1.1. But for some reason I've lost my Space Invaders results, so I'm now confused as to where I managed to get those results. Currently I'm running with pytorch 0.4.1, atari_py 0.1.1 and opencv-python 3.4.2.16, but I've not been tracking library versions. I always just run the same seed and one experiment. Let's leave this issue open until the results can be replicated. I'm currently seeing if removing gradient clipping from 1.1 will get Space Invaders to work, but that'll take a week to check if I can even hang on to my current GPU. If you still have the trained model from this experiment a quick sanity check would be to load it but use epsilon=0.001 for testing.

marintoro · 2018-08-07T08:58:49Z

So I just did the sanity check you told me with epsilon=0.001 for testing and it doesn't really change anything (I don't got the exact same result but almost).
My library versions are :
atari_py : 0.1.1
torch : 0.4.0
opencv-python : 3.4.0.12

I don't expect those minor difference to be a problem but I may be wrong...

Actually I got a pretty good CPU/GPU available right now and I could make some other test (but I really would like to get the working version on Space Invaders! This is my sanity check for my own multi-agent version ^^)

Kaixhin · 2018-08-07T09:27:33Z

I would have been using torch 0.4.0 at the time, and I don't expect opencv-python to change a lot on a minor version (I may have been on the same version as you anyway).

If you have compute to test then taking master and running Space Invaders with priority weights not included in the new priorities would be the next thing to check.

epsilon=0.001 was needed for Pong to report the right scores, and using log softmax in training prevents numerical problems (had these in Q*bert) so I'm pretty sure those are needed.

marintoro · 2018-08-07T09:33:06Z

Not sure to have fully understand what you mean there.
I should run the current master on Space Invaders but removing which part? (which commits are involved particularly?)

Kaixhin · 2018-08-07T09:48:55Z

Reverting d6538df and running Space Invaders would be a good test.

marintoro · 2018-08-07T10:04:45Z

Hum I manually reverted this commit cause I had some conflicts (and it was only 5 lines changed).
I just launched now the experiment on Space Invaders.

I don't really know how I could share you the new branch I just made for sanity check?

Edited: I come just to upgrade my plotly version from 2.5.1 to 3.1.0 to make it work.

Kaixhin · 2018-08-07T10:50:37Z

Fork this repo, make a branch with the changes and just point to it in this issue.

marintoro · 2018-08-07T13:31:20Z

Here is the change I made to revert the gradient clipping.
The experiment is currently ongoing.

https://github.com/marintoro/Rainbow/commits/master

marintoro · 2018-08-09T10:04:38Z

I am currently at 14M steps and it looks really similar to everything else... (but yes 14M is not enough, it's still running). But at 14M on your graph, we are already suppose to get rewards around 10000 and there it's still barely 3000....
Here are the current rewards and Q-values

Kaixhin · 2018-08-09T23:09:21Z

I would say wait to about 18M just in case, but it doesn't look promising. Of all the hyperparameters I might have changed, I might have used --noisy-std=0.5, but can't think of anything else. I'm looking at what has changed between 1.0 and master that might be the culprit, but I'm really not sure. There's changing the noise sampling at a8d01b8, but those should be equivalent. There's disabling cuDNN, but if adding that back worked that would be troubling because it should just be a slightly nondeterministic version of what's happening now.

marintoro · 2018-08-10T07:17:25Z

GREAT NEWS! It looks way more promising at 20M steps!!! It's the first time I got score above 3k (and it's even sometimes hit more than 20k!!!)

By the way I really don't think the current version 1.0 work on Space Invaders (c.f. the first post of this issue) or at least on my computer with all my library versions it doesn't (but like it's working with the revert_grad_clipping version on my computer with the exact same library versions, I think that it's just v1.0 which is not working as expected...).

Now I think you should try with the current master to see if it works or not on Space Invaders, to see if there is really a problem with the grad clipping?
I must admit that now that I got a version working on Space Invaders, I will work back to my multi-agent version (and maybe everything will work just fine now :p).

Edited: It's pretty funny, it really started to play way better 1M step after my first screenshot ^^

Re-edited: I stopped the training, I updated the reward (just 2M step more...), it looks still good (with one time hitting 50k!)

Kaixhin · 2018-08-10T08:22:32Z

Really glad that you waited, I would say that's good enough. Below I have a run from master without gradient clipping, and what I believe is a run from 1.1 stagnating at 3k, which together indicates that what you have now is the correct setting, but I'll need to check that properly.

Kaixhin · 2018-09-15T07:59:53Z

Closing because an agent trained from master gets crazy high scores.

marintoro mentioned this issue Aug 6, 2018

Performance with QR prioritization on Space Invaders #25

Closed

Kaixhin closed this as completed Sep 15, 2018

neevparikh mentioned this issue May 23, 2020

Add gradient clipping #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of release v1.0 on Space Invaders #26

Performance of release v1.0 on Space Invaders #26

marintoro commented Aug 6, 2018

Kaixhin commented Aug 6, 2018

marintoro commented Aug 7, 2018

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018 •

edited

Loading

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018 •

edited

Loading

marintoro commented Aug 9, 2018

Kaixhin commented Aug 9, 2018

marintoro commented Aug 10, 2018 •

edited

Loading

Kaixhin commented Aug 10, 2018 •

edited

Loading

Kaixhin commented Sep 15, 2018

Performance of release v1.0 on Space Invaders #26

Performance of release v1.0 on Space Invaders #26

Comments

marintoro commented Aug 6, 2018

Kaixhin commented Aug 6, 2018

marintoro commented Aug 7, 2018

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018 • edited Loading

Kaixhin commented Aug 7, 2018

marintoro commented Aug 7, 2018 • edited Loading

marintoro commented Aug 9, 2018

Kaixhin commented Aug 9, 2018

marintoro commented Aug 10, 2018 • edited Loading

Kaixhin commented Aug 10, 2018 • edited Loading

Kaixhin commented Sep 15, 2018

marintoro commented Aug 7, 2018 •

edited

Loading

marintoro commented Aug 7, 2018 •

edited

Loading

marintoro commented Aug 10, 2018 •

edited

Loading

Kaixhin commented Aug 10, 2018 •

edited

Loading