Skip to content
This repository has been archived by the owner on Jan 16, 2023. It is now read-only.

Missing learning curve data for Defender Surround #78

Closed
vwxyzjn opened this issue Jul 29, 2022 · 7 comments
Closed

Missing learning curve data for Defender Surround #78

vwxyzjn opened this issue Jul 29, 2022 · 7 comments

Comments

@vwxyzjn
Copy link

vwxyzjn commented Jul 29, 2022

Hi, thanks for making the learning curves data available! I am using it for my study but noticed the csv file in https://github.com/google-research/seed_rl/blob/master/docs/r2d2_atari_training_curves.md lacks the data for Defender and Surround. Would you mind looking into it?

E.g., the Surround learning curve should come after StarGunner
image

Thank you!

@brieyla1
Copy link

the frames are similar to IMPALA, using V-TRACE, in my opinion your best bet is to run it yourself with GPUs, the TPU setup is very hard and still being discussed.

@vwxyzjn
Copy link
Author

vwxyzjn commented Sep 26, 2022

Hi @brieyla1, thanks for the suggestion. I was hoping to use the TPU learning curves for a paper, and it's probably gonna be an inconsistent setup if I replicated the experiments with GPU or used a different algorithm like IMPALA :/

@brieyla1
Copy link

I followed the project since its early days, never got a TPU to run on it unfortunately.

The tpu doesn't work as expected really, it was developed with hyper nightly versions that don't work anymore from what I know.

@lespeholt from google, may know a bit more.

I consider this project deprecated as of now.

@lespeholt
Copy link
Collaborator

@vwxyzjn There shouldn't be a big difference between GPU and TPU or IMPALA or batched A2C for that matter, if you stick to small scale setup.

We always wanted a nice public TPU version. However, due to full TPU release on Cloud didn't get available in time, the effort for that died out a bit. And yeah, other priorities took over :-)

@vwxyzjn
Copy link
Author

vwxyzjn commented Sep 27, 2022

@lespeholt Thank you for the response! On a related note, any chance you have the learning curves raw data for IMPALA? We would like to know the level of human median scores achieved within the first hour.

In the IMPALA paper, it was mentioned that the "shallow IMPALA (Atari) experiment completes training over 200 million frames in less than one hour", but we are wondering 1) how much time did the deep IMPALA experiment take and 2) how much time did the shallow IMPALA take exactly. The closest paper that has this info is R2D2, but the x-axis has a much larger scale, as shown below:

Screen Shot 2022-09-27 at 12 08 03 PM

We are working on distributed RL in a simplified setup that only does rollouts asynchronously, and we would love to compare prior state-of-the-art works such as yours. Because we are limited by computational resources, we can only run experiments up to an hour per game per seed, which is difficult to compare past works without the raw learning curves.

I would be happy to provide more info / chat more if you are interested. Your work is awesome, and I would love to connect and learn from you :)

@lespeholt
Copy link
Collaborator

Unfortunately I don't have access to the IMPALA curves.

  1. ". Using 8 learner GPUs instead of
    1 further speeds up training of the deep model by a factor of
    7 to 210K frames/sec, up from 30K frames/sec."
    so it's roughly 200/30 = 6.67x slower with one GPU, and slightly faster with 8 GPUs (so not quite linear scale due to communication between GPUs). SEED and Cloud TPUs scales better.

  2. See table 1. If we assume batch size 32, it is 200000000/200000/3600*60 = 17 minutes. With SEED you can actually get under 1 minute if necessary.

Best, Lasse

@vwxyzjn
Copy link
Author

vwxyzjn commented Sep 29, 2022

Thank you, Lasse! I might need to bother you with some additional questions...

Both Table 1 and Figure 6 show the SPS number in DMLab-30. Would those numbers be the same for Atari environments? Does the DMLab run at the same speed as Atari?

See table 1. If we assume batch size 32, it is 200000000/200000/3600*60 = 17 minutes. With SEED you can actually get under 1 minute if necessary.

Does this mean IMPALA (shallow) obtains 93.2% median human normalized scores (HNS) after 17 minutes of training? Would this imply SEED RL can obtain 93.2% median HNS under 1 minute of training?

When plotting using the SEED RL's R2D2 learning curves, I find that obtaining 100% median HNS (when omitting Defender and Surround) took about an hour.

Screen Shot 2022-09-29 at 1 37 23 PM

Here is a direct analysis of Figure 6 in the SEED RL paper.
image

It seems that sample efficiency could get lower when increasing throughput. Setting throughput aside, I am interested in the wall-time performance. That is, what is the published algorithm that has the minimum amount of training wall-time hours to make the agent reach 100% median HNS?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants