-
Notifications
You must be signed in to change notification settings - Fork 146
Missing learning curve data for Defender
Surround
#78
Comments
the frames are similar to IMPALA, using V-TRACE, in my opinion your best bet is to run it yourself with GPUs, the TPU setup is very hard and still being discussed. |
Hi @brieyla1, thanks for the suggestion. I was hoping to use the TPU learning curves for a paper, and it's probably gonna be an inconsistent setup if I replicated the experiments with GPU or used a different algorithm like IMPALA :/ |
I followed the project since its early days, never got a TPU to run on it unfortunately. The tpu doesn't work as expected really, it was developed with hyper nightly versions that don't work anymore from what I know. @lespeholt from google, may know a bit more. I consider this project deprecated as of now. |
@vwxyzjn There shouldn't be a big difference between GPU and TPU or IMPALA or batched A2C for that matter, if you stick to small scale setup. We always wanted a nice public TPU version. However, due to full TPU release on Cloud didn't get available in time, the effort for that died out a bit. And yeah, other priorities took over :-) |
@lespeholt Thank you for the response! On a related note, any chance you have the learning curves raw data for IMPALA? We would like to know the level of human median scores achieved within the first hour. In the IMPALA paper, it was mentioned that the "shallow IMPALA (Atari) experiment completes training over 200 million frames in less than one hour", but we are wondering 1) how much time did the deep IMPALA experiment take and 2) how much time did the shallow IMPALA take exactly. The closest paper that has this info is R2D2, but the x-axis has a much larger scale, as shown below: We are working on distributed RL in a simplified setup that only does rollouts asynchronously, and we would love to compare prior state-of-the-art works such as yours. Because we are limited by computational resources, we can only run experiments up to an hour per game per seed, which is difficult to compare past works without the raw learning curves. I would be happy to provide more info / chat more if you are interested. Your work is awesome, and I would love to connect and learn from you :) |
Unfortunately I don't have access to the IMPALA curves.
Best, Lasse |
Thank you, Lasse! I might need to bother you with some additional questions... Both Table 1 and Figure 6 show the SPS number in DMLab-30. Would those numbers be the same for Atari environments? Does the DMLab run at the same speed as Atari?
Does this mean IMPALA (shallow) obtains 93.2% median human normalized scores (HNS) after 17 minutes of training? Would this imply SEED RL can obtain 93.2% median HNS under 1 minute of training? When plotting using the SEED RL's R2D2 learning curves, I find that obtaining 100% median HNS (when omitting Here is a direct analysis of Figure 6 in the SEED RL paper. It seems that sample efficiency could get lower when increasing throughput. Setting throughput aside, I am interested in the wall-time performance. That is, what is the published algorithm that has the minimum amount of training wall-time hours to make the agent reach 100% median HNS? |
Hi, thanks for making the learning curves data available! I am using it for my study but noticed the csv file in https://github.com/google-research/seed_rl/blob/master/docs/r2d2_atari_training_curves.md lacks the data for
Defender
andSurround
. Would you mind looking into it?E.g., the Surround learning curve should come after
StarGunner
Thank you!
The text was updated successfully, but these errors were encountered: