theoretical question #5

hlsfin · 2022-02-22T14:18:39Z

Do you think the improvement noticed here has more to do with the fact that this is essentially two impala network compared to one impala, hence bigger network just tend to do better?

thank you

hlsfin · 2022-02-25T14:06:19Z

Also, I ran DAAC model on Atari games, and it didn't do as well as PPO methods; Can this just be a hyper parameter problem or something a bit deeper?
Thank you

rraileanu · 2022-02-25T15:57:35Z

DAAC is supposed to fix a problem that comes up in procedurally generated environments, where you train on a relatively small number of instances (compared to the test distriction) and want to generalize to new ones (in particular, when learning from high-dimensional inputs like images, and when the different instances have different episode lengths -- although this is not necessary).

This isn't a problem when you train and test on the same environment, as is the case in Atari, which doesn't have a notion of generalization or training on different task instances (each Atari game represents a single environment). It also wouldn't be as much of a problem if you train on all the environments you want to test on, even if they are procedurally generated.

The problem we describe in the paper that leads to overfitting when training on a subset of the environments and want to generalize to new ones from the same distribution remains irrespective of how large the network is. These 2 components are orthogonal, so I doubt that doubling the PPO network would get the same performance, although it might do better than the original PPO. But then, your baseline changes, so I would expect doubling the DAAC networks would also help. In any case, the main idea of our work is to decouple the training of the policy and value networks (which particularly helps generalization but also sample efficiency), rather than simply increase capacity.

hlsfin · 2022-02-25T19:15:33Z

I see, thank you for your explanation. However, I was under the impression that it would at least match PPO performance on a game of Breakout. What could be a possible explanation for this? Does having a bigger CNN architecture actually hurt the learning process?

rraileanu · 2022-02-25T19:57:09Z

Did you do a hyperparameter (HP) search? The optimal HPs for Atari might be different than for Procgen, and it's typical to tune HPs for each benchmark.

I don't think the bigger CNN is the main problem. However, it could be that gradients from the value provide more signal than those from the advantage (they are even noisier given that they are also dependant on the action and not only the action). Thus, it is possible that PPO does better on such singleton environments, although I would expect a small difference from PPO if properly tuned.

hlsfin · 2022-02-25T21:09:30Z

No, I have not, I thought the default values given was a good start to train on for Breakout. Also, for frame stacking, [num_process,framestack,84,84] for grayscale; how would I handle RGB frame stacking [num_process,framestack,3,84,84]. My guess is to vstack them such that it would be [num_process,7 (framestack*3),84,84] .

hlsfin · 2022-02-25T21:10:35Z

Here is what I got so far.

rraileanu · 2022-02-26T11:17:21Z

Yes that sounds right.

hlsfin closed this as completed Feb 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

theoretical question #5

theoretical question #5

hlsfin commented Feb 22, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 25, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 25, 2022 •

edited

hlsfin commented Feb 25, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 26, 2022

theoretical question #5

theoretical question #5

Comments

hlsfin commented Feb 22, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 25, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 25, 2022 • edited

hlsfin commented Feb 25, 2022

hlsfin commented Feb 25, 2022

rraileanu commented Feb 26, 2022

rraileanu commented Feb 25, 2022 •

edited