Cartpole colab shows DQN outperforming C51? #148

RylanSchaeffer · 2020-08-04T04:15:36Z

I would've expected C51 to outperform DQN (at least initially, if not asymptotically) but when I looked at the provided colab notebook, C51 seems to be beaten by DQN quite frequently:

I ran the notebook myself to get my own results, which largely agreed:

I suppose there are two questions:

Why is DQN so unstable?
Why does DQN outperform C51?

psc-g · 2020-08-04T13:09:40Z

the hyperparameters we are using for cartpole (and acrobot) were not tuned for a very long time. we played with them to get something that's reasonably stable, as the intent was to use it as a simple example that can train quickly, rather than aiming to get SOTA.

…

On Tue, Aug 4, 2020 at 12:16 AM Rylan Schaeffer ***@***.***> wrote: I would've expected C51 to outperform DQN (at least initially, if not asymptotically) but when I looked at the provided colab notebook, C51 seems to be beaten by DQN most of the time: [image: image] <https://user-images.githubusercontent.com/8942987/89252334-45d83d00-d5ce-11ea-9547-8edb3a9d9c35.png> I ran the notebook myself to get my own results, which largely agreed: [image: image] <https://user-images.githubusercontent.com/8942987/89252294-2ccf8c00-d5ce-11ea-952f-72e473099524.png> I suppose there are two questions: 1. Why is DQN so unstable? 2. Why does DQN outperform C51? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#148>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMI24RSA3IASFTVBLD3R66DQVANCNFSM4PT7JK4A> .

RylanSchaeffer · 2020-08-04T14:25:49Z

Ok thank you for clarifying! In that case, can I ask for your insights into what fraction of hyperparameters C51 clearly displays superior performance to DQN and vice versa?

Relatedly, when reading a paper like your paper with Lyle and Bellemare, how reliable are the results in Section 5.2? If distributional RL only outperforms classical RL under a very small subset of hyperparameters, how can a reader discern whether the result is something genuine from whether the result didn't test a sufficient number/range of hyperparameters?

RylanSchaeffer · 2020-08-04T14:27:17Z

A statement like "We used the same hyperparameters for all algorithms, except for step sizes, where we chose the step size that gave the best performance for each algorithm." now seems a bit more concerning to me.

psc-g · 2020-08-04T16:11:26Z

On Tue, Aug 4, 2020 at 10:26 AM Rylan Schaeffer ***@***.***> wrote: Ok thank you for clarifying! In that case, can I ask for your insights into what fraction of hyperparameters C51 clearly displays superior performance to DQN and vice versa? Relatedly, when reading a paper like your paper with Lyle and Bellemare, how reliable are the results in Section 5.2? If distributional RL only outperforms classical RL under a very small subset of hyperparameters, how can a reader discern whether the result is something genuine from whether the result didn't test a sufficient number/range of hyperparameters?

for this paper clare did run hyperparameter sweeps for dqn and c51, but these were run on custom code (dopamine had not yet been launched and it was atari-only at the time), so she was not using the configs that have been released with dopamine.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMLP3X3EXNVIR6K6QKTR7AK75ANCNFSM4PT7JK4A> .

psc-g · 2020-08-04T16:13:45Z

On Tue, Aug 4, 2020 at 10:27 AM Rylan Schaeffer ***@***.***> wrote: A statement like "We used the same hyperparameters for all algorithms, except for step sizes, where we chose the step size that gave the best performance for each algorithm." now seems a bit more concerning to me.

our intent with the hyperparameter choices for dopamine was to try, as much as possible, to provide an apples-to-apples comparison of the different algorithms, not to provide SOTA configs for each. however, we do provide the configs that match the hyperparameter choices used in the original papers that introduced those algorithms. the idea was to provide reasonable baselines for each of these algorithms so that researchers could use that as a starting point to develop new algorithms/ideas (SOTA or not).

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMNZVSQROZOJ63LZ5MTR7ALFNANCNFSM4PT7JK4A> .

RylanSchaeffer · 2020-08-04T16:42:30Z

I think my main question now is how a reader can discern whether a published result is a genuine effect versus whether the result didn't test a sufficient number/range of hyperparameters?

psc-g · 2020-08-04T17:01:54Z

On Tue, Aug 4, 2020 at 12:42 PM Rylan Schaeffer ***@***.***> wrote: I think my main question now is how a reader can discern whether a published result is a genuine effect from whether the result didn't test a sufficient number/range of hyperparameters?

ah, _that_ general question :) it's an important question, and i don't think there's a single right answer, but things like the reproducibility challenge and the reproducibility checklist do help in making sure that the results presented are not just cherry-picked results, but actually show the merits (and shortcomings) of a new algorithm in a way that carries through to other results.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMJSSONSZHLK57P6PSDR7A3ANANCNFSM4PT7JK4A> .

RylanSchaeffer · 2020-08-07T21:44:21Z

Hi @psc-g , sorry to bother you again, but I'm hoping you can help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense). Can you point me in the right direction?

RylanSchaeffer · 2020-08-07T21:47:12Z

I was hoping this C51 + cartpole tutorial would be such an example. I'm just looking for an example I can play with where distributional agents learn faster and asymptote to a higher return per episode

RylanSchaeffer · 2020-09-01T21:01:52Z

@psc-g sorry to bother you again, but can you help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense)?

RylanSchaeffer · 2020-09-01T21:02:16Z

Single architecture, single environment, whatever it takes

psc-g · 2020-09-02T16:55:06Z

hi rylan, i don't know off the top of my head, and i haven't run these types of experiments to have a good suggestion for a simple environment that exhibits these characteristics. one thing you could try is with other gym environments (such as lunar lander). i'm not sure a-priori whether distributional will outperform expectational, though. another option is running some atari games for fewer frames (instead of the regular 200 million frames)? i guess it depends on what your intended use case is...

…

On Tue, Sep 1, 2020 at 5:02 PM Rylan Schaeffer ***@***.***> wrote: Single architecture, single environment, whatever it takes — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3CCMJ57ZEX6USTW2OA3NLSDVOONANCNFSM4PT7JK4A> .

psc-g · 2021-04-09T14:18:39Z

hi rylan, any luck with this?

RylanSchaeffer · 2021-05-09T02:27:21Z

None. I ran a few environments (Asterix, Breakout, Pong, Qbert, Seaquest, SpaceInvaders) and found mixed results. I had to abandon the project :(

RylanSchaeffer closed this as completed Aug 4, 2020

RylanSchaeffer reopened this Aug 7, 2020

RylanSchaeffer closed this as completed May 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cartpole colab shows DQN outperforming C51? #148

Cartpole colab shows DQN outperforming C51? #148

RylanSchaeffer commented Aug 4, 2020 •

edited

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 4, 2020

RylanSchaeffer commented Aug 4, 2020

psc-g commented Aug 4, 2020 via email

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 4, 2020 •

edited

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 7, 2020

RylanSchaeffer commented Aug 7, 2020 •

edited

RylanSchaeffer commented Sep 1, 2020

RylanSchaeffer commented Sep 1, 2020

psc-g commented Sep 2, 2020 via email

psc-g commented Apr 9, 2021

RylanSchaeffer commented May 9, 2021 •

edited

Cartpole colab shows DQN outperforming C51? #148

Cartpole colab shows DQN outperforming C51? #148

Comments

RylanSchaeffer commented Aug 4, 2020 • edited

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 4, 2020

RylanSchaeffer commented Aug 4, 2020

psc-g commented Aug 4, 2020 via email

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 4, 2020 • edited

psc-g commented Aug 4, 2020 via email

RylanSchaeffer commented Aug 7, 2020

RylanSchaeffer commented Aug 7, 2020 • edited

RylanSchaeffer commented Sep 1, 2020

RylanSchaeffer commented Sep 1, 2020

psc-g commented Sep 2, 2020 via email

psc-g commented Apr 9, 2021

RylanSchaeffer commented May 9, 2021 • edited

RylanSchaeffer commented Aug 4, 2020 •

edited

RylanSchaeffer commented Aug 4, 2020 •

edited

RylanSchaeffer commented Aug 7, 2020 •

edited

RylanSchaeffer commented May 9, 2021 •

edited