New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cartpole colab shows DQN outperforming C51? #148
Comments
the hyperparameters we are using for cartpole (and acrobot) were not tuned
for a very long time. we played with them to get something that's
reasonably stable, as the intent was to use it as a simple example that can
train quickly, rather than aiming to get SOTA.
…On Tue, Aug 4, 2020 at 12:16 AM Rylan Schaeffer ***@***.***> wrote:
I would've expected C51 to outperform DQN (at least initially, if not
asymptotically) but when I looked at the provided colab notebook, C51 seems
to be beaten by DQN most of the time:
[image: image]
<https://user-images.githubusercontent.com/8942987/89252334-45d83d00-d5ce-11ea-9547-8edb3a9d9c35.png>
I ran the notebook myself to get my own results, which largely agreed:
[image: image]
<https://user-images.githubusercontent.com/8942987/89252294-2ccf8c00-d5ce-11ea-952f-72e473099524.png>
I suppose there are two questions:
1.
Why is DQN so unstable?
2.
Why does DQN outperform C51?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#148>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3CCMI24RSA3IASFTVBLD3R66DQVANCNFSM4PT7JK4A>
.
|
Ok thank you for clarifying! In that case, can I ask for your insights into what fraction of hyperparameters C51 clearly displays superior performance to DQN and vice versa? Relatedly, when reading a paper like your paper with Lyle and Bellemare, how reliable are the results in Section 5.2? If distributional RL only outperforms classical RL under a very small subset of hyperparameters, how can a reader discern whether the result is something genuine from whether the result didn't test a sufficient number/range of hyperparameters? |
A statement like "We used the same hyperparameters for all algorithms, except for step sizes, where we chose the step size that gave the best performance for each algorithm." now seems a bit more concerning to me. |
On Tue, Aug 4, 2020 at 10:26 AM Rylan Schaeffer ***@***.***> wrote:
Ok thank you for clarifying! In that case, can I ask for your insights
into what fraction of hyperparameters C51 clearly displays superior
performance to DQN and vice versa?
Relatedly, when reading a paper like your paper with Lyle and Bellemare,
how reliable are the results in Section 5.2? If distributional RL only
outperforms classical RL under a very small subset of hyperparameters, how
can a reader discern whether the result is something genuine from whether
the result didn't test a sufficient number/range of hyperparameters?
for this paper clare did run hyperparameter sweeps for dqn and c51, but
these were run on custom code (dopamine had not yet been launched and it
was atari-only at the time), so she was not using the configs that have
been released with dopamine.
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3CCMLP3X3EXNVIR6K6QKTR7AK75ANCNFSM4PT7JK4A>
.
|
On Tue, Aug 4, 2020 at 10:27 AM Rylan Schaeffer ***@***.***> wrote:
A statement like "We used the same hyperparameters for all algorithms,
except for step sizes, where we chose the step size that gave the best
performance for each algorithm." now seems a bit more concerning to me.
our intent with the hyperparameter choices for dopamine was to try, as much
as possible, to provide an apples-to-apples comparison of the different
algorithms, not to provide SOTA configs for each. however, we do provide
the configs that match the hyperparameter choices used in the original
papers that introduced those algorithms.
the idea was to provide reasonable baselines for each of these algorithms
so that researchers could use that as a starting point to develop new
algorithms/ideas (SOTA or not).
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3CCMNZVSQROZOJ63LZ5MTR7ALFNANCNFSM4PT7JK4A>
.
|
I think my main question now is how a reader can discern whether a published result is a genuine effect versus whether the result didn't test a sufficient number/range of hyperparameters? |
On Tue, Aug 4, 2020 at 12:42 PM Rylan Schaeffer ***@***.***> wrote:
I think my main question now is how a reader can discern whether a
published result is a genuine effect from whether the result didn't test a
sufficient number/range of hyperparameters?
ah, _that_ general question :)
it's an important question, and i don't think there's a single right
answer, but things like the reproducibility challenge and the
reproducibility checklist do help in making sure that the results presented
are not just cherry-picked results, but actually show the merits (and
shortcomings) of a new algorithm in a way that carries through to other
results.
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3CCMJSSONSZHLK57P6PSDR7A3ANANCNFSM4PT7JK4A>
.
|
Hi @psc-g , sorry to bother you again, but I'm hoping you can help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense). Can you point me in the right direction? |
I was hoping this C51 + cartpole tutorial would be such an example. I'm just looking for an example I can play with where distributional agents learn faster and asymptote to a higher return per episode |
@psc-g sorry to bother you again, but can you help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense)? |
Single architecture, single environment, whatever it takes |
hi rylan,
i don't know off the top of my head, and i haven't run these types of
experiments to have a good suggestion for a simple environment that
exhibits these characteristics.
one thing you could try is with other gym environments (such as lunar
lander). i'm not sure a-priori whether distributional will outperform
expectational, though.
another option is running some atari games for fewer frames (instead of the
regular 200 million frames)?
i guess it depends on what your intended use case is...
…On Tue, Sep 1, 2020 at 5:02 PM Rylan Schaeffer ***@***.***> wrote:
Single architecture, single environment, whatever it takes
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3CCMJ57ZEX6USTW2OA3NLSDVOONANCNFSM4PT7JK4A>
.
|
hi rylan, any luck with this? |
None. I ran a few environments (Asterix, Breakout, Pong, Qbert, Seaquest, SpaceInvaders) and found mixed results. I had to abandon the project :( |
I would've expected C51 to outperform DQN (at least initially, if not asymptotically) but when I looked at the provided colab notebook, C51 seems to be beaten by DQN quite frequently:
I ran the notebook myself to get my own results, which largely agreed:
I suppose there are two questions:
Why is DQN so unstable?
Why does DQN outperform C51?
The text was updated successfully, but these errors were encountered: