Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cartpole colab shows DQN outperforming C51? #148

Closed
RylanSchaeffer opened this issue Aug 4, 2020 · 14 comments
Closed

Cartpole colab shows DQN outperforming C51? #148

RylanSchaeffer opened this issue Aug 4, 2020 · 14 comments

Comments

@RylanSchaeffer
Copy link

RylanSchaeffer commented Aug 4, 2020

I would've expected C51 to outperform DQN (at least initially, if not asymptotically) but when I looked at the provided colab notebook, C51 seems to be beaten by DQN quite frequently:

image

I ran the notebook myself to get my own results, which largely agreed:

image

I suppose there are two questions:

  1. Why is DQN so unstable?

  2. Why does DQN outperform C51?

@psc-g
Copy link
Collaborator

psc-g commented Aug 4, 2020 via email

@RylanSchaeffer
Copy link
Author

Ok thank you for clarifying! In that case, can I ask for your insights into what fraction of hyperparameters C51 clearly displays superior performance to DQN and vice versa?

Relatedly, when reading a paper like your paper with Lyle and Bellemare, how reliable are the results in Section 5.2? If distributional RL only outperforms classical RL under a very small subset of hyperparameters, how can a reader discern whether the result is something genuine from whether the result didn't test a sufficient number/range of hyperparameters?

@RylanSchaeffer
Copy link
Author

A statement like "We used the same hyperparameters for all algorithms, except for step sizes, where we chose the step size that gave the best performance for each algorithm." now seems a bit more concerning to me.

@psc-g
Copy link
Collaborator

psc-g commented Aug 4, 2020 via email

@psc-g
Copy link
Collaborator

psc-g commented Aug 4, 2020 via email

@RylanSchaeffer
Copy link
Author

RylanSchaeffer commented Aug 4, 2020

I think my main question now is how a reader can discern whether a published result is a genuine effect versus whether the result didn't test a sufficient number/range of hyperparameters?

@psc-g
Copy link
Collaborator

psc-g commented Aug 4, 2020 via email

@RylanSchaeffer
Copy link
Author

Hi @psc-g , sorry to bother you again, but I'm hoping you can help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense). Can you point me in the right direction?

@RylanSchaeffer
Copy link
Author

RylanSchaeffer commented Aug 7, 2020

I was hoping this C51 + cartpole tutorial would be such an example. I'm just looking for an example I can play with where distributional agents learn faster and asymptote to a higher return per episode

@RylanSchaeffer
Copy link
Author

@psc-g sorry to bother you again, but can you help me find a very simple example to play with in which distributional RL is clearly better (in an apples-to-apples comparison sense)?

@RylanSchaeffer
Copy link
Author

Single architecture, single environment, whatever it takes

@psc-g
Copy link
Collaborator

psc-g commented Sep 2, 2020 via email

@psc-g
Copy link
Collaborator

psc-g commented Apr 9, 2021

hi rylan, any luck with this?

@RylanSchaeffer
Copy link
Author

RylanSchaeffer commented May 9, 2021

None. I ran a few environments (Asterix, Breakout, Pong, Qbert, Seaquest, SpaceInvaders) and found mixed results. I had to abandon the project :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants