-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] DDPG worse performance than stable baselines #1181
Comments
Hi @smorad I made a working example out from your script which I'm hosting on a separate branch. You can see my edits here: This is the file: I testes it for 10K frames and it seems to be working fine. The major issue is that in your script, you're not building target parameters. I see two problems for torchrl here:
Other small issues:
Let me know if you have questions or remarks! Happy to address anything that i missed. |
Fantastic, thanks for the quick response! |
The missing target network (oops) and |
DDPG should default to have a target net and we should not be able to create SoftUpdate without one, so this one's defo on me :) |
Describe the bug
The stable-baselines 3 (SB3) version of DDPG seems to significantly outperform the torchrl DDPG implementation. I implemented this following @matteobettini's VMAS MADDPG script very closely, so it is possible that I made mistakes.
To Reproduce
SB3 version
Here are the results from the last few SB3 epochs
Here is my torchrl implementation:
And here are the torchrl results
Expected behavior
I'd expect the SB3 and torchrl implementations to be roughly equivalent in terms of reward and value loss. It appears that the torchrl version isn't really learning and the critic loss is much worse.
System info
Describe the characteristic of your environment:
From source (master branch, 22 May)
Checklist
The text was updated successfully, but these errors were encountered: