-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Discrete SAC #882
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before landing:
- can we move the loss to sac.py? I'd rather have them all in the same place if that makes sense?
- Can we add the loss to the doc?
- Is this supposed to work with gSDE? gSDE is not tailored for discrete action spaces AFAICT
Do you mean discrete and continuous sac loss in one objective class or having both losses just in the same file? Will add it to the doc and also take off the gSDE :) |
I was thinking of having them in the same file. If having them in the same class does not create a monster class I'm happy to consider it. Will it work with v1 and v2? |
For now, I just added it to the I also took off the gSDE from the loss and updated the description of the actor_network to be a How can I update the docs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- let's try to merge this :)
# Conflicts: # docs/source/reference/objectives.rst
torchrl/objectives/sac.py
Outdated
|
||
if target_entropy == "auto": | ||
target_entropy = -float( | ||
np.log(1.0 / action_spec["action"].shape[0]) * target_entropy_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
careful here: the [0]
can be the batch size
maybe the last dimension? Or since it's discrete we can check if it's a one hot or a discrete encoding and directly retrieve the number of options from the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the only place where we use action_spec
Maybe we could just pass the number of possible actions rather than passing the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that and adapted the example script due to recent TorchRL changes.
However, now some tests fail but I'm at it and hope to resolve them quickly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the example script issues and updated the objective tests as well as they were getting several errors.
Hopefully ready to merge now! :)
Description
Adding a discrete SAC example
Motivation and Context
Current SAC implementation only supports continuous action spaces. This PR will add the option to run a discrete SAC example based on the paper.
Convergence proof tested on CartPole-v1 (wandb)
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!