[Feature] Discrete SAC #882

BY571 · 2023-01-30T15:52:11Z

Description

Adding a discrete SAC example

Motivation and Context

Current SAC implementation only supports continuous action spaces. This PR will add the option to run a discrete SAC example based on the paper.

Convergence proof tested on CartPole-v1 (wandb)

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

LGTM
Before landing:

can we move the loss to sac.py? I'd rather have them all in the same place if that makes sense?
Can we add the loss to the doc?
Is this supposed to work with gSDE? gSDE is not tailored for discrete action spaces AFAICT

torchrl/objectives/discrete_sac.py

BY571 · 2023-02-07T13:01:44Z

LGTM Before landing:

can we move the loss to sac.py? I'd rather have them all in the same place if that makes sense?

Can we add the loss to the doc?

Is this supposed to work with gSDE? gSDE is not tailored for discrete action spaces AFAICT

Do you mean discrete and continuous sac loss in one objective class or having both losses just in the same file?
I'd prefer to have them in the same class, what do you think? Will have a look at it in the coming days.

Will add it to the doc and also take off the gSDE :)

vmoens · 2023-02-07T15:08:55Z

I'd prefer to have them in the same class
How much control flow would that entail?
Does it save a lot of code?

I was thinking of having them in the same file. If having them in the same class does not create a monster class I'm happy to consider it.

Will it work with v1 and v2?

…ete_sac

BY571 · 2023-02-09T16:19:57Z

I was thinking of having them in the same file. If having them in the same class does not create a monster class I'm happy to consider it.

Will it work with v1 and v2?

For now, I just added it to the sac.py file in objectives. As it only works with v2 it might get messy and as you said would probably create a monster class. Let me know what you think.

I also took off the gSDE from the loss and updated the description of the actor_network to be a TensorDictModule.

How can I update the docs?

vmoens

LGTM

vmoens

LGTM -- let's try to merge this :)

# Conflicts: # docs/source/reference/objectives.rst

vmoens · 2023-03-17T10:49:28Z

torchrl/objectives/sac.py

+
+        if target_entropy == "auto":
+            target_entropy = -float(
+                np.log(1.0 / action_spec["action"].shape[0]) * target_entropy_weight


careful here: the [0] can be the batch size
maybe the last dimension? Or since it's discrete we can check if it's a one hot or a discrete encoding and directly retrieve the number of options from the spec?

I think it's the only place where we use action_spec
Maybe we could just pass the number of possible actions rather than passing the spec?

Fixed that and adapted the example script due to recent TorchRL changes.
However, now some tests fail but I'm at it and hope to resolve them quickly!

Fixed the example script issues and updated the objective tests as well as they were getting several errors.
Hopefully ready to merge now! :)

…nd example script

BY571 added 3 commits January 25, 2023 18:32

add discrete sac example

0044d38

fix objectives

8d18fa7

add discrete sac tests

67f9fc0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2023

BY571 and others added 2 commits January 31, 2023 10:40

Merge branch 'main' into discrete_sac

dc94676

Merge branch 'pytorch:main' into discrete_sac

df2cbce

BY571 marked this pull request as ready for review January 31, 2023 10:18

Merge branch 'pytorch:main' into discrete_sac

116c6cc

vmoens added enhancement New feature or request new algo New algorithm request or PR labels Feb 6, 2023

Merge branch 'main' into discrete_sac

98af201

vmoens reviewed Feb 6, 2023

View reviewed changes

torchrl/objectives/discrete_sac.py Outdated Show resolved Hide resolved

torchrl/objectives/discrete_sac.py Outdated Show resolved Hide resolved

BY571 added 2 commits February 9, 2023 17:02

move discrete sac loss to sac loss

75c35bb

Merge branch 'discrete_sac' of https://github.com/BY571/rl into discr…

b31eeec

…ete_sac

add DiscreteSACLoss and TD3Loss to docs

b5e33f7

vmoens approved these changes Feb 9, 2023

View reviewed changes

vmoens and others added 5 commits February 9, 2023 21:21

Merge branch 'main' into discrete_sac

1a47230

Merge branch 'pytorch:main' into discrete_sac

cf4e13b

Merge branch 'pytorch:main' into discrete_sac

738c41d

Merge branch 'pytorch:main' into discrete_sac

4caeb52

Merge branch 'main' into discrete_sac

bb4d5a1

vmoens approved these changes Mar 17, 2023

View reviewed changes

Merge remote-tracking branch 'BY571/discrete_sac' into discrete_sac

030f1ae

# Conflicts: # docs/source/reference/objectives.rst

vmoens reviewed Mar 17, 2023

View reviewed changes

BY571 and others added 3 commits March 20, 2023 13:32

fix target entropy calc and num_action input, update objectives tst a…

fcf6f86

…nd example script

Merge branch 'pytorch:main' into discrete_sac

3dc85c7

Merge branch 'main' into discrete_sac

fcaf375

vmoens merged commit 8e03f6b into pytorch:main Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Discrete SAC #882

[Feature] Discrete SAC #882

BY571 commented Jan 30, 2023

vmoens left a comment

BY571 commented Feb 7, 2023

vmoens commented Feb 7, 2023

BY571 commented Feb 9, 2023

vmoens left a comment

vmoens left a comment •

edited

Loading

vmoens Mar 17, 2023

vmoens Mar 17, 2023

BY571 Mar 17, 2023

BY571 Mar 20, 2023

[Feature] Discrete SAC #882

[Feature] Discrete SAC #882

Conversation

BY571 commented Jan 30, 2023

Description

Motivation and Context

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

BY571 commented Feb 7, 2023

vmoens commented Feb 7, 2023

BY571 commented Feb 9, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment • edited Loading

Choose a reason for hiding this comment

vmoens Mar 17, 2023

Choose a reason for hiding this comment

vmoens Mar 17, 2023

Choose a reason for hiding this comment

BY571 Mar 17, 2023

Choose a reason for hiding this comment

BY571 Mar 20, 2023

Choose a reason for hiding this comment

vmoens left a comment •

edited

Loading