[RLlib] SAC RLModule on new API stack. #42568

simonsays1980 · 2024-01-22T14:51:53Z

Why are these changes needed?

Transferring to the new stack SAC needs to be implemented with an RLModule. This PR delivers the files needed to define and configure the SACRLModule. Implementation is in PyTorch.

Related issue number

Closes #37778

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/models/torch/torch_distributions.py

rllib/algorithms/sac/sac_rl_module.py

sven1977 · 2024-01-23T13:21:09Z

rllib/algorithms/sac/sac_rl_module.py

+
+@ExperimentalAPI
+class SACRLModule(RLModule, RLModuleWithTargetNetworksInterface):
+    def setup(self):


Add override decorator.

Question: Should we call super here or not?

We should try to be super consistent and clear about the two decorators:

@OverrideToImplementCustomLogic
and
@OverrideToImplementCustomLogic_CallToSuperRecommended

Can you check, whether these already exist in the base: RLModule.setup() and if not add them as applicable?

Both are used in the super class. I think I remember that I ran into an error when calling super().setup(). I have to recheck, when testing.

rllib/algorithms/sac/sac_rl_module.py

sven1977 · 2024-01-23T13:27:02Z

rllib/algorithms/sac/sac_rl_module.py

+        self.action_dist_cls = catalog.get_action_dist_cls(framework=self.framework)
+
+        # Define the temperature.
+        self.alpha = self.config.model_config_dict["initial_alpha"]


Dumb question: How do we get this config.initial_alpha into the model config dict?
I had the same problem for DreamerV3 (and other algos) and I think we need to come up with a non-hacky solution for this problem.

Constraint here:

We don't want to have to pass the entire AlgorithmConfig into RLModule constructors as we envision RLModules to be used completely outside of RLlib in production.

Here, we actually do not need it in the model_config_dict as it is now only used in the learner. In general we have to solve this more consistently.

It might play here a role as the old stack does use a model config for the poilicy and one for the q model. If we want to use something like this we should implement such a solution in the same breath.

rllib/algorithms/sac/sac_rl_module.py

rllib/algorithms/sac/torch/sac_torch_rl_module.py

rllib/algorithms/sac/sac_rl_module.py

rllib/algorithms/sac/torch/sac_torch_rl_module.py

sven1977 · 2024-01-23T13:56:50Z

rllib/algorithms/sac/sac_catalog.py

+            "post_fcnet_activation"
+        ]
+
+        # We don't have the exact (framework specific) action dist class yet and thus


Yeah, same in PPO. I feel like we should change the Catalog API here:
Provide the framework already in the c'tor (imo, there is no good reason NOT to do this), then users can already construct the pi-config in the c'tor, which is much cleaner.

WOuld this also work with dynamic action spaces?

But do we already support dynamic action spaces?

The thing is that right now, we construct one module config in the c'tor and the other one during build(), which is pretty bad imo. We should be consistent in where and when we do what. :)

No, we do not support them right now - at least not for the new stack I guess.

Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…-rl-module Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…es in 'SACLearner'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/algorithms/sac/sac_rl_module.py

Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

rllib/algorithms/sac/torch/sac_torch_rl_module.py

Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

rllib/algorithms/sac/torch/sac_torch_rl_module.py

Signed-off-by: Sven Mika <sven@anyscale.io>

sven1977

LGTM now. Thanks for the fixes @simonsays1980 !

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…-rl-module Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…project#42568) Signed-off-by: khluu <khluu000@gmail.com>

simonsays1980 added 2 commits January 22, 2024 15:40

Added 'SACCatalog', 'SACRLModule', and 'SACTorchRLModule' to branch.

08a4720

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added 'SquashedGaussianTorchDistribution' for 'SAC'.

cede70f

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

simonsays1980 mentioned this pull request Jan 22, 2024

[RLlib] SAC on new API stack (w/ EnvRunner and ConnectorV2): SACLearner and SACTorchLearner classes. #42570

Merged

8 tasks

sven1977 changed the title ~~SAC-RLModule~~ [RLlib] SAC RLModule on new API stack. Jan 23, 2024

sven1977 marked this pull request as ready for review January 23, 2024 13:14

sven1977 requested review from sven1977, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners January 23, 2024 13:14