Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom model for Soft-Actor-Critic in [rllib] #13218

Closed
sahikagenc opened this issue Jan 5, 2021 · 4 comments
Closed

Custom model for Soft-Actor-Critic in [rllib] #13218

sahikagenc opened this issue Jan 5, 2021 · 4 comments
Assignees
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical

Comments

@sahikagenc
Copy link

Is it possible to provide a custom model to SAC from a configuration file such as the case for model parameter as follows:

                    # Model options for the Q network(s).
                    "Q_model": {
                        "model": "SoftQ_V2_model_sac",
                        #"fcnet_activation": "relu",
                        #"fcnet_hiddens": [256, 256],
                    },
                    # Model options for the policy function.
                    "policy_model": {
                        "model": "Policy_V2_model_sac",
                        #"fcnet_activation": "relu",
                        #"fcnet_hiddens": [256, 256],
                    },
@sahikagenc sahikagenc added the enhancement Request for new feature and/or capability label Jan 5, 2021
@sven1977 sven1977 added P2 Important issue, but not time-critical rllib labels Jan 13, 2021
@sven1977 sven1977 self-assigned this Jan 13, 2021
@sven1977
Copy link
Contributor

Great question, @sahikagenc . Let me try to make this work with the existing model building APIs. ...

@sven1977
Copy link
Contributor

Btw, did you try simply sub-classing SACTF|TorchModel and then implement your own get_q_value, get_policy_output, etc.. logics?

@sven1977
Copy link
Contributor

There is also a bug in SAC, which makes it not learn the "state-preprocessor" (e.g. when you use a CNN in front of the policy- and Q-nets). The problem is in SAC's compute_and_clip_gradients (tf) and optimizer_fn (torch), where the optimizers are told to only optimize the policy and Q-nets, but never the pre-network.
Fixing this now ...

@sven1977
Copy link
Contributor

This is now fixed (via this PR: #13522). You can provide custom models to SAC via the following options:

  1. custom Q-model:
config:
    Q_model:
        custom_model: [your registered custom Q-model class]
  1. custom policy-model:
    config:
    policy_model:
    custom_model: [your registered custom p-model class]

  2. custom SAC model (as a whole):

  • sub-class SACTF|TorchModel
  • override the new build_policy_model and build_q_model methods in there to return whatever custom model(s) you want.

Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

2 participants