[RLlib] Fix env_check for parametric actions (with action mask)#34790
Conversation
703bd8c to
0d0c0f9
Compare
|
@avnishn Could you review this PR, please? Failing checks don't seem to be related to the changes proposed. |
I think that is very much the case. We have to think about if we want to make an opinionated design decision on how to support action masking in RLlib. On an initial thought, I think that we'd rather not make an opinionated design decision here. I'll talk with @sven1977 who probably has more opinions on this and get back to you. |
There was a problem hiding this comment.
This is a really cool feature of gymnasium, actually :) which I didn't know about.
sven1977
left a comment
There was a problem hiding this comment.
Awesome PR. Thanks for filing this @inpefess . I would like to suggest one enhancement to make the "action_mask" key not hard-coded.
Can we add an additional config setting in therllib/algorithms/algorithm_config.py::AlgorithmConfig::environment() method to be able to customize the actual value of this action mask key in the observation_space?
something along the lines of:
config.environment("my_env", env_config=..., action_mask_key="action_mask")
Set the default value of self.action_mask_key = "action_mask" in the AlgorithmConfig c'tor.
Then use that value instead of the hard-coded one in the pre-check.
You might have to change the signature of check_env to pass along the AlgorithmConfig (from within the RolloutWorker) so that it has access to that configuration.
efdb078 to
9147f8d
Compare
There was a problem hiding this comment.
Awesome, thanks for adding this so quickly! I think it's ready to merge now. Just waiting for tests to finish ..
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
524549d to
5f753c4
Compare
…ay-project#34790) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
|
@inpefess @sven1977 @avnishn I was browsing through this PR from earlier this year, and wanted to verify if my assumption is correct about the changes in this PR: the env_check for parametric actions (with action mask) will only work if the underlying environment is a |
Why are these changes needed?
Ray RLlib has a great example of using a parametric actions environment, but now it works only with
self._skip_env_checking = True. Gymnasium action spaces have amaskargument to theirsamplemethod. We apply this feature to fix theenv_checkbehaviour in the parametric actions environments case.Related issue number
Closes #23925
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Discussion
rllib/utils/tests/test_env_check.pybecause it seems to belong there, but seven tests from this module fail in themasterbranchinfo, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.ParametricActionsCartPoleusing theaction_maskkey, there is aParametricActionsCartPoleNoEmbeddingshaving a key with the same meaning but calledvalid_avail_actions_mask. I've changed it toaction_mask