-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101
[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101
Conversation
* Fix cyclic dependency Headers in ray/util should not depend on those in ray/common * Move random generations to ray/common/test_util.h * Add license header Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
* adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * minor fixes * first working version :) * added tag support * added msi identity intermediate * enable MSI through user managed identity * updated schema * extend yaml schema remove service principal code add re-use of managed user identity * fix rg_id * fix logging * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) * run linting * updating yaml configs and formatting * updating yaml configs and formatting * typo in example config * pulling default config from example-full * resetting min, init worker prop * adding docs for azure autoscaler and fixing status * add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment * fix for default subscription in azure node provider * vm dev image build * minor change * keeping example-full.yaml in autoscaler/azure, updating azure example config * linting azure config * extending retries on azure config * lint * support for internal ips, fix to azure docs, and new azure gpu example config * linting * Update python/ray/autoscaler/azure/node_provider.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * revert_this * remove_schema * updating configs and removing ssh keygen, tweak azure node provider terminate * minor tweaks Co-authored-by: Markus Cozowicz <marcozo@microsoft.com> Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
There are some file sizes and memory issue with bazel disk cache we will disable the cache and use remote cache exclusively for now
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
…h_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* add accessible trial_info * trial name and info * doc * fix gp * Update doc/source/tune-package-ref.rst * Apply suggestions from code review * fix * trial * fixtest * testfix
…ray-project#7630) * Fix GetDependencies * lint
* enable * Turn on eager eviction * Shorten tests and drain ReferenceCounter * Don't force kill actor handles that have gone out of scope, lint * Fix locks * Cleanup Plasma Async Callback (ray-project#7452) * [rllib][tune] fix some nans (ray-project#7611) * Change /tmp to platform-specific temporary directory (ray-project#7529) * [Serve] UI Improvements (ray-project#7569) * bugfix about test_dynres.py (ray-project#7615) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * Java call Python actor method use actor.call (ray-project#7614) * bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (ray-project#7462) * [Java] Fix the issue that the cached value in `RayObject` is serialized (ray-project#7613) * Add failure tests to test_reference_counting (ray-project#7400) * Fix typo in asyncio documentation (ray-project#7602) * Fix segfault * debug * Force kill actor * Fix test
…roject#7668) * Only drain ref counter for non-actor tasks * Don't force kill actors that have gone out of scope
* Windows compatibility bug fixes * Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets * Clean up some TODOs * Fix duplicate compilations * RedisAsioClient boost::asio::error::connection_reset Co-authored-by: Mehrdad <noreply@github.com>
Test PASSed. |
Test PASSed. |
Test PASSed. |
action_space_struct=None): | ||
|
||
if not isinstance(inputs, torch.Tensor): | ||
inputs = torch.Tensor(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work if the inputs are of different shapes? Does torch still combine them into a tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs
is currently (and I guess it should stay this way) always a single ("super-flat") tensor.
Hence we need input_lens
(to split this tensor by).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work in both cases now: 1) simple list of child-distributions + nested action space 2) nested struct of child distributions + equally nested action space.
from gym.spaces import Tuple, Dict | ||
|
||
|
||
def flatten_space(space): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we do, unfortunately. gym.Spaces are not flattened by tree b/c a gym.Dict is not an actual dict, gym.Tuple is not an actual tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.
Args: | ||
space (gym.Space): The Space to get the python struct for. | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example in the documentation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
rllib/models/tf/tf_action_dist.py
Outdated
@@ -324,6 +325,51 @@ def _unsquash(self, values): | |||
return unsquashed | |||
|
|||
|
|||
class Beta(TFActionDistribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm what do we use the beta distribution for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative for SquashedGaussian in PPO or SAC. For double-bounded action spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, seems like it should be added in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
class TorchMultiActionDistribution(TorchDistributionWrapper): | ||
"""Action distribution that operates for list of (flattened) actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be flat any more right? So one of the child distributions could be in turn a MultiActionDist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll fix the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
child_distributions, | ||
input_lens, | ||
action_space, | ||
action_space_struct=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the struct can always be computed from the action space, why pass both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
rllib/models/tf/tf_action_dist.py
Outdated
@@ -324,6 +325,51 @@ def _unsquash(self, values): | |||
return unsquashed | |||
|
|||
|
|||
class Beta(TFActionDistribution): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, seems like it should be added in a separate PR
action_space_struct=None): | ||
|
||
if not isinstance(inputs, torch.Tensor): | ||
inputs = torch.Tensor(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").
from gym.spaces import Tuple, Dict | ||
|
||
|
||
def flatten_space(space): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.
supported type (including nested Tuples and Dicts). | ||
|
||
Returns: | ||
List[gym.Space]: The flattened list of primitive Spaces. This list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a super flat space gym.Box() right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this corresponds to tree.flatten, which returns a list. The problem is that tree.flatten does not work on gym.Dict/Tuple, so I had to add this helper here. What you mean is "super-flatten", where everything really gets crunched into a single tensor. I should move our function for that purpose (currently called _flatten_action
in episode.py) into space_utils as well and rename it "flatten_to_single_tensor" for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess I'm a bit confused why we need this intermediate "list of gym spaces", but maybe the next PR will make it clear.
…ed_action_spaces_1
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
child_distributions (any[torch.Tensor]): Any struct | ||
that contains the child distribution classes to use to | ||
instantiate the child distributions from `inputs`. This could | ||
be an already flattened list or a struct according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow it to be either a flat list or struct and not always one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the torch interface, but otherwise lgtm.
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
RLlib currently does not support arbitrarily nested action spaces. Only depth=1 Tuples are supported (no Dicts either). For PyTorch, not even Tuples are allowed.
This PR introduces the first step toward fixing this restriction:
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.