[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101

sven1977 · 2020-04-20T12:49:48Z

RLlib currently does not support arbitrarily nested action spaces. Only depth=1 Tuples are supported (no Dicts either). For PyTorch, not even Tuples are allowed.
This PR introduces the first step toward fixing this restriction:

Adds TorchMultiActionDistribution class.
Adds framework-agnostic test cases for TorchMultiActionDistribution.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested (please justify below)

* Fix cyclic dependency Headers in ray/util should not depend on those in ray/common * Move random generations to ray/common/test_util.h * Add license header Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>

…project#7532)

…ray-project#7262)

* adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * minor fixes * first working version :) * added tag support * added msi identity intermediate * enable MSI through user managed identity * updated schema * extend yaml schema remove service principal code add re-use of managed user identity * fix rg_id * fix logging * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) * run linting * updating yaml configs and formatting * updating yaml configs and formatting * typo in example config * pulling default config from example-full * resetting min, init worker prop * adding docs for azure autoscaler and fixing status * add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment * fix for default subscription in azure node provider * vm dev image build * minor change * keeping example-full.yaml in autoscaler/azure, updating azure example config * linting azure config * extending retries on azure config * lint * support for internal ips, fix to azure docs, and new azure gpu example config * linting * Update python/ray/autoscaler/azure/node_provider.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * revert_this * remove_schema * updating configs and removing ssh keygen, tweak azure node provider terminate * minor tweaks Co-authored-by: Markus Cozowicz <marcozo@microsoft.com> Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

There are some file sizes and memory issue with bazel disk cache we will disable the cache and use remote cache exclusively for now

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

…h_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

…or` (ray-project#7462)

…ed (ray-project#7613)

…7589)

* add accessible trial_info * trial name and info * doc * fix gp * Update doc/source/tune-package-ref.rst * Apply suggestions from code review * fix * trial * fixtest * testfix

…ray-project#7630) * Fix GetDependencies * lint

…-project#7538)

* enable * Turn on eager eviction * Shorten tests and drain ReferenceCounter * Don't force kill actor handles that have gone out of scope, lint * Fix locks * Cleanup Plasma Async Callback (ray-project#7452) * [rllib][tune] fix some nans (ray-project#7611) * Change /tmp to platform-specific temporary directory (ray-project#7529) * [Serve] UI Improvements (ray-project#7569) * bugfix about test_dynres.py (ray-project#7615) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * Java call Python actor method use actor.call (ray-project#7614) * bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (ray-project#7462) * [Java] Fix the issue that the cached value in `RayObject` is serialized (ray-project#7613) * Add failure tests to test_reference_counting (ray-project#7400) * Fix typo in asyncio documentation (ray-project#7602) * Fix segfault * debug * Force kill actor * Fix test

…roject#7668) * Only drain ref counter for non-actor tasks * Don't force kill actors that have gone out of scope

* Windows compatibility bug fixes * Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets * Clean up some TODOs * Fix duplicate compilations * RedisAsioClient boost::asio::error::connection_reset Co-authored-by: Mehrdad <noreply@github.com>

AmplabJenkins · 2020-04-20T15:49:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24961/
Test PASSed.

AmplabJenkins · 2020-04-20T17:10:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24958/
Test PASSed.

AmplabJenkins · 2020-04-20T18:56:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24970/
Test PASSed.

ericl · 2020-04-20T19:39:08Z

rllib/models/torch/torch_action_dist.py

+                 action_space_struct=None):
+
+        if not isinstance(inputs, torch.Tensor):
+            inputs = torch.Tensor(inputs)


How does this work if the inputs are of different shapes? Does torch still combine them into a tensor?

inputs is currently (and I guess it should stay this way) always a single ("super-flat") tensor.
Hence we need input_lens (to split this tensor by).

Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").

It would work in both cases now: 1) simple list of child-distributions + nested action space 2) nested struct of child distributions + equally nested action space.

ericl · 2020-04-20T19:40:33Z

rllib/utils/space_utils.py

+from gym.spaces import Tuple, Dict
+
+
+def flatten_space(space):


Don't need this any more.

Yeah, we do, unfortunately. gym.Spaces are not flattened by tree b/c a gym.Dict is not an actual dict, gym.Tuple is not an actual tuple.

Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.

ericl · 2020-04-20T19:41:12Z

rllib/utils/space_utils.py

+    Args:
+        space (gym.Space): The Space to get the python struct for.
+
+    Returns:


Can you add an example in the documentation here?

ericl · 2020-04-20T19:42:07Z

rllib/models/tf/tf_action_dist.py

@@ -324,6 +325,51 @@ def _unsquash(self, values):
        return unsquashed


+class Beta(TFActionDistribution):


Hm what do we use the beta distribution for?

As an alternative for SquashedGaussian in PPO or SAC. For double-bounded action spaces.

ok, seems like it should be added in a separate PR

ericl · 2020-04-20T19:43:58Z

rllib/models/torch/torch_action_dist.py

+
+
+class TorchMultiActionDistribution(TorchDistributionWrapper):
+    """Action distribution that operates for list of (flattened) actions.


This doesn't need to be flat any more right? So one of the child distributions could be in turn a MultiActionDist.

Correct. I'll fix the comment.

ericl · 2020-04-20T19:44:14Z

rllib/models/torch/torch_action_dist.py

+                 child_distributions,
+                 input_lens,
+                 action_space,
+                 action_space_struct=None):


If the struct can always be computed from the action space, why pass both?

ericl · 2020-04-20T21:15:28Z

rllib/models/tf/tf_action_dist.py

@@ -324,6 +325,51 @@ def _unsquash(self, values):
        return unsquashed


+class Beta(TFActionDistribution):


ok, seems like it should be added in a separate PR

ericl · 2020-04-20T21:16:13Z

rllib/models/torch/torch_action_dist.py

+                 action_space_struct=None):
+
+        if not isinstance(inputs, torch.Tensor):
+            inputs = torch.Tensor(inputs)


Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").

ericl · 2020-04-20T21:18:04Z

rllib/utils/space_utils.py

+from gym.spaces import Tuple, Dict
+
+
+def flatten_space(space):


Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.

ericl · 2020-04-20T21:18:21Z

rllib/utils/space_utils.py

+            supported type (including nested Tuples and Dicts).
+
+    Returns:
+        List[gym.Space]: The flattened list of primitive Spaces. This list


This should be a super flat space gym.Box() right?

Nope, this corresponds to tree.flatten, which returns a list. The problem is that tree.flatten does not work on gym.Dict/Tuple, so I had to add this helper here. What you mean is "super-flatten", where everything really gets crunched into a single tensor. I should move our function for that purpose (currently called _flatten_action in episode.py) into space_utils as well and rename it "flatten_to_single_tensor" for clarity.

Ok, I guess I'm a bit confused why we need this intermediate "list of gym spaces", but maybe the next PR will make it clear.

…ed_action_spaces_1

AmplabJenkins · 2020-04-21T12:09:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25012/
Test PASSed.

…ed_action_spaces_1

AmplabJenkins · 2020-04-22T11:19:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25040/
Test PASSed.

AmplabJenkins · 2020-04-22T12:52:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25041/
Test PASSed.

AmplabJenkins · 2020-04-22T14:16:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25043/
Test FAILed.

rllib/models/torch/torch_action_dist.py

ericl · 2020-04-22T21:53:43Z

rllib/models/torch/torch_action_dist.py

+            child_distributions (any[torch.Tensor]): Any struct
+                that contains the child distribution classes to use to
+                instantiate the child distributions from `inputs`. This could
+                be an already flattened list or a struct according to


Why allow it to be either a flat list or struct and not always one?

ericl

Some comments on the torch interface, but otherwise lgtm.

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

mehrdadn and others added 30 commits March 14, 2020 12:44

[Java] Allow passing internal config from raylet to Java worker (ray-…

6e24036

…project#7532)

[GCS]Add job id when operating gcs table (ray-project#7592)

4bf6c14

Blocking ray.get/wait inside async context will warn instead of error (…

84b2649

…ray-project#7262)

Disable Travis Disk Cache (ray-project#7612)

ab605cb

There are some file sizes and memory issue with bazel disk cache we will disable the cache and use remote cache exclusively for now

Cleanup Plasma Async Callback (ray-project#7452)

1a3f69a

[rllib][tune] fix some nans (ray-project#7611)

9a42b5f

Change /tmp to platform-specific temporary directory (ray-project#7529)

d7796ca

[Serve] UI Improvements (ray-project#7569)

411a380

bugfix about test_dynres.py (ray-project#7615)

abe6f58

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

Java call Python actor method use actor.call (ray-project#7614)

a370a54

bug fix about useage of absl::flat_hash_map::erase and absl::flat_has…

dfbc7e9

…h_set::erase (ray-project#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

[Java] Make both RayActor and RayPyActor inheriting from `BaseAct…

1f8b693

…or` (ray-project#7462)

[Java] Fix the issue that the cached value in RayObject is serializ…

c8e6f5b

…ed (ray-project#7613)

Add failure tests to test_reference_counting (ray-project#7400)

0102b9e

Fix typo in asyncio documentation (ray-project#7602)

089e64c

First pass at ray memory command for memory debugging (ray-project#…

8e157f8

…7589)

[tune] add accessible trial_info (ray-project#7378)

263e4da

* add accessible trial_info * trial name and info * doc * fix gp * Update doc/source/tune-package-ref.rst * Apply suggestions from code review * fix * trial * fixtest * testfix

[core] Fix leak for subscribing to object dependencies in NodeManager (…

239e612

…ray-project#7630) * Fix GetDependencies * lint

[tune] Fix an example for _Brackets of async hyperband scheduler (ray…

41529bf

…-project#7538)

Fix test_raylet_pending_tasks test case failed (ray-project#7636)

e5a0c3f

[GCS]Tie lifecycle of gcs service and redis together (ray-project#7601)

91ebbf5

Set RayCluster as service owner (ray-project#7621)

a20cb91

[operator] Use headless service for head node (ray-project#7622)

b2d00d3

Remove duplicate jsonschema from setup.py (ray-project#7665)

63a79a5

Remove object store memory cap (ray-project#7654)

342f279

[core] Only drain references for non-actor workers on shutdown (ray-p…

17a178c

…roject#7668) * Only drain ref counter for non-actor tasks * Don't force kill actors that have gone out of scope

LINT.

2c2c243

Fix APEX test (make larger).

97e7f52

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 20, 2020

ericl reviewed Apr 20, 2020

View reviewed changes

janblumenkamp mentioned this pull request Apr 20, 2020

[rllib] Custom model for multi-agent environment: access to all states #7341

Closed

ericl reviewed Apr 20, 2020

View reviewed changes

sven1977 added 2 commits April 21, 2020 12:23

Fixes and LINT.

22fca49

Merge branch 'master' of https://github.com/ray-project/ray into nest…

61bc14a

…ed_action_spaces_1

sven1977 added 3 commits April 22, 2020 12:15

Increase learning tests timeouts (new test cases have been added).

02fbe61

Merge branch 'master' of https://github.com/ray-project/ray into nest…

e322c82

…ed_action_spaces_1

Fixes and LINT.

72003c8

Make SAC test larger.

3fc286c

ericl reviewed Apr 22, 2020

View reviewed changes

rllib/models/torch/torch_action_dist.py Outdated Show resolved Hide resolved

ericl reviewed Apr 22, 2020

View reviewed changes

ericl approved these changes Apr 22, 2020

View reviewed changes

Update rllib/models/torch/torch_action_dist.py

e63a262

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

sven1977 merged commit e9ee5c4 into ray-project:master Apr 23, 2020

sven1977 deleted the nested_action_spaces_1 branch August 21, 2020 07:39

		@@ -324,6 +325,51 @@ def _unsquash(self, values):
		return unsquashed


		class Beta(TFActionDistribution):



		class TorchMultiActionDistribution(TorchDistributionWrapper):
		"""Action distribution that operates for list of (flattened) actions.

[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101

[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101

Conversation

sven1977 commented Apr 20, 2020 • edited Loading

Related issue number

Checks

AmplabJenkins commented Apr 20, 2020

AmplabJenkins commented Apr 20, 2020

AmplabJenkins commented Apr 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Apr 21, 2020

AmplabJenkins commented Apr 22, 2020

AmplabJenkins commented Apr 22, 2020

AmplabJenkins commented Apr 22, 2020

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

sven1977 commented Apr 20, 2020 •

edited

Loading