Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Nested action space PR (minimally invasive; torch only + test). #8101

Merged
merged 4,147 commits into from
Apr 23, 2020

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 20, 2020

RLlib currently does not support arbitrarily nested action spaces. Only depth=1 Tuples are supported (no Dicts either). For PyTorch, not even Tuples are allowed.
This PR introduces the first step toward fixing this restriction:

  • Adds TorchMultiActionDistribution class.
  • Adds framework-agnostic test cases for TorchMultiActionDistribution.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

mehrdadn and others added 30 commits March 14, 2020 12:44
* Fix cyclic dependency

Headers in ray/util should not depend on those in ray/common

* Move random generations to ray/common/test_util.h

* Add license header

Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* minor fixes

* first working version :)

* added tag support

* added msi identity intermediate

* enable MSI through user managed identity

* updated schema

* extend yaml schema
remove service principal code
add re-use of managed user identity

* fix rg_id

* fix logging

* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)

* run linting

* updating yaml configs and formatting

* updating yaml configs and formatting

* typo in example config

* pulling default config from example-full

* resetting min, init worker prop

* adding docs for azure autoscaler and fixing status

* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment

* fix for default subscription in azure node provider

* vm dev image build

* minor change

* keeping example-full.yaml in autoscaler/azure, updating azure example config

* linting azure config

* extending retries on azure config

* lint

* support for internal ips, fix to azure docs, and new azure gpu example config

* linting

* Update python/ray/autoscaler/azure/node_provider.py

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* revert_this

* remove_schema

* updating configs and removing ssh keygen, tweak azure node provider terminate

* minor tweaks

Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
There are some file sizes and memory issue with bazel disk cache
we will disable the cache and use remote cache exclusively for now
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
…h_set::erase (ray-project#7633)

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* add accessible trial_info

* trial name and info

* doc

* fix
gp

* Update doc/source/tune-package-ref.rst

* Apply suggestions from code review

* fix

* trial

* fixtest

* testfix
* enable

* Turn on eager eviction

* Shorten tests and drain ReferenceCounter

* Don't force kill actor handles that have gone out of scope, lint

* Fix locks

* Cleanup Plasma Async Callback (ray-project#7452)

* [rllib][tune] fix some nans (ray-project#7611)

* Change /tmp to platform-specific temporary directory (ray-project#7529)

* [Serve] UI Improvements (ray-project#7569)

* bugfix about test_dynres.py (ray-project#7615)

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

* Java call Python actor method use actor.call (ray-project#7614)

* bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (ray-project#7633)

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

* [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (ray-project#7462)

* [Java] Fix the issue that the cached value in `RayObject` is serialized (ray-project#7613)

* Add failure tests to test_reference_counting (ray-project#7400)

* Fix typo in asyncio documentation (ray-project#7602)

* Fix segfault

* debug

* Force kill actor

* Fix test
…roject#7668)

* Only drain ref counter for non-actor tasks

* Don't force kill actors that have gone out of scope
* Windows compatibility bug fixes

* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets

* Clean up some TODOs

* Fix duplicate compilations

* RedisAsioClient boost::asio::error::connection_reset

Co-authored-by: Mehrdad <noreply@github.com>
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24961/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24958/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24970/
Test PASSed.

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 20, 2020
action_space_struct=None):

if not isinstance(inputs, torch.Tensor):
inputs = torch.Tensor(inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work if the inputs are of different shapes? Does torch still combine them into a tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs is currently (and I guess it should stay this way) always a single ("super-flat") tensor.
Hence we need input_lens (to split this tensor by).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work in both cases now: 1) simple list of child-distributions + nested action space 2) nested struct of child distributions + equally nested action space.

from gym.spaces import Tuple, Dict


def flatten_space(space):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we do, unfortunately. gym.Spaces are not flattened by tree b/c a gym.Dict is not an actual dict, gym.Tuple is not an actual tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.

Args:
space (gym.Space): The Space to get the python struct for.

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example in the documentation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -324,6 +325,51 @@ def _unsquash(self, values):
return unsquashed


class Beta(TFActionDistribution):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm what do we use the beta distribution for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative for SquashedGaussian in PPO or SAC. For double-bounded action spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, seems like it should be added in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



class TorchMultiActionDistribution(TorchDistributionWrapper):
"""Action distribution that operates for list of (flattened) actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be flat any more right? So one of the child distributions could be in turn a MultiActionDist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll fix the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

child_distributions,
input_lens,
action_space,
action_space_struct=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the struct can always be computed from the action space, why pass both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -324,6 +325,51 @@ def _unsquash(self, values):
return unsquashed


class Beta(TFActionDistribution):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, seems like it should be added in a separate PR

action_space_struct=None):

if not isinstance(inputs, torch.Tensor):
inputs = torch.Tensor(inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think the documentation should be updated here then (it currently claims inputs is a "Tensor list").

from gym.spaces import Tuple, Dict


def flatten_space(space):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I guess this is currently done implicitly by the dict/tuple flattening preprocessors for observations.

supported type (including nested Tuples and Dicts).

Returns:
List[gym.Space]: The flattened list of primitive Spaces. This list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a super flat space gym.Box() right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this corresponds to tree.flatten, which returns a list. The problem is that tree.flatten does not work on gym.Dict/Tuple, so I had to add this helper here. What you mean is "super-flatten", where everything really gets crunched into a single tensor. I should move our function for that purpose (currently called _flatten_action in episode.py) into space_utils as well and rename it "flatten_to_single_tensor" for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I'm a bit confused why we need this intermediate "list of gym spaces", but maybe the next PR will make it clear.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25012/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25040/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25041/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25043/
Test FAILed.

child_distributions (any[torch.Tensor]): Any struct
that contains the child distribution classes to use to
instantiate the child distributions from `inputs`. This could
be an already flattened list or a struct according to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow it to be either a flat list or struct and not always one?

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the torch interface, but otherwise lgtm.

Co-Authored-By: Eric Liang <ekhliang@gmail.com>
@sven1977 sven1977 merged commit e9ee5c4 into ray-project:master Apr 23, 2020
@sven1977 sven1977 deleted the nested_action_spaces_1 branch August 21, 2020 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.