Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Policies get/set_state fixes and enhancements. #16354

Merged
merged 8 commits into from
Jun 15, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jun 10, 2021

Policies currently do not properly return their exploration state when calling Policy.get_state(). This PR adds the exploration state to the return value of policy.get_state(). Exploration.get_info() has been renamed into get_state() (backward compatible). A new Exploration.set_state() method has been added, which is used by Policy.set_state().

This is in preparation of:

  • making policies addable to/deletable from a worker's policy_map in-flight
  • self-play with 100s of policy snapshots
  • league-based training

This may also fix:
#16065

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 14, 2021
@sven1977 sven1977 merged commit d0014cd into ray-project:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants