Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Allow policies to be added/deleted on the fly. #16359

Merged
merged 24 commits into from
Jun 18, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jun 10, 2021

We currently do not support turn-based games in RLlib (e.g. user1 action -> env obs for user2 -> user2 action -> env obs for user1 -> user1 action, etc..). This PR adds that functionality to RLlib in addition to:

  • Option to add/remove policies on-the-fly, as well as change the policy_mapping_fn and the list of policies to train.
  • Added a (rudimentary) DeepMind open-spiel adapter for playing board/card-style multi-agent games that can be learnt with self-play.
  • Added a connect-4 example script that performs simple self-play via custom callbacks (no league-based training yet, just having main policy play against most previous version of itself).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 marked this pull request as ready for review June 14, 2021 08:07
@sven1977 sven1977 changed the title [WIP] [RLlib] Allow policies to be added/deleted on the fly. [RLlib] Allow policies to be added/deleted on the fly. Jun 14, 2021
for pid in self.policies}, lw, self.num_sgd_iter,
self.sgd_minibatch_size, [])
batch, {
pid: lw.get_policy(pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try this for PPO? Don't think this line of code returns the right policies as intended.

@@ -457,9 +457,9 @@ def test_multi_agent_complex_spaces(self):
PGTFPolicy, DICT_SPACE, act_space,
{"model": {"custom_model": "dict_spy"}}),
},
"policy_mapping_fn": lambda a: {
"policy_mapping_fn": lambda agent_id, **k: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly write out kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the * and renamed it **kwargs in all tests/examples.

@PublicAPI
def add_policy(
self,
*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to remove * so users dont have to manually specific argument name.

@PublicAPI
def remove_policy(
self,
*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -313,7 +313,7 @@ def get_policy_configs_for_game(
action_spaces["Striker"], {}),
}

def policy_mapping_fn(agent_id):
def policy_mapping_fn(*, agent_id, episode, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point, I think I'll remove it here.
I do like these *'s to enforce named args to be used. Makes things much more explicit. But I agree, here it would be too much.

if policies is not None:
deprecation_warning(old="policies")
if policy_mapping_fn is not None:
deprecation_warning(old="policy_mapping_fn")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need documentation changes since Sampler is core part of RLlib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments that explain that we can use the worker arg, which is passed in as well, for all these 4 deprecated args. E.g. policies = worker.policy_map or policy_mapping_fn = worker.policy_mapping_fn.

@sven1977 sven1977 merged commit e78ec37 into ray-project:master Jun 18, 2021
@rusu24edward
Copy link

@sven1977 I know this has already been merged, but I'm curious to know what you think about my reasoning here

@mgerstgrasser
Copy link
Contributor

@sven1977 I also have a quick follow-up question to this: It seems that it's still required that the agents in infos are a subset of those in observations. Is that intentional, and something breaks otherwise? Or just an oversight? I have a couple of situations where I'd want to send infos for agents that aren't stepping (e.g. so a callback could grab some info from there for logging).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants