[RLlib] Allow policies to be added/deleted on the fly. #16359

sven1977 · 2021-06-10T16:40:50Z

We currently do not support turn-based games in RLlib (e.g. user1 action -> env obs for user2 -> user2 action -> env obs for user1 -> user1 action, etc..). This PR adds that functionality to RLlib in addition to:

Option to add/remove policies on-the-fly, as well as change the policy_mapping_fn and the list of policies to train.
Added a (rudimentary) DeepMind open-spiel adapter for playing board/card-style multi-agent games that can be learnt with self-play.
Added a connect-4 example script that performs simple self-play via custom callbacks (no league-based training yet, just having main policy play against most previous version of itself).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…cy_support_add_and_delete

michaelzhiluo · 2021-06-15T09:13:23Z

rllib/execution/train_ops.py

-                            for pid in self.policies}, lw, self.num_sgd_iter,
-                    self.sgd_minibatch_size, [])
+                    batch, {
+                        pid: lw.get_policy(pid)


Did you try this for PPO? Don't think this line of code returns the right policies as intended.

michaelzhiluo · 2021-06-15T09:14:21Z

rllib/tests/test_nested_observation_spaces.py

@@ -457,9 +457,9 @@ def test_multi_agent_complex_spaces(self):
                            PGTFPolicy, DICT_SPACE, act_space,
                            {"model": {"custom_model": "dict_spy"}}),
                    },
-                    "policy_mapping_fn": lambda a: {
+                    "policy_mapping_fn": lambda agent_id, **k: {


Explicitly write out kwargs

Removed the * and renamed it **kwargs in all tests/examples.

michaelzhiluo · 2021-06-15T09:18:49Z

rllib/agents/trainer.py

+    @PublicAPI
+    def add_policy(
+            self,
+            *,


Good idea to remove * so users dont have to manually specific argument name.

michaelzhiluo · 2021-06-15T09:18:58Z

rllib/agents/trainer.py

+    @PublicAPI
+    def remove_policy(
+            self,
+            *,


Same as above.

michaelzhiluo · 2021-06-15T09:22:14Z

rllib/env/wrappers/unity3d_env.py

@@ -313,7 +313,7 @@ def get_policy_configs_for_game(
                            action_spaces["Striker"], {}),
            }

-            def policy_mapping_fn(agent_id):
+            def policy_mapping_fn(*, agent_id, episode, **kwargs):


Also here too

Yeah, good point, I think I'll remove it here.
I do like these *'s to enforce named args to be used. Makes things much more explicit. But I agree, here it would be too much.

michaelzhiluo · 2021-06-15T09:26:58Z

rllib/evaluation/sampler.py

+        if policies is not None:
+            deprecation_warning(old="policies")
+        if policy_mapping_fn is not None:
+            deprecation_warning(old="policy_mapping_fn")


Will need documentation changes since Sampler is core part of RLlib

Added comments that explain that we can use the worker arg, which is passed in as well, for all these 4 deprecated args. E.g. policies = worker.policy_map or policy_mapping_fn = worker.policy_mapping_fn.

…cy_support_add_and_delete

This reverts commit e78ec37.

…" (#16543) This reverts commit e78ec37.

rusu24edward · 2021-08-07T19:37:43Z

@sven1977 I know this has already been merged, but I'm curious to know what you think about my reasoning here

mgerstgrasser · 2022-09-09T21:41:10Z

@sven1977 I also have a quick follow-up question to this: It seems that it's still required that the agents in infos are a subset of those in observations. Is that intentional, and something breaks otherwise? Or just an oversight? I have a couple of situations where I'd want to send infos for agents that aren't stepping (e.g. so a callback could grab some info from there for logging).

sven1977 added 8 commits June 10, 2021 18:39

wip

47c91c1

wip

649103c

Merge branch 'master' of https://github.com/ray-project/ray into poli…

fbfbd5b

…cy_support_add_and_delete

fix and LINT.

aa472ca

wip.

281654d

wip.

8c049fe

Merge branch 'master' of https://github.com/ray-project/ray into poli…

a052fcd

…cy_support_add_and_delete

wip.

9d675af

sven1977 marked this pull request as ready for review June 14, 2021 08:07

sven1977 requested a review from michaelzhiluo June 14, 2021 08:07

sven1977 assigned michaelzhiluo Jun 14, 2021

sven1977 changed the title ~~[WIP] [RLlib] Allow policies to be added/deleted on the fly.~~ [RLlib] Allow policies to be added/deleted on the fly. Jun 14, 2021

sven1977 added 3 commits June 14, 2021 14:02

fix

8dfbec9

fix

eaa6afb

fix

dc6a774

michaelzhiluo approved these changes Jun 15, 2021

View reviewed changes

sven1977 added 13 commits June 15, 2021 13:10

Merge branch 'master' of https://github.com/ray-project/ray into poli…

6406687

…cy_support_add_and_delete

wip.

e0b6311

wip.

46e84fc

wip.

2835b56

wip.

4351570

wip.

265454a

wip.

ab79eac

Merge branch 'master' of https://github.com/ray-project/ray into poli…

3fb411d

…cy_support_add_and_delete

wip.

9443460

Merge branch 'master' of https://github.com/ray-project/ray into poli…

230adee

…cy_support_add_and_delete

wip.

f2b4c20

Merge branch 'master' of https://github.com/ray-project/ray into poli…

2ad07aa

…cy_support_add_and_delete

wip.

97ca8dc

sven1977 merged commit e78ec37 into ray-project:master Jun 18, 2021

amogkam added a commit that referenced this pull request Jun 18, 2021

Revert "[RLlib] Allow policies to be added/deleted on the fly. (#16359)"

047a8d2

This reverts commit e78ec37.

amogkam mentioned this pull request Jun 18, 2021

Revert "[RLlib] Allow policies to be added/deleted on the fly." #16543

Merged

amogkam added a commit that referenced this pull request Jun 18, 2021

Revert "[RLlib] Allow policies to be added/deleted on the fly. (#16359)…

bd3cbfc

…" (#16543) This reverts commit e78ec37.

sven1977 mentioned this pull request Jun 20, 2021

[RLlib] Re-do: Trainer: Support add and delete Policies. #16569

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Allow policies to be added/deleted on the fly. #16359

[RLlib] Allow policies to be added/deleted on the fly. #16359

sven1977 commented Jun 10, 2021 •

edited

Loading

michaelzhiluo Jun 15, 2021

michaelzhiluo Jun 15, 2021

sven1977 Jun 15, 2021

michaelzhiluo Jun 15, 2021

michaelzhiluo Jun 15, 2021

michaelzhiluo Jun 15, 2021

sven1977 Jun 15, 2021

michaelzhiluo Jun 15, 2021

sven1977 Jun 15, 2021

rusu24edward commented Aug 7, 2021

mgerstgrasser commented Sep 9, 2022

[RLlib] Allow policies to be added/deleted on the fly. #16359

[RLlib] Allow policies to be added/deleted on the fly. #16359

Conversation

sven1977 commented Jun 10, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rusu24edward commented Aug 7, 2021

mgerstgrasser commented Sep 9, 2022

sven1977 commented Jun 10, 2021 •

edited

Loading