[rllib] observation function api for multi-agent #8236

ericl · 2020-04-30T02:07:42Z

Why are these changes needed?

This adds an observation function that sits between multi agent envs and policies. It can handle communication / data sharing between local observations, or merging global observations into local observations.

MultiAgentEnv -> policies
MultiAgentEnv -> obs_func -> policies

In the future, the obs func can also be made differentiable to enable shared computation with multi-agent.

Lightly documented; will add more documentation as we iterate on the API.

AmplabJenkins · 2020-04-30T02:08:47Z

Can one of the admins verify this patch?

AmplabJenkins · 2020-04-30T03:22:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25363/
Test PASSed.

AmplabJenkins · 2020-04-30T03:37:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25364/
Test PASSed.

richardliaw · 2020-04-30T03:57:18Z

rllib/evaluation/observation_function.py

+            >>> # Observer that merges global state into individual obs. It is
+            ... # rewriting the discrete obs into a tuple with global state.
+            >>> example_obs_fn1({"a": 1, "b": 2, "global_state": 101}, ...)
+            {"a": [1, 101], "b": [2, 101]}
+
+            >>> # Observer for e.g., custom centralized critic model. It is
+            ... # rewriting the discrete obs into a dict with more data.
+            >>> example_obs_fn2({"a": 1, "b": 2}, ...)
+            {"a": {"self": 1, "other": 2}, "b": {"self": 2, "other": 1}}


meta (tip): .. code-block:: python is easier to type and renders equally well.

richardliaw · 2020-04-30T03:58:00Z

rllib/evaluation/observation_function.py

+        TODO(ekl): enable batch processing.
+
+        Args:
+            agent_obs (dict): Dictionary of default observations from the


tip - if you put Dict[AgentID, TensorType], readthedocs automatically hyperlinks it.

richardliaw · 2020-04-30T03:59:52Z

rllib/examples/centralized_critic_2.py

+def central_critic_observer(agent_obs, **kw):
+    """Rewrites the agent obs to include opponent data for training."""

-    to_update = info["post_batch"][SampleBatch.CUR_OBS]
-    my_id = info["agent_id"]
-    other_id = 1 if my_id == 0 else 0
-    action_encoder = ModelCatalog.get_preprocessor_for_space(Discrete(2))
-
-    # set the opponent actions into the observation
-    _, opponent_batch = info["all_pre_batches"][other_id]
-    opponent_actions = np.array([
-        action_encoder.transform(a)
-        for a in opponent_batch[SampleBatch.ACTIONS]
-    ])
-    to_update[:, -2:] = opponent_actions
+    new_obs = {
+        0: {
+            "own_obs": agent_obs[0],
+            "opponent_obs": agent_obs[1],
+            "opponent_action": 0,  # filled in by FillInActions
+        },
+        1: {
+            "own_obs": agent_obs[1],
+            "opponent_obs": agent_obs[0],
+            "opponent_action": 0,  # filled in by FillInActions
+        },
+    }
+    return new_obs



would this look different if you subclass the ObservationFunction?

Probably just move it into call.

richardliaw · 2020-04-30T04:00:42Z

rllib/examples/centralized_critic_2.py

@@ -1,9 +1,9 @@
-"""An example of implementing a centralized critic by modifying the env.
+"""An example of implementing a centralized critic with ObservationFunction.


user may ask, what is "ObservationFunction" given you don't use it here?

It's the interface definition. You can extend it or not, I don't know how to document a function signature otherwise though

AmplabJenkins · 2020-04-30T04:36:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25365/
Test PASSed.

ericl added 2 commits April 29, 2020 19:05

add obs func api

a2edbe9

update doc

995fa71

ericl added 8 commits April 29, 2020 19:09

update

8e50cc1

experimental

208fad2

stray change

62133fe

update

0ceec82

use func only

94fc04e

typo

82239ab

update

fa7c2a6

update examples

1accf60

richardliaw reviewed Apr 30, 2020

View reviewed changes

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label May 1, 2020

ericl assigned sven1977 and richardliaw May 1, 2020

ericl changed the title ~~[rllib] [RFC] observation function api for multi-agent~~ [rllib] observation function api for multi-agent May 4, 2020

richardliaw approved these changes May 5, 2020

View reviewed changes

ericl merged commit f48da50 into ray-project:master May 5, 2020

janblumenkamp mentioned this pull request May 28, 2020

[rllib] Custom model for multi-agent environment: access to all states #7341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] observation function api for multi-agent #8236

[rllib] observation function api for multi-agent #8236

ericl commented Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

richardliaw Apr 30, 2020

richardliaw Apr 30, 2020

richardliaw Apr 30, 2020

ericl Apr 30, 2020

richardliaw Apr 30, 2020

ericl Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

		@@ -1,9 +1,9 @@
		"""An example of implementing a centralized critic by modifying the env.
		"""An example of implementing a centralized critic with ObservationFunction.

[rllib] observation function api for multi-agent #8236

[rllib] observation function api for multi-agent #8236

Conversation

ericl commented Apr 30, 2020

Why are these changes needed?

AmplabJenkins commented Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

AmplabJenkins commented Apr 30, 2020

richardliaw Apr 30, 2020

Choose a reason for hiding this comment

richardliaw Apr 30, 2020

Choose a reason for hiding this comment

richardliaw Apr 30, 2020

Choose a reason for hiding this comment

ericl Apr 30, 2020

Choose a reason for hiding this comment

richardliaw Apr 30, 2020

Choose a reason for hiding this comment

ericl Apr 30, 2020

Choose a reason for hiding this comment

AmplabJenkins commented Apr 30, 2020