[RLlib] Rename some items in episode APIs. #44669

sven1977 · 2024-04-11T10:14:24Z

Rename some items in episode APIs.

MultiAgentEpisode._agent_buffered_[actions/rewards/extra_model_outputs] -> _hanging_[actions/rewards/extra_model_outputs]_end
Add: _hanging_[actions/rewards/extra_model_outputs]_begin to allow for more accurate cuts and slices of MultiAgentEpisode. Right now, when slicing through a sequence of e.g. agent1 NOT receiving observations, but already having acted/received rewards, these hanging actions/rewards should be included in the successor chunk (but at the beginngin). Only this way, proper concatenation can be guaranteed. Otherwise, individual timesteps might get lost entirely. Note that these properties are not actually used yet in this PR. A follow up PR will fix the behavior of MultiAgentEpisode's cut(), slice(), and getters.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. The set of cached "begin" values appear not to be used.

simonsays1980 · 2024-04-11T12:42:19Z

rllib/env/multi_agent_episode.py

+        self._hanging_extra_model_outputs_end = defaultdict(dict)
+        self._hanging_rewards_end = defaultdict(float)
+
+        # In case of a `cut()` or `slice()`, we also need to store the hanging actions,


Dumb question: Why do we need these extra buffers? Can't we just copy over the _hanging_actions_end to the successor?
When we concatenate them again self has these hanging actions and the first observation in the successor episode for the agent will be used to store the now valid transition.

Not a dumb question at all!
An example of where it is crucial to have both these chaches:

obs = [ a0: 0, a1: 0 # ts=0 a0: 1 # ts=1 a0: 2 # ts=2 ]

a1's action at ts=0 now is hanging, b/c there is no follow-up observation after its first one. Thus, when we cut() (or slice), we put this hanging action in the "before" buffer:

successor = MAEps( a0: 2, <- first observation self._hanging_actions_before[a1] = [a1's hanging action] )

Then we continue adding timesteps to the succesor chunk (the one returned by cut()). Let's assume that in the successor, eventually, a new obs comes in for a1.

successor = MAEps( a0: 2, <- first observation a0: 3, ... a0: 5, a1: 5 self._hanging_actions_before[a1] = [a1's hanging action] )

At this point, we can't do anything with the hanging action (before) as belongs to an observation (the original one of a1 at ts=0 in the previous chunk) that is NOT part of the successor anymore. It's long gone, so to speak. We thus simply leave this hanging action (before) as-is and ONLY use it to check against a preceeding chunk when we do concat_episodes().

Furthermore, assume that we continue the episode and there is another sequence of missing observations for a1, in which case we also now need to store the hanging action at the end for the time when the next obs for a1 DOES come in. However this hanging action at the end should not be confused with the hanging action at the beginning, they are from different timesteps and have nothing to do with each other.

simonsays1980 · 2024-04-11T12:47:46Z

rllib/env/multi_agent_episode.py

        # agents have at least one observation to look back to.
-        for agent_id, agent_actions in self._agent_buffered_actions.items():
+        for agent_id, agent_actions in self._hanging_actions_end.items():
            assert self.env_t_to_agent_t[agent_id].get(-1) == self.SKIP_ENV_TS_TAG


Is this always true? What if an agent stepped in the last timestep but did not receive an observation? Do we then denote this as no-step (SKIP_ENV_TS_TAG) and count the step at the timestep the missing observation comes in?

correct.
As long as the agent does NOT receive a new obs, all steps get marked with "SKIP_ENV_TS_TAG".

simonsays1980 · 2024-04-11T12:49:07Z

rllib/env/multi_agent_episode.py

@@ -1547,7 +1557,7 @@ def get_state(self) -> Dict[str, Any]:
        Returns: A dicitonary containing pickable data fro a
            `MultiAgentEpisode`.
        """
-        # TODO (simon): Add the buffers.
+        # TODO (simon): Add the agent caches.


Shall we add the caches to the state?

Not yet. Do we need the states somewhere already?
When storing replay buffers?

simonsays1980 · 2024-04-11T12:50:06Z

rllib/env/multi_agent_episode.py

    ) -> float:
        """Returns all-agent return.

        Args:
-            consider_buffer: Whether we should also consider
-                buffered rewards wehn calculating the overall return. Agents might
+            consider_hanging_rewards: Whether we should also consider


include_hanging_rewards?

Fixed in next PR.

sven1977 added 2 commits April 9, 2024 20:58

wip

970275d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6606c3b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha and simonsays1980 as code owners April 11, 2024 10:14

sven1977 added 3 commits April 11, 2024 12:15

merge

b1fd64e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

ba98ade

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

79aa832

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 11, 2024

sven1977 assigned simonsays1980 Apr 11, 2024

simonsays1980 approved these changes Apr 11, 2024

View reviewed changes

sven1977 merged commit 9bebefb into ray-project:master Apr 11, 2024
5 checks passed

sven1977 deleted the rename_some_items_in_episode_apis branch April 11, 2024 13:36

harborn pushed a commit to harborn/ray that referenced this pull request Apr 18, 2024

[RLlib] Rename some items in episode APIs. (ray-project#44669)

4bb1374

ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024

[RLlib] Rename some items in episode APIs. (ray-project#44669)

769b781

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Rename some items in episode APIs. #44669

[RLlib] Rename some items in episode APIs. #44669

sven1977 commented Apr 11, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Apr 11, 2024

sven1977 Apr 11, 2024

simonsays1980 Apr 11, 2024

sven1977 Apr 11, 2024

simonsays1980 Apr 11, 2024

sven1977 Apr 11, 2024

simonsays1980 Apr 11, 2024

sven1977 Apr 11, 2024

[RLlib] Rename some items in episode APIs. #44669

[RLlib] Rename some items in episode APIs. #44669

Conversation

sven1977 commented Apr 11, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Apr 11, 2024 •

edited

Loading