[RLlib] `MultiAgentEpisode` for Multi-Agent Reinforcement Learning with the new `EnvRunner` API. #40263

simonsays1980 · 2023-10-11T16:16:14Z

The MultiAgentEpisode should be used as a container to store all episode information of MultiAgentEnvs.

Specifically it has to track which agent stepped at which environmen step.

Why are these changes needed?

With the change from RolloutWorker to EnvRunner we switch from EpisodeV2 to SingleAgentEpisode and MultiAgentEpisode.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ization, timestep mapping and class data. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ule states will only be stored in the 'SingleAgentEpisode's. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…et_return' and '__len__'. Moved episode files into 'rllib/env'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…one. Furthermore moved 'SingleAgentEpisode' and 'MultiAgentEpisode' towards 'rllib/env'. I also added unit testing for 'SingleAgentEpisode'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/env/multi_agent_episode.py

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/algorithms/dreamerv3/utils/env_runner.py

rllib/algorithms/ppo/ppo.py

rllib/env/single_agent_episode.py

rllib/env/multi_agent_episode.py

sven1977 · 2023-10-27T13:01:31Z

rllib/env/multi_agent_episode.py

+
+        Note that in a multi-agent environment this does not necessarily
+        correspond to single agents having terminated or being truncated.
+


Let's explain this here a little (what we discussed on slack):

For MultiAgentEpisode: self.is_terminated should be True, if all agents are terminated and self.is_truncated should be True if all agents are truncated. If only one or more agents (but not all!) are terminated or truncated, the MultiAgentEpisode.is_terminated/is_truncated should both be False. The information about single agents' terminated/truncated states can always be retrieved from the SingleAgentEpisodes inside the MultiAgent one. If all agents are either terminated or truncated (but in a mixed fashion: some agents terminated, some truncated): This is currently undefined and could potentially be a problem (if a user really implemented such a multi-agent env that behaves this way). My guess is that we should probably then set both is_terminated AND is_truncated in the MultiAgentEpisode to True. Question here: Does this have practical relevance?

I think it has for postprocessing: If truncated, the GAE uses for example bootstrapping, while if terminated the value of $o_T$ is set to 0.0.

sven1977 · 2023-10-27T13:01:52Z

rllib/env/multi_agent_episode.py

+        id_: Optional[str] = None,
+        agent_episode_ids: Optional[Dict[str, str]] = None,
+        *,
+        observations: Optional[List[MultiAgentDict]] = None,


I think what you said on slack is correct:

Simon:

But was with the case The environment is not terminated and some agents are terminated earlier on, e.g. timestep 10 agent 1 temrinated (because it fell down a cliff) the other agents are still alive at timestep 100. Do we need here probably in the _init_ of the MultiAgentEpisode` is_terminated and is_terminated as a list of `MultiAgentDict`s like [{...}, ... , {"agent_1": True, "agent_2": False, ...}, ....] such that we know which agent is still alive and where agent 1 died?

Sven:

Great point! Maybe we should NOT allow users to construct a MA Episode via observations and actions and ... But rather by providing a bunch of agentIDs mapping to SingleAgentEpisodes?

Just to keep this in mind: Thus far, initializing a new episode (single- or multi-agent) with already existing data is an edge case, that we have - to the best of my knowledge - never used in RLlib. So it's not a super urgent decision we need to make, just to keep this in the back of our heads.

See my comment to this on slack. There might be a hurdle with MA episodes.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977

LGTM now. Thanks for this PR, @simonsays1980 ! Let's wait for tests to pass, then we can merge this.

…i-agent-episode

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 added 5 commits October 11, 2023 18:12

Initialized MultiAgentEpisode.

afc0e17

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added timestep mapping and necessary methods to MultiAgentEpisode.

c6c8275

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added get_<data> - methods to 'MultiAgentEpisode' and refined initial…

7b423ce

…ization, timestep mapping and class data. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added 'create_successor', Äget_state()' and 'from_state()'. Agent mod…

d7a50fa

…ule states will only be stored in the 'SingleAgentEpisode's. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Refactored 'self.t' and 'self.t_started'. Added 'to_sample_batch', 'g…

02ea5b8

…et_return' and '__len__'. Moved episode files into 'rllib/env'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

simonsays1980 changed the title ~~Initialized MultiAgentEpisode.~~ Implementing MultiAgentEpisode for Multi-Agent Reinforcement Learning with EnvRunner Oct 18, 2023

simonsays1980 added 2 commits October 18, 2023 17:40

Added list conversion to 'from_sample_batch' in case episode is not d…

ac189c5

…one. Furthermore moved 'SingleAgentEpisode' and 'MultiAgentEpisode' towards 'rllib/env'. I also added unit testing for 'SingleAgentEpisode'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Ran linter.

c84525a

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 changed the title ~~Implementing MultiAgentEpisode for Multi-Agent Reinforcement Learning with EnvRunner~~ [RLlib] MultiAgentEpisode for Multi-Agent Reinforcement Learning with the new EnvRunner API. Oct 24, 2023

sven1977 marked this pull request as ready for review October 24, 2023 13:10

sven1977 requested review from sven1977, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners October 24, 2023 13:10

sven1977 self-assigned this Oct 24, 2023

sven1977 added rllib RLlib related issues rllib-multi-agent An RLlib multi-agent related problem. rllib-env rllib env related issues rllib-newstack labels Oct 24, 2023

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Outdated Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Outdated Show resolved Hide resolved

sven1977 reviewed Oct 24, 2023

View reviewed changes

rllib/env/multi_agent_episode.py Outdated Show resolved Hide resolved

simonsays1980 added 5 commits October 26, 2023 19:21

Implemented review from @sven1977 and changed docstrings a bit.

03cce90

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Changed import for 'SingleAgentEpisode' to fix some errors in CI tests.

47e88e6

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Refactored 'get_observations|rewards|actions' into helper funciton.

1a62530

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Refactored 'get_observations|rewards|actions' into helper funciton.

9056dfe

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Fixed imports and the resulting bug in 'single_agent_gym_env_runner.py'.

879b080

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>