Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] MultiAgentEpisode for Multi-Agent Reinforcement Learning with the new EnvRunner API. #40263

Merged
merged 24 commits into from
Oct 30, 2023

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Oct 11, 2023

The MultiAgentEpisode should be used as a container to store all episode information of MultiAgentEnvs.

Specifically it has to track which agent stepped at which environmen step.

Why are these changes needed?

With the change from RolloutWorker to EnvRunner we switch from EpisodeV2 to SingleAgentEpisode and MultiAgentEpisode.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ization, timestep mapping and class data.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ule states will only be stored in the 'SingleAgentEpisode's.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…et_return' and '__len__'. Moved episode files into 'rllib/env'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 changed the title Initialized MultiAgentEpisode. Implementing MultiAgentEpisode for Multi-Agent Reinforcement Learning with EnvRunner Oct 18, 2023
…one. Furthermore moved 'SingleAgentEpisode' and 'MultiAgentEpisode' towards 'rllib/env'. I also added unit testing for 'SingleAgentEpisode'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 changed the title Implementing MultiAgentEpisode for Multi-Agent Reinforcement Learning with EnvRunner [RLlib] MultiAgentEpisode for Multi-Agent Reinforcement Learning with the new EnvRunner API. Oct 24, 2023
@sven1977 sven1977 marked this pull request as ready for review October 24, 2023 13:10
@sven1977 sven1977 self-assigned this Oct 24, 2023
@sven1977 sven1977 added rllib RLlib related issues rllib-multi-agent An RLlib multi-agent related problem. rllib-env rllib env related issues rllib-newstack labels Oct 24, 2023
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Note that in a multi-agent environment this does not necessarily
correspond to single agents having terminated or being truncated.

Copy link
Contributor

@sven1977 sven1977 Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explain this here a little (what we discussed on slack):

For MultiAgentEpisode: self.is_terminated should be True, if all agents are terminated
and self.is_truncated should be True if all agents are truncated.
If only one or more agents (but not all!) are terminated or truncated, the MultiAgentEpisode.is_terminated/is_truncated should both be False.
The information about single agents' terminated/truncated states can always be
retrieved from the SingleAgentEpisodes inside the MultiAgent one.
If all agents are either terminated or truncated (but in a mixed fashion: some agents terminated, some truncated): This is currently undefined and could potentially be
a problem (if a user really implemented such a multi-agent env that behaves this way).
My guess is that we should probably then set both is_terminated AND
is_truncated in the MultiAgentEpisode to True. Question here:
Does this have practical relevance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has for postprocessing: If truncated, the GAE uses for example bootstrapping, while if terminated the value of $o_T$ is set to 0.0.

id_: Optional[str] = None,
agent_episode_ids: Optional[Dict[str, str]] = None,
*,
observations: Optional[List[MultiAgentDict]] = None,
Copy link
Contributor

@sven1977 sven1977 Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you said on slack is correct:

Simon:

But was with the case The environment is not terminated and some agents are terminated earlier on, e.g. timestep 10 agent 1 temrinated (because it fell down a cliff) the other agents are still alive at timestep 100. Do we need here probably in the _init_  of the MultiAgentEpisode`  is_terminated and is_terminated as a list of `MultiAgentDict`s like [{...}, ... , {"agent_1": True, "agent_2": False, ...}, ....] such that we know which agent is still alive and where agent 1 died?

Sven:

Great point! Maybe we should NOT allow users to construct a MA Episode via observations and actions and ... But rather by providing a bunch of agentIDs mapping to SingleAgentEpisodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep this in mind: Thus far, initializing a new episode (single- or multi-agent) with already existing data is an edge case, that we have - to the best of my knowledge - never used in RLlib. So it's not a super urgent decision we need to make, just to keep this in the back of our heads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment to this on slack. There might be a hurdle with MA episodes.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks for this PR, @simonsays1980 ! Let's wait for tests to pass, then we can merge this.

@sven1977 sven1977 merged commit 1ee167d into ray-project:master Oct 30, 2023
9 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rllib RLlib related issues rllib-env rllib env related issues rllib-multi-agent An RLlib multi-agent related problem. rllib-newstack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants