-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] MultiAgentEpisode
for Multi-Agent Reinforcement Learning with the new EnvRunner
API.
#40263
[RLlib] MultiAgentEpisode
for Multi-Agent Reinforcement Learning with the new EnvRunner
API.
#40263
Conversation
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ization, timestep mapping and class data. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ule states will only be stored in the 'SingleAgentEpisode's. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…et_return' and '__len__'. Moved episode files into 'rllib/env'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
MultiAgentEpisode
for Multi-Agent Reinforcement Learning with EnvRunner
…one. Furthermore moved 'SingleAgentEpisode' and 'MultiAgentEpisode' towards 'rllib/env'. I also added unit testing for 'SingleAgentEpisode'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
MultiAgentEpisode
for Multi-Agent Reinforcement Learning with EnvRunner
MultiAgentEpisode
for Multi-Agent Reinforcement Learning with the new EnvRunner
API.
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
|
||
Note that in a multi-agent environment this does not necessarily | ||
correspond to single agents having terminated or being truncated. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's explain this here a little (what we discussed on slack):
For MultiAgentEpisode: self.is_terminated should be True, if all agents are terminated
and self.is_truncated should be True if all agents are truncated.
If only one or more agents (but not all!) are terminated or truncated, the MultiAgentEpisode.is_terminated/is_truncated should both be False.
The information about single agents' terminated/truncated states can always be
retrieved from the SingleAgentEpisodes inside the MultiAgent one.
If all agents are either terminated or truncated (but in a mixed fashion: some agents terminated, some truncated): This is currently undefined and could potentially be
a problem (if a user really implemented such a multi-agent env that behaves this way).
My guess is that we should probably then set both is_terminated AND
is_truncated in the MultiAgentEpisode to True. Question here:
Does this have practical relevance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has for postprocessing: If truncated, the GAE
uses for example bootstrapping, while if terminated
the value of 0.0
.
id_: Optional[str] = None, | ||
agent_episode_ids: Optional[Dict[str, str]] = None, | ||
*, | ||
observations: Optional[List[MultiAgentDict]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you said on slack is correct:
Simon:
But was with the case The environment is not terminated and some agents are terminated earlier on, e.g. timestep 10 agent 1 temrinated (because it fell down a cliff) the other agents are still alive at timestep 100. Do we need here probably in the _init_ of the MultiAgentEpisode` is_terminated and is_terminated as a list of `MultiAgentDict`s like [{...}, ... , {"agent_1": True, "agent_2": False, ...}, ....] such that we know which agent is still alive and where agent 1 died?
Sven:
Great point! Maybe we should NOT allow users to construct a MA Episode via observations and actions and ... But rather by providing a bunch of agentIDs mapping to SingleAgentEpisodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep this in mind: Thus far, initializing a new episode (single- or multi-agent) with already existing data is an edge case, that we have - to the best of my knowledge - never used in RLlib. So it's not a super urgent decision we need to make, just to keep this in the back of our heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment to this on slack. There might be a hurdle with MA episodes.
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks for this PR, @simonsays1980 ! Let's wait for tests to pass, then we can merge this.
…i-agent-episode
…i-agent-episode
The
MultiAgentEpisode
should be used as a container to store all episode information ofMultiAgentEnv
s.Specifically it has to track which agent stepped at which environmen step.
Why are these changes needed?
With the change from
RolloutWorker
toEnvRunner
we switch fromEpisodeV2
toSingleAgentEpisode
andMultiAgentEpisode
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.