-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] MultiAgentEpisode: Add concat()
API.
#44622
[RLlib] MultiAgentEpisode: Add concat()
API.
#44622
Conversation
…i_agent_episode_add_concat Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/env/multi_agent_episode.py # rllib/env/tests/test_multi_agent_episode.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only thing missing: Tests for env_t_to_agent_t
. We should add these.
rllib/env/multi_agent_episode.py
Outdated
|
||
In order for this to work, both chunks (`self` and `other`) must fit | ||
together. This is checked by the IDs (must be identical), the time step counters | ||
(`self.t` must be the same as `episode_chunk.t_started`), as well as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be self.env_t
and self._env_t_started
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/env/multi_agent_episode.py
Outdated
In order for this to work, both chunks (`self` and `other`) must fit | ||
together. This is checked by the IDs (must be identical), the time step counters | ||
(`self.t` must be the same as `episode_chunk.t_started`), as well as the | ||
observations/infos at the concatenation boundaries (`self.observations[-1]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for SAE
s, correct? In MAE
s we keep the observations in the SAE
s, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# Then store all agent data from the new episode chunk in self. | ||
self.agent_episodes[agent_id] = other.agent_episodes[agent_id] | ||
# Do not forget the env to agent timestep mapping. | ||
self.env_t_to_agent_t[agent_id] = other.env_t_to_agent_t[agent_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the env_t_to_agent_t
already updated with the new env_t_started
of the other
or do we have to add the starting point of the other
to env_t_to_agent_t
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They get concatenated as well. Note that - for now - env_t_to_agent_t
always start at 0, no matter what the env_t_started is. They basically operate on a "simpler" time-axis.
We might want to change this in the future (and use the true global env steps instead). But I'm not sure yet, whether this would not complicate things too much. After all, it's just an int offset that we are talking about here.
rllib/env/multi_agent_episode.py
Outdated
# If `self` has hanging agent values -> Add these to `other`'s agent | ||
# SingleAgentEpisode (as a new timestep) and only then concatenate. | ||
# Otherwise, the concatentaion would fail b/c of missing data. | ||
if agent_id in self._agent_buffered_actions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah great point! This was missing in the first implementation, but it really important.
if agent_id in self._agent_buffered_actions: | ||
assert agent_id in self._agent_buffered_extra_model_outputs | ||
sa_episode.add_env_step( | ||
observation=other.agent_episodes[agent_id].get_observations(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is indices=0
the first "new" one from the other
episode chunk for agents that were missing the next observation? The last observation in self
is then in the other.observations
' lookback buffer, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, observations (and infos) always overlap by one ts (regardless of any lookback, so even if lookback=0)!
So for a multi agent episode, if you do e.g. self.cut()
, the returned successor's first observation (multi-agent dict) is identical to self
's last observation (multi-agent dict). Same principal as for single-agent episodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, here, we have to concatenate single-agent episodes (that sit inside a multi-agent one). For those single-agent episodes, in case agents are not always stepping with each env(!) step, they might "miss" this overlap. In this case, just for the purpose of concatenating the individual single-agent episodes, we have to "fix" this and add the overlap-timestep to the first SA episodes (so its observations overlap with the second SA episode' by 1 ts).
|
||
# Validate. | ||
self.validate() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH, I have waited so long for this. Can't wait to try out the replay buffer now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hang in there. Almost there :)
) | ||
check(episode_1.agent_buffers["agent_4"]["actions"].queue[0], buffered_action) | ||
check(episode_1._agent_buffered_rewards, {"a1": 6.0}) | ||
check((a0.is_done, a1.is_done), (False, False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test for the correctness of env_t_to_agent_t
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me add tests for this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then hopefully we can merge and take this over into the MAERB PR
…i_agent_episode_add_concat Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/env/multi_agent_episode.py
…dd_concat # Conflicts: # rllib/env/multi_agent_episode.py # rllib/env/tests/test_multi_agent_episode.py
…i_agent_episode_add_concat
…i_agent_episode_add_concat
…i_agent_episode_add_concat
…i_agent_episode_add_concat
…i_agent_episode_add_concat
MultiAgentEpisode: Add
concat()
API.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.