-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Implementation of MultiAgentEpisodeReplayBuffer
for the new API stack.
#44450
[RLlib] Implementation of MultiAgentEpisodeReplayBuffer
for the new API stack.
#44450
Conversation
…i_agent_episode_add_module_for_api
…i_agent_episode_add_slice_api
…d 'add' method. Fixed a minor bug in the 'MultiAgentEpisode' b/c the 'agent_module_ids' mapping was not assigned in the 'MultiAgentEpisode.__init__'. Furthermore, added some tests for the new buffer. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… either 'independent' or 'synchronized'. There appears to be an inconsistency between the action getters of the SingleAgentEpisode' and the 'MultiAgentEpisode' - the latter retrieves the timestep one earlier than observations. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
MultiAgentEpisodeReplayBuffer
for the new API stack.
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
rllib/utils/replay_buffers/multi_agent_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
|
||
# Stores indices of module (single-agent) timesteps. | ||
self._module_indices: Dict[str, List[Tuple[int]]] = defaultdict(list) | ||
# Stores for each module the episode indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same: improve typehint to DefaultDict[ModuleID, Dict[EpisodeID, int]]
? Is int
correct here or should this be a 4-tuple as well?
Let's also above declare what "index" means throughout this entire class:
Something like:
"In this class, we use a global indexing system, where an individual index is a 4-tuple, consisting of 1) ...2)... 3)... 4) ..."
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…i_agent_episode_add_slice_api
…api' into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… and 'SingleAgentEpisode'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ays1980/ray into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…a buffer. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ts for getters and counters. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ixed. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
rllib/BUILD
Outdated
@@ -1524,6 +1524,13 @@ py_test( | |||
srcs = ["utils/replay_buffers/tests/test_episode_replay_buffer.py"] | |||
) | |||
|
|||
oy_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> py_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fixes.
I understand we'll uncomment those tests for the new buffer once I have fixed the getter API of MultiAgentEpisode.
Signed-off-by: Sven Mika <sven@anyscale.io>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ays1980/ray into multi-agent-episode-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ts that depend on it. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… the test for synchronized sampling'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Why are these changes needed?
With the new API stack and the
EnvRunner API
came the necessity to define new replay buffers that can work completely on episode objects and stores the latter. This PR proposes the multi-agent version of theseEpisodeReplayBuffer
s.Sampling works in
independent
andsynchronized
replay mode and can use differentn-step
for each item.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.