Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] MultiAgentEpisode: Add concat() API. #44622

Merged
merged 27 commits into from
Apr 12, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 10, 2024

MultiAgentEpisode: Add concat() API.

  • This API already exists for SingleAgentEpisode and should work analogous for MultiAgentEpisode.
  • It is mostly used by episode based replay buffers, e.g. in DQN and SAC to add incoming episode chunks (from an EnvRunner) to an already buffer-stored previous chunk of the same episode.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…i_agent_episode_add_concat

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/env/multi_agent_episode.py
#	rllib/env/tests/test_multi_agent_episode.py
@sven1977 sven1977 marked this pull request as ready for review April 10, 2024 09:39
@sven1977 sven1977 closed this Apr 10, 2024
@sven1977 sven1977 deleted the multi_agent_episode_add_concat branch April 10, 2024 11:54
@sven1977 sven1977 restored the multi_agent_episode_add_concat branch April 10, 2024 11:56
@sven1977 sven1977 reopened this Apr 10, 2024
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only thing missing: Tests for env_t_to_agent_t. We should add these.


In order for this to work, both chunks (`self` and `other`) must fit
together. This is checked by the IDs (must be identical), the time step counters
(`self.t` must be the same as `episode_chunk.t_started`), as well as the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be self.env_t and self._env_t_started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

In order for this to work, both chunks (`self` and `other`) must fit
together. This is checked by the IDs (must be identical), the time step counters
(`self.t` must be the same as `episode_chunk.t_started`), as well as the
observations/infos at the concatenation boundaries (`self.observations[-1]`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for SAEs, correct? In MAEs we keep the observations in the SAEs, don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Then store all agent data from the new episode chunk in self.
self.agent_episodes[agent_id] = other.agent_episodes[agent_id]
# Do not forget the env to agent timestep mapping.
self.env_t_to_agent_t[agent_id] = other.env_t_to_agent_t[agent_id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the env_t_to_agent_t already updated with the new env_t_started of the other or do we have to add the starting point of the other to env_t_to_agent_t here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They get concatenated as well. Note that - for now - env_t_to_agent_t always start at 0, no matter what the env_t_started is. They basically operate on a "simpler" time-axis.
We might want to change this in the future (and use the true global env steps instead). But I'm not sure yet, whether this would not complicate things too much. After all, it's just an int offset that we are talking about here.

# If `self` has hanging agent values -> Add these to `other`'s agent
# SingleAgentEpisode (as a new timestep) and only then concatenate.
# Otherwise, the concatentaion would fail b/c of missing data.
if agent_id in self._agent_buffered_actions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great point! This was missing in the first implementation, but it really important.

if agent_id in self._agent_buffered_actions:
assert agent_id in self._agent_buffered_extra_model_outputs
sa_episode.add_env_step(
observation=other.agent_episodes[agent_id].get_observations(0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is indices=0 the first "new" one from the other episode chunk for agents that were missing the next observation? The last observation in self is then in the other.observations' lookback buffer, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, observations (and infos) always overlap by one ts (regardless of any lookback, so even if lookback=0)!

So for a multi agent episode, if you do e.g. self.cut(), the returned successor's first observation (multi-agent dict) is identical to self's last observation (multi-agent dict). Same principal as for single-agent episodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, here, we have to concatenate single-agent episodes (that sit inside a multi-agent one). For those single-agent episodes, in case agents are not always stepping with each env(!) step, they might "miss" this overlap. In this case, just for the purpose of concatenating the individual single-agent episodes, we have to "fix" this and add the overlap-timestep to the first SA episodes (so its observations overlap with the second SA episode' by 1 ts).


# Validate.
self.validate()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, I have waited so long for this. Can't wait to try out the replay buffer now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hang in there. Almost there :)

)
check(episode_1.agent_buffers["agent_4"]["actions"].queue[0], buffered_action)
check(episode_1._agent_buffered_rewards, {"a1": 6.0})
check((a0.is_done, a1.is_done), (False, False))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test for the correctness of env_t_to_agent_t here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me add tests for this as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then hopefully we can merge and take this over into the MAERB PR

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…i_agent_episode_add_concat

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/env/multi_agent_episode.py
…dd_concat

# Conflicts:
#	rllib/env/multi_agent_episode.py
#	rllib/env/tests/test_multi_agent_episode.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit a37bb30 into ray-project:master Apr 12, 2024
5 checks passed
@sven1977 sven1977 deleted the multi_agent_episode_add_concat branch April 12, 2024 11:58
harborn pushed a commit to harborn/ray that referenced this pull request Apr 18, 2024
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants