Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Refine MultiAgentEpisode and add test cases. #40799

Merged
merged 52 commits into from
Nov 28, 2023

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Oct 30, 2023

Why are these changes needed?

The MultiAgentEpisode needs buffers to record action, reward, state, and extra_model_output in case an agent has acted at a timestep, but not received a next observation, yet. The appropriate logic for adding timesteps should be implemented here together with extensive testing to ensure in this complex area precise execution.

Related issue number

Clsoes #40746

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ization, timestep mapping and class data.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ule states will only be stored in the 'SingleAgentEpisode's.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…et_return' and '__len__'. Moved episode files into 'rllib/env'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…one. Furthermore moved 'SingleAgentEpisode' and 'MultiAgentEpisode' towards 'rllib/env'. I also added unit testing for 'SingleAgentEpisode'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…de into branch episode.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…AgentEpisode'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…r-mae

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… 'add_timestep()'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… and agents that terminate before stepping first time.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…le of functionalitites in 'MultiAgentEPisode'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ntense testing led to further changes in the 'MultiAgentEpisode', specifically as we do need for the successor results from 'get_observations()', 'get_infos', etc. in type 'List[MultiAgentDict]' and not 'MultiAgentDict'. Furthermore, 'terminateds' and 'truncateds' hat to be provided with a getter. Needs more testing.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…tEpisode's. Also added corresponding tests. Rnamed 'global_rewards' to 'partial_rewards'. This is an intermediate commit to have a safe state to return to.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…encies in the use of the ' partial_rewards'. Has to be fixed before stepping forward.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ere removed from 'SingleAgentEpisode by @sven1977. Added test for getters. Needed to change '_getattr_by_index' as using buffered actions is non-trivial. Had to add a 'global_actions_t' timestep_mapping for the actions as they could be buffered and the original timestep would get lost. Testing is not finished.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… of using buffered actions. Thereby refactored '_getattr_by_index()' extensively to be used more generically. Furthermore, wrote corresponding tests. Fixed a bug in the 'create_successor()' method. All tests run.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ards, but with receiving either 'MultiAgentDict' or 'List[MutliAgentDict]'. Made minor changes to '_IndexMapping' and '_getattr_by_index()'.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ded buffered rewards to the 'get_rewards' method.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…wards' is complete now and needs to be tested more.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e for some days now.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…le testing.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…t_episode()' to use the corresponding 'SingleAgentEpisode''s methods, to contain as initial observation in the successor always the last observation of an agent. Adjusted tests accordingly and added multiple new ones.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…method to 'MultiAgentEpisode' for immutable copying of buffers between episodes. Refined the test file.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…fied these functions to account for empty episodes and agents that are done.'

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 changed the title Testing multi agent episode [RLlib] Refine MultiAgentEpisode and add test cases. Nov 27, 2023
@sven1977 sven1977 marked this pull request as ready for review November 27, 2023 14:12
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now. Let's merge this to make some progress here.
We will have to:

  • Rename the MAE APIs to: add_env_reset/add_env_step
  • Enhance the getter APIs to allow for more options, such as one_hot_discrete, fill, etc.. analogous to the upcoming SingleAgentEpisode enhancements.
  • Add the infinite lookback buffer functionality to MAE.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 2ded325 into ray-project:master Nov 28, 2023
14 of 15 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants