[RLlib] ConnectorV2: Enhance performance of `add_n_batch_items()` for already batched items. #43669

sven1977 · 2024-03-04T14:05:22Z

ConnectorV2: Enhance performance of add_n_batch_items() for already batched items:

When adding an entire episode's observations (for example to build a train batch) to a batch via ConnectorV2.add_n_items_to_batch, the method would split the (already batched) item into a list, then add the list items individually to the batch, then re-batch. This is very expensive, especially for complex spaces.
Instead, we now allow adding already batched items via this same method and then "concatenate" the individual items along the batch axes afterwards (instead of stacking them onto a new batch axis).

As an example for how much speedup this change delivers, run the nested_action_spaces.py example script with and w/o this fix. The difference in running time is about 100s vs 80s (use the --enable-new-api-stack --num-agent=0 options).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_enhance_performance_of_add_n_batch_items

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_enhance_performance_of_add_n_batch_items

…batch_items Signed-off-by: Sven Mika <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_of_add_n_batch_items' into connector_v2_enhance_performance_of_add_n_batch_items

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Some remarks in regard to performance and the _has_batched_items.

simonsays1980 · 2024-03-05T11:26:51Z

rllib/connectors/common/add_observations_from_episodes_to_batch.py

-                        sa_episode.get_observations(indices=ts)
-                        for ts in range(len(sa_episode))
-                    ],
+                    items_to_add=sa_episode.get_observations(slice(0, len(sa_episode))),


Yup that looks somehow cleaner and faster.

Yeah, thanks for mentioning this in the other PR! This helped me pinpoint the performance decrease this time.

simonsays1980 · 2024-03-05T11:35:25Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

+            data = tree.map_structure(
+                # Expand on axis 0 (the to-be-time-dim) if item has not been batched'
+                # yet, otherwise axis=1 (the time-dim).
+                lambda s: np.expand_dims(


expand_dims has an impact on the memory management of numpy arrays. Is here maybe a 'reshape' action possible or do we need to fill the new axis with new values? For example do we make a (32, 4) array a (32, 1, 4) one or a (32, max_seq_len, 4) one?

Ah, great point! I didn't know that. Maybe we can just replace by a reshape. No we don't expect nor want this axis to be >1. It's the "simple" time-axis=1 add for action computing forward passes.

simonsays1980 · 2024-03-05T11:47:20Z

rllib/utils/spaces/space_utils.py

+        # Use __new__ to create a new instance of our subclass.
+        obj = np.asarray(input_array).view(cls)
+        # Set the _has_batch_dim property.
+        obj._has_batch_dim = True


Why do we need this attribute? What I see is that we only check against an object being an instance of type BatchedNdArray and not if _has_batch_dim.

I think you are right. This being another class should be sufficient.

Another option would be to signal via the listin which we are storing these that this is a list-to-be-concatenated (rather than a list-to-be-stacked). 🤔

This would remove the messiness of the BatchedNdArray approach, in which we normally should check, whether really all items in the list are of that class (which we don't do right now!).

…batch_items

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_of_add_n_batch_items' into connector_v2_enhance_performance_of_add_n_batch_items

sven1977 added 4 commits March 4, 2024 10:45

wip

b03643a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

d555953

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

b817873

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

e1ce285

…ector_v2_enhance_performance_of_add_n_batch_items

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha and simonsays1980 as code owners March 4, 2024 14:05

sven1977 assigned simonsays1980 Mar 4, 2024

sven1977 added 7 commits March 4, 2024 15:17

LINT

758b1e1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

fb0b560

…ector_v2_enhance_performance_of_add_n_batch_items

Merge branch 'master' into connector_v2_enhance_performance_of_add_n_…

17c4ed6

…batch_items Signed-off-by: Sven Mika <svenmika1977@gmail.com>

merge

216a13f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/connector_v2_enhance_performance…

ec61b55

…_of_add_n_batch_items' into connector_v2_enhance_performance_of_add_n_batch_items

merge

63bcafe

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT and docstrings

9ea0fa6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Mar 5, 2024

View reviewed changes

sven1977 added 3 commits March 5, 2024 13:17

Merge branch 'master' into connector_v2_enhance_performance_of_add_n_…

8296218

…batch_items

LINT and docstrings

468c3c9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/connector_v2_enhance_performance…

f7963b0

…_of_add_n_batch_items' into connector_v2_enhance_performance_of_add_n_batch_items

sven1977 merged commit 5237e49 into ray-project:master Mar 5, 2024
9 checks passed

sven1977 deleted the connector_v2_enhance_performance_of_add_n_batch_items branch March 6, 2024 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] ConnectorV2: Enhance performance of `add_n_batch_items()` for already batched items. #43669

[RLlib] ConnectorV2: Enhance performance of `add_n_batch_items()` for already batched items. #43669

sven1977 commented Mar 4, 2024 •

edited

simonsays1980 left a comment

simonsays1980 Mar 5, 2024

sven1977 Mar 5, 2024

simonsays1980 Mar 5, 2024

sven1977 Mar 5, 2024

simonsays1980 Mar 5, 2024

sven1977 Mar 5, 2024

sven1977 Mar 5, 2024

[RLlib] ConnectorV2: Enhance performance of add_n_batch_items() for already batched items. #43669

[RLlib] ConnectorV2: Enhance performance of add_n_batch_items() for already batched items. #43669

Conversation

sven1977 commented Mar 4, 2024 • edited

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Mar 5, 2024

Choose a reason for hiding this comment

sven1977 Mar 5, 2024

Choose a reason for hiding this comment

simonsays1980 Mar 5, 2024

Choose a reason for hiding this comment

sven1977 Mar 5, 2024

Choose a reason for hiding this comment

simonsays1980 Mar 5, 2024

Choose a reason for hiding this comment

sven1977 Mar 5, 2024

Choose a reason for hiding this comment

sven1977 Mar 5, 2024

Choose a reason for hiding this comment

[RLlib] ConnectorV2: Enhance performance of `add_n_batch_items()` for already batched items. #43669

[RLlib] ConnectorV2: Enhance performance of `add_n_batch_items()` for already batched items. #43669

sven1977 commented Mar 4, 2024 •

edited