[data] Removing unnecessary data copy in convert_udf_returns_to_numpy #39188

raulchen · 2023-08-31T23:39:04Z

Why are these changes needed?

This increases the perf of image_loader_microbenchmark.py by ~10%.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Hao Chen <chenh1024@gmail.com>

c21

Thanks!

c21 · 2023-08-31T23:47:16Z

python/ray/air/util/tensor_extensions/arrow.py

@@ -488,7 +488,10 @@ def _concat_same_type(
                [e for a in to_concat for e in a]
            )
        else:
-            storage = pa.concat_arrays([c.storage for c in to_concat])
+            if len(to_concat) == 1:
+                storage = to_concat[0].storage


can we add a comment for why this is needed, for reader in the future.

It turns out that the PyArrow Block copy code path expects this function to copy data. I reverted this change in this PR, as it doesn't impact the final benchmark perf.

This reverts commit 75b7581. Signed-off-by: Hao Chen <chenh1024@gmail.com>

…ray-project#39188) --------- Signed-off-by: Hao Chen <chenh1024@gmail.com>

…ray-project#39188) --------- Signed-off-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>

…ray-project#39188) --------- Signed-off-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>

raulchen added 2 commits August 31, 2023 16:37

optimize _concat_same_type

75b7581

Signed-off-by: Hao Chen <chenh1024@gmail.com>

optimize convert_udf_returns_to_numpy

12dc192

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen requested review from ericl, scv119, c21, amogkam, scottjlee and bveeramani as code owners August 31, 2023 23:39

comment

0cd3f5e

Signed-off-by: Hao Chen <chenh1024@gmail.com>

c21 approved these changes Aug 31, 2023

View reviewed changes

Revert "optimize _concat_same_type"

9cedc12

This reverts commit 75b7581. Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen changed the title ~~[data] Removing unnecessary data copy~~ [data] Removing unnecessary data copy in convert_udf_returns_to_numpy Sep 1, 2023

raulchen merged commit a38eb53 into ray-project:master Sep 5, 2023
21 of 48 checks passed

raulchen deleted the reduce-data-copy branch September 5, 2023 18:29

harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy (…

1ad7174

…ray-project#39188) --------- Signed-off-by: Hao Chen <chenh1024@gmail.com>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy (…

cee08d2

…ray-project#39188) --------- Signed-off-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy #39188

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy #39188

raulchen commented Aug 31, 2023 •

edited

Loading

c21 left a comment

c21 Aug 31, 2023

raulchen Sep 1, 2023

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy #39188

[data] Removing unnecessary data copy in convert_udf_returns_to_numpy #39188

Conversation

raulchen commented Aug 31, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

c21 left a comment

Choose a reason for hiding this comment

c21 Aug 31, 2023

Choose a reason for hiding this comment

raulchen Sep 1, 2023

Choose a reason for hiding this comment

raulchen commented Aug 31, 2023 •

edited

Loading