[data] Optimization to reduce ArrowBlock building time for blocks of size 1 #38833 #38988

stephanie-wang · 2023-08-28T15:46:02Z

Many Data ops depend on converting numpy batches to Arrow blocks. A single np array -> pyarrow is normally zero-copy, but blocks with multiple rows will need a copy to make the column of np arrays into one contiguous ndarray. This PR avoids this step for blocks of a single row by using np.expand_dims to reshape the array instead of copying it.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…size 1 ray-project#38833 Many Data ops depend on converting numpy batches to Arrow blocks. A single np array -> pyarrow is normally zero-copy, but blocks with multiple rows will need a copy to make the column of np arrays into one contiguous ndarray. This PR avoids this step for blocks of a single row by using np.expand_dims to reshape the array instead of copying it. Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

GeneDer · 2023-08-28T17:10:54Z

Will merge once premerge passed

stephanie-wang requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani and raulchen as code owners August 28, 2023 15:46

stephanie-wang added release-blocker P0 Issue that blocks the release v2.7.0-pick labels Aug 28, 2023

stephanie-wang assigned GeneDer and zhe-thoughts Aug 28, 2023

GeneDer approved these changes Aug 28, 2023

View reviewed changes

zhe-thoughts approved these changes Aug 28, 2023

View reviewed changes

GeneDer merged commit c269af8 into ray-project:releases/2.7.0 Aug 28, 2023
45 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Optimization to reduce ArrowBlock building time for blocks of size 1 #38833 #38988

[data] Optimization to reduce ArrowBlock building time for blocks of size 1 #38833 #38988

stephanie-wang commented Aug 28, 2023

GeneDer commented Aug 28, 2023

[data] Optimization to reduce ArrowBlock building time for blocks of size 1 #38833 #38988

[data] Optimization to reduce ArrowBlock building time for blocks of size 1 #38833 #38988

Conversation

stephanie-wang commented Aug 28, 2023

Checks

GeneDer commented Aug 28, 2023