Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Modify ImageDatasource to use Image.BILINEAR as the default image resampling filter. #43484

Merged
merged 3 commits into from Mar 5, 2024

Conversation

ronyw7
Copy link
Contributor

@ronyw7 ronyw7 commented Feb 28, 2024

Why are these changes needed?

Ray Data's current ImageDatasource uses PIL's resize function as the image processing backend, which by default uses the Image.BICUBIC resampling filter. In practice, we found this is around 20% slower than using the Image.BILINEAR filter, which is the default option in torch vision.

Here are some benchmark results:

    Example Output:
    Time taken with default (BICUBIC) implementation: 45.74734696099995
    Time taken with BILINEAR mode: 40.40636501500012
    .
    ----------------------------------------------------------------------
    Ran 1 test in 86.155s
    OK
    - Default (BICUBIC)
    ray_data,1137.3931758979813
    ray_data,1146.6874856299994
    ray_data,1147.080762189496
    ray_data,1153.9882179831754
    ray_data,1140.6351740671025

    - Proposed (BILINEAR)
    ray_data,1399.06477460088
    ray_data,1381.023590432492
    ray_data,1394.170545465624
    ray_data,1396.9916050718184
    ray_data,1389.6461855214884

If a user still wishes to use the BICUBIC filter, this is still easily achievable by applying a UDF resize_fn after the images have been read. For instance, we can choose cv2's INTER_CUBIC or PIL's original resize (this requires the use of PIL's fromarray though, as the output of read_image are numpy arrays; this conversion lowers throughput).

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: ronyw7 <yifengwang@berkeley.edu>
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ronyw7!

@c21 c21 merged commit 13ad9fd into ray-project:master Mar 5, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants