[proto] Performance improvements for equalize op #6757

vfdev-5 · 2022-10-12T14:36:02Z

Description:

vectorized version of equalize_image_tensor. This is a great job done by @lezcano, thanks a lot for your help ! 🎉
Speed up on CPU and CUDA

[---------------- equalize_image_tensor transforms measurements ---------------]
                                         |    main   |    new  
1 threads: ----------------------------------------------------
      <class 'torch.Tensor'> Image data  |  978.917  |  335.936

Times are in microseconds (us).

Timestamp: 20221012-160208
Torch version: 1.14.0.dev20221010+cu116
Torchvision version: 0.15.0a0
Num threads: 1

Time benchmark: RandomEqualize (1.0,) None
V2: RandomEqualize(p=1.0) torchvision.prototype.transforms._color
Stable: RandomEqualize(p=1.0) torchvision.transforms.transforms

[- RandomEqualize transforms measurements -]
                         |  stable  |    v2 
1 threads: ---------------------------------
      Tensor Image data  |  2.891   |  2.266

Times are in milliseconds (ms).

cc @datumbox @NicolasHug

…ove-equalize

datumbox

LGTM, thank you @vfdev-5 and @lezcano for making this fast! It's interesting that we can achieve this kind of performance optimization on the frontend without calling the native histogram methods. There might be some speed improvements to be achieved on the future on Core.

I see that all the tests pass, so this looks good to go.

vfdev-5 · 2022-10-13T11:43:05Z

@datumbox speaking with Mario, we can improve the code a bit more, I'll send an update in a follow-up PR.

lezcano · 2022-10-13T12:02:41Z

It's interesting that we can achieve this kind of performance optimization on the frontend without calling the native histogram methods

For reference, note that the speed-ups do not come from the histogram method itself. I just used that way of implementing a histogram because it allows to have batches. The main speed-up comes from using batched operations all across vs having a for loop and running all these kernels once per channel.

lezcano · 2022-10-13T12:06:35Z

Probably generalising torch.bincount would be overkill, but it could be interesting to add a note on bincount such as "if you want to do this for a batch of indices, think of using index_add_ in the .. seealso::" section of the docs.

Summary: * [proto] Performance improvements for equalize op * Added tests Reviewed By: NicolasHug Differential Revision: D40427459 fbshipit-source-id: 8cfba7a345b87fb56b7edcf58b7f9c7d526be813

vfdev-5 added 3 commits October 12, 2022 13:46

[proto] Performance improvements for equalize op

938c837

Merge branch 'main' of github.com:pytorch/vision into proto-perf-impr…

53bd858

…ove-equalize

Added tests

2daa1ac

facebook-github-bot added the cla signed label Oct 12, 2022

datumbox approved these changes Oct 13, 2022

View reviewed changes

datumbox added module: transforms Perf For performance improvements prototype labels Oct 13, 2022

datumbox merged commit b16dec1 into pytorch:main Oct 13, 2022

vfdev-5 deleted the proto-perf-improve-equalize branch October 13, 2022 11:42

This was referenced Oct 24, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

Support all integer and floating point dtypes in prototype transform kernels? #6840

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proto] Performance improvements for equalize op #6757

[proto] Performance improvements for equalize op #6757

vfdev-5 commented Oct 12, 2022 •

edited

Loading

datumbox left a comment

vfdev-5 commented Oct 13, 2022

lezcano commented Oct 13, 2022

lezcano commented Oct 13, 2022 •

edited

Loading

[proto] Performance improvements for equalize op #6757

[proto] Performance improvements for equalize op #6757

Conversation

vfdev-5 commented Oct 12, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

vfdev-5 commented Oct 13, 2022

lezcano commented Oct 13, 2022

lezcano commented Oct 13, 2022 • edited Loading

vfdev-5 commented Oct 12, 2022 •

edited

Loading

lezcano commented Oct 13, 2022 •

edited

Loading