Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proto] Performance improvements for equalize op #6757

Merged
merged 3 commits into from
Oct 13, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Oct 12, 2022

Description:

  • vectorized version of equalize_image_tensor. This is a great job done by @lezcano, thanks a lot for your help ! 🎉
  • Speed up on CPU and CUDA
[---------------- equalize_image_tensor transforms measurements ---------------]
                                         |    main   |    new  
1 threads: ----------------------------------------------------
      <class 'torch.Tensor'> Image data  |  978.917  |  335.936

Times are in microseconds (us).
Timestamp: 20221012-160208
Torch version: 1.14.0.dev20221010+cu116
Torchvision version: 0.15.0a0
Num threads: 1

Time benchmark: RandomEqualize (1.0,) None
V2: RandomEqualize(p=1.0) torchvision.prototype.transforms._color
Stable: RandomEqualize(p=1.0) torchvision.transforms.transforms

[- RandomEqualize transforms measurements -]
                         |  stable  |    v2 
1 threads: ---------------------------------
      Tensor Image data  |  2.891   |  2.266

Times are in milliseconds (ms).

cc @datumbox @NicolasHug

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @vfdev-5 and @lezcano for making this fast! It's interesting that we can achieve this kind of performance optimization on the frontend without calling the native histogram methods. There might be some speed improvements to be achieved on the future on Core.

I see that all the tests pass, so this looks good to go.

@datumbox datumbox added module: transforms Perf For performance improvements prototype labels Oct 13, 2022
@datumbox datumbox merged commit b16dec1 into pytorch:main Oct 13, 2022
@vfdev-5 vfdev-5 deleted the proto-perf-improve-equalize branch October 13, 2022 11:42
@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Oct 13, 2022

@datumbox speaking with Mario, we can improve the code a bit more, I'll send an update in a follow-up PR.

@lezcano
Copy link
Contributor

lezcano commented Oct 13, 2022

It's interesting that we can achieve this kind of performance optimization on the frontend without calling the native histogram methods

For reference, note that the speed-ups do not come from the histogram method itself. I just used that way of implementing a histogram because it allows to have batches. The main speed-up comes from using batched operations all across vs having a for loop and running all these kernels once per channel.

@lezcano
Copy link
Contributor

lezcano commented Oct 13, 2022

Probably generalising torch.bincount would be overkill, but it could be interesting to add a note on bincount such as "if you want to do this for a batch of indices, think of using index_add_ in the .. seealso::" section of the docs.

facebook-github-bot pushed a commit that referenced this pull request Oct 17, 2022
Summary:
* [proto] Performance improvements for equalize op

* Added tests

Reviewed By: NicolasHug

Differential Revision: D40427459

fbshipit-source-id: 8cfba7a345b87fb56b7edcf58b7f9c7d526be813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants