Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prototype] Speed up adjust_sharpness_image_tensor #6930

Merged
merged 2 commits into from
Nov 8, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Nov 8, 2022

Related to #6818

Performance optimization for the adjust sharpness kernel:

[----------- adjust_sharpness_image_tensor cpu torch.float32 -----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 274                 |    230  
      (3, 400, 400)      |                   4                 |      4  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 300                 |    260  
      (3, 400, 400)      |                   5                 |      5  

Times are in milliseconds (ms).

[----------- adjust_sharpness_image_tensor cuda torch.float32 ----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 440                 |    382  
      (3, 400, 400)      |                 150                 |    100  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 400                 |    400  
      (3, 400, 400)      |                 150                 |    100  

Times are in microseconds (us).

[------------ adjust_sharpness_image_tensor cpu torch.uint8 ------------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 280                 |    240  
      (3, 400, 400)      |                   5                 |      4  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 300                 |    260  
      (3, 400, 400)      |                   7                 |      6  

Times are in milliseconds (ms).

[------------ adjust_sharpness_image_tensor cuda torch.uint8 -----------]
                         |  adjust_sharpness_image_tensor old  |  fn2 new
1 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 520                 |    459  
      (3, 400, 400)      |                 190                 |    110  
6 threads: --------------------------------------------------------------
      (16, 3, 400, 400)  |                 500                 |    460  
      (3, 400, 400)      |                 180                 |    110  

Times are in microseconds (us).

cc @vfdev-5 @bjuncek @pmeier

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise LGTM if CI is green.

@@ -119,7 +122,29 @@ def adjust_sharpness_image_tensor(image: torch.Tensor, sharpness_factor: float)
else:
needs_unsquash = False

output = _blend(image, _FT._blurred_degenerate_image(image), sharpness_factor)
kernel_dtype = image.dtype if fp else torch.float32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ease review, here is the old implementation:

def _blurred_degenerate_image(img: Tensor) -> Tensor:

Comment on lines +140 to +142
# We speed up blending by minimizing flops and doing in-place. The 2 blend options are mathematically equivalent:
# x+(1-r)*(y-x) = x + (1-r)*y - (1-r)*x = x*r + y*(1-r)
view.add_(blurred_degenerate.sub_(view), alpha=(1.0 - sharpness_factor))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can push a change like this to _blend or is this a special case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a special case :( We can do this only because we are allowed to subtract image1 from image2 in place. In all other cases where _blend() is used, we rely on broadcasting do that's not possible.

@datumbox datumbox changed the title Speed up adjust_sharpness_image_tensor [prototype] Speed up adjust_sharpness_image_tensor Nov 8, 2022
@datumbox datumbox merged commit 7a7ab7e into pytorch:main Nov 8, 2022
@datumbox datumbox deleted the perf/adjust_brightness branch November 8, 2022 15:41
facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2022
Summary:
* Speed up `adjust_sharpness_image_tensor`

* Add a comment

Reviewed By: NicolasHug

Differential Revision: D41265190

fbshipit-source-id: 4ebbd1d7a4d763a77f4af84b2da710f7a981a843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants