Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proto] Speed up adjust color ops #6784

Merged
merged 18 commits into from
Oct 21, 2022
Merged

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Oct 18, 2022

Description:

Results:

[------------------ Adjust_brightness cpu torch.uint8 ------------------]
                     |  adjust_brightness stable  |  adjust_brightness v2
1 threads: --------------------------------------------------------------
      (3, 400, 400)  |            1080            |          483         

Times are in microseconds (us).

[----------------- Adjust_contrast cpu torch.uint8 -----------------]
                     |  adjust_contrast stable  |  adjust_contrast v2
1 threads: ----------------------------------------------------------
      (3, 400, 400)  |           1150           |         926        

Times are in microseconds (us).

[------------------ Adjust_saturation cpu torch.uint8 ------------------]
                     |  adjust_saturation stable  |  adjust_saturation v2
1 threads: --------------------------------------------------------------
      (3, 400, 400)  |            1200            |          935         

Times are in microseconds (us).

[------------------ Adjust_sharpness cpu torch.uint8 -----------------]
                     |  adjust_sharpness stable  |  adjust_sharpness v2
1 threads: ------------------------------------------------------------
      (3, 400, 400)  |            4.06           |          3.64       

Times are in milliseconds (ms).

[----------------- Adjust_brightness cpu torch.float32 -----------------]
                     |  adjust_brightness stable  |  adjust_brightness v2
1 threads: --------------------------------------------------------------
      (3, 400, 400)  |            665             |          206         
6 threads: --------------------------------------------------------------
      (3, 400, 400)  |            167             |           57         

Times are in microseconds (us).

[---------------- Adjust_contrast cpu torch.float32 ----------------]
                     |  adjust_contrast stable  |  adjust_contrast v2
1 threads: ----------------------------------------------------------
      (3, 400, 400)  |           691            |         455        
6 threads: ----------------------------------------------------------
      (3, 400, 400)  |           248            |         162        

Times are in microseconds (us).

[----------------- Adjust_saturation cpu torch.float32 -----------------]
                     |  adjust_saturation stable  |  adjust_saturation v2
1 threads: --------------------------------------------------------------
      (3, 400, 400)  |            728             |          466         

Times are in microseconds (us).

[----------------- Adjust_sharpness cpu torch.float32 ----------------]
                     |  adjust_sharpness stable  |  adjust_sharpness v2
1 threads: ------------------------------------------------------------
      (3, 400, 400)  |            3100           |          2860       

Times are in microseconds (us).

More results in logs, code

Logs with #6765 code + simplified adjust brightness : https://github.com/vfdev-5/tvapiv2_benchmarks/blob/ba35a27f14fd77123ea96f0c54ac9e2fb22f0f15/output/20221018-115341-output-adjust-color-ops_datumbox.log

@vfdev-5 vfdev-5 requested a review from datumbox October 18, 2022 10:30
@vfdev-5 vfdev-5 added module: transforms Perf For performance improvements prototype labels Oct 18, 2022
@vfdev-5 vfdev-5 requested a review from pmeier October 19, 2022 12:34
@vfdev-5 vfdev-5 requested a review from pmeier October 19, 2022 21:04
@vfdev-5 vfdev-5 requested a review from pmeier October 20, 2022 09:31
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more perf improvement. Otherwise LGTM if CI is green. Thanks Victor!

Comment on lines +82 to +83
grayscale_image = _rgb_to_gray(image) if c == 3 else image
mean = torch.mean(grayscale_image.to(dtype), dim=(-3, -2, -1), keepdim=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grayscale_image = _rgb_to_gray(image) if c == 3 else image
mean = torch.mean(grayscale_image.to(dtype), dim=(-3, -2, -1), keepdim=True)
grayscale_image = _rgb_to_gray(image.to(dtype)) if c == 3 else image.to(dtype)
mean = torch.mean(grayscale_image, dim=(-3, -2, -1), keepdim=True)

This saves one conversion in _rgb_to_gray in case the input is uint8: _rgb_to_gray would convert the output of its floating point computation back to uint8 just for the result being converted back to floating point in before the torch.mean call.

Copy link
Collaborator Author

@vfdev-5 vfdev-5 Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier actually, doing so the output of mean is not the same for uint8 image.
As we originally applied uint8 cast in the end of _rgb_to_gray we get rid of all floating point values. This is not the case if image is casted to float before _rgb_to_gray.
Here is an example of difference for mean:

tensor([[[125.9521]]])   # original implementation
# vs
tensor([[[126.4607]]])  # cast to float before `_rgb_to_gray`.

So, finally consistency tests report for example:

Mismatched elements: 256 / 2772 (9.2%)                                                                                                                                      
Greatest absolute difference: 3 at index (1, 2, 4, 21) (up to 1e-05 allowed)   
Greatest relative difference: 0.25 at index (2, 0, 6, 21) (up to 1e-05 allowed)

and this is a real failure, IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the behavior changes, but IMO repeatedly converting to uint8 in the computation and thus eliminating intermediate values sounds more like a missed opportunity in the original kernel than a bug now. Thus, I would consider this more like a "bug fix" rather than a BC breaking change. On the other hand, that is not a strong opinion. Not going to block over this.

@vfdev-5 vfdev-5 marked this pull request as draft October 20, 2022 10:24
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks great!

ratio = float(ratio)
fp = image1.is_floating_point()
bound = 1.0 if fp else 255.0
output = image1.mul(ratio).add_(image2, alpha=(1.0 - ratio)).clamp_(0, bound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Supersedes the work at #6765

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Oct 20, 2022

This implementation is not consistent with previous behaviour. Valid time measurements are reported in the description.

With the latest commit we reduce runtime for adjust_contrast:

@vfdev-5 vfdev-5 marked this pull request as ready for review October 20, 2022 13:29
@vfdev-5 vfdev-5 marked this pull request as draft October 20, 2022 14:29
@vfdev-5 vfdev-5 marked this pull request as ready for review October 20, 2022 20:02
@vfdev-5 vfdev-5 merged commit 9f024a6 into pytorch:main Oct 21, 2022
@vfdev-5 vfdev-5 deleted the proto-improve-blend-op branch October 21, 2022 10:04
facebook-github-bot pushed a commit that referenced this pull request Oct 21, 2022
Summary:
* WIP

* _blend optim v1

* _blend and color ops optims: v2

* updated a/r tol and configs to make tests pass

* Loose a/r tolerance in AA tests

* Use custom rgb_to_grayscale

* Renamed img -> image

* nit code update

* PR review

* adjust_contrast convert to float32 earlier

* Revert "adjust_contrast convert to float32 earlier"

This reverts commit a82cf8c.

Reviewed By: YosuaMichael

Differential Revision: D40588170

fbshipit-source-id: b87fbbecb7490c222d990ef5c3e620d9ffe457ab

fp = image.is_floating_point()
bound = 1.0 if fp else 255.0
output = image.mul(brightness_factor).clamp_(0, bound)
Copy link

@MiChatz MiChatz Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue:
This clamp_ implementation is enforcing the histogram to take values between 0 and 1 in my case is not working the way I will expect since my images are normalized to values around 0 (ex: -0.2 to 0.8) so the multiplication with the factor of 1 will not return an identical image.

Suggestion:
output = image.mul(brightness_factor).clamp_(image.min(), image.max())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MiChatz thanks for the feedback. There is (yet unwritten) assumption for color transformations on float images that image range is between [0, 1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants