Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized clamp kernels to Numpy-like implementation #43288

Closed
wants to merge 3 commits into from

Conversation

vsimkus
Copy link
Contributor

@vsimkus vsimkus commented Aug 19, 2020

BC-breaking note

For ease of exposition let a_min be the value of the "min" argument to clamp, and a_max be the value of the "max" argument to clamp.

This PR changes the behavior of torch.clamp to always compute min(max(a, a_min), a_max). torch.clamp currently computes this in its vectorized CPU specializations:

return _mm256_min_pd(max, _mm256_max_pd(min, a));

but in other places it clamps differently:

c[i] = a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] : a[i]);

return (v < lower) ? lower : (v > upper ? upper : v);

These implementations are the same when a_min < a_max, but divergent when a_min > a_max. This divergence is easily triggered:

t = torch.arange(200).to(torch.float)
torch.clamp(t, 4, 2)[0]
: tensor(2.)

torch.clamp(t.cuda(), 4, 2)[0]
: tensor(4., device='cuda:0')

torch.clamp(torch.tensor(0), 4, 2)
: tensor(4)

This PR makes the behavior consistent with NumPy's clip. C++'s std::clamp's behavior is undefined when a_min > a_max, but Clang's std::clamp will return 10 in this case (although the program, per the above comment, is in error). Python has no standard clamp implementation.

PR Summary

Fixes discrepancy between AVX, CUDA, and base vector implementation for clamp, such that all implementations are consistent and use min(max_vec, max(min_vec, x) formula, thus making it equivalent to numpy.clip in all implementations.

The same fix as in #32587 but isolated to the kernel change only, so that the internal team can benchmark.

@dr-ci
Copy link

dr-ci bot commented Aug 19, 2020

💊 CI failures summary and remediations

As of commit 8308899 (more details on the Dr. CI page):


Commit 8308899 was recently pushed. Waiting for builds...


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 39 times.

@vsimkus vsimkus force-pushed the clamp-kernel-standardization branch from 911a843 to 6add085 Compare August 19, 2020 20:53
@vsimkus vsimkus marked this pull request as ready for review August 21, 2020 19:59
@vsimkus
Copy link
Contributor Author

vsimkus commented Aug 21, 2020

@mruberry Here's the change which standardizes clamp behaviour for all kernels to NumPy-like as discussed in the other PR. Let me know if this is enough for you to benchmark and anything else, thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 24, 2020
@mruberry
Copy link
Collaborator

I ran this through our perf experiments and it passed. So that's good news because we can safely make this change, but bad news in that we don't know what was causing the performance degradation on the larger PR. There's a small chance it was benchmark flakiness, but I would like to propose two options for you, @vsimkus:

  • We could update this PR with a test for its behavior and land, then continue to refactor the larger PR into pieces we can validate incrementally with perf experiments
  • We could rebase the larger PR and run some clamp-specific benchmarks on it to verify it really does cause a performance regression, hoping the initial experiment failures were due to flakiness

I kind of prefer the first option, myself, since the original PR is now in need of a rebase. What if we separated the changes into this PR, the non-quantization changes, and the quantization changes?

Looking forward to hearing your thoughts.

@vsimkus
Copy link
Contributor Author

vsimkus commented Aug 26, 2020

@mruberry Thanks! I agree with you - it will be easier to land smaller PRs. I'll be away now for a couple of weeks, but once I return I'll add some tests to this PR and update the docstrings where necessary. Then, we can work on the other bits.

@mruberry
Copy link
Collaborator

@mruberry Thanks! I agree with you - it will be easier to land smaller PRs. I'll be away now for a couple of weeks, but once I return I'll add some tests to this PR and update the docstrings where necessary. Then, we can work on the other bits.

Great! Looking forward to it.

@vsimkus vsimkus force-pushed the clamp-kernel-standardization branch 5 times, most recently from fba4051 to e6c2ee4 Compare September 6, 2020 11:02
@vsimkus
Copy link
Contributor Author

vsimkus commented Sep 6, 2020

@mruberry As before, I've refactored the clamp tests to be more concise and use NumPy as the reference implementation. Also updated the docstring so that it shows the correct clamping output formula and removed an unnecessary comment about the argument types.

I think it should be good for a review and benchmarking now.

@mruberry mruberry self-requested a review September 9, 2020 08:43
@mruberry mruberry added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Sep 9, 2020
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
@mruberry
Copy link
Collaborator

mruberry commented Sep 9, 2020

This is looking pretty good, @vsimkus. Thanks for taking the time to follow-up. I'm looking forward to get these changes in.

I have a question about removing the vectorized implementations and made some suggestions for simplifying the tests.

@vsimkus
Copy link
Contributor Author

vsimkus commented Sep 26, 2020

@mruberry I've now updated the PR with the suggested changes, and answered your question above on removing the complex vectorized clamp implementations. Let me know what you think.

test/test_torch.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks @vsimkus. Sorry this review took awhile, we just cut our 1.7 branch and that process was very time consuming.

Thanks for pointing out we're only removing the complex vectorized clamp impls.

One last issue before this gets imported is in the docs however. Take a look and let me know your thoughts.

@mruberry mruberry added the module: bc-breaking Related to a BC-breaking change label Oct 1, 2020
torch/_torch_docs.py Outdated Show resolved Hide resolved
@vsimkus vsimkus force-pushed the clamp-kernel-standardization branch from a7f53d0 to 0e34ee8 Compare October 1, 2020 12:53
@vsimkus
Copy link
Contributor Author

vsimkus commented Oct 1, 2020

@mruberry Thanks again for carefully reviewing the change :) I've updated the documentation and added the small change in torch_test (device=device). Also rebased on master, so it has the most recent changes now and can be benchmarked again before merging.

@codecov
Copy link

codecov bot commented Oct 1, 2020

Codecov Report

Merging #43288 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #43288   +/-   ##
=======================================
  Coverage   68.19%   68.19%           
=======================================
  Files         410      410           
  Lines       53232    53232           
=======================================
  Hits        36302    36302           
  Misses      16930    16930           
Impacted Files Coverage Δ
torch/_torch_docs.py 100.00% <ø> (ø)
...ch/testing/_internal/common_methods_invocations.py 91.45% <ø> (ø)
torch/testing/_internal/common_utils.py 78.25% <0.00%> (-0.09%) ⬇️
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fbe597...8308899. Read the comment docs.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

Hey @vsimkus, sorry to bother you. Would you mind rebasing this? I'm seeing some odd failures internally and I'm hoping they're in the base revision.

@vsimkus vsimkus force-pushed the clamp-kernel-standardization branch from 0e34ee8 to 8308899 Compare October 6, 2020 08:08
@vsimkus
Copy link
Contributor Author

vsimkus commented Oct 6, 2020

@mruberry Done. I hope the failures go away :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

mruberry commented Oct 6, 2020

The rebase appears to have solved the issues. Thanks @vsimkus. Looking forward to the next PR in this series. Do you have a good idea for the next step or would you like to discuss some ideas?

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in e154b36.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: bc-breaking Related to a BC-breaking change module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants