New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing interpolate on uint8 unsqueezed 3D CL tensor #100258
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100258
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 39d59f6: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
4f1a315
to
2460501
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vfdev-5 , made a minor suggestion to save a copy (I think it works?). LMK what you think
@NicolasHug thanks for the review. I pushed new commit with your suggestions. I also run benchmarks to ensure no regression and new code results are compatible (+/- noise in measurements):
|
Description: - Fixed memory format issue: When input is channels last 4d tensor that was produced as following ``` t = torch.ones(1, 3, 32, 32).contiguous(memory_format=torch.channels_last) t = t[0] t = t[None, ...] ``` upsampling will produce output with channels first memory format but our avx code does not take that into account. - Also renamed needs_unpacking by skip_unpacking
642ff29
to
39d59f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the fix @vfdev-5 , LGTM.
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
For completeness, on top of the benchmarks above (#100258 (comment)) showing no regression for "regular" 4D tensors, @vfdev-5 also benchmarked the case of channels_last unsqueezed tensors which are affected by the issue that the output would be allocated as contiguous instead of channels_last (which is what this PR is addressing). Results are here and show that we now have to pay a ~2X slow-down for these tensors. It's worth stressing that the 2X slow-down is caused by the unexpected result that Hopefully, #100373 will help fixing this original problem at its root, which will allow us to revert this PR and gain back the 2X perf that we just lost. |
Description:
When input is channels last 4d tensor that was produced as following
due to a surprising behaviour of
suggest_memory_format()
, the output tensor will be allocated as contiguous instead ofchannels_last
. Up to now the uint8 AVX bilinear interpolation code was assuming that the output format was the same the input format (but this is unfortunately not the case) and this leads to the output values to be completely wrong.This PR is a band-aid fix to address this issue in the short term, by converting the output from channels_last to contiguous when needed, to match with the (incorrect) format suggested by
suggest_memory_format()
.Here is a repro code to show that nightly is broken for this particular case:
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @NicolasHug