-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduced number of graphs for compiled resize #8108
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8108
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 3688b50 with merge base 893b4ab (): NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
60023be
to
ec515eb
Compare
@@ -225,11 +235,13 @@ def resize_image( | |||
elif image.device.type == "cpu": | |||
# uint8 dtype support for bilinear and bicubic is limited to cpu and | |||
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear | |||
if (interpolation == InterpolationMode.BILINEAR and "AVX2" in torch.backends.cpu.get_cpu_capability()) or ( | |||
# For torch.compile we use uint8 input and let decomposition work | |||
if (interpolation == InterpolationMode.BILINEAR and _can_add_uint8()) or ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include this entire if
's conditions into the _can_add_uint8()
helper? I.e. something like:
def _can_add_uint8():
if interpolation == InterpolationMode.BILINEAR:
if torch._dynamo.is_compiling():
return True
else:
return "AVX2" in torch.backends.cpu.get_cpu_capability()
else:
return interpolation == InterpolationMode.BICUBIC
This would make the name _can_add_unit8()
more logical IMO.
Also, we seem to have some duplicated comments now (those in line 236 for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the fix @vfdev-5 . The fix LGTM, caveated with the comments in #8092 (comment) about whether or not we should include the tests
900ed56
to
99c0962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vfdev-5 , minor comments but LGTM anyway.
BTW, this isn't really critical, but is there a way to make get_cpu_capability()
compatible with dynamo?
# inside resize_image due to torchscript. | ||
# uint8 dtype support for bilinear and bicubic is limited to cpu and | ||
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear | ||
# For torch.compile we use uint8 input and let decomposition work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be clear that the reason we always use uint8 for dynamo is simply that it doesn't support get_cpu_capability()
, so with the suggested comment below, this comment is probably unnecessary
# For torch.compile we use uint8 input and let decomposition work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A decomposition can also not support uint8 dtype, so the fact that we return True instead of False is that we believe that decomposition can work with uint8 dtype.
Even if dynamo "supported" get_cpu_capability()
this heuristic to perform u8->f32->interpolate->u8 on non-AVX systems can be wrong for compiled version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.
We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like
def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool:
if torch._dynamo.is_compiling():
return True # both bilinear and bicubic are OK, right?
# then conditions as before
And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if torch._dynamo.is_compiling():
return True # both bilinear and bicubic are OK, right?
Well, right now, it may be safer to set return False
due to pytorch/pytorch#104182 not yet merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion but still LGTM, thanks @vfdev-5
# inside resize_image due to torchscript. | ||
# uint8 dtype support for bilinear and bicubic is limited to cpu and | ||
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear | ||
# For torch.compile we use uint8 input and let decomposition work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.
We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like
def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool:
if torch._dynamo.is_compiling():
return True # both bilinear and bicubic are OK, right?
# then conditions as before
And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Hey @vfdev-5! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Reviewed By: vmoens Differential Revision: D52539004 fbshipit-source-id: 8fe4de87cd225118895b24c3b317c646bcb66356
Related to #8056
Depends on https://github.com/pytorch/vision/pull/8092/files
Description:
Now it can look like for CL input (1, 3, 500, 400) resized to (256, 256)