Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced number of graphs for compiled resize #8108

Merged
merged 8 commits into from
Nov 22, 2023

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Nov 10, 2023

Related to #8056
Depends on https://github.com/pytorch/vision/pull/8092/files

Description:

  • Reduced number of graphs for compiled resize
  • Before: 3 -> Now: 2
    Now it can look like for CL input (1, 3, 500, 400) resized to (256, 256)
[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 0 =====
 <eval_with_key>.4 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self):
        return ()
        

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 4 =====
 <eval_with_key>.37 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: u8[1, 3, 500, 400]):
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:231, code: image = image.reshape(-1, num_channels, old_height, old_width)
        view: u8[1, 3, 500, 400] = torch.ops.aten.view.default(arg0_1, [1, 3, 500, 400]);  arg0_1 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:244, code: image = image.as_strided((1, num_channels, old_height, old_width), new_strides)
        as_strided: u8[1, 3, 500, 400] = torch.ops.aten.as_strided.default(view, [1, 3, 500, 400], [600000, 1, 1200, 3]);  view = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:250, code: image = interpolate(
        _upsample_bilinear2d_aa: u8[1, 3, 256, 256] = torch.ops.aten._upsample_bilinear2d_aa.default(as_strided, [256, 256], False);  as_strided = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:266, code: return image.reshape(shape[:-3] + (num_channels, new_height, new_width))
        view_1: u8[1, 3, 256, 256] = torch.ops.aten.view.default(_upsample_bilinear2d_aa, [1, 3, 256, 256]);  _upsample_bilinear2d_aa = None
        return (view_1,)

  • Reduced number of graphs for compiled resized_crop. Now it gives the following:
[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 0 =====
 <eval_with_key>.4 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self):
        return ()
        

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 4 =====
 <eval_with_key>.37 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: u8[1, 3, 500, 400]):
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:1350, code: return image[..., top:bottom, left:right]
        slice_1: u8[1, 3, 300, 400] = torch.ops.aten.slice.Tensor(arg0_1, 2, 1, 301);  arg0_1 = None
        slice_2: u8[1, 3, 300, 300] = torch.ops.aten.slice.Tensor(slice_1, 3, 2, 302);  slice_1 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:238, code: image = image.reshape(-1, num_channels, old_height, old_width)
        view: u8[1, 3, 300, 300] = torch.ops.aten.view.default(slice_2, [1, 3, 300, 300]);  slice_2 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:257, code: image = interpolate(
        _upsample_bilinear2d_aa: u8[1, 3, 256, 256] = torch.ops.aten._upsample_bilinear2d_aa.default(view, [256, 256], False);  view = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:273, code: return image.reshape(shape[:-3] + (num_channels, new_height, new_width))
        view_1: u8[1, 3, 256, 256] = torch.ops.aten.view.default(_upsample_bilinear2d_aa, [1, 3, 256, 256]);  _upsample_bilinear2d_aa = None
        return (view_1,)

Copy link

pytorch-bot bot commented Nov 10, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8108

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 3688b50 with merge base 893b4ab (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -225,11 +235,13 @@ def resize_image(
elif image.device.type == "cpu":
# uint8 dtype support for bilinear and bicubic is limited to cpu and
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
if (interpolation == InterpolationMode.BILINEAR and "AVX2" in torch.backends.cpu.get_cpu_capability()) or (
# For torch.compile we use uint8 input and let decomposition work
if (interpolation == InterpolationMode.BILINEAR and _can_add_uint8()) or (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include this entire if's conditions into the _can_add_uint8() helper? I.e. something like:

def _can_add_uint8():
	if interpolation == InterpolationMode.BILINEAR:
	    if torch._dynamo.is_compiling():
	        return True
	    else:
	        return "AVX2" in torch.backends.cpu.get_cpu_capability()
        else:
            return interpolation == InterpolationMode.BICUBIC

This would make the name _can_add_unit8() more logical IMO.

Also, we seem to have some duplicated comments now (those in line 236 for example).

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix @vfdev-5 . The fix LGTM, caveated with the comments in #8092 (comment) about whether or not we should include the tests

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vfdev-5 , minor comments but LGTM anyway.

BTW, this isn't really critical, but is there a way to make get_cpu_capability() compatible with dynamo?

# inside resize_image due to torchscript.
# uint8 dtype support for bilinear and bicubic is limited to cpu and
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
# For torch.compile we use uint8 input and let decomposition work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be clear that the reason we always use uint8 for dynamo is simply that it doesn't support get_cpu_capability(), so with the suggested comment below, this comment is probably unnecessary

Suggested change
# For torch.compile we use uint8 input and let decomposition work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decomposition can also not support uint8 dtype, so the fact that we return True instead of False is that we believe that decomposition can work with uint8 dtype.
Even if dynamo "supported" get_cpu_capability() this heuristic to perform u8->f32->interpolate->u8 on non-AVX systems can be wrong for compiled version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.

We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like

def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool:
	if torch._dynamo.is_compiling():
	    return True  # both bilinear and bicubic are OK, right?
	# then conditions as before

And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?

Copy link
Collaborator Author

@vfdev-5 vfdev-5 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if torch._dynamo.is_compiling():
	    return True  # both bilinear and bicubic are OK, right?

Well, right now, it may be safer to set return False due to pytorch/pytorch#104182 not yet merged

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion but still LGTM, thanks @vfdev-5

torchvision/transforms/v2/functional/_geometry.py Outdated Show resolved Hide resolved
# inside resize_image due to torchscript.
# uint8 dtype support for bilinear and bicubic is limited to cpu and
# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
# For torch.compile we use uint8 input and let decomposition work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.

We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like

def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool:
	if torch._dynamo.is_compiling():
	    return True  # both bilinear and bicubic are OK, right?
	# then conditions as before

And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?

vfdev-5 and others added 2 commits November 14, 2023 15:59
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@vfdev-5 vfdev-5 merged commit ab4c102 into pytorch:main Nov 22, 2023
60 of 64 checks passed
@vfdev-5 vfdev-5 deleted the compile-resize branch November 22, 2023 10:14
Copy link

Hey @vfdev-5!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Jan 15, 2024
Summary: Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Reviewed By: vmoens

Differential Revision: D52539004

fbshipit-source-id: 8fe4de87cd225118895b24c3b317c646bcb66356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants