Reduced number of graphs for compiled resize #8108

vfdev-5 · 2023-11-10T09:25:00Z

Related to #8056
Depends on https://github.com/pytorch/vision/pull/8092/files

Description:

Reduced number of graphs for compiled resize
Before: 3 -> Now: 2
Now it can look like for CL input (1, 3, 500, 400) resized to (256, 256)

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 0 =====
 <eval_with_key>.4 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self):
        return ()
        

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 4 =====
 <eval_with_key>.37 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: u8[1, 3, 500, 400]):
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:231, code: image = image.reshape(-1, num_channels, old_height, old_width)
        view: u8[1, 3, 500, 400] = torch.ops.aten.view.default(arg0_1, [1, 3, 500, 400]);  arg0_1 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:244, code: image = image.as_strided((1, num_channels, old_height, old_width), new_strides)
        as_strided: u8[1, 3, 500, 400] = torch.ops.aten.as_strided.default(view, [1, 3, 500, 400], [600000, 1, 1200, 3]);  view = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:250, code: image = interpolate(
        _upsample_bilinear2d_aa: u8[1, 3, 256, 256] = torch.ops.aten._upsample_bilinear2d_aa.default(as_strided, [256, 256], False);  as_strided = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:266, code: return image.reshape(shape[:-3] + (num_channels, new_height, new_width))
        view_1: u8[1, 3, 256, 256] = torch.ops.aten.view.default(_upsample_bilinear2d_aa, [1, 3, 256, 256]);  _upsample_bilinear2d_aa = None
        return (view_1,)

Reduced number of graphs for compiled resized_crop. Now it gives the following:

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 0 =====
 <eval_with_key>.4 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self):
        return ()
        

[aot_autograd.py:2047 INFO] TRACED GRAPH
 ===== Forward graph 4 =====
 <eval_with_key>.37 from /usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py:509 in wrapped class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: u8[1, 3, 500, 400]):
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:1350, code: return image[..., top:bottom, left:right]
        slice_1: u8[1, 3, 300, 400] = torch.ops.aten.slice.Tensor(arg0_1, 2, 1, 301);  arg0_1 = None
        slice_2: u8[1, 3, 300, 300] = torch.ops.aten.slice.Tensor(slice_1, 3, 2, 302);  slice_1 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:238, code: image = image.reshape(-1, num_channels, old_height, old_width)
        view: u8[1, 3, 300, 300] = torch.ops.aten.view.default(slice_2, [1, 3, 300, 300]);  slice_2 = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:257, code: image = interpolate(
        _upsample_bilinear2d_aa: u8[1, 3, 256, 256] = torch.ops.aten._upsample_bilinear2d_aa.default(view, [256, 256], False);  view = None
        
        # File: /vision/torchvision/transforms/v2/functional/_geometry.py:273, code: return image.reshape(shape[:-3] + (num_channels, new_height, new_width))
        view_1: u8[1, 3, 256, 256] = torch.ops.aten.view.default(_upsample_bilinear2d_aa, [1, 3, 256, 256]);  _upsample_bilinear2d_aa = None
        return (view_1,)

pytorch-bot · 2023-11-10T09:25:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8108

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 3688b50 with merge base 893b4ab ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug · 2023-11-10T14:59:06Z

torchvision/transforms/v2/functional/_geometry.py

@@ -225,11 +235,13 @@ def resize_image(
        elif image.device.type == "cpu":
            # uint8 dtype support for bilinear and bicubic is limited to cpu and
            # according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
-            if (interpolation == InterpolationMode.BILINEAR and "AVX2" in torch.backends.cpu.get_cpu_capability()) or (
+            # For torch.compile we use uint8 input and let decomposition work
+            if (interpolation == InterpolationMode.BILINEAR and _can_add_uint8()) or (


Can we include this entire if's conditions into the _can_add_uint8() helper? I.e. something like:

def _can_add_uint8(): if interpolation == InterpolationMode.BILINEAR: if torch._dynamo.is_compiling(): return True else: return "AVX2" in torch.backends.cpu.get_cpu_capability() else: return interpolation == InterpolationMode.BICUBIC

This would make the name _can_add_unit8() more logical IMO.

Also, we seem to have some duplicated comments now (those in line 236 for example).

NicolasHug

Thanks a lot for the fix @vfdev-5 . The fix LGTM, caveated with the comments in #8092 (comment) about whether or not we should include the tests

Before: 3 -> Now: 2

NicolasHug

Thanks @vfdev-5 , minor comments but LGTM anyway.

BTW, this isn't really critical, but is there a way to make get_cpu_capability() compatible with dynamo?

torchvision/transforms/v2/functional/_geometry.py

NicolasHug · 2023-11-10T16:47:12Z

torchvision/transforms/v2/functional/_geometry.py

+# inside resize_image due to torchscript.
+# uint8 dtype support for bilinear and bicubic is limited to cpu and
+# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
+# For torch.compile we use uint8 input and let decomposition work


I think we should be clear that the reason we always use uint8 for dynamo is simply that it doesn't support get_cpu_capability(), so with the suggested comment below, this comment is probably unnecessary

Suggested change

# For torch.compile we use uint8 input and let decomposition work

A decomposition can also not support uint8 dtype, so the fact that we return True instead of False is that we believe that decomposition can work with uint8 dtype.
Even if dynamo "supported" get_cpu_capability() this heuristic to perform u8->f32->interpolate->u8 on non-AVX systems can be wrong for compiled version.

OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.

We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like

def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool: if torch._dynamo.is_compiling(): return True # both bilinear and bicubic are OK, right? # then conditions as before

And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?

if torch._dynamo.is_compiling(): return True # both bilinear and bicubic are OK, right?

Well, right now, it may be safer to set return False due to pytorch/pytorch#104182 not yet merged

NicolasHug

Minor suggestion but still LGTM, thanks @vfdev-5

torchvision/transforms/v2/functional/_geometry.py

NicolasHug · 2023-11-13T14:26:05Z

torchvision/transforms/v2/functional/_geometry.py

+# inside resize_image due to torchscript.
+# uint8 dtype support for bilinear and bicubic is limited to cpu and
+# according to our benchmarks, non-AVX CPUs should still prefer u8->f32->interpolate->u8 path for bilinear
+# For torch.compile we use uint8 input and let decomposition work


OK, that makes sense. I added a suggestion above to clarify that the benchmarks were only relevant for eager.

We can merge now an iterate a bit later, but do you think our conditions could be a bit simplified? I think we should be able to do something like

def _do_native_uint8_resize_on_cpu(interpolation: InterpolationMode) -> bool: if torch._dynamo.is_compiling(): return True # both bilinear and bicubic are OK, right? # then conditions as before

And IDK if that's true but perhaps torch.compile works for bilinear and bicubic on GPU as well, in which case we can probably write that condition much earlier?

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

github-actions · 2023-11-22T10:14:21Z

Hey @vfdev-5!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Reviewed By: vmoens Differential Revision: D52539004 fbshipit-source-id: 8fe4de87cd225118895b24c3b317c646bcb66356

facebook-github-bot added the cla signed label Nov 10, 2023

vfdev-5 force-pushed the compile-resize branch 2 times, most recently from 60023be to ec515eb Compare November 10, 2023 10:48

NicolasHug reviewed Nov 10, 2023

View reviewed changes

vfdev-5 added 5 commits November 10, 2023 09:59

Added torch compile checks for functional ops

c94fb7c

Fixed failing tests

0ec1943

Reduced number of graphs for compiled resize

bff9687

Before: 3 -> Now: 2

Fix to reduce the number of graphs for resized_crop

09e1024

Renamed the method and simplified a bit the code

99c0962

vfdev-5 force-pushed the compile-resize branch from 900ed56 to 99c0962 Compare November 10, 2023 16:30

NicolasHug approved these changes Nov 10, 2023

View reviewed changes

Few cosmetics

3337a14

NicolasHug approved these changes Nov 13, 2023

View reviewed changes

vfdev-5 and others added 2 commits November 14, 2023 15:59

Apply suggestions from code review

8ebaa95

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Merge branch 'main' into compile-resize

3688b50

vfdev-5 merged commit ab4c102 into pytorch:main Nov 22, 2023
60 of 64 checks passed

vfdev-5 deleted the compile-resize branch November 22, 2023 10:14

vfdev-5 added module: transforms pt2 labels Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced number of graphs for compiled resize #8108

Reduced number of graphs for compiled resize #8108

vfdev-5 commented Nov 10, 2023 •

edited

Loading

pytorch-bot bot commented Nov 10, 2023 •

edited

Loading

NicolasHug Nov 10, 2023

NicolasHug left a comment

NicolasHug left a comment

NicolasHug Nov 10, 2023

vfdev-5 Nov 13, 2023

NicolasHug Nov 13, 2023

vfdev-5 Nov 14, 2023 •

edited

Loading

NicolasHug left a comment

NicolasHug Nov 13, 2023

github-actions bot commented Nov 22, 2023

Reduced number of graphs for compiled resize #8108

Reduced number of graphs for compiled resize #8108

Conversation

vfdev-5 commented Nov 10, 2023 • edited Loading

pytorch-bot bot commented Nov 10, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8108

❌ 2 New Failures, 2 Unrelated Failures

NicolasHug Nov 10, 2023

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Nov 10, 2023

Choose a reason for hiding this comment

vfdev-5 Nov 13, 2023

Choose a reason for hiding this comment

NicolasHug Nov 13, 2023

Choose a reason for hiding this comment

vfdev-5 Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Nov 13, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 22, 2023

vfdev-5 commented Nov 10, 2023 •

edited

Loading

pytorch-bot bot commented Nov 10, 2023 •

edited

Loading

vfdev-5 Nov 14, 2023 •

edited

Loading