More NT subclass op support for SAM #111253

jbschlosser · 2023-10-13T22:13:29Z

Stack from ghstack (oldest at bottom):

-> More NT subclass op support for SAM #111253

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder.

cc @cpuhrsch @bhosmer @drisspg @soulitzer @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2023-10-13T22:13:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111253

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 659d5d7 with merge base 3eb5cae ():

NEW FAILURE - The following job has failed:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

trunk / linux-focal-rocm5.6-py3.8 / test (default, 3, 3, linux.rocm.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/nested/_internal/ops.py

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 585b4a7 Pull Request resolved: #111253

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 7111393 Pull Request resolved: #111253

soulitzer · 2023-10-13T23:42:22Z

torch/nested/_internal/nested_tensor.py

+            output = func(*t_args, **t_kwargs)
+            return NestedTensor(output, **extract_kwargs(args[0]))
+        with torch._C.DisableTorchFunctionSubclass():
+            return func(*args, **kwargs)


What's the plan here with torch compile?

(I suspect tests would fail)

Yeah good question, I need to look into this. For the purposes of getting SAM running without torch.compile, this is fine, but not great for the immediate next step

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 25d262e Pull Request resolved: #111253

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: f680f61 Pull Request resolved: #111253

jbschlosser · 2023-10-13T22:14:27Z

aten/src/ATen/native/Convolution.cpp

+  // assume NTs are always batched
+  if (input.is_nested()) {
+    return std::make_tuple(input, true);
+  }


context: conv_transpose2d is composite implicit, so I can't override this behavior. It's easiest to add this hack here to avoid messing with shapes for the NT case.

just to give an alternative: You could override conv2d_transpose for the AutogradNestedTensor key using py_impl. But maybe this is more of a pain, since it would require you to re-implement more of conv2d_transpose in python?

ah thanks for the suggestion! Indeed this would be more work, as I'd have to implement more of conv2d_transpose, alongside the other 1d/2d/3d transposed / non-transposed variants.

torch/nested/_internal/ops.py

jbschlosser · 2023-10-17T18:34:47Z

torch/nested/_internal/ops.py

-    if bias is not None:
-        new_values += bias
-    return NestedTensor(new_values, **extract_kwargs(inp))
+    return NestedTensor(func(inp._values, **new_kwargs), **extract_kwargs(inp))


@soulitzer FYI this change fixes linear to expect weight in the form of (out_channels, in_channels). Backward and tests have to change to accommodate this as well

whoops, thanks!

jbschlosser · 2023-10-17T18:36:10Z

torch/nested/_internal/nested_tensor.py

+            output = func(*t_args, **t_kwargs)
+            return NestedTensor(output, **extract_kwargs(args[0]))
+        with torch._C.DisableTorchFunctionSubclass():
+            return func(*args, **kwargs)


Yeah good question, I need to look into this. For the purposes of getting SAM running without torch.compile, this is fine, but not great for the immediate next step

jbschlosser · 2023-10-18T16:05:42Z

@pytorchbot merge

pytorchmergebot · 2023-10-18T16:07:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. cc cpuhrsch bhosmer drisspg soulitzer jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 003bd81 Pull Request resolved: #111253

pytorchmergebot · 2023-10-18T18:07:46Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

jbschlosser · 2023-10-18T18:08:51Z

@pytorchbot merge

pytorchmergebot · 2023-10-18T18:10:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jbschlosser · 2023-10-18T21:19:21Z

@pytorchbot merge -f "ignore spurious failure"

pytorchmergebot · 2023-10-18T21:19:40Z

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

pytorchmergebot · 2023-10-18T21:21:17Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2023-10-18T22:18:44Z

@pytorchbot drci

(please ignore this, I'm testing Dr.CI)

This is the final part of #110054. The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used. * [x] #110054 * [x] #110133 * [x] This PR to clean up the broken trunk logic. One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there. The function looks a lot simpler now. ### Testing #111253 had 1 broken trunk failure as detected by Dr.CI from the base commit https://hud.pytorch.org/pytorch/pytorch/commit/3eb5cae3af1207ac58f77c5ac78669e276824cb9 (valid) while trymerge didn't detect that because ghstack base commit https://hud.pytorch.org/pytorch/pytorch/commit/be8e51717411e09d8e4343c055848d434964dfb5 didn't have the same failure (miss). Pull Request resolved: #111520 Approved by: https://github.com/clee2000

…11520) This is the final part of pytorch#110054. The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used. * [x] pytorch#110054 * [x] pytorch#110133 * [x] This PR to clean up the broken trunk logic. One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there. The function looks a lot simpler now. ### Testing pytorch#111253 had 1 broken trunk failure as detected by Dr.CI from the base commit https://hud.pytorch.org/pytorch/pytorch/commit/3eb5cae3af1207ac58f77c5ac78669e276824cb9 (valid) while trymerge didn't detect that because ghstack base commit https://hud.pytorch.org/pytorch/pytorch/commit/be8e51717411e09d8e4343c055848d434964dfb5 didn't have the same failure (miss). Pull Request resolved: pytorch#111520 Approved by: https://github.com/clee2000

More NT subclass op support for SAM

be8e517

[ghstack-poisoned]

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 13, 2023

jbschlosser requested review from cpuhrsch and soulitzer October 13, 2023 22:15

jbschlosser mentioned this pull request Oct 13, 2023

Model changes to allow using the jagged subclass the whole way through meta-pytorch/segment-anything-fast#32

Closed

soulitzer approved these changes Oct 13, 2023

View reviewed changes

soulitzer reviewed Oct 13, 2023

View reviewed changes

torch/nested/_internal/ops.py Show resolved Hide resolved

jbschlosser added a commit that referenced this pull request Oct 13, 2023

More NT subclass op support for SAM

d9da5b3

ghstack-source-id: 585b4a7 Pull Request resolved: #111253

jbschlosser added a commit that referenced this pull request Oct 13, 2023

More NT subclass op support for SAM

bc29351

ghstack-source-id: 7111393 Pull Request resolved: #111253

soulitzer reviewed Oct 13, 2023

View reviewed changes

jbschlosser added a commit that referenced this pull request Oct 16, 2023

More NT subclass op support for SAM

bbf032d

ghstack-source-id: 25d262e Pull Request resolved: #111253

cpuhrsch approved these changes Oct 16, 2023

View reviewed changes

jbschlosser added a commit that referenced this pull request Oct 17, 2023

More NT subclass op support for SAM

fea5edd

ghstack-source-id: f680f61 Pull Request resolved: #111253

jbschlosser commented Oct 17, 2023

View reviewed changes

jbschlosser added topic: improvements topic category release notes: nested tensor Changes that have a direct impact on nested tensors module: nestedtensor NestedTensor tag see issue #25032 and removed module: cpu CPU specific problem (e.g., perf, algorithm) labels Oct 18, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 18, 2023

pytorchmergebot added the merging label Oct 18, 2023

jbschlosser added a commit that referenced this pull request Oct 18, 2023

More NT subclass op support for SAM

6c7e18e

ghstack-source-id: 003bd81 Pull Request resolved: #111253

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 18, 2023

pytorchmergebot removed the merging label Oct 18, 2023

pytorchmergebot added the merging label Oct 18, 2023

jbschlosser mentioned this pull request Oct 18, 2023

Use jagged layout NT through the MaskDecoder meta-pytorch/segment-anything-fast#45

Merged

pytorchmergebot added Merged and removed merging labels Oct 18, 2023

pytorchmergebot closed this in ba2ba96 Oct 18, 2023

huydhn mentioned this pull request Oct 18, 2023

[BE] Clean up trymerge code handling broken trunk failures #111520

Closed

3 tasks

facebook-github-bot deleted the gh/jbschlosser/94/head branch November 18, 2023 15:27

More NT subclass op support for SAM #111253

More NT subclass op support for SAM #111253

Uh oh!

Conversation

jbschlosser commented Oct 13, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111253

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbschlosser commented Oct 18, 2023

Uh oh!

pytorchmergebot commented Oct 18, 2023

Merge started

Uh oh!

pytorchmergebot commented Oct 18, 2023

Merge failed

Uh oh!

jbschlosser commented Oct 18, 2023

Uh oh!

pytorchmergebot commented Oct 18, 2023

Merge started

Uh oh!

jbschlosser commented Oct 18, 2023

Uh oh!

pytorchmergebot commented Oct 18, 2023

Uh oh!

pytorchmergebot commented Oct 18, 2023

Merge started

Uh oh!

huydhn commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jbschlosser commented Oct 13, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 13, 2023 •

edited

Loading