Short-term fix to preserve NJT metadata cache in torch.compile #122836

jbschlosser · 2024-03-27T21:48:02Z

Stack from ghstack (oldest at bottom):

-> Short-term fix to preserve NJT metadata cache in torch.compile #122836

Idea: close over min / max sequence length in the main NJT view func (_nested_view_from_jagged) so that view replay during fake-ification propagates these correctly in torch.compile.

For dynamic shapes support for min / max sequence length, this PR uses a hack that stores the values in (val, 0) shaped tensors.

NB: This PR changes SDPA to operate on real views instead of using buffer_from_jagged() / ViewNestedFromBuffer, which may impact the internal FIRST model. That is, it undoes the partial revert from #123215 alongside a fix to the problem that required the partial revert. We need to verify that there are no regressions there before landing.

Differential Revision: D55448636

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-03-27T21:48:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122836

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Failure with setup-ssh on Amazon Linux 2023 runners

❌ 3 New Failures, 8 Unrelated Failures

As of commit 6ea26c9 with merge base ea47d54 ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
'Test'
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16
pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)
Process completed with exit code 1.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu) (gh) (similar failure)
test_cuda.py::TestCuda::test_out_of_memory_retry

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
'Test'

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 · 2024-03-27T22:17:11Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

tools/autograd/derivatives.yaml

torch/nested/_internal/ops.py

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

jbschlosser · 2024-03-29T21:24:15Z

aten/src/ATen/native/native_functions.yaml

@@ -6200,6 +6200,16 @@
  device_check: NoCheck
  dispatch: {}

+- func: _nested_get_min_seqlen(Tensor self) -> int


I think these need to be symbolic to work with dynamic shapes. TBD exploring this

This needs more thought / design work to fully cover dynamic shapes. @YuqingJ mentioned to me that a static max length is being used for now, so maybe we can get a patch to work for FIRST without this, but it's iffy.

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

clee2000 · 2024-04-02T21:56:55Z

@clee2000 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

jbschlosser · 2024-04-03T15:14:07Z

@jbschlosser has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

torch/nested/_internal/ops.py

soulitzer · 2024-05-02T16:27:34Z

test/test_nestedtensor.py

+            return torch.ones_like(nt) * nt.max_seqlen()
+
+        for dynamic in [False, True]:
+            self.assertFalse(_recompiles_for_inputs(f, (nt,), (nt2,), dynamic=dynamic))


jfyi there's with unittest.mock.patch("torch._dynamo.config.error_on_recompile", True): though _recompiles_for_inputs seems generally useful if you also want to make sure cases where recompiles=True

I lifted _recompiles_for_inputs() from some other test file, which isn't great :p

test/test_nestedtensor.py

torch/nested/_internal/nested_tensor.py

davidberard98 · 2024-05-02T16:41:43Z

test/test_nestedtensor.py

@@ -4339,6 +4353,115 @@ def forward(self, query, value, offsets):
        self.assertTrue(torch.allclose(attn_output_eager, attn_output))
        self.assertTrue(torch.allclose(value_grad, value.grad))

+    @dtypes(torch.float32)


do you know if this will work with a max_seqlen manually assigned in the graph? (and if so, can you add a test / or add as a followup)

good Q! this works with manual assignment as done in FIRST (i.e. via convert_jagged_to_nested_tensor() -> ViewNestedFromBuffer with the metadata cache info). Also see test_dummy_mha_with_nt which tries to match as closely as possible to what the FIRST model does.

We should arguably support and test manual max seq len assignment for the new API. I'll address this or add a TODO

Update: I think I'll boot handling manual assignment outside of the FIRST use case until we can have public @property getters / setters for min / max sequence length that work with PT2 (this is in progress)

[ghstack-poisoned]

jbschlosser · 2024-06-19T00:29:20Z

Just FYI, we have a small tool to detect dependencies here https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml. I run the job with this PR and get the following list of conflicts https://github.com/pytorch/test-infra/actions/runs/9573995289/attempts/1#summary-26396464107. This is an easy case because they are all in your stack, but the tool might be useful to find conflicts from elsewhere (if it happens)

wow that's super useful! thanks for the link :)

jbschlosser · 2024-06-20T18:02:37Z

Confirmed that the memory leak issues are pre-existing and not introduced by this PR. They need more investigation, but I'm not even convinced these are NJT-related. For example, I saw a memory leak from running this minimal, non-NJT example with PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1:

def test_memory_leak(self, device):
    value = torch.randn(6, 4, requires_grad=True, device=device)
    m = torch.nn.Linear(4, 6, device=device)
    symbolic_traced: torch.fx.GraphModule = torch.fx.symbolic_trace(m)
    m = torch.compile(symbolic_traced)
    m(value)

I'm going to go ahead and reland this with the ROCm / DEBUG=1 assert fixes added.

…pile" Idea: close over min / max sequence length in the main NJT view func (`_nested_view_from_jagged`) so that view replay during fake-ification propagates these correctly in torch.compile. For dynamic shapes support for min / max sequence length, this PR uses a hack that stores the values in `(val, 0)` shaped tensors. **NB: This PR changes SDPA to operate on real views instead of using `buffer_from_jagged()` / `ViewNestedFromBuffer`, which may impact the internal FIRST model. That is, it undoes the partial revert from #123215 alongside a fix to the problem that required the partial revert. We need to verify that there are no regressions there before landing.** Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: 853d357d36ed3edd4a831b1eccad625d53ac73ff Pull Request resolved: #122836

jbschlosser · 2024-06-20T18:04:14Z

@pytorchbot merge

pytorchmergebot · 2024-06-20T18:05:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-20T18:42:14Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jbschlosser · 2024-06-20T19:06:53Z

@pytorchbot merge -i

pytorchmergebot · 2024-06-20T19:08:31Z

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-20T19:13:54Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jbschlosser · 2024-06-20T19:24:32Z

@pytorchbot merge -i

pytorchmergebot · 2024-06-20T19:28:52Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge), pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-20T22:44:14Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

jbschlosser · 2024-06-20T23:13:53Z

@pytorchbot merge -i

pytorchmergebot · 2024-06-20T23:15:26Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Short-term fix to preserve NJT metadata cache in torch.compile

fcb0820

[ghstack-poisoned]

jbschlosser requested review from albanD and soulitzer as code owners March 27, 2024 21:48

jbschlosser mentioned this pull request Mar 27, 2024

(WIP) to_padded_tensor() triton kernel for NJT #121947

Draft

jbschlosser requested a review from YuqingJ March 27, 2024 21:48

albanD removed their request for review March 27, 2024 22:00

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

ab7e444

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

soulitzer reviewed Mar 28, 2024

View reviewed changes

tools/autograd/derivatives.yaml Outdated Show resolved Hide resolved

YuqingJ reviewed Mar 28, 2024

View reviewed changes

torch/nested/_internal/ops.py Outdated Show resolved Hide resolved

jbschlosser added 2 commits March 29, 2024 10:51

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

b9606a7

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

ef02e6a

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

jbschlosser commented Mar 29, 2024

View reviewed changes

jbschlosser added 3 commits April 2, 2024 16:22

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

1bd5588

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

abf28a8

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

957d277

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

Update on "Short-term fix to preserve NJT metadata cache in torch.com…

78efc28

…pile" Differential Revision: [D55448636](https://our.internmc.facebook.com/intern/diff/D55448636) [ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Apr 3, 2024

Update

ece048c

[ghstack-poisoned]

jbschlosser added the topic: not user facing topic category label May 1, 2024

Update

b6d1abe

[ghstack-poisoned]

soulitzer reviewed May 2, 2024

View reviewed changes

torch/nested/_internal/ops.py Show resolved Hide resolved

soulitzer reviewed May 2, 2024

View reviewed changes

test/test_nestedtensor.py Outdated Show resolved Hide resolved

soulitzer reviewed May 2, 2024

View reviewed changes

torch/nested/_internal/nested_tensor.py Show resolved Hide resolved

davidberard98 reviewed May 2, 2024

View reviewed changes

Update

6d75373

[ghstack-poisoned]

pytorchmergebot reopened this Jun 19, 2024

jbschlosser added a commit that referenced this pull request Jun 20, 2024

Short-term fix to preserve NJT metadata cache in torch.compile

d12d3d0

ghstack-source-id: 853d357d36ed3edd4a831b1eccad625d53ac73ff Pull Request resolved: #122836

pytorchmergebot added the merging label Jun 20, 2024

pytorchmergebot removed the merging label Jun 20, 2024

pytorchmergebot added the merging label Jun 20, 2024

pytorchmergebot removed the merging label Jun 20, 2024

pytorchmergebot added the merging label Jun 20, 2024

pytorchmergebot removed the merging label Jun 20, 2024

pytorchmergebot added the merging label Jun 20, 2024

pytorchmergebot closed this in 31d5753 Jun 20, 2024

pytorchmergebot removed the merging label Jun 20, 2024

jbschlosser mentioned this pull request Jun 24, 2024

Fix DEBUG=1 asserts with NJT ops #129014

Closed

williamwen42 mentioned this pull request Jul 1, 2024

[dynamo] memory leak when compiling symbolic_trace result #129901

Open

jbschlosser mentioned this pull request Jul 12, 2024

[NestedTensor] torch.compile silent specialization on _max_seqlen #124205

Closed

github-actions bot deleted the gh/jbschlosser/131/head branch July 21, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short-term fix to preserve NJT metadata cache in torch.compile #122836

Short-term fix to preserve NJT metadata cache in torch.compile #122836

jbschlosser commented Mar 27, 2024 •

edited

Loading

pytorch-bot bot commented Mar 27, 2024 •

edited

Loading

davidberard98 commented Mar 27, 2024

jbschlosser Mar 29, 2024

jbschlosser Apr 2, 2024 •

edited

Loading

clee2000 commented Apr 2, 2024

jbschlosser commented Apr 3, 2024

soulitzer May 2, 2024

jbschlosser May 7, 2024

davidberard98 May 2, 2024

jbschlosser May 7, 2024

jbschlosser May 8, 2024

jbschlosser commented Jun 19, 2024

jbschlosser commented Jun 20, 2024

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

Short-term fix to preserve NJT metadata cache in torch.compile #122836

Short-term fix to preserve NJT metadata cache in torch.compile #122836

Conversation

jbschlosser commented Mar 27, 2024 • edited Loading

pytorch-bot bot commented Mar 27, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122836

❗ 1 Active SEVs

❌ 3 New Failures, 8 Unrelated Failures

davidberard98 commented Mar 27, 2024

jbschlosser Mar 29, 2024

Choose a reason for hiding this comment

jbschlosser Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

clee2000 commented Apr 2, 2024

jbschlosser commented Apr 3, 2024

soulitzer May 2, 2024

Choose a reason for hiding this comment

jbschlosser May 7, 2024

Choose a reason for hiding this comment

davidberard98 May 2, 2024

Choose a reason for hiding this comment

jbschlosser May 7, 2024

Choose a reason for hiding this comment

jbschlosser May 8, 2024

Choose a reason for hiding this comment

jbschlosser commented Jun 19, 2024

jbschlosser commented Jun 20, 2024

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

Merge started

pytorchmergebot commented Jun 20, 2024

Merge failed

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

Merge started

pytorchmergebot commented Jun 20, 2024

Merge failed

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

Merge started

pytorchmergebot commented Jun 20, 2024

Merge failed

jbschlosser commented Jun 20, 2024

pytorchmergebot commented Jun 20, 2024

Merge started

jbschlosser commented Mar 27, 2024 •

edited

Loading

pytorch-bot bot commented Mar 27, 2024 •

edited

Loading

jbschlosser Apr 2, 2024 •

edited

Loading