[aot cache][ca] remove restriction on caching ca's aot inference graph #148491

xmfan · 2025-03-04T21:44:35Z

Stack from ghstack (oldest at bottom):

but still can't cache CA's aot inference graph yet: the CA functional ops aren't serializable

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

[ghstack-poisoned]

pytorch-bot · 2025-03-04T21:44:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148491

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 4175ba9 with merge base 097b0d3 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / libtorch-linux-focal-cuda12.4-py3.10-gcc9-debug / build (gh) (#148495)
undefined reference to std::__throw_bad_array_new_length()'`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

bdhirsh · 2025-03-05T18:01:36Z

test/dynamo/test_aot_autograd_cache.py

-        )  # from compiled autograd
+        with self.assertRaisesRegex(
+            torch._dynamo.exc.BackendCompilerFailed,
+            "BypassAOTAutogradCache: Unsupported call_function target torch._dynamo.compiled_autograd.ops.validate_outputs",


basic q: I would have imagined that the AOTAutograd cache misses, and is forced to re-run AOTAutograd on the ca graph. But in this test it sounds like compile hard errors duration compilation. Is that right / why is that the case?

yep, we rerun AOTAutograd on the CA dynamo graph, it doesn't fail. Then we try to cache it, which fails because of some dynamically registered ops found in the CA dynamo graph. In this unit test, we set a config to make that a hard error

xmfan · 2025-03-06T21:33:06Z

@pytorchbot merge -i

pytorchmergebot · 2025-03-06T21:34:49Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-03-07T00:09:12Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.9-gcc11 / test (default, 1, 5, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

xmfan · 2025-03-07T00:58:39Z

@pytorchbot rebase

pytorchmergebot · 2025-03-07T01:00:04Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

ghstack-source-id: 6c94c0b Pull Request resolved: #148491

pytorchmergebot · 2025-03-07T01:00:20Z

Successfully rebased gh/xmfan/191/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/148491)

xmfan · 2025-03-07T17:59:47Z

@pytorchbot merge

pytorchmergebot · 2025-03-07T18:02:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-03-07T18:08:09Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

ghstack-source-id: 96cd7cd Pull Request resolved: #148491

xmfan · 2025-03-08T00:52:53Z

@pytorchbot merge

pytorchmergebot · 2025-03-08T00:54:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

2d62211

[ghstack-poisoned]

xmfan mentioned this pull request Mar 4, 2025

[reland][ca] side-effect free inital trace: compiled_args #148376

Closed

xmfan mentioned this pull request Mar 4, 2025

[ca] remove compiled_autograd_tracing #148381

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Mar 4, 2025

xmfan added the topic: not user facing topic category label Mar 4, 2025

Update

94aeed0

[ghstack-poisoned]

xmfan mentioned this pull request Mar 5, 2025

[ca][aot] mark activations as maybe dynamic #148516

Closed

Update

dcafa00

[ghstack-poisoned]

xmfan marked this pull request as ready for review March 5, 2025 01:21

xmfan requested review from albanD, bdhirsh and soulitzer as code owners March 5, 2025 01:21

xmfan requested a review from jamesjwu March 5, 2025 01:21

jamesjwu approved these changes Mar 5, 2025

View reviewed changes

bdhirsh reviewed Mar 5, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 6, 2025

albanD removed their request for review March 6, 2025 21:34

pytorchmergebot added the merging label Mar 6, 2025

pytorchmergebot removed the merging label Mar 7, 2025

Update

6ba7cdb

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Mar 7, 2025

[aot cache][ca] remove restriction on caching ca's aot inference graph

bcb4597

ghstack-source-id: 6c94c0b Pull Request resolved: #148491

pytorchmergebot added the merging label Mar 7, 2025

pytorchmergebot removed the merging label Mar 7, 2025

Update

4175ba9

[ghstack-poisoned]

xmfan added a commit that referenced this pull request Mar 7, 2025

[aot cache][ca] remove restriction on caching ca's aot inference graph

1a74b7b

ghstack-source-id: 96cd7cd Pull Request resolved: #148491

pytorchmergebot added the merging label Mar 8, 2025

pytorchmergebot added the Merged label Mar 8, 2025

pytorchmergebot closed this in 666508e Mar 8, 2025

pytorchmergebot removed the merging label Mar 8, 2025

github-actions bot deleted the gh/xmfan/191/head branch April 12, 2025 02:11

[aot cache][ca] remove restriction on caching ca's aot inference graph #148491

[aot cache][ca] remove restriction on caching ca's aot inference graph #148491

Uh oh!

Conversation

xmfan commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148491

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

bdhirsh Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

xmfan Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

xmfan commented Mar 6, 2025

Uh oh!

pytorchmergebot commented Mar 6, 2025

Merge started

Uh oh!

pytorchmergebot commented Mar 7, 2025

Merge failed

Uh oh!

xmfan commented Mar 7, 2025

Uh oh!

pytorchmergebot commented Mar 7, 2025

Uh oh!

pytorchmergebot commented Mar 7, 2025

Uh oh!

xmfan commented Mar 7, 2025

Uh oh!

pytorchmergebot commented Mar 7, 2025

Merge started

Uh oh!

pytorchmergebot commented Mar 7, 2025

Merge failed

Uh oh!

xmfan commented Mar 8, 2025

Uh oh!

pytorchmergebot commented Mar 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xmfan commented Mar 4, 2025 •

edited

Loading

pytorch-bot bot commented Mar 4, 2025 •

edited

Loading