-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test fix] try to re-use cached symnodes across dynamo and AOTAutograd #120090
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120090
Note: Links to docs will display an error until the docs builds have been completed. ❌ 9 New Failures, 2 Unrelated FailuresAs of commit 6340936 with merge base 24968ff (): NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: eca9e3bd708490d69e7567d178b40617f16ccb95 Pull Request resolved: #120090
… AOTAutograd" More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/ It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols). That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new `SymNode` wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes. This means that when we fakeify in dynamo and then again in AOTAutograd: (1) the symints that we create for the sizes of the fake tensors in these two cases **will** share the same underlying `sympy.symbol` (thanks to the symbol cache) (2) However, they will **not** share the same `SymNode`. This is a problem because `make_fx()` has a `symnode_tracker`, that is effectively a giant map from `[torch.SymNode] -> [Proxy]`. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to run `make_fx()` on, and make_fx() will see two versions of `s1` with different SymNodes, that it would then be obligated to create (different) proxies for. [ghstack-poisoned]
ghstack-source-id: a1ea3edb77fc88720741578461f388340f9ff41c Pull Request resolved: #120090
… AOTAutograd" More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/ It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols). That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new `SymNode` wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes. This means that when we fakeify in dynamo and then again in AOTAutograd: (1) the symints that we create for the sizes of the fake tensors in these two cases **will** share the same underlying `sympy.symbol` (thanks to the symbol cache) (2) However, they will **not** share the same `SymNode`. This is a problem because `make_fx()` has a `symnode_tracker`, that is effectively a giant map from `[torch.SymNode] -> [Proxy]`. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to run `make_fx()` on, and make_fx() will see two versions of `s1` with different SymNodes, that it would then be obligated to create (different) proxies for. [ghstack-poisoned]
ghstack-source-id: 376553890eea0758e46dc7d8f2d5be184a1a6517 Pull Request resolved: #120090
Meh, I am not convinced by the approach in this PR. What's not clear to me is why it matters that we have two different SymNodes for the same symbol. Yes, make_fx has a symnode_tracker, but it ordinarily isn't an error condition to have two SymNodes for the same symbol. The usual situation this could occur is if you have This is sort of related to the stuff @eellison was working on. It actually, eventually, is undesirable to have s0 proxy that depends on the tensor x, because it will keep x live longer than it needs to. So there's a separate FX pass that tries to break these dependencies, based on the underlying sympy expression. I know I asked Elias to undo his pass interleaved directly with make_fx because it ended up being complicated and difficult to handle, but maybe a version of it where we eliminate symnode_tracker entirely might still be worth it. But we have to do this carefully because we need to avoid accidentally DCE'ing a spurious dependence on tangents. |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/
It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols).
That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new
SymNode
wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes.This means that when we fakeify in dynamo and then again in AOTAutograd:
(1) the symints that we create for the sizes of the fake tensors in these two cases will share the same underlying
sympy.symbol
(thanks to the symbol cache)(2) However, they will not share the same
SymNode
.This is a problem because
make_fx()
has asymnode_tracker
, that is effectively a giant map from[torch.SymNode] -> [Proxy]
. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to runmake_fx()
on, and make_fx() will see two versions ofs1
with different SymNodes, that it would then be obligated to create (different) proxies for.Stack from ghstack (oldest at bottom):