Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test fix] try to re-use cached symnodes across dynamo and AOTAutograd #120090

Closed
wants to merge 3 commits into from

Conversation

bdhirsh
Copy link
Contributor

@bdhirsh bdhirsh commented Feb 16, 2024

More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/

It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols).

That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new SymNode wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes.

This means that when we fakeify in dynamo and then again in AOTAutograd:

(1) the symints that we create for the sizes of the fake tensors in these two cases will share the same underlying sympy.symbol (thanks to the symbol cache)

(2) However, they will not share the same SymNode.

This is a problem because make_fx() has a symnode_tracker, that is effectively a giant map from [torch.SymNode] -> [Proxy]. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to run make_fx() on, and make_fx() will see two versions of s1 with different SymNodes, that it would then be obligated to create (different) proxies for.

Stack from ghstack (oldest at bottom):

Copy link

pytorch-bot bot commented Feb 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120090

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 2 Unrelated Failures

As of commit 6340936 with merge base 24968ff (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Feb 16, 2024
bdhirsh added a commit that referenced this pull request Feb 16, 2024
ghstack-source-id: eca9e3bd708490d69e7567d178b40617f16ccb95
Pull Request resolved: #120090
… AOTAutograd"

More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/

It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols).

That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new `SymNode` wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes.

This means that when we fakeify in dynamo and then again in AOTAutograd:

(1) the symints that we create for the sizes of the fake tensors in these two cases **will** share the same underlying `sympy.symbol` (thanks to the symbol cache)

(2) However, they will **not** share the same `SymNode`.

This is a problem because `make_fx()` has a `symnode_tracker`, that is effectively a giant map from `[torch.SymNode] -> [Proxy]`. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to run `make_fx()` on, and make_fx() will see two versions of `s1` with different SymNodes, that it would then be obligated to create (different) proxies for.





[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Feb 16, 2024
ghstack-source-id: a1ea3edb77fc88720741578461f388340f9ff41c
Pull Request resolved: #120090
@albanD albanD removed their request for review February 16, 2024 20:12
… AOTAutograd"

More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/

It looks like we used to try to cache symint creation - but that was changed in this PR, to cache raw sympy symbols instead: #115396 (the reason described in the PR is that it helps avoid creating spurious symbols).

That has an extra affect though, which is that only caching the raw sympy.symbols means that we will generate new `SymNode` wrappers around these symbols each time we fakeify a tensor and generate symints for its sizes.

This means that when we fakeify in dynamo and then again in AOTAutograd:

(1) the symints that we create for the sizes of the fake tensors in these two cases **will** share the same underlying `sympy.symbol` (thanks to the symbol cache)

(2) However, they will **not** share the same `SymNode`.

This is a problem because `make_fx()` has a `symnode_tracker`, that is effectively a giant map from `[torch.SymNode] -> [Proxy]`. We can end up with symints from both AOTAutograd and Dynamo showing up in the function that we want to run `make_fx()` on, and make_fx() will see two versions of `s1` with different SymNodes, that it would then be obligated to create (different) proxies for.





[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Feb 17, 2024
ghstack-source-id: 376553890eea0758e46dc7d8f2d5be184a1a6517
Pull Request resolved: #120090
@ezyang
Copy link
Contributor

ezyang commented Feb 20, 2024

Meh, I am not convinced by the approach in this PR.

What's not clear to me is why it matters that we have two different SymNodes for the same symbol. Yes, make_fx has a symnode_tracker, but it ordinarily isn't an error condition to have two SymNodes for the same symbol. The usual situation this could occur is if you have s0 = x.size(0) and s0_2 = y.size(0) where x and y compute to the same sympy expression but only after some non-trivial simplification. We will end up with two proxies for this case, but that is OK because one proxy will refer to the x.size(0) projection, while another refers to y.size(0) projection.

This is sort of related to the stuff @eellison was working on. It actually, eventually, is undesirable to have s0 proxy that depends on the tensor x, because it will keep x live longer than it needs to. So there's a separate FX pass that tries to break these dependencies, based on the underlying sympy expression. I know I asked Elias to undo his pass interleaved directly with make_fx because it ended up being complicated and difficult to handle, but maybe a version of it where we eliminate symnode_tracker entirely might still be worth it. But we have to do this carefully because we need to avoid accidentally DCE'ing a spurious dependence on tangents.

Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Apr 20, 2024
@github-actions github-actions bot closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants