-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Avoid AOTAutogradCache.load in stack trace on cache miss path #158149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158149
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit f03cbce with merge base 4b9a6f7 ( NEW FAILURE - The following job has failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
remote, | ||
) | ||
if compiled_fn is not None: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's gotta be a better way to write this than a while True that never loops more than once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Python had do...while I would have used that :P I think I am eventually going to refactor the cache hit test to be a single line conditional so I don't have to do the loop here.
Starting merge as part of PR stack under #158150 |
Starting merge as part of PR stack under #158176 |
break | ||
|
||
compiled_fn = dispatch_and_compile() | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had a hard time seeing what the meaningful difference in this PR is. it seems like there is not any change to nesting of function calls or pipelining, just a different coding style. maybe it will become clear in the next PR
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
Starting merge as part of PR stack under #158150 |
Starting merge as part of PR stack under #158176 |
Starting merge as part of PR stack under #158213 |
Starting merge as part of PR stack under #158251 |
Starting merge as part of PR stack under #158319 |
Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #158150 Approved by: https://github.com/jamesjwu, https://github.com/wconstab ghstack dependencies: #158149
Two main things of note: - Review this diff without whitespace changes - To ensure that context managers correctly propagate to later pipeline stages, I am using the ExitStack trick: there is an ExitStack which is in scope for the entire pipeline, and inside of the individual pipeline stages we push context managers onto this stack when we want them to survive into the next pipeline stage. This is not obviously what the best final form of the code is, but create_aot_dispatcher_function is called from multiple locations so I can't just inline the context managers into the call site. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #158173 Approved by: https://github.com/jamesjwu, https://github.com/wconstab ghstack dependencies: #158149, #158150
The starting point for this refactor is that I need access to the fully general joint graph representation in an export-like interface, but I then subsequently need a way to feed this joint graph into the rest of the compilation pipeline so I can get an actual callable that I can run once I've finished modifying it. Previously, people had added export capabilities to AOTAutograd by having an export flag that toggled what exactly the functions return and triggering aot_dispatch to go to a different "export" implementation, but I've found this difficult to understand and has lead to a bit of duplicate code for the export path. So the idea here is to reorganize the structure of the function calls in AOTAutograd. Here, it is helpful to first describe how things used to work: * Start with aot_autograd.py top level functions like aot_function, _aot_export_function and aot_module_simplified. These call: * create_aot_dispatcher_function. This does a bunch of stuff (forward metadata collection) and adds many context managers. This calls: * One of aot_dispatch_base, aot_dispatch_export or aot_dispatch_autograd, which: * Call aot_dispatch_autograd_graph or aot_dispatch_base_graph to actually do the graph capture * Do some base/export/autograd specific post-processing on the graph Notice the pattern of nested function invocations means that there is no way to easily get the graph capture result from the autograd case; furthermore, the export path is "bolted" on to force the entire chain of functions to have a different return result than normal, and no way to *resume* the rest of the post-processing to actually get a callable. Here is the new structure: * Start with aot_autograd.py top level functions like aot_function, _aot_export_function and aot_module_simplified. These now orchestrate this top level flow: * Start a context manager (stack); this stateful context block takes care of all of the nested context managers which originally necessitated the nested call structure * Call create_aot_state to do initial setup and setup all the context managers on stack. These context managers do NOT exit upon return of this. * Call aot_stage1_graph_capture to do the graph capture * Call aot_stage2_compile or aot_stage2_export depending on what postprocessing you want With this new structure, it's now possible (although not done in this PR) to return the graph after aot_stage1_graph_capture and do something with it, before running aot_stage2_compile to finish the job. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #158213 Approved by: https://github.com/jamesjwu ghstack dependencies: #158149, #158150, #158173, #158176
Stack from ghstack (oldest at bottom):
The general context for the upcoming stack of commits is I am attempting
to "pipeline" AOTAutograd. Instead of having function f call function g
which is the next "stage" of compilation, instead f should return with
its outputs, which are then piped to g for the next stage. This will
make it easier to implement early exit / resume pipeline without forcing
callback structure, which is good for export-style use cases. It also
reduces the size of our stack traces, which makes tools like Perfetto
happy.
Signed-off-by: Edward Z. Yang ezyang@meta.com