Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frame/recompile counter to all log messages in tracing context #107530

Closed
wants to merge 5 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Aug 20, 2023

Stack from ghstack (oldest at bottom):

All log messages that occur while running Dynamo compilation now have [X/Y] added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N. Here is what the logs look like as you transition from one frame to another:

image

To accurately get this prefix added to all messages, I had to expand the scope of the tracing context manager. Its scope now coincides with log_compilation_event. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph.

This subsumes the previous X.Y logging that was solely for dynamic shapes.

Unfortunately I had to reindent some stuff. Review the diff with whitespace off.

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Aug 20, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 20, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107530

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit 4aad15e with merge base bdecdfd (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Aug 20, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: c77094c257e6b6c6b4fbb84a99a21a66c75b6195
Pull Request resolved: #107530
… context"

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 aakhundov

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Aug 20, 2023
Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 86186fe8626c2b132230e15b40ba807aa5008730
Pull Request resolved: #107530
@ezyang ezyang added the topic: not user facing topic category label Aug 20, 2023
… context"


All log messages that occur while running Dynamo compilation now have `[X/Y]` added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N.  Here is what the logs look like as you transition from one frame to another:

<img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/13564/4897e368-1e50-4807-b342-54e911bcf087">

To accurately get this prefix added to all messages, I had to expand the scope of the `tracing` context manager. Its scope now coincides with `log_compilation_event`. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph.

This subsumes the previous X.Y logging that was solely for dynamic shapes.

Unfortunately I had to reindent some stuff. Review the diff with whitespace off.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 aakhundov

[ghstack-poisoned]
… context"


All log messages that occur while running Dynamo compilation now have `[X/Y]` added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N.  Here is what the logs look like as you transition from one frame to another:

<img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/13564/4897e368-1e50-4807-b342-54e911bcf087">

To accurately get this prefix added to all messages, I had to expand the scope of the `tracing` context manager. Its scope now coincides with `log_compilation_event`. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph.

This subsumes the previous X.Y logging that was solely for dynamic shapes.

Unfortunately I had to reindent some stuff. Review the diff with whitespace off.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305

[ghstack-poisoned]
… context"


All log messages that occur while running Dynamo compilation now have `[X/Y]` added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N.  Here is what the logs look like as you transition from one frame to another:

<img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/13564/4897e368-1e50-4807-b342-54e911bcf087">

To accurately get this prefix added to all messages, I had to expand the scope of the `tracing` context manager. Its scope now coincides with `log_compilation_event`. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph.

This subsumes the previous X.Y logging that was solely for dynamic shapes.

Unfortunately I had to reindent some stuff. Review the diff with whitespace off.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305

[ghstack-poisoned]
@ezyang ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 20, 2023
pytorchmergebot pushed a commit that referenced this pull request Aug 21, 2023
Instead of (poorly) reconstructing the guard list from the guards on OutputGraph, we log them at the horses mouth: when we actually codegen the guard. This only requires very modest refactoring: as we translate guards into code parts, we also have to pass the source guard along so we can use it to give stack information.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #107532
Approved by: https://github.com/anijain2305
ghstack dependencies: #107505, #107516, #107530
pytorchmergebot pushed a commit that referenced this pull request Aug 21, 2023
The new guard messages look like:

```
check_tensor(L['y'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3], stride=[1])  # _dynamo/variables/builder.py:1237 in wrap_fx_proxy_cls
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #107562
Approved by: https://github.com/anijain2305, https://github.com/jansel
ghstack dependencies: #107505, #107516, #107530, #107532
pytorchmergebot pushed a commit that referenced this pull request Aug 22, 2023
CapturedTraceback is fast but one downside is that it has strong references to code objects, which via `co_extra` can cause un-collectable cycles. This means that it is important to clear out CapturedTraceback when you are done with it; e.g., if you tracebacks during compilation, you need to explicitly clear them out at the end of compilation to actually make sure they promptly deallocate.

Instead of caching `summary` on the CapturedTraceback, we simply allow for tracebacks to have `tb = None`. Tracebacks get dropped if you pickle the traceback, or if you explicitly call cleanup().

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: #107471
Approved by: https://github.com/voznesenskym
ghstack dependencies: #107505, #107516, #107530, #107532, #107562
pytorchmergebot pushed a commit that referenced this pull request Aug 22, 2023
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/2316/head branch August 24, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants