-
Notifications
You must be signed in to change notification settings - Fork 25.6k
remote_cache: Add a waitcounter for gets and sets #141307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141307
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit 420ac2d with merge base da94ab0 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/_inductor/remote_cache.py
Outdated
raise | ||
self._log_sample(sample) | ||
return result | ||
with _WaitCounter("pytorch.remote_cache.get"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a convo on another diff: are you instead planning to add this Waitcounter functionality to dynamo_timed? If so, can you sync w/ me on that?
And also: logging here will conflate different use cases, which may or may not be what we want. If we want to distinguish inductor remote cache from autotune remote cache, then we'd want to log "lower down"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sam and I talked directly, recording conclusions here for others to see :).
We currently have two parallel collection systems - dynamo_timed (which writes to a bunch of stuff). and WaitCounter (which is purely for counters).
Here we are purely trying to measure "remote network activity in cache", whether it is in inductor, or other things doing remote access. This is being done in WaitCounters due to where people want to consume it.
Not blocking this PR, but there is interest in writing to waitcounters from dynamo_timed, and unless it breaks stuff (i.e. multiple process dynamo timed has issues with events), we should probably have a span + a waitcounter for most spots within the compiler.
I am going to cut a PR for the base infrastructure (hopefully today, if not after thanksgiving). Then we'll work on consolidating the logging within compile.
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge -f "msvcc failure, unrelated to this code AFAICT". |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This adds a basic waitcounter to help show if we're spending a lot of time doing gets and sets to remote caches Pull Request resolved: pytorch#141307 Approved by: https://github.com/masnesral
Stack from ghstack (oldest at bottom):
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov