-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[pytorch][logging] add empty wait counter implementation #128466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128466
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (4 Unrelated Failures)As of commit 6edbbe2 with merge base 8b6391e ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@jamesperng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
||
bool ProcessGroupNCCL::WorkNCCL::checkTimeout( | ||
std::optional<std::chrono::milliseconds> timeout) { | ||
SCOPED_WAIT_COUNTER_US("pytorch.logging.wait_counter.ProcessGroupNCCL::WorkNCCL::checkTimeout"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be convention that all counters have the same prefix "pytorch.logging.wait_counter"? If so maybe its worth prepending that inside the counter impl and then users can just write the suffix "ProcessGroup..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for these wait counters, we would have the same prefix pytorch.wait_counter (i took out the logging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downside of having the prefix put in implicitly, is that it might make the counters less searchable in the code?
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
Pull Request resolved: #128466 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @imported-using-ghimport Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466/) ghstack-source-id: 230091361
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
/easycla |
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D58441466](https://our.internmc.facebook.com/intern/diff/D58441466) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D58441466 |
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#128605) * created fb internal implementation in `caffe2/torch/csrc/monitor/fb/instrumentation.cpp` * uses `facebook::data_preproc::WaitCounterUs` under the hood by having `WaitCounterImpl` trivially subclass it. * this makes `WaitCounterHandle` a glorified pointer to `facebook::data_preproc::WaitCounterUs` which is statically defined in the `STATIC_WAIT_COUNTER` macro making these pointers Meyer's singletons. * `facebook::data_preproc::WaitCounterUs` uses 3 singletons: 1. `std::unique_ptr<DynamicCounter::State>` map — leaky singleton 2. `std::weak_ptr<WaitCounterUs::State>` map — leaky singleton 3. publisherSingleton — normal singleton since it manages resources (threads) * `facebook::data_preproc::WaitCounterUs` actually owns shared pointers to the state and its destructor will remove it from the `std::weak_ptr<WaitCounterUs::State>` map when the reference count for the state hits 0. * linked `caffe2/torch/csrc/monitor/fb/instrumentation.cpp` and added `//data_preproc/common:counters` (dpp dependency) to `caffe2/fb/fbcode/target_definitions.bzl` * wrapped OSS null implementation in `#ifndef FBCODE_CAFFE2` so that internally we use the fb internal implementation. as a follow-up I might move the counter implementation out of the data_preproc/counters library to a more common ai infra library? Differential Revision: [D58458751](https://our.internmc.facebook.com/intern/diff/D58458751/) Pull Request resolved: #128605 Approved by: https://github.com/c-p-i-o ghstack dependencies: #128466
Stack from ghstack (oldest at bottom):
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k
Differential Revision: D58441466