[Profiler] Memory profiler part 3: Schema parsing and mutable arguments #86854

robieta · 2022-10-13T00:21:53Z

Stack from ghstack (oldest at bottom):

The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc.

We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision.

Differential Revision: D40220390

The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) [ghstack-poisoned]

pytorch-bot · 2022-10-13T00:21:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86854

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit b79f376:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) ghstack-source-id: 170238215 Pull Request resolved: #86854

…ble arguments" The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) [ghstack-poisoned]

Pull Request resolved: #86854 The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. ghstack-source-id: 170243810 Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/)

…ble arguments" The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) [ghstack-poisoned]

robieta · 2022-11-15T17:39:55Z

@pytorchbot merge -g

pytorchmergebot · 2022-11-15T17:41:40Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ts (pytorch#86854) The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) Pull Request resolved: pytorch#86854 Approved by: https://github.com/chaekit

robieta mentioned this pull request Oct 13, 2022

[Profiler] Memory profiler part 4: Select top level torch ops #86880

Closed

robieta mentioned this pull request Oct 13, 2022

[Profiler] Regularize AccumulateGrad name #86909

Closed

robieta mentioned this pull request Oct 15, 2022

[Profiler] Memory profiler part 5: Data flow graph #87006

Closed

robieta mentioned this pull request Oct 17, 2022

[Profiler] Handle ABA for TensorImpl* when assigning IDs #87133

Closed

Taylor Robie added 2 commits October 17, 2022 14:49

robieta mentioned this pull request Oct 19, 2022

[Profiler] Hold weak reference to prevent TensorImpl address reuse during profiling. #87244

Closed

robieta added the release notes: profiler release notes category label Oct 19, 2022

Taylor Robie added 2 commits October 21, 2022 11:42

This was referenced Oct 23, 2022

[Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. #87566

Closed

[Profiler] Memory profiler part 7: Mark inputs #87567

Closed

[Profiler] Memory profiler part 8: Mark parameters. #87568

Closed

This was referenced Oct 25, 2022

[Profiler][Trivial] Add hashing struct for pairs and tuples. #87668

Closed

[Profiler][Trivial] Move ID assignment code to data_flow.cpp #87670

Closed

robieta added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 26, 2022

robieta mentioned this pull request Oct 26, 2022

[Profiler] Restructure inputs and capture TensorLists. #87825

Closed

Taylor Robie added 5 commits October 26, 2022 16:57

robieta mentioned this pull request Nov 8, 2022

[Profiler] E2E expecttests for category assignment #88653

Closed

robieta requested review from slgong-fb, chaekit and aaronenyeshi November 8, 2022 06:13

robieta mentioned this pull request Nov 11, 2022

[Profiler] Account for caching when assigning IDs #88917

Closed

chaekit approved these changes Nov 15, 2022

View reviewed changes

robieta added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Nov 15, 2022

pytorchmergebot added the Merged label Nov 15, 2022

pytorchmergebot closed this in 8023c9d Nov 15, 2022

facebook-github-bot deleted the gh/robieta/136/head branch June 8, 2023 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Profiler] Memory profiler part 3: Schema parsing and mutable arguments #86854

[Profiler] Memory profiler part 3: Schema parsing and mutable arguments #86854

robieta commented Oct 13, 2022 •

edited

pytorch-bot bot commented Oct 13, 2022 •

edited

robieta commented Nov 15, 2022

pytorchmergebot commented Nov 15, 2022

[Profiler] Memory profiler part 3: Schema parsing and mutable arguments #86854

[Profiler] Memory profiler part 3: Schema parsing and mutable arguments #86854

Conversation

robieta commented Oct 13, 2022 • edited

pytorch-bot bot commented Oct 13, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86854

⏳ No Failures, 2 Pending

robieta commented Nov 15, 2022

pytorchmergebot commented Nov 15, 2022

Merge started

robieta commented Oct 13, 2022 •

edited

pytorch-bot bot commented Oct 13, 2022 •

edited