-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[SR] Add a copy variant for fused_split_and_squeeze #75660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)! [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 37a2421 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)! ghstack-source-id: 153648923 Pull Request resolved: #75660
The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)! [ghstack-poisoned]
Pull Request resolved: #75660 The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. ghstack-source-id: 153768800 Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
Hey @mikeiovine. |
Summary: Pull Request resolved: #75660 The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides. Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path. ghstack-source-id: 153768800 Test Plan: Existing unit tests cover the new op. Reviewed By: navahgar, hlu1 Differential Revision: D35513777 fbshipit-source-id: fb081eb41689ab4e8ef5102fb4906cffd8a95b1a
Stack from ghstack (oldest at bottom):
The outputs of
split_and_squeeze
are passed toVarStack
in models we care about.VarStack
has a fast path for when all of its inputs have the same strides.Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in
split_and_squeeze
and force all ofVarStack
's inputs to be contiguous so we can take advantage of the fast path.Differential Revision: D35513777
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!