-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[PyTorch] Devirtualize is_contiguous #54896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 785e03c (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! ghstack-source-id: 125154946 Pull Request resolved: #54896
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not just make is_continuous_customized()
a virtual method, turn IsContiguousPolicy
into a default/custom bool and s/is_contiguous/is_contiguous_customized/ the subclasses? Would be less churn (e.g. for the FB case mentioned in phab) but maybe slower for the custom cases?
bool BatchedTensorImpl::is_contiguous(at::MemoryFormat memory_format) const { | ||
TORCH_CHECK(memory_format == MemoryFormat::Contiguous, | ||
"NYI: querying is_contiguous inside of vmap for memory_format ", | ||
"other than torch.contiguous_format"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case I think we lose some signal by replacing this error message with the more generic one.
I haven't closely looked at the diff yet, but it looks overcomplicated to compensate for not wanting fix clients, whereas I suspect we should just fix clients. |
How would you recommend fixing them? There's a KP with Metal/Vulkan that would let us save one policy mode, but I don't know how to get rid of the errors for Batched/Opaque/Sparse. |
Good idea. It somehow feels a little clunkier to have to override a virtual method and set a flag to make sure that that method is actually called; do you think preserving the BatchedTensorImpl error text is worth it? |
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
I'm not sure there's anything special about the BatchedTensorImpl error text, you could pull that into the current enum too, if you didn't want the double hop. That would mean that the enum betrays its origins as a cross section of subclass specific behaviors, but I mean, that's what it actually is 😁 so I'm not sure putting a "policy" veneer on it is the right direction to go in anyway (maybe this is what @ezyang was reacting to too) |
Here is why I think client fix is feasible. Ideally, the implementation is this:
No matter the subclass, it is possible to make it observationally equivalent to whatever you had before simply by setting the three Sparse is easy, because contiguity doesn't make sense for it as a concept. For batched, @zou3519 has thought about this before, at #47365 and #47621, we think there's a right setting for the booleans and we just were lazy and didn't think hard enough about how to set it up. For opaque, that's on the client for opaque to populate these correctly. |
There is no guarantee that the three contiguous_ fields will stay set. An API that calls |
What about the cases that currently throw? Also IIRC in previous discussions you weren't a fan of tensors with no concept of contiguity returning false for is_contiguous, but maybe this takes precedence (or I'm misremembering). |
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125293142 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
All I'm saying is that one might very reasonable impose the obligation on the subclass that they are responsible for maintaining whatever invariants the parent class expects when they change sizes. If you have no concept of strides, you probably shouldn't call those methods anyway; if you do have strides, calling those functions is probably going to make it easy for you to preserve invariants.
Well, I'm OK with having a flag to raise an error, similar to what we do today with storage access. If someone came to me and said, "Edward, look at all this performance we're leaving on the floor because of this flag test", I'd be willing to be convinced that we need a way to do this access in a branchless way (overriding the general UX preference of erroring when you do something that doesn't make sense). One thing to note, though, is that this PR has been updated from a "policy" thing to a "virtual fallback" thing. I guess I'm OK with the virtual fallback; there are certainly cases where it makes sense. I just think it kind of encourages bad behavior on backends where they can do all sorts of random (wrong) behavior and then we have to clean it up afterwards... case in point here. |
Agree, virtualizing it at all (first class or fallback) leaves us open to nonsense semantics. I think a For Vulkan/Metal and Batched we could jump to the goal state or leave the loophole for now and hope we catch anybody new perps in review. For the loophole we could make For the goal state we could
|
Yeah sorry for the mixed signals, it's my bad for stretching the "approve with suggestions" idiom past the breaking point, I'll change status to changes requested for clarity. My preference would be to not land this as-is but modify it to use a virtual is_contiguous_custom() instead of the "policy" switch. A ternary AFAICT @ezyang is also saying he's ok with a virtual fallback but not with the policy switch, but I could be wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per thread
AFAICT, here's a breakdown of TensorImpl types and the behavior they need:
IIUC, we want to support "does not have contiguity, throw" going forward, and we begrudgingly support the virtual fallback. I will send an update soon, but if this description is wrong, let's talk about it at this level rather than code comments. |
By the way, is_contiguous itself still needs to be TENSORIMPL_MAYBE_VIRTUAL to support backward compatibility, right? |
This description matches my understanding, yeah. Re TENSORIMPL_MAYBE_VIRTUAL, it would sure be nice to avoid, both for general-case perf and crazy semantics loophole reasons. But I don't know how what the contract is for TensorImpl subclassing - is the specific current set of virtual TensorImpl methods considered public API? cc @ezyang @gchanan |
…rtualize is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125540154 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. I still don't know the definitive answer about TENSORIMPL_MAYBE_VIRTUAL
but in the absence of signal, this is obv the right way to land it.
Oh hey, sorry for the late-breaking observation, but: s̶i̶n̶c̶e̶ ̶ Feel free to disregard if your sense of the perf (and safety I guess, but mostly perf) ROI doesn't motivate either of these, just want to make sure they've been floated. |
It needs to be copied in the various metadata copy methods, so it can't be const (and I think I've forgotten to do that copying, so update coming) |
…tualize is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
…e is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125623747 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]
Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125659451 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
This pull request has been merged in 62aa924. |
This pull request has been reverted by e61f5b5. |
Relevant snippet for build failures:
|
Stack from ghstack:
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)
Differential Revision: D27404164
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!