[PyTorch] Devirtualize is_contiguous #54896

swolchok · 2021-03-29T18:19:42Z

Stack from ghstack:

[PyTorch] Devirtualize is_contiguous #54896 [PyTorch] Devirtualize is_contiguous

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: D27404164

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

facebook-github-bot · 2021-03-29T18:19:46Z

💊 CI failures summary and remediations

As of commit 785e03c (more details on the Dr. CI page):

5/5 failures possibly* introduced in this PR
- 1/5 non-scanned failure(s)

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 22:59:53 sccache: error: couldn't connect to server

Apr 02 22:59:53 +++ eval 'extract_trap_cmd '
Apr 02 22:59:53 ++++ extract_trap_cmd
Apr 02 22:59:53 ++++ printf '%s\n' ''
Apr 02 22:59:53 +++ printf '%s\n' cleanup
Apr 02 22:59:53 ++ trap -- '
Apr 02 22:59:53 cleanup' EXIT
Apr 02 22:59:53 ++ [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *pytorch-win-* ]]
Apr 02 22:59:53 ++ which sccache
Apr 02 22:59:53 ++ sccache --stop-server
Apr 02 22:59:53 Stopping sccache server...
Apr 02 22:59:53 sccache: error: couldn't connect to server
Apr 02 22:59:53 sccache: caused by: Connection refused (os error 111)
Apr 02 22:59:53 ++ true
Apr 02 22:59:53 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 22:59:53 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 22:59:53 ++ true
Apr 02 22:59:53 ++ [[ pytorch-linux-xenial-py3.6-gcc5.4-build == *rocm* ]]
Apr 02 22:59:53 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 22:59:53 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 22:59:53 ++ RUST_LOG=sccache::server=error
Apr 02 22:59:53 ++ sccache --start-server

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (2/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 23:02:39 sccache: error: couldn't connect to server

Apr 02 23:02:39 +++ eval 'extract_trap_cmd '
Apr 02 23:02:39 ++++ extract_trap_cmd
Apr 02 23:02:39 ++++ printf '%s\n' ''
Apr 02 23:02:39 +++ printf '%s\n' cleanup
Apr 02 23:02:39 ++ trap -- '
Apr 02 23:02:39 cleanup' EXIT
Apr 02 23:02:39 ++ [[ pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build != *pytorch-win-* ]]
Apr 02 23:02:39 ++ which sccache
Apr 02 23:02:39 ++ sccache --stop-server
Apr 02 23:02:39 Stopping sccache server...
Apr 02 23:02:39 sccache: error: couldn't connect to server
Apr 02 23:02:39 sccache: caused by: Connection refused (os error 111)
Apr 02 23:02:39 ++ true
Apr 02 23:02:39 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 23:02:39 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 23:02:39 ++ true
Apr 02 23:02:39 ++ [[ pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build == *rocm* ]]
Apr 02 23:02:39 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 23:02:39 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 23:02:39 ++ RUST_LOG=sccache::server=error
Apr 02 23:02:39 ++ sccache --start-server

docker-pytorch-linux-bionic-rocm3.9-py3.6 (3/4)

Step: "Check if image should be built" (full log | diagnosis details | 🔁 rerun)

ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch

+ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm3.9-py3.6:47793f0cc1b99136bdee7981477c6d0d374f2a5e
unsupported manifest format: &{{{2 application/vnd.docker.distribution.manifest.v2+json} {application/vnd.docker.container.image.v1+json 16788 sha256:f952902cbeccf9f36537e306cbf31a4f2ef135857472a7ea7f0df7804b9e5745 [] map[] <nil>} [{application/vnd.docker.image.rootfs.diff.tar.gzip 26710781 sha256:6e0aa5e7af40303f56126b1469d1f37525b3a55a788836a6c9b773f6ce8bc446 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 850 sha256:d47239a868b3375462d644f2ffb1b20114623fac03109d2950bdf0d57ab487d2 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 189 sha256:49cbb10cca8504e3dbd65eb5db3c1dd0cd27070154386f819c5936de321c14b1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1391 sha256:0eb92ef9ed9532a5217f9c7cd8195da429e812012f15fc973244522ea95bb228 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 252586784 sha256:a362267025814571708434b385d015786bfd528683d861f79e19f7664915200b [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 810 sha256:584d510c54b5b71889c1c2566b6004a32b641d39cb207af45623baa30e681b14 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:9c7eeba429f8bbb0b704fccb309d30bdd203fb0618478fd7351aaf6dca340ae9 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 362 sha256:7cd484f34372b8aac68b564f748aff4b6c7ad971361192d748a334a44d2a2ab7 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1000 sha256:ded20653e9e4738f6da756e26bda2fc7bac9a8f835dc885571ea963774c7d008 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1832 sha256:857dcbca195a47b7b73e24678687e091bfd099e24ac189f9a9d9949ce852e2fb [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1199294242 sha256:f1a124ceb174f95857649a3ca1cffde1b2e3a5db6b3503ff8c8a52a35fbf0b30 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 442 sha256:8654b01721c37eba2032b57813bc00af58655fa12b05774247012c5263d5c185 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 42388900 sha256:36e59dd98038f13c0f2fd585a948ca9fc74e0ade4919cc1984d478c3e7ed8edf [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 915 sha256:80fcb8d4a0e04e4068919f67c67be777eefcb319a08c6eb1d41d9c27f05dc2b9 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 3823988 sha256:df6737ee2b22cbf932a82c839dc9fcd2fee951de42230b20cc3b168764299150 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 109 sha256:fa10645d0b84eb6dbc90cc1a60f8e6ca4ed5f69798350fbe61cc13742268be77 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 874 sha256:e28be2d0b341fbc1ad54ceda252b7082a338ba815d1ad6994a2ec8e8d6ee678d [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 999780 sha256:1dd81b9c7b95c2b90e54402693304a72de78dc26b9293db21e3f3cc5e1b88b72 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 103 sha256:1235013d53071e9f5bc121794775919e12744bcd21c8ccfd9dee2e7e28a44306 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 861 sha256:674781aa8594a131f8fb289ab5c578cc95b7d78b9090dd4ad03830241e8daa0e [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 199989856 sha256:591038f8204271c544eb7ffcf71449eccc7bd75d523f4c86c9efb4a0685d7dfc [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 107 sha256:ce0d2055b399ad3919df038bb46ca8264efa972e6f5e5c8f000357577d81fdbe [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1474 sha256:451a362f7298adcc9d443be5c7c5c3b63b15379c4eb377fc5a228eb94b875d0e [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 2938763554 sha256:1263d18de4beebbb6707b6c80d4996d1cea668eebf1e09ad0bff1433a428ee1a [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 105 sha256:22bf273d1fc6ebf236e8861bfaee43b7a2ee3b48d444645fa900bc9731a3a9bb [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 405 sha256:b792b673dd83c785e3a11f011d1b8e5c93fe73d0c65edae0044e2b6a30af08f0 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 32 sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:7f37fcaf91339c96a7d30768ae7f3e8aa670d84eb7821abf515d16340b7bdfc3 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 294 sha256:9eae915814f4b7b105602176d75cb400b206f9ef74af272ed0d869b04dafb0aa [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 32 sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:5aa90cfbf3c190d55737910f464c3f36eaccf4d6ae9909965c7817084c210bf8 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1616 sha256:d9668da67e2b0469b02bff0f8e52d453097979207d31816f3c8baa9c8f2a0585 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 15983013 sha256:e7de621d16843c0b6fb060009b04db132731319a09551c6fc81794d414977b1b [] map[] <nil>
++ git merge-base HEAD 8a13d17bd27336dc6a334b898473768dd0463072
+ git rev-parse 8a13d17bd27336dc6a334b898473768dd0463072:.circleci/docker
47793f0cc1b99136bdee7981477c6d0d374f2a5e
+++ git merge-base HEAD 8a13d17bd27336dc6a334b898473768dd0463072
++ git rev-parse 8a13d17bd27336dc6a334b898473768dd0463072:.circleci/docker
+ PREVIOUS_DOCKER_TAG=47793f0cc1b99136bdee7981477c6d0d374f2a5e
+ [[ 47793f0cc1b99136bdee7981477c6d0d374f2a5e = \4\7\7\9\3\f\0\c\c\1\b\9\9\1\3\6\b\d\e\e\7\9\8\1\4\7\7\c\6\d\0\d\3\7\4\f\2\a\5\e ]]
+ echo 'ERROR: Something has gone wrong and the previous image isn'\''t available for the merge-base of your branch'
ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch
+ echo '       contact the PyTorch team to restore the original images'
       contact the PyTorch team to restore the original images
+ exit 1


Exited with code exit status 1

pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (4/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 23:04:05 sccache: error: couldn't connect to server

Apr 02 23:04:05 +++ eval 'extract_trap_cmd '
Apr 02 23:04:05 ++++ extract_trap_cmd
Apr 02 23:04:05 ++++ printf '%s\n' ''
Apr 02 23:04:05 +++ printf '%s\n' cleanup
Apr 02 23:04:05 ++ trap -- '
Apr 02 23:04:05 cleanup' EXIT
Apr 02 23:04:05 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
Apr 02 23:04:05 ++ which sccache
Apr 02 23:04:05 ++ sccache --stop-server
Apr 02 23:04:05 Stopping sccache server...
Apr 02 23:04:05 sccache: error: couldn't connect to server
Apr 02 23:04:05 sccache: caused by: Connection refused (os error 111)
Apr 02 23:04:05 ++ true
Apr 02 23:04:05 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 23:04:05 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 23:04:05 ++ true
Apr 02 23:04:05 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
Apr 02 23:04:05 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 23:04:05 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 23:04:05 ++ RUST_LOG=sccache::server=error
Apr 02 23:04:05 ++ sccache --start-server

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! ghstack-source-id: 125154946 Pull Request resolved: #54896

bhosmer

Curious, why not just make is_continuous_customized() a virtual method, turn IsContiguousPolicy into a default/custom bool and s/is_contiguous/is_contiguous_customized/ the subclasses? Would be less churn (e.g. for the FB case mentioned in phab) but maybe slower for the custom cases?

bhosmer · 2021-03-29T21:32:41Z

aten/src/ATen/BatchedTensorImpl.cpp

-bool BatchedTensorImpl::is_contiguous(at::MemoryFormat memory_format) const {
-  TORCH_CHECK(memory_format == MemoryFormat::Contiguous,
-      "NYI: querying is_contiguous inside of vmap for memory_format ",
-      "other than torch.contiguous_format");


In this particular case I think we lose some signal by replacing this error message with the more generic one.

ezyang · 2021-03-30T15:16:47Z

I haven't closely looked at the diff yet, but it looks overcomplicated to compensate for not wanting fix clients, whereas I suspect we should just fix clients.

swolchok · 2021-03-30T15:50:47Z

I suspect we should just fix clients.

How would you recommend fixing them? There's a KP with Metal/Vulkan that would let us save one policy mode, but I don't know how to get rid of the errors for Batched/Opaque/Sparse.

swolchok · 2021-03-30T15:52:51Z

why not just make is_continuous_customized() a virtual method,

Good idea. It somehow feels a little clunkier to have to override a virtual method and set a flag to make sure that that method is actually called; do you think preserving the BatchedTensorImpl error text is worth it?

This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

bhosmer · 2021-03-30T16:17:45Z

why not just make is_continuous_customized() a virtual method,

Good idea. It somehow feels a little clunkier to have to override a virtual method and set a flag to make sure that that method is actually called; do you think preserving the BatchedTensorImpl error text is worth it?

I'm not sure there's anything special about the BatchedTensorImpl error text, you could pull that into the current enum too, if you didn't want the double hop. That would mean that the enum betrays its origins as a cross section of subclass specific behaviors, but I mean, that's what it actually is 😁 so I'm not sure putting a "policy" veneer on it is the right direction to go in anyway (maybe this is what @ezyang was reacting to too)

ezyang · 2021-03-30T17:04:45Z

Here is why I think client fix is feasible.

Ideally, the implementation is this:

  bool is_contiguous(at::MemoryFormat memory_format=at::MemoryFormat::Contiguous) const {
    if (memory_format == at::MemoryFormat::ChannelsLast) {
      return is_channels_last_contiguous_;
    } else if (memory_format == at::MemoryFormat::ChannelsLast3d) {
      return is_channels_last_3d_contiguous_;
    }
    return is_contiguous_;
  }

No matter the subclass, it is possible to make it observationally equivalent to whatever you had before simply by setting the three contiguous_ fields appropriately.

Sparse is easy, because contiguity doesn't make sense for it as a concept. For batched, @zou3519 has thought about this before, at #47365 and #47621, we think there's a right setting for the booleans and we just were lazy and didn't think hard enough about how to set it up. For opaque, that's on the client for opaque to populate these correctly.

swolchok · 2021-03-30T17:08:50Z

it is possible to make it observationally equivalent to whatever you had before simply by setting the three contiguous_ fields appropriately.

There is no guarantee that the three contiguous_ fields will stay set. An API that calls TensorImpl::refresh_contiguous() (like, say, TensorImpl::set_sizes_and_strides or TensorImpl::empty_tensor_restride, both of which are non-virtual) could get called at any time and mess things up.

bhosmer · 2021-03-30T17:17:28Z

observationally equivalent

What about the cases that currently throw? Also IIRC in previous discussions you weren't a fan of tensors with no concept of contiguity returning false for is_contiguous, but maybe this takes precedence (or I'm misremembering).

This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125293142 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

ezyang · 2021-03-30T22:54:13Z

There is no guarantee that the three contiguous_ fields will stay set. An API that calls TensorImpl::refresh_contiguous() (like, say, TensorImpl::set_sizes_and_strides or TensorImpl::empty_tensor_restride, both of which are non-virtual) could get called at any time and mess things up.

All I'm saying is that one might very reasonable impose the obligation on the subclass that they are responsible for maintaining whatever invariants the parent class expects when they change sizes. If you have no concept of strides, you probably shouldn't call those methods anyway; if you do have strides, calling those functions is probably going to make it easy for you to preserve invariants.

What about the cases that currently throw? Also IIRC in previous discussions you weren't a fan of tensors with no concept of contiguity returning false for is_contiguous, but maybe this takes precedence (or I'm misremembering).

Well, I'm OK with having a flag to raise an error, similar to what we do today with storage access. If someone came to me and said, "Edward, look at all this performance we're leaving on the floor because of this flag test", I'd be willing to be convinced that we need a way to do this access in a branchless way (overriding the general UX preference of erroring when you do something that doesn't make sense).

One thing to note, though, is that this PR has been updated from a "policy" thing to a "virtual fallback" thing. I guess I'm OK with the virtual fallback; there are certainly cases where it makes sense. I just think it kind of encourages bad behavior on backends where they can do all sorts of random (wrong) behavior and then we have to clean it up afterwards... case in point here.

bhosmer · 2021-03-31T01:11:53Z

Agree, virtualizing it at all (first class or fallback) leaves us open to nonsense semantics. I think a has_contiguity_ gate that throws "this tensor type does not have contiguity" when false gives us a legit data model for all tensors, and takes care of Opaque and Sparse right away.

For Vulkan/Metal and Batched we could jump to the goal state or leave the loophole for now and hope we catch anybody new perps in review. For the loophole we could make has_contiguity_ a ternary enum with the third value diverting to the virtual fallback, that would leave perf pretty much pay as you go I think.

For the goal state we could

just remove the Vulkan and Metal overrides and leave it at that, i.e. move them to the default behavior. AFAICT the proper solution for these would be to error when you try to set strides to something noncontiguous, but I don't think these do that currently. So having is_contiguous() tell the truth seems strictly better, though ... BC breaking?
move Batched to the default behavior, maybe with a preliminary PR to slipstream in correct bool setting if that setting is obvious

swolchok · 2021-04-01T04:28:03Z

This PR is accepted, but given the discussion I'm unsure if I'm supposed to land it. @bhosmer / @ezyang can you clarify?

bhosmer · 2021-04-01T05:55:44Z

This PR is accepted, but given the discussion I'm unsure if I'm supposed to land it. @bhosmer / @ezyang can you clarify?

Yeah sorry for the mixed signals, it's my bad for stretching the "approve with suggestions" idiom past the breaking point, I'll change status to changes requested for clarity.

My preference would be to not land this as-is but modify it to use a virtual is_contiguous_custom() instead of the "policy" switch. A ternary has_contiguity_ flag seems to me like a decent way to confine the perf hit to the cases we want to get rid of anyway, but I'm sure there's other ways that would be fine. (per "loophole" above - true = default behavior, false = throw, custom = is_contiguous_custom() with a // TODO remove)

AFAICT @ezyang is also saying he's ok with a virtual fallback but not with the policy switch, but I could be wrong.

bhosmer

Per thread

swolchok · 2021-04-01T17:16:06Z

AFAICT, here's a breakdown of TensorImpl types and the behavior they need:

TensorImpl -- default, obviously
Sparse -- does not have contiguity, throw
Opaque -- does not have contiguity, throw
Batched -- has custom contiguity behavior, virtual fallback
DelayedTensorImpl -- custom behavior, virtual fallback (contiguous always iff MemoryFormat::Contiguous)
Vulkan/Metal -- same as Delayed after @taox gets around to fixing the current behavior, virtual fallback

IIUC, we want to support "does not have contiguity, throw" going forward, and we begrudgingly support the virtual fallback.

I will send an update soon, but if this description is wrong, let's talk about it at this level rather than code comments.

swolchok · 2021-04-01T17:42:53Z

By the way, is_contiguous itself still needs to be TENSORIMPL_MAYBE_VIRTUAL to support backward compatibility, right?

bhosmer · 2021-04-01T18:44:58Z

AFAICT, here's a breakdown of TensorImpl types and the behavior they need:
...
IIUC, we want to support "does not have contiguity, throw" going forward, and we begrudgingly support the virtual fallback.

I will send an update soon, but if this description is wrong, let's talk about it at this level rather than code comments.

This description matches my understanding, yeah.

Re TENSORIMPL_MAYBE_VIRTUAL, it would sure be nice to avoid, both for general-case perf and crazy semantics loophole reasons. But I don't know how what the contract is for TensorImpl subclassing - is the specific current set of virtual TensorImpl methods considered public API? cc @ezyang @gchanan

…rtualize is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125540154 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

bhosmer

This looks great to me. I still don't know the definitive answer about TENSORIMPL_MAYBE_VIRTUAL but in the absence of signal, this is obv the right way to land it.

bhosmer · 2021-04-02T01:38:16Z

Oh hey, sorry for the late-breaking observation, but: s̶i̶n̶c̶e̶ ̶̶h̶a̶s̶_̶c̶o̶n̶t̶i̶g̶u̶i̶t̶y̶_̶̶ ̶i̶s̶ ̶p̶e̶r̶-̶c̶l̶a̶s̶s̶ ̶r̶a̶t̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶p̶e̶r̶-̶i̶n̶s̶t̶a̶n̶c̶e̶,̶ ̶i̶s̶ ̶i̶t̶ ̶w̶o̶r̶t̶h̶ ̶t̶e̶m̶p̶l̶a̶t̶i̶z̶i̶n̶g̶?̶ ̶O̶r̶ ̶a̶l̶t̶e̶r̶n̶a̶t̶i̶v̶e̶l̶y̶,̶ [edit: nvm] would it be worth making it const and setting it at construction time only rather than having a setter?

Feel free to disregard if your sense of the perf (and safety I guess, but mostly perf) ROI doesn't motivate either of these, just want to make sure they've been floated.

swolchok · 2021-04-02T16:20:36Z

would it be worth making it const and setting it at construction time only rather than having a setter?

It needs to be copied in the various metadata copy methods, so it can't be const (and I think I've forgotten to do that copying, so update coming)

…tualize is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

…e is_contiguous" This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125623747 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)! [ghstack-poisoned]

Pull Request resolved: #54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125659451 Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

facebook-github-bot · 2021-04-05T20:19:27Z

This pull request has been merged in 62aa924.

facebook-github-bot · 2021-04-05T20:43:28Z

This pull request has been reverted by e61f5b5.

mruberry · 2021-04-05T20:48:35Z

Relevant snippet for build failures:

Apr 02 23:06:09 In file included from /var/lib/jenkins/workspace/c10/core/TensorImpl.cpp:1:0:
Apr 02 23:06:09 /var/lib/jenkins/workspace/c10/core/TensorImpl.h:1851:41: error: 'c10::TensorImpl::has_contiguity_' is too small to hold all values of 'enum class c10::TensorImpl::HasContiguityPolicy' [-Werror]
Apr 02 23:06:09    HasContiguityPolicy has_contiguity_ : 2;
Apr 02 23:06:09                                          ^
Apr 02 23:06:09 cc1plus: all warnings being treated as errors

facebook-github-bot added the cla signed label Mar 29, 2021

swolchok requested review from bhosmer and ezyang March 29, 2021 18:20

bhosmer approved these changes Mar 29, 2021

View reviewed changes

swolchok mentioned this pull request Mar 30, 2021

[PyTorch] Move Tensor::has_names inline #54965

Closed

swolchok mentioned this pull request Mar 30, 2021

[PyTorch] OperandInfo ctor should take rvalue reference #54972

Closed

bhosmer suggested changes Apr 1, 2021

View reviewed changes

swolchok requested a review from bhosmer April 1, 2021 21:02

bhosmer approved these changes Apr 2, 2021

View reviewed changes

facebook-github-bot closed this in 62aa924 Apr 5, 2021

facebook-github-bot added the Merged label Apr 5, 2021

facebook-github-bot added the Reverted label Apr 5, 2021

swolchok mentioned this pull request Apr 5, 2021

CI not surfacing some failures (-Werror?) on PR #55330

Open

facebook-github-bot deleted the gh/swolchok/174/head branch April 9, 2021 14:16

[PyTorch] Devirtualize is_contiguous #54896

[PyTorch] Devirtualize is_contiguous #54896

Uh oh!

Conversation

swolchok commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (2/4)

docker-pytorch-linux-bionic-rocm3.9-py3.6 (3/4)

pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (4/4)

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

bhosmer Mar 29, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang commented Mar 30, 2021

Uh oh!

swolchok commented Mar 30, 2021

Uh oh!

swolchok commented Mar 30, 2021

Uh oh!

bhosmer commented Mar 30, 2021

Uh oh!

ezyang commented Mar 30, 2021

Uh oh!

swolchok commented Mar 30, 2021

Uh oh!

bhosmer commented Mar 30, 2021

Uh oh!

ezyang commented Mar 30, 2021

Uh oh!

bhosmer commented Mar 31, 2021

Uh oh!

swolchok commented Apr 1, 2021

Uh oh!

bhosmer commented Apr 1, 2021

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

swolchok commented Apr 1, 2021

Uh oh!

swolchok commented Apr 1, 2021

Uh oh!

bhosmer commented Apr 1, 2021

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

bhosmer commented Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swolchok commented Apr 2, 2021

Uh oh!

facebook-github-bot commented Apr 5, 2021

Uh oh!

facebook-github-bot commented Apr 5, 2021

Uh oh!

mruberry commented Apr 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

swolchok commented Mar 29, 2021 •

edited

Loading

facebook-github-bot commented Mar 29, 2021 •

edited

Loading

bhosmer commented Apr 2, 2021 •

edited

Loading