Skip to content

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Mar 29, 2021

Stack from ghstack:

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: D27404164

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 29, 2021

💊 CI failures summary and remediations

As of commit 785e03c (more details on the Dr. CI page):


  • 5/5 failures possibly* introduced in this PR
    • 1/5 non-scanned failure(s)

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 22:59:53 sccache: error: couldn't connect to server
Apr 02 22:59:53 +++ eval 'extract_trap_cmd '
Apr 02 22:59:53 ++++ extract_trap_cmd
Apr 02 22:59:53 ++++ printf '%s\n' ''
Apr 02 22:59:53 +++ printf '%s\n' cleanup
Apr 02 22:59:53 ++ trap -- '
Apr 02 22:59:53 cleanup' EXIT
Apr 02 22:59:53 ++ [[ pytorch-linux-xenial-py3.6-gcc5.4-build != *pytorch-win-* ]]
Apr 02 22:59:53 ++ which sccache
Apr 02 22:59:53 ++ sccache --stop-server
Apr 02 22:59:53 Stopping sccache server...
Apr 02 22:59:53 sccache: error: couldn't connect to server
Apr 02 22:59:53 sccache: caused by: Connection refused (os error 111)
Apr 02 22:59:53 ++ true
Apr 02 22:59:53 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 22:59:53 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 22:59:53 ++ true
Apr 02 22:59:53 ++ [[ pytorch-linux-xenial-py3.6-gcc5.4-build == *rocm* ]]
Apr 02 22:59:53 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 22:59:53 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 22:59:53 ++ RUST_LOG=sccache::server=error
Apr 02 22:59:53 ++ sccache --start-server

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (2/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 23:02:39 sccache: error: couldn't connect to server
Apr 02 23:02:39 +++ eval 'extract_trap_cmd '
Apr 02 23:02:39 ++++ extract_trap_cmd
Apr 02 23:02:39 ++++ printf '%s\n' ''
Apr 02 23:02:39 +++ printf '%s\n' cleanup
Apr 02 23:02:39 ++ trap -- '
Apr 02 23:02:39 cleanup' EXIT
Apr 02 23:02:39 ++ [[ pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build != *pytorch-win-* ]]
Apr 02 23:02:39 ++ which sccache
Apr 02 23:02:39 ++ sccache --stop-server
Apr 02 23:02:39 Stopping sccache server...
Apr 02 23:02:39 sccache: error: couldn't connect to server
Apr 02 23:02:39 sccache: caused by: Connection refused (os error 111)
Apr 02 23:02:39 ++ true
Apr 02 23:02:39 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 23:02:39 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 23:02:39 ++ true
Apr 02 23:02:39 ++ [[ pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build == *rocm* ]]
Apr 02 23:02:39 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 23:02:39 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 23:02:39 ++ RUST_LOG=sccache::server=error
Apr 02 23:02:39 ++ sccache --start-server

See CircleCI build docker-pytorch-linux-bionic-rocm3.9-py3.6 (3/4)

Step: "Check if image should be built" (full log | diagnosis details | 🔁 rerun)

ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch
+ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm3.9-py3.6:47793f0cc1b99136bdee7981477c6d0d374f2a5e
unsupported manifest format: &{{{2 application/vnd.docker.distribution.manifest.v2+json} {application/vnd.docker.container.image.v1+json 16788 sha256:f952902cbeccf9f36537e306cbf31a4f2ef135857472a7ea7f0df7804b9e5745 [] map[] <nil>} [{application/vnd.docker.image.rootfs.diff.tar.gzip 26710781 sha256:6e0aa5e7af40303f56126b1469d1f37525b3a55a788836a6c9b773f6ce8bc446 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 850 sha256:d47239a868b3375462d644f2ffb1b20114623fac03109d2950bdf0d57ab487d2 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 189 sha256:49cbb10cca8504e3dbd65eb5db3c1dd0cd27070154386f819c5936de321c14b1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1391 sha256:0eb92ef9ed9532a5217f9c7cd8195da429e812012f15fc973244522ea95bb228 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 252586784 sha256:a362267025814571708434b385d015786bfd528683d861f79e19f7664915200b [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 810 sha256:584d510c54b5b71889c1c2566b6004a32b641d39cb207af45623baa30e681b14 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:9c7eeba429f8bbb0b704fccb309d30bdd203fb0618478fd7351aaf6dca340ae9 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 362 sha256:7cd484f34372b8aac68b564f748aff4b6c7ad971361192d748a334a44d2a2ab7 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1000 sha256:ded20653e9e4738f6da756e26bda2fc7bac9a8f835dc885571ea963774c7d008 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1832 sha256:857dcbca195a47b7b73e24678687e091bfd099e24ac189f9a9d9949ce852e2fb [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1199294242 sha256:f1a124ceb174f95857649a3ca1cffde1b2e3a5db6b3503ff8c8a52a35fbf0b30 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 442 sha256:8654b01721c37eba2032b57813bc00af58655fa12b05774247012c5263d5c185 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 42388900 sha256:36e59dd98038f13c0f2fd585a948ca9fc74e0ade4919cc1984d478c3e7ed8edf [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 915 sha256:80fcb8d4a0e04e4068919f67c67be777eefcb319a08c6eb1d41d9c27f05dc2b9 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 3823988 sha256:df6737ee2b22cbf932a82c839dc9fcd2fee951de42230b20cc3b168764299150 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 109 sha256:fa10645d0b84eb6dbc90cc1a60f8e6ca4ed5f69798350fbe61cc13742268be77 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 874 sha256:e28be2d0b341fbc1ad54ceda252b7082a338ba815d1ad6994a2ec8e8d6ee678d [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 999780 sha256:1dd81b9c7b95c2b90e54402693304a72de78dc26b9293db21e3f3cc5e1b88b72 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 103 sha256:1235013d53071e9f5bc121794775919e12744bcd21c8ccfd9dee2e7e28a44306 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 861 sha256:674781aa8594a131f8fb289ab5c578cc95b7d78b9090dd4ad03830241e8daa0e [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 199989856 sha256:591038f8204271c544eb7ffcf71449eccc7bd75d523f4c86c9efb4a0685d7dfc [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 107 sha256:ce0d2055b399ad3919df038bb46ca8264efa972e6f5e5c8f000357577d81fdbe [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1474 sha256:451a362f7298adcc9d443be5c7c5c3b63b15379c4eb377fc5a228eb94b875d0e [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 2938763554 sha256:1263d18de4beebbb6707b6c80d4996d1cea668eebf1e09ad0bff1433a428ee1a [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 105 sha256:22bf273d1fc6ebf236e8861bfaee43b7a2ee3b48d444645fa900bc9731a3a9bb [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 405 sha256:b792b673dd83c785e3a11f011d1b8e5c93fe73d0c65edae0044e2b6a30af08f0 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 32 sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:7f37fcaf91339c96a7d30768ae7f3e8aa670d84eb7821abf515d16340b7bdfc3 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 294 sha256:9eae915814f4b7b105602176d75cb400b206f9ef74af272ed0d869b04dafb0aa [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 32 sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 106 sha256:5aa90cfbf3c190d55737910f464c3f36eaccf4d6ae9909965c7817084c210bf8 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 1616 sha256:d9668da67e2b0469b02bff0f8e52d453097979207d31816f3c8baa9c8f2a0585 [] map[] <nil>} {application/vnd.docker.image.rootfs.diff.tar.gzip 15983013 sha256:e7de621d16843c0b6fb060009b04db132731319a09551c6fc81794d414977b1b [] map[] <nil>
++ git merge-base HEAD 8a13d17bd27336dc6a334b898473768dd0463072
+ git rev-parse 8a13d17bd27336dc6a334b898473768dd0463072:.circleci/docker
47793f0cc1b99136bdee7981477c6d0d374f2a5e
+++ git merge-base HEAD 8a13d17bd27336dc6a334b898473768dd0463072
++ git rev-parse 8a13d17bd27336dc6a334b898473768dd0463072:.circleci/docker
+ PREVIOUS_DOCKER_TAG=47793f0cc1b99136bdee7981477c6d0d374f2a5e
+ [[ 47793f0cc1b99136bdee7981477c6d0d374f2a5e = \4\7\7\9\3\f\0\c\c\1\b\9\9\1\3\6\b\d\e\e\7\9\8\1\4\7\7\c\6\d\0\d\3\7\4\f\2\a\5\e ]]
+ echo 'ERROR: Something has gone wrong and the previous image isn'\''t available for the merge-base of your branch'
ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch
+ echo '       contact the PyTorch team to restore the original images'
       contact the PyTorch team to restore the original images
+ exit 1


Exited with code exit status 1

See CircleCI build pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (4/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Apr 02 23:04:05 sccache: error: couldn't connect to server
Apr 02 23:04:05 +++ eval 'extract_trap_cmd '
Apr 02 23:04:05 ++++ extract_trap_cmd
Apr 02 23:04:05 ++++ printf '%s\n' ''
Apr 02 23:04:05 +++ printf '%s\n' cleanup
Apr 02 23:04:05 ++ trap -- '
Apr 02 23:04:05 cleanup' EXIT
Apr 02 23:04:05 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
Apr 02 23:04:05 ++ which sccache
Apr 02 23:04:05 ++ sccache --stop-server
Apr 02 23:04:05 Stopping sccache server...
Apr 02 23:04:05 sccache: error: couldn't connect to server
Apr 02 23:04:05 sccache: caused by: Connection refused (os error 111)
Apr 02 23:04:05 ++ true
Apr 02 23:04:05 ++ rm /var/lib/jenkins/sccache_error.log
Apr 02 23:04:05 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Apr 02 23:04:05 ++ true
Apr 02 23:04:05 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
Apr 02 23:04:05 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Apr 02 23:04:05 ++ SCCACHE_IDLE_TIMEOUT=1200
Apr 02 23:04:05 ++ RUST_LOG=sccache::server=error
Apr 02 23:04:05 ++ sccache --start-server

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

swolchok added a commit that referenced this pull request Mar 29, 2021
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

ghstack-source-id: 125154946
Pull Request resolved: #54896
@swolchok swolchok requested review from bhosmer and ezyang March 29, 2021 18:20
Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not just make is_continuous_customized() a virtual method, turn IsContiguousPolicy into a default/custom bool and s/is_contiguous/is_contiguous_customized/ the subclasses? Would be less churn (e.g. for the FB case mentioned in phab) but maybe slower for the custom cases?

bool BatchedTensorImpl::is_contiguous(at::MemoryFormat memory_format) const {
TORCH_CHECK(memory_format == MemoryFormat::Contiguous,
"NYI: querying is_contiguous inside of vmap for memory_format ",
"other than torch.contiguous_format");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case I think we lose some signal by replacing this error message with the more generic one.

@ezyang
Copy link
Contributor

ezyang commented Mar 30, 2021

I haven't closely looked at the diff yet, but it looks overcomplicated to compensate for not wanting fix clients, whereas I suspect we should just fix clients.

@swolchok
Copy link
Contributor Author

I suspect we should just fix clients.

How would you recommend fixing them? There's a KP with Metal/Vulkan that would let us save one policy mode, but I don't know how to get rid of the errors for Batched/Opaque/Sparse.

@swolchok
Copy link
Contributor Author

why not just make is_continuous_customized() a virtual method,

Good idea. It somehow feels a little clunkier to have to override a virtual method and set a flag to make sure that that method is actually called; do you think preserving the BatchedTensorImpl error text is worth it?

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
@bhosmer
Copy link

bhosmer commented Mar 30, 2021

why not just make is_continuous_customized() a virtual method,

Good idea. It somehow feels a little clunkier to have to override a virtual method and set a flag to make sure that that method is actually called; do you think preserving the BatchedTensorImpl error text is worth it?

I'm not sure there's anything special about the BatchedTensorImpl error text, you could pull that into the current enum too, if you didn't want the double hop. That would mean that the enum betrays its origins as a cross section of subclass specific behaviors, but I mean, that's what it actually is 😁 so I'm not sure putting a "policy" veneer on it is the right direction to go in anyway (maybe this is what @ezyang was reacting to too)

@ezyang
Copy link
Contributor

ezyang commented Mar 30, 2021

Here is why I think client fix is feasible.

Ideally, the implementation is this:

  bool is_contiguous(at::MemoryFormat memory_format=at::MemoryFormat::Contiguous) const {
    if (memory_format == at::MemoryFormat::ChannelsLast) {
      return is_channels_last_contiguous_;
    } else if (memory_format == at::MemoryFormat::ChannelsLast3d) {
      return is_channels_last_3d_contiguous_;
    }
    return is_contiguous_;
  }

No matter the subclass, it is possible to make it observationally equivalent to whatever you had before simply by setting the three contiguous_ fields appropriately.

Sparse is easy, because contiguity doesn't make sense for it as a concept. For batched, @zou3519 has thought about this before, at #47365 and #47621, we think there's a right setting for the booleans and we just were lazy and didn't think hard enough about how to set it up. For opaque, that's on the client for opaque to populate these correctly.

@swolchok
Copy link
Contributor Author

it is possible to make it observationally equivalent to whatever you had before simply by setting the three contiguous_ fields appropriately.

There is no guarantee that the three contiguous_ fields will stay set. An API that calls TensorImpl::refresh_contiguous() (like, say, TensorImpl::set_sizes_and_strides or TensorImpl::empty_tensor_restride, both of which are non-virtual) could get called at any time and mess things up.

@bhosmer
Copy link

bhosmer commented Mar 30, 2021

observationally equivalent

What about the cases that currently throw? Also IIRC in previous discussions you weren't a fan of tensors with no concept of contiguity returning false for is_contiguous, but maybe this takes precedence (or I'm misremembering).

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Mar 30, 2021
Pull Request resolved: #54896

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)
ghstack-source-id: 125293142

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
@ezyang
Copy link
Contributor

ezyang commented Mar 30, 2021

There is no guarantee that the three contiguous_ fields will stay set. An API that calls TensorImpl::refresh_contiguous() (like, say, TensorImpl::set_sizes_and_strides or TensorImpl::empty_tensor_restride, both of which are non-virtual) could get called at any time and mess things up.

All I'm saying is that one might very reasonable impose the obligation on the subclass that they are responsible for maintaining whatever invariants the parent class expects when they change sizes. If you have no concept of strides, you probably shouldn't call those methods anyway; if you do have strides, calling those functions is probably going to make it easy for you to preserve invariants.

What about the cases that currently throw? Also IIRC in previous discussions you weren't a fan of tensors with no concept of contiguity returning false for is_contiguous, but maybe this takes precedence (or I'm misremembering).

Well, I'm OK with having a flag to raise an error, similar to what we do today with storage access. If someone came to me and said, "Edward, look at all this performance we're leaving on the floor because of this flag test", I'd be willing to be convinced that we need a way to do this access in a branchless way (overriding the general UX preference of erroring when you do something that doesn't make sense).

One thing to note, though, is that this PR has been updated from a "policy" thing to a "virtual fallback" thing. I guess I'm OK with the virtual fallback; there are certainly cases where it makes sense. I just think it kind of encourages bad behavior on backends where they can do all sorts of random (wrong) behavior and then we have to clean it up afterwards... case in point here.

@bhosmer
Copy link

bhosmer commented Mar 31, 2021

Agree, virtualizing it at all (first class or fallback) leaves us open to nonsense semantics. I think a has_contiguity_ gate that throws "this tensor type does not have contiguity" when false gives us a legit data model for all tensors, and takes care of Opaque and Sparse right away.

For Vulkan/Metal and Batched we could jump to the goal state or leave the loophole for now and hope we catch anybody new perps in review. For the loophole we could make has_contiguity_ a ternary enum with the third value diverting to the virtual fallback, that would leave perf pretty much pay as you go I think.

For the goal state we could

  • just remove the Vulkan and Metal overrides and leave it at that, i.e. move them to the default behavior. AFAICT the proper solution for these would be to error when you try to set strides to something noncontiguous, but I don't think these do that currently. So having is_contiguous() tell the truth seems strictly better, though ... BC breaking?
  • move Batched to the default behavior, maybe with a preliminary PR to slipstream in correct bool setting if that setting is obvious

@swolchok
Copy link
Contributor Author

swolchok commented Apr 1, 2021

This PR is accepted, but given the discussion I'm unsure if I'm supposed to land it. @bhosmer / @ezyang can you clarify?

@bhosmer
Copy link

bhosmer commented Apr 1, 2021

This PR is accepted, but given the discussion I'm unsure if I'm supposed to land it. @bhosmer / @ezyang can you clarify?

Yeah sorry for the mixed signals, it's my bad for stretching the "approve with suggestions" idiom past the breaking point, I'll change status to changes requested for clarity.

My preference would be to not land this as-is but modify it to use a virtual is_contiguous_custom() instead of the "policy" switch. A ternary has_contiguity_ flag seems to me like a decent way to confine the perf hit to the cases we want to get rid of anyway, but I'm sure there's other ways that would be fine. (per "loophole" above - true = default behavior, false = throw, custom = is_contiguous_custom() with a // TODO remove)

AFAICT @ezyang is also saying he's ok with a virtual fallback but not with the policy switch, but I could be wrong.

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per thread

@swolchok
Copy link
Contributor Author

swolchok commented Apr 1, 2021

AFAICT, here's a breakdown of TensorImpl types and the behavior they need:

  • TensorImpl -- default, obviously
  • Sparse -- does not have contiguity, throw
  • Opaque -- does not have contiguity, throw
  • Batched -- has custom contiguity behavior, virtual fallback
  • DelayedTensorImpl -- custom behavior, virtual fallback (contiguous always iff MemoryFormat::Contiguous)
  • Vulkan/Metal -- same as Delayed after @taox gets around to fixing the current behavior, virtual fallback

IIUC, we want to support "does not have contiguity, throw" going forward, and we begrudgingly support the virtual fallback.

I will send an update soon, but if this description is wrong, let's talk about it at this level rather than code comments.

@swolchok
Copy link
Contributor Author

swolchok commented Apr 1, 2021

By the way, is_contiguous itself still needs to be TENSORIMPL_MAYBE_VIRTUAL to support backward compatibility, right?

@bhosmer
Copy link

bhosmer commented Apr 1, 2021

AFAICT, here's a breakdown of TensorImpl types and the behavior they need:
...
IIUC, we want to support "does not have contiguity, throw" going forward, and we begrudgingly support the virtual fallback.

I will send an update soon, but if this description is wrong, let's talk about it at this level rather than code comments.

This description matches my understanding, yeah.

Re TENSORIMPL_MAYBE_VIRTUAL, it would sure be nice to avoid, both for general-case perf and crazy semantics loophole reasons. But I don't know how what the contract is for TensorImpl subclassing - is the specific current set of virtual TensorImpl methods considered public API? cc @ezyang @gchanan

…rtualize is_contiguous"

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 1, 2021
Pull Request resolved: #54896

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)
ghstack-source-id: 125540154

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
@swolchok swolchok requested a review from bhosmer April 1, 2021 21:02
Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I still don't know the definitive answer about TENSORIMPL_MAYBE_VIRTUAL but in the absence of signal, this is obv the right way to land it.

@bhosmer
Copy link

bhosmer commented Apr 2, 2021

Oh hey, sorry for the late-breaking observation, but: s̶i̶n̶c̶e̶ ̶̶h̶a̶s̶_̶c̶o̶n̶t̶i̶g̶u̶i̶t̶y̶_̶̶ ̶i̶s̶ ̶p̶e̶r̶-̶c̶l̶a̶s̶s̶ ̶r̶a̶t̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶p̶e̶r̶-̶i̶n̶s̶t̶a̶n̶c̶e̶,̶ ̶i̶s̶ ̶i̶t̶ ̶w̶o̶r̶t̶h̶ ̶t̶e̶m̶p̶l̶a̶t̶i̶z̶i̶n̶g̶?̶ ̶O̶r̶ ̶a̶l̶t̶e̶r̶n̶a̶t̶i̶v̶e̶l̶y̶,̶ [edit: nvm] would it be worth making it const and setting it at construction time only rather than having a setter?

Feel free to disregard if your sense of the perf (and safety I guess, but mostly perf) ROI doesn't motivate either of these, just want to make sure they've been floated.

@swolchok
Copy link
Contributor Author

swolchok commented Apr 2, 2021

would it be worth making it const and setting it at construction time only rather than having a setter?

It needs to be copied in the various metadata copy methods, so it can't be const (and I think I've forgotten to do that copying, so update coming)

…tualize is_contiguous"

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
…e is_contiguous"

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 2, 2021
Pull Request resolved: #54896

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)
ghstack-source-id: 125623747

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 2, 2021
Pull Request resolved: #54896

This should help performance. (For example, it improves total
time spent in a C++ benchmark that just adds 2 tensors in place by
about 10%.)
ghstack-source-id: 125659451

Differential Revision: [D27404164](https://our.internmc.facebook.com/intern/diff/D27404164/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27404164/)!
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 62aa924.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by e61f5b5.

@mruberry
Copy link
Collaborator

mruberry commented Apr 5, 2021

Relevant snippet for build failures:

Apr 02 23:06:09 In file included from /var/lib/jenkins/workspace/c10/core/TensorImpl.cpp:1:0:
Apr 02 23:06:09 /var/lib/jenkins/workspace/c10/core/TensorImpl.h:1851:41: error: 'c10::TensorImpl::has_contiguity_' is too small to hold all values of 'enum class c10::TensorImpl::HasContiguityPolicy' [-Werror]
Apr 02 23:06:09    HasContiguityPolicy has_contiguity_ : 2;
Apr 02 23:06:09                                          ^
Apr 02 23:06:09 cc1plus: all warnings being treated as errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants