Remove backward op for slow 3d transposed convolution #69933

jbschlosser · 2021-12-14T22:09:48Z

Stack from ghstack:

-> Remove backward op for slow 3d transposed convolution #69933

This PR drops the backward op for slow 3d transposed convolution. It replaces the op with a dispatch stub, and registers a single composite CPU kernel for all CPU arch types.

Differential Revision: D33131343

[ghstack-poisoned]

pytorch-probot · 2021-12-14T22:09:51Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/f8dbcdd161b397208db2783d91236534b53119d3/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

ghstack-source-id: b1c70cce1ec4ebaca12c1111d84ac6d28d3099c3 Pull Request resolved: #69933

facebook-github-bot · 2021-12-15T17:37:34Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69933
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit f8dbcdd (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-py3.6-gcc5.4 / build (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-12-28T22:16:47.6378177Z �[36;1m echo "ERR...t available for the merge-base of your branch"�[0m

2021-12-28T22:16:47.6372607Z �[36;1mfi�[0m
2021-12-28T22:16:47.6373043Z �[36;1m# Covers the case where a previous tag doesn't exist for the tree�[0m
2021-12-28T22:16:47.6373711Z �[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly�[0m
2021-12-28T22:16:47.6374359Z �[36;1mif ! git rev-parse "$MERGE_BASE:.circleci/docker"; then�[0m
2021-12-28T22:16:47.6375051Z �[36;1m  echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"�[0m
2021-12-28T22:16:47.6375608Z �[36;1m  exit 1�[0m
2021-12-28T22:16:47.6375865Z �[36;1mfi�[0m
2021-12-28T22:16:47.6376299Z �[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")�[0m
2021-12-28T22:16:47.6376957Z �[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here�[0m
2021-12-28T22:16:47.6377526Z �[36;1mif [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then�[0m
2021-12-28T22:16:47.6378177Z �[36;1m  echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"�[0m
2021-12-28T22:16:47.6378879Z �[36;1m  echo "       contact the PyTorch team to restore the original images"�[0m
2021-12-28T22:16:47.6379298Z �[36;1m  exit 1�[0m
2021-12-28T22:16:47.6379566Z �[36;1mfi�[0m
2021-12-28T22:16:47.6380078Z �[36;1mecho ::set-output name=rebuild::yes�[0m
2021-12-28T22:16:47.6390376Z shell: /usr/bin/bash -e {0}
2021-12-28T22:16:47.6390687Z env:
2021-12-28T22:16:47.6391146Z   BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc5.4
2021-12-28T22:16:47.6392076Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
2021-12-28T22:16:47.6393074Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2021-12-28T22:16:47.6393954Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jbschlosser · 2021-12-15T17:37:36Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343) [ghstack-poisoned]

ghstack-source-id: e2e0d57dc8e8db80bf8ba19e94c569aa310dfc04 Pull Request resolved: #69933

jbschlosser · 2021-12-17T14:31:18Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

bdhirsh · 2021-12-22T20:07:00Z

aten/src/ATen/native/native_functions.yaml

-  python_module: nn
-  dispatch:
-    CPU: slow_conv_transpose3d_backward_out_cpu
-    CUDA: slow_conv_transpose3d_backward_out_cuda


we can assume that backends aren't overriding this op, right? Since other backends (e.g. XLA) override convolution_overrideable, and not any of the individual ops.

That's correct AFAIK. convolution_overrideable was created to be overridden for XLA. I'm sure it's still possible that someone somewhere is overriding the op, but I haven't seen anything internal or external.

bdhirsh · 2021-12-22T20:11:40Z

aten/src/ATen/native/NaiveConvolutionTranspose3d.h

+namespace native {
+namespace {
+
+static inline void slow_conv_transpose3d_shape_check(


should this go in a cpp file instead of a header?

bdhirsh · 2021-12-22T20:17:49Z

aten/src/ATen/native/cpu/NaiveConvolutionTranspose3d.cpp

+
+  // number of input & output planes and kernel size is indirectly defined by
+  // the grad_weight tensor
+  slow_conv_transpose3d_shape_check(


It looks like you factored out the shape check into that other header file, but I also see a bunch of other shape checking here. What's the split for (or is some of it duplicated)?

Hey good question! Sorry I updated the descriptions of some PRs to explain the split, but missed this one. Essentially, I want to register a CPU dispatch, but the REGISTER_DISPATCH macro for CPU kernels requires that the code be place into the native/cpu dir where is it recompiled it once per arch type. So I moved all the backward logic underneath native/cpu so I can call REGISTER_DISPATCH.

I've been talking with Richard about this, and he is rightfully concerned that the new multiple-arch compilation unnecessarily regresses build time and expands the binary size. I've been throwing around an idea of defining a new macro REGISTER_ALL_CPU_DISPATCH that registers the same kernel across all arch types to avoid both the recompilation and the need to split the logic as done here. Do you hav any thoughts on this idea?

**Note:** `REGISTER_DISPATCH` for the CPU kernel is only accessible from the `native/cpu` directory. So this PR splits `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` into: * (new file) `aten/src/ATen/native/NaiveConvolutionTranspose3d.h` (contains functions shared between forward and backward) * `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` (contains forward logic) * (new file) `aten/src/ATen/native/cpu/NaiveConvolutionTranspose3d.cpp` (contains backward functions + `REGISTER_DISPATCH` call) Once the forward op is removed as well, the first two can go away. Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343) [ghstack-poisoned]

ghstack-source-id: fd79b06f54a4b1c6ef9c2c2de9c14ddf327a45ea Pull Request resolved: #69933

jbschlosser · 2021-12-28T22:31:08Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

bdhirsh

LGTM!

Summary: Pull Request resolved: #69933 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33131343 Pulled By: jbschlosser fbshipit-source-id: 4300c66f0f4811c949f82c62d17c7b5200cd15a3

Remove backward op for slow 3d transposed convolution

880d60f

[ghstack-poisoned]

jbschlosser requested review from albanD, ezyang and soulitzer as code owners December 14, 2021 22:09

pytorch-probot bot added the ciflow/default label Dec 14, 2021

jbschlosser added a commit that referenced this pull request Dec 14, 2021

Remove backward op for slow 3d transposed convolution

ee1bbd9

ghstack-source-id: b1c70cce1ec4ebaca12c1111d84ac6d28d3099c3 Pull Request resolved: #69933

facebook-github-bot added the cla signed label Dec 15, 2021

Update on "Remove backward op for slow 3d transposed convolution"

80d1e56

Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343) [ghstack-poisoned]

jbschlosser added a commit that referenced this pull request Dec 16, 2021

Remove backward op for slow 3d transposed convolution

3bc3058

ghstack-source-id: e2e0d57dc8e8db80bf8ba19e94c569aa310dfc04 Pull Request resolved: #69933

jbschlosser requested review from bdhirsh and zou3519 December 20, 2021 21:23

bdhirsh reviewed Dec 22, 2021

View reviewed changes

jbschlosser added a commit that referenced this pull request Dec 28, 2021

Remove backward op for slow 3d transposed convolution

5e7868f

ghstack-source-id: fd79b06f54a4b1c6ef9c2c2de9c14ddf327a45ea Pull Request resolved: #69933

albanD removed their request for review December 29, 2021 11:03

jbschlosser requested a review from bdhirsh December 30, 2021 17:13

bdhirsh approved these changes Jan 4, 2022

View reviewed changes

facebook-github-bot closed this in 14457bb Jan 5, 2022

facebook-github-bot deleted the gh/jbschlosser/14/head branch January 8, 2022 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove backward op for slow 3d transposed convolution #69933

Remove backward op for slow 3d transposed convolution #69933

jbschlosser commented Dec 14, 2021 •

edited

pytorch-probot bot commented Dec 14, 2021 •

edited

⚛️ CI Flow

facebook-github-bot commented Dec 15, 2021 •

edited

jbschlosser commented Dec 15, 2021

jbschlosser commented Dec 17, 2021

bdhirsh Dec 22, 2021

jbschlosser Dec 22, 2021

bdhirsh Dec 22, 2021

bdhirsh Dec 22, 2021

jbschlosser Dec 22, 2021 •

edited

jbschlosser commented Dec 28, 2021

bdhirsh left a comment

Remove backward op for slow 3d transposed convolution #69933

Remove backward op for slow 3d transposed convolution #69933

Conversation

jbschlosser commented Dec 14, 2021 • edited

pytorch-probot bot commented Dec 14, 2021 • edited

⚛️ CI Flow

facebook-github-bot commented Dec 15, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

linux-xenial-py3.6-gcc5.4 / build (1/1)

jbschlosser commented Dec 15, 2021

jbschlosser commented Dec 17, 2021

bdhirsh Dec 22, 2021

Choose a reason for hiding this comment

jbschlosser Dec 22, 2021

Choose a reason for hiding this comment

bdhirsh Dec 22, 2021

Choose a reason for hiding this comment

bdhirsh Dec 22, 2021

Choose a reason for hiding this comment

jbschlosser Dec 22, 2021 • edited

Choose a reason for hiding this comment

jbschlosser commented Dec 28, 2021

bdhirsh left a comment

Choose a reason for hiding this comment

jbschlosser commented Dec 14, 2021 •

edited

pytorch-probot bot commented Dec 14, 2021 •

edited

facebook-github-bot commented Dec 15, 2021 •

edited

jbschlosser Dec 22, 2021 •

edited