Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove backward op for slow 3d transposed convolution #69933

Closed
wants to merge 3 commits into from

Conversation

jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Dec 14, 2021

Stack from ghstack:

This PR drops the backward op for slow 3d transposed convolution. It replaces the op with a dispatch stub, and registers a single composite CPU kernel for all CPU arch types.

Differential Revision: D33131343

@pytorch-probot
Copy link

pytorch-probot bot commented Dec 14, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/f8dbcdd161b397208db2783d91236534b53119d3/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

jbschlosser added a commit that referenced this pull request Dec 14, 2021
ghstack-source-id: b1c70cce1ec4ebaca12c1111d84ac6d28d3099c3
Pull Request resolved: #69933
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 15, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit f8dbcdd (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / build (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-12-28T22:16:47.6378177Z �[36;1m echo "ERR...t available for the merge-base of your branch"�[0m
2021-12-28T22:16:47.6372607Z �[36;1mfi�[0m
2021-12-28T22:16:47.6373043Z �[36;1m# Covers the case where a previous tag doesn't exist for the tree�[0m
2021-12-28T22:16:47.6373711Z �[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly�[0m
2021-12-28T22:16:47.6374359Z �[36;1mif ! git rev-parse "$MERGE_BASE:.circleci/docker"; then�[0m
2021-12-28T22:16:47.6375051Z �[36;1m  echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"�[0m
2021-12-28T22:16:47.6375608Z �[36;1m  exit 1�[0m
2021-12-28T22:16:47.6375865Z �[36;1mfi�[0m
2021-12-28T22:16:47.6376299Z �[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")�[0m
2021-12-28T22:16:47.6376957Z �[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here�[0m
2021-12-28T22:16:47.6377526Z �[36;1mif [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then�[0m
2021-12-28T22:16:47.6378177Z �[36;1m  echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"�[0m
2021-12-28T22:16:47.6378879Z �[36;1m  echo "       contact the PyTorch team to restore the original images"�[0m
2021-12-28T22:16:47.6379298Z �[36;1m  exit 1�[0m
2021-12-28T22:16:47.6379566Z �[36;1mfi�[0m
2021-12-28T22:16:47.6380078Z �[36;1mecho ::set-output name=rebuild::yes�[0m
2021-12-28T22:16:47.6390376Z shell: /usr/bin/bash -e {0}
2021-12-28T22:16:47.6390687Z env:
2021-12-28T22:16:47.6391146Z   BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc5.4
2021-12-28T22:16:47.6392076Z   DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
2021-12-28T22:16:47.6393074Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2021-12-28T22:16:47.6393954Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@jbschlosser
Copy link
Contributor Author

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jbschlosser added a commit that referenced this pull request Dec 16, 2021
ghstack-source-id: e2e0d57dc8e8db80bf8ba19e94c569aa310dfc04
Pull Request resolved: #69933
@jbschlosser
Copy link
Contributor Author

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

python_module: nn
dispatch:
CPU: slow_conv_transpose3d_backward_out_cpu
CUDA: slow_conv_transpose3d_backward_out_cuda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can assume that backends aren't overriding this op, right? Since other backends (e.g. XLA) override convolution_overrideable, and not any of the individual ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct AFAIK. convolution_overrideable was created to be overridden for XLA. I'm sure it's still possible that someone somewhere is overriding the op, but I haven't seen anything internal or external.

namespace native {
namespace {

static inline void slow_conv_transpose3d_shape_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go in a cpp file instead of a header?


// number of input & output planes and kernel size is indirectly defined by
// the grad_weight tensor
slow_conv_transpose3d_shape_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you factored out the shape check into that other header file, but I also see a bunch of other shape checking here. What's the split for (or is some of it duplicated)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey good question! Sorry I updated the descriptions of some PRs to explain the split, but missed this one. Essentially, I want to register a CPU dispatch, but the REGISTER_DISPATCH macro for CPU kernels requires that the code be place into the native/cpu dir where is it recompiled it once per arch type. So I moved all the backward logic underneath native/cpu so I can call REGISTER_DISPATCH.

I've been talking with Richard about this, and he is rightfully concerned that the new multiple-arch compilation unnecessarily regresses build time and expands the binary size. I've been throwing around an idea of defining a new macro REGISTER_ALL_CPU_DISPATCH that registers the same kernel across all arch types to avoid both the recompilation and the need to split the logic as done here. Do you hav any thoughts on this idea?

**Note:** `REGISTER_DISPATCH` for the CPU kernel is only accessible from the `native/cpu` directory. So this PR splits `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` into:
* (new file) `aten/src/ATen/native/NaiveConvolutionTranspose3d.h` (contains functions shared between forward and backward)
* `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` (contains forward logic)
* (new file) `aten/src/ATen/native/cpu/NaiveConvolutionTranspose3d.cpp` (contains backward functions + `REGISTER_DISPATCH` call)

Once the forward op is removed as well, the first two can go away.

Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343)

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Dec 28, 2021
ghstack-source-id: fd79b06f54a4b1c6ef9c2c2de9c14ddf327a45ea
Pull Request resolved: #69933
@jbschlosser
Copy link
Contributor Author

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@albanD albanD removed their request for review December 29, 2021 11:03
Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

wconstab pushed a commit that referenced this pull request Jan 5, 2022
Summary: Pull Request resolved: #69933

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33131343

Pulled By: jbschlosser

fbshipit-source-id: 4300c66f0f4811c949f82c62d17c7b5200cd15a3
@facebook-github-bot facebook-github-bot deleted the gh/jbschlosser/14/head branch January 8, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants