New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove backward op for slow 3d transposed convolution #69933
Conversation
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
ghstack-source-id: b1c70cce1ec4ebaca12c1111d84ac6d28d3099c3 Pull Request resolved: #69933
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit f8dbcdd (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: linux-xenial-py3.6-gcc5.4 / build (1/1)Step: "Unknown" (full log | diagnosis details | 🔁 rerun)
|
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343) [ghstack-poisoned]
ghstack-source-id: e2e0d57dc8e8db80bf8ba19e94c569aa310dfc04 Pull Request resolved: #69933
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
python_module: nn | ||
dispatch: | ||
CPU: slow_conv_transpose3d_backward_out_cpu | ||
CUDA: slow_conv_transpose3d_backward_out_cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can assume that backends aren't overriding this op, right? Since other backends (e.g. XLA) override convolution_overrideable
, and not any of the individual ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct AFAIK. convolution_overrideable
was created to be overridden for XLA. I'm sure it's still possible that someone somewhere is overriding the op, but I haven't seen anything internal or external.
namespace native { | ||
namespace { | ||
|
||
static inline void slow_conv_transpose3d_shape_check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go in a cpp file instead of a header?
|
||
// number of input & output planes and kernel size is indirectly defined by | ||
// the grad_weight tensor | ||
slow_conv_transpose3d_shape_check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you factored out the shape check into that other header file, but I also see a bunch of other shape checking here. What's the split for (or is some of it duplicated)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey good question! Sorry I updated the descriptions of some PRs to explain the split, but missed this one. Essentially, I want to register a CPU dispatch, but the REGISTER_DISPATCH
macro for CPU kernels requires that the code be place into the native/cpu
dir where is it recompiled it once per arch type. So I moved all the backward logic underneath native/cpu
so I can call REGISTER_DISPATCH
.
I've been talking with Richard about this, and he is rightfully concerned that the new multiple-arch compilation unnecessarily regresses build time and expands the binary size. I've been throwing around an idea of defining a new macro REGISTER_ALL_CPU_DISPATCH
that registers the same kernel across all arch types to avoid both the recompilation and the need to split the logic as done here. Do you hav any thoughts on this idea?
**Note:** `REGISTER_DISPATCH` for the CPU kernel is only accessible from the `native/cpu` directory. So this PR splits `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` into: * (new file) `aten/src/ATen/native/NaiveConvolutionTranspose3d.h` (contains functions shared between forward and backward) * `aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp` (contains forward logic) * (new file) `aten/src/ATen/native/cpu/NaiveConvolutionTranspose3d.cpp` (contains backward functions + `REGISTER_DISPATCH` call) Once the forward op is removed as well, the first two can go away. Differential Revision: [D33131343](https://our.internmc.facebook.com/intern/diff/D33131343) [ghstack-poisoned]
ghstack-source-id: fd79b06f54a4b1c6ef9c2c2de9c14ddf327a45ea Pull Request resolved: #69933
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary: Pull Request resolved: #69933 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33131343 Pulled By: jbschlosser fbshipit-source-id: 4300c66f0f4811c949f82c62d17c7b5200cd15a3
Stack from ghstack:
This PR drops the backward op for slow 3d transposed convolution. It replaces the op with a dispatch stub, and registers a single composite CPU kernel for all CPU arch types.
Differential Revision: D33131343