Remove fgrad_input from slow_conv2d #64280

peterbell10 · 2021-08-31T16:24:57Z

Stack from ghstack:

-> Remove fgrad_input from slow_conv2d #64280

Differential Revision: D30830887

[ghstack-poisoned]

facebook-github-bot · 2021-08-31T16:25:03Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/64280
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR

💊 CI failures summary and remediations

As of commit d6d2760 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-bionic-py3.8-gcc9-coverage / test (distributed, 1, 1, linux.2xlarge) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-09T19:32:27.3233211Z test_udf_remote_...yUniqueId(created_on=0, local_id=0) to be created.

2021-09-09T19:31:46.2879782Z frame #15: <unknown function> + 0x486ea (0x7f2f5dbf56ea in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:31:46.2880943Z frame #16: <unknown function> + 0xc9039 (0x7f2f5db01039 in /opt/conda/lib/libstdc++.so.6)
2021-09-09T19:31:46.2882325Z frame #17: <unknown function> + 0x76db (0x7f2f816936db in /lib/x86_64-linux-gnu/libpthread.so.0)
2021-09-09T19:31:46.2883524Z frame #18: clone + 0x3f (0x7f2f813bc71f in /lib/x86_64-linux-gnu/libc.so.6)
2021-09-09T19:31:46.2884036Z 
2021-09-09T19:31:46.7772062Z ok (3.824s)
2021-09-09T19:32:02.0299820Z   test_rpc_builtin_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (15.253s)
2021-09-09T19:32:11.3692105Z   test_rpc_script_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (9.339s)
2021-09-09T19:32:15.1945286Z   test_rref_to_here_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (3.825s)
2021-09-09T19:32:23.0270257Z   test_udf_remote_message_delay_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (7.832s)
2021-09-09T19:32:27.3233211Z   test_udf_remote_message_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTest) ... [E request_callback_no_python.cpp:559] Received error while processing request type 261: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":385, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
2021-09-09T19:32:27.3235861Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:385 (most recent call first):
2021-09-09T19:32:27.3238761Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x59 (0x7fb19341bf59 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3240701Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xa3 (0x7fb1933f2b34 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3242612Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x61 (0x7fb193419341 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3244377Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x628 (0x7fb19c964fb8 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3246661Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x8c (0x7fb19c94b80c in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3249157Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0xf5 (0x7fb1ad25bbb5 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
2021-09-09T19:32:27.3251732Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x1f0 (0x7fb19c9523a0 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3254124Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x60 (0x7fb1ad25b480 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
2021-09-09T19:32:27.3255733Z frame #8: <unknown function> + 0x92be350 (0x7fb19c947350 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

[ghstack-poisoned]

ghstack-source-id: 845dbc55b43c0b6474922b1fe90387dd21e8dd1e Pull Request resolved: #64280

[ghstack-poisoned]

ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0 Pull Request resolved: #64280

ezyang · 2021-09-01T00:35:58Z

ooh so delightful

cc @jbschlosser

ezyang · 2021-09-01T00:36:27Z

Actually, @jbschlosser, I'm going to leave reviewing and landing this one up to you, since it is a logical step towards convolution rationalization

jbschlosser · 2021-09-01T15:52:53Z

Very timely PR, @peterbell10 :) love it!

jbschlosser

Very nice work! Left a couple comments below

jbschlosser · 2021-09-01T16:30:40Z

aten/src/ATen/native/native_functions.yaml

@@ -9490,31 +9490,31 @@
    CPU: slow_conv_transpose3d_backward_cpu
    CUDA: slow_conv_transpose3d_backward_cuda

- func: thnn_conv2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, *, Tensor(a!) out) -> Tensor(a!)
+- func: _slow_conv2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, *, Tensor(a!) out) -> Tensor(a!)


The name change is sorely needed and I welcome it, but I predict some additional internal breakage. Based on how much there is, it might make sense to limit the scope of this PR to only fgrad_input removal and bump the name change to a future PR, TBD.

I've restored the name of the top-level function but I think changing the names of _forward and _backward is okay since they aren't really public and the signature has to change either way.

jbschlosser · 2021-09-01T17:15:12Z

aten/src/ATen/native/ConvolutionMM2d.cpp

+  if (grad_bias.defined()) {
+    Tensor grad_bias_mut = grad_bias;
+    at::sum_out(grad_bias_mut, grad_output, IntArrayRef{0, 2, 3});
+  }
+


This is a much nicer way to compute grad_bias and seems correct AFAICT.

Couple questions regarding this:

Are there any tests verifying the grad_bias calculation? I briefly checked but nothing stood out

Do you by chance have any comparative benchmarking results for the change?

Are there any tests verifying the grad_bias calculation? I briefly checked but nothing stood out

The nn tests are a bit confusing, but I think the gradcheck is done here:

pytorch/torch/testing/_internal/common_nn.py

Line 5807 in 59fcbd1

def _check_gradients(self, test_case, module, input_tuple):

That test includes inputs and parameters, so bias should be covered.

Do you by chance have any comparative benchmarking results for the change?

I've done some simple benchmarking which showed positive results. LMK if you want more.

For CUDA, I've run a convolution in a loop under nvprof with these parameters:

input = torch.rand(1, 50, 260, 260, device=device, requires_grad=False) weight = torch.rand(10, 50, 3, 3, device=device, requires_grad=False) bias = torch.rand(10, device=device, requires_grad=True)

The result is it calls one kernel instead of two (cublas gemm here becomes dot_kernel + reduce_1Block_kernel). The kernel execution time is also noticeably faster:

Master (us) This PR (us)

Forward 23.8 14.6

Backward 22.4 18.1

On CPU without nvprof, the results are dwarfed by the main convolution and the same shapes showed no measurable difference. So I also tried a smaller input size with H=120, C_in = 2 and that showed some minor improvement in the forward case (presumably from vectorization).

Master (us) This PR (us)

Forward (Big) 15,100 15,100

Backward (Big) 13,100 13,100

Forward (Small) 526 499

Backward (Small) 1,050 1,050

jbschlosser · 2021-09-01T17:22:02Z

aten/src/ATen/native/cuda/ConvolutionMM2d.cu

-  TORCH_CHECK(!bias.defined() || bias.is_contiguous(),
-              "bias tensor has to be contiguous");
-


Just to verify my understading: bias no longer needs to be contiguous because grad_bias is now being computed using at::sum_out instead of with an ad-hoc kernel that required contiguous input?

That's the general idea. This is replacing cuBLAS library calls which don't support completely arbitrary strides, with TensorIterator kernels that do.

ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0 Pull Request resolved: pytorch#64280

[ghstack-poisoned]

ghstack-source-id: 0fa3a83cbdea02370e4ebd0e818efa552e568765 Pull Request resolved: #64280

[ghstack-poisoned]

ghstack-source-id: 345ed21f8d0a1e00370f1e3f99e01344468dc875 Pull Request resolved: #64280

jbschlosser

LGTM! Thanks for the update :)

ngimel · 2021-09-08T04:43:04Z

This looks awesome and I'm glad we don't need blas to add bias/compute bias grad, it's a leftover from caffe1 times.

jbschlosser · 2021-09-09T02:50:30Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jbschlosser · 2021-09-09T17:57:38Z

Hey @peterbell10, do you mind doing a rebase for this? auto-rebase is failing internally

Differential Revision: [D30830887](https://our.internmc.facebook.com/intern/diff/D30830887) [ghstack-poisoned]

ghstack-source-id: 5c0cdd094500c03630dc4ec813d6b1ffff081223 Pull Request resolved: #64280

pytorch-probot · 2021-09-09T18:16:34Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/d6d2760cb74c04cc5507ed661bbb80e3f7445770/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-bionic-py3.8-gcc9-coverage	`ciflow/all`, `ciflow/coverage`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda10.2-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
paralleltbb-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

jbschlosser · 2021-09-09T20:08:55Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jbschlosser · 2021-09-10T01:55:42Z

FYI @peterbell10 there's still quite a bit of internal breakage happening due to the renaming. I'm looking into how much of it is unavoidable due to the signature change.

peterbell10 · 2021-09-23T13:04:51Z

@jbschlosser any update on this?

jbschlosser · 2021-09-24T21:51:47Z

Hey @peterbell10, sorry for the lack of updates. I got the internal situation figured out and it's merged now.

Summary: Pull Request resolved: #64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4 (cherry picked from commit 68e5935) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Peter Bell <peterbell10@live.co.uk>

Remove fgrad_input from slow_conv2d

fefa920

[ghstack-poisoned]

peterbell10 requested review from albanD, ezyang and soulitzer as code owners August 31, 2021 16:24

facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Aug 31, 2021

pytorchbot added the open source label Aug 31, 2021

Update on "Remove fgrad_input from slow_conv2d"

480cd16

[ghstack-poisoned]

Update on "Remove fgrad_input from slow_conv2d"

08ab268

[ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Aug 31, 2021

Remove fgrad_input from slow_conv2d

05286e5

ghstack-source-id: 845dbc55b43c0b6474922b1fe90387dd21e8dd1e Pull Request resolved: #64280

Update on "Remove fgrad_input from slow_conv2d"

6dce32c

[ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Aug 31, 2021

Remove fgrad_input from slow_conv2d

e0b77f9

ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0 Pull Request resolved: #64280

ezyang requested a review from jbschlosser September 1, 2021 00:36

jbschlosser reviewed Sep 1, 2021

View reviewed changes

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Sep 1, 2021

Remove fgrad_input from slow_conv2d

fc9c5f7

ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0 Pull Request resolved: pytorch#64280

Update on "Remove fgrad_input from slow_conv2d"

3f7e6a7

[ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Sep 2, 2021

Remove fgrad_input from slow_conv2d

f4648f7

ghstack-source-id: 0fa3a83cbdea02370e4ebd0e818efa552e568765 Pull Request resolved: #64280

Update on "Remove fgrad_input from slow_conv2d"

52f7393

[ghstack-poisoned]

peterbell10 added a commit that referenced this pull request Sep 2, 2021

Remove fgrad_input from slow_conv2d

ba38197

ghstack-source-id: 345ed21f8d0a1e00370f1e3f99e01344468dc875 Pull Request resolved: #64280

ezyang removed their request for review September 2, 2021 16:22

jbschlosser approved these changes Sep 7, 2021

View reviewed changes

ngimel approved these changes Sep 8, 2021

View reviewed changes

Update on "Remove fgrad_input from slow_conv2d"

d6d2760

Differential Revision: [D30830887](https://our.internmc.facebook.com/intern/diff/D30830887) [ghstack-poisoned]

pytorch-probot bot added the ciflow/default label Sep 9, 2021

pytorch-probot bot assigned pytorchbot and unassigned pytorchbot Sep 9, 2021

peterbell10 added a commit that referenced this pull request Sep 9, 2021

Remove fgrad_input from slow_conv2d

3ae5286

ghstack-source-id: 5c0cdd094500c03630dc4ec813d6b1ffff081223 Pull Request resolved: #64280

facebook-github-bot closed this in 68e5935 Sep 24, 2021

facebook-github-bot deleted the gh/peterbell10/133/head branch September 28, 2021 14:20

XiaobingSuper mentioned this pull request Nov 9, 2021

[1.10.0]Conv2d grad bias gets wrong value for bfloat16 case #68048

Closed

zou3519 mentioned this pull request Nov 29, 2021

Remove finput from slow2d signatures #68896

Closed

seemethere added this to the 1.10.1 milestone Dec 8, 2021

This was referenced Dec 8, 2021

[release/1.10] Remove fgrad_input from slow_conv2d (#64280) #69622

Merged

[v1.10.1] Release Tracker #69100

Closed

seemethere added a commit that referenced this pull request Dec 10, 2021

[release/1.10] Remove fgrad_input from slow_conv2d (#64280) (#69622)

932ac7b

Co-authored-by: Peter Bell <peterbell10@live.co.uk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove fgrad_input from slow_conv2d #64280

Remove fgrad_input from slow_conv2d #64280

peterbell10 commented Aug 31, 2021 •

edited by jbschlosser

facebook-github-bot commented Aug 31, 2021 •

edited

ezyang commented Sep 1, 2021

ezyang commented Sep 1, 2021

jbschlosser commented Sep 1, 2021

jbschlosser left a comment

jbschlosser Sep 1, 2021

peterbell10 Sep 2, 2021 •

edited

jbschlosser Sep 1, 2021

peterbell10 Sep 2, 2021

jbschlosser Sep 1, 2021

peterbell10 Sep 2, 2021

jbschlosser left a comment

ngimel commented Sep 8, 2021

jbschlosser commented Sep 9, 2021

jbschlosser commented Sep 9, 2021

pytorch-probot bot commented Sep 9, 2021

⚛️ CI Flow

jbschlosser commented Sep 9, 2021

jbschlosser commented Sep 10, 2021

peterbell10 commented Sep 23, 2021

jbschlosser commented Sep 24, 2021 •

edited

	Master (us)	This PR (us)
Forward (Big)	15,100	15,100
Backward (Big)	13,100	13,100
Forward (Small)	526	499
Backward (Small)	1,050	1,050

		TORCH_CHECK(!bias.defined() \|\| bias.is_contiguous(),
		"bias tensor has to be contiguous");

Remove fgrad_input from slow_conv2d #64280

Remove fgrad_input from slow_conv2d #64280

Conversation

peterbell10 commented Aug 31, 2021 • edited by jbschlosser

facebook-github-bot commented Aug 31, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

linux-bionic-py3.8-gcc9-coverage / test (distributed, 1, 1, linux.2xlarge) (1/1)

ezyang commented Sep 1, 2021

ezyang commented Sep 1, 2021

jbschlosser commented Sep 1, 2021

jbschlosser left a comment

Choose a reason for hiding this comment

jbschlosser Sep 1, 2021

Choose a reason for hiding this comment

peterbell10 Sep 2, 2021 • edited

Choose a reason for hiding this comment

jbschlosser Sep 1, 2021

Choose a reason for hiding this comment

peterbell10 Sep 2, 2021

Choose a reason for hiding this comment

jbschlosser Sep 1, 2021

Choose a reason for hiding this comment

peterbell10 Sep 2, 2021

Choose a reason for hiding this comment

jbschlosser left a comment

Choose a reason for hiding this comment

ngimel commented Sep 8, 2021

jbschlosser commented Sep 9, 2021

jbschlosser commented Sep 9, 2021

pytorch-probot bot commented Sep 9, 2021

⚛️ CI Flow

jbschlosser commented Sep 9, 2021

jbschlosser commented Sep 10, 2021

peterbell10 commented Sep 23, 2021

jbschlosser commented Sep 24, 2021 • edited

peterbell10 commented Aug 31, 2021 •

edited by jbschlosser

facebook-github-bot commented Aug 31, 2021 •

edited

peterbell10 Sep 2, 2021 •

edited

jbschlosser commented Sep 24, 2021 •

edited