Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fgrad_input from slow_conv2d #64280

Closed
wants to merge 7 commits into from

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Aug 31, 2021

Stack from ghstack:

Differential Revision: D30830887

@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Aug 31, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 31, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit d6d2760 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-py3.8-gcc9-coverage / test (distributed, 1, 1, linux.2xlarge) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-09T19:32:27.3233211Z test_udf_remote_...yUniqueId(created_on=0, local_id=0) to be created.
2021-09-09T19:31:46.2879782Z frame #15: <unknown function> + 0x486ea (0x7f2f5dbf56ea in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:31:46.2880943Z frame #16: <unknown function> + 0xc9039 (0x7f2f5db01039 in /opt/conda/lib/libstdc++.so.6)
2021-09-09T19:31:46.2882325Z frame #17: <unknown function> + 0x76db (0x7f2f816936db in /lib/x86_64-linux-gnu/libpthread.so.0)
2021-09-09T19:31:46.2883524Z frame #18: clone + 0x3f (0x7f2f813bc71f in /lib/x86_64-linux-gnu/libc.so.6)
2021-09-09T19:31:46.2884036Z 
2021-09-09T19:31:46.7772062Z ok (3.824s)
2021-09-09T19:32:02.0299820Z   test_rpc_builtin_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (15.253s)
2021-09-09T19:32:11.3692105Z   test_rpc_script_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (9.339s)
2021-09-09T19:32:15.1945286Z   test_rref_to_here_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (3.825s)
2021-09-09T19:32:23.0270257Z   test_udf_remote_message_delay_timeout (__main__.FaultyFaultyAgentRpcTest) ... ok (7.832s)
2021-09-09T19:32:27.3233211Z   test_udf_remote_message_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTest) ... [E request_callback_no_python.cpp:559] Received error while processing request type 261: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":385, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
2021-09-09T19:32:27.3235861Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:385 (most recent call first):
2021-09-09T19:32:27.3238761Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x59 (0x7fb19341bf59 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3240701Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xa3 (0x7fb1933f2b34 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3242612Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x61 (0x7fb193419341 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
2021-09-09T19:32:27.3244377Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x628 (0x7fb19c964fb8 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3246661Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x8c (0x7fb19c94b80c in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3249157Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0xf5 (0x7fb1ad25bbb5 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
2021-09-09T19:32:27.3251732Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x1f0 (0x7fb19c9523a0 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
2021-09-09T19:32:27.3254124Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x60 (0x7fb1ad25b480 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
2021-09-09T19:32:27.3255733Z frame #8: <unknown function> + 0x92be350 (0x7fb19c947350 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

peterbell10 added a commit that referenced this pull request Aug 31, 2021
ghstack-source-id: 845dbc55b43c0b6474922b1fe90387dd21e8dd1e
Pull Request resolved: #64280
peterbell10 added a commit that referenced this pull request Aug 31, 2021
ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0
Pull Request resolved: #64280
@ezyang
Copy link
Contributor

ezyang commented Sep 1, 2021

ooh so delightful

cc @jbschlosser

@ezyang
Copy link
Contributor

ezyang commented Sep 1, 2021

Actually, @jbschlosser, I'm going to leave reviewing and landing this one up to you, since it is a logical step towards convolution rationalization

@jbschlosser
Copy link
Contributor

Very timely PR, @peterbell10 :) love it!

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! Left a couple comments below

@@ -9490,31 +9490,31 @@
CPU: slow_conv_transpose3d_backward_cpu
CUDA: slow_conv_transpose3d_backward_cuda

- func: thnn_conv2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, *, Tensor(a!) out) -> Tensor(a!)
- func: _slow_conv2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, *, Tensor(a!) out) -> Tensor(a!)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name change is sorely needed and I welcome it, but I predict some additional internal breakage. Based on how much there is, it might make sense to limit the scope of this PR to only fgrad_input removal and bump the name change to a future PR, TBD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored the name of the top-level function but I think changing the names of _forward and _backward is okay since they aren't really public and the signature has to change either way.

Comment on lines 344 to 348
if (grad_bias.defined()) {
Tensor grad_bias_mut = grad_bias;
at::sum_out(grad_bias_mut, grad_output, IntArrayRef{0, 2, 3});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much nicer way to compute grad_bias and seems correct AFAICT.

Couple questions regarding this:

  • Are there any tests verifying the grad_bias calculation? I briefly checked but nothing stood out
  • Do you by chance have any comparative benchmarking results for the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests verifying the grad_bias calculation? I briefly checked but nothing stood out

The nn tests are a bit confusing, but I think the gradcheck is done here:

def _check_gradients(self, test_case, module, input_tuple):

That test includes inputs and parameters, so bias should be covered.

Do you by chance have any comparative benchmarking results for the change?

I've done some simple benchmarking which showed positive results. LMK if you want more.

For CUDA, I've run a convolution in a loop under nvprof with these parameters:

input = torch.rand(1, 50, 260, 260, device=device, requires_grad=False)
weight = torch.rand(10, 50, 3, 3, device=device, requires_grad=False)
bias = torch.rand(10, device=device, requires_grad=True)

The result is it calls one kernel instead of two (cublas gemm here becomes dot_kernel + reduce_1Block_kernel). The kernel execution time is also noticeably faster:

Master (us) This PR (us)
Forward 23.8 14.6
Backward 22.4 18.1

On CPU without nvprof, the results are dwarfed by the main convolution and the same shapes showed no measurable difference. So I also tried a smaller input size with H=120, C_in = 2 and that showed some minor improvement in the forward case (presumably from vectorization).

Master (us) This PR (us)
Forward (Big) 15,100 15,100
Backward (Big) 13,100 13,100
Forward (Small) 526 499
Backward (Small) 1,050 1,050

Comment on lines -128 to -130
TORCH_CHECK(!bias.defined() || bias.is_contiguous(),
"bias tensor has to be contiguous");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify my understading: bias no longer needs to be contiguous because grad_bias is now being computed using at::sum_out instead of with an ad-hoc kernel that required contiguous input?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the general idea. This is replacing cuBLAS library calls which don't support completely arbitrary strides, with TensorIterator kernels that do.

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Sep 1, 2021
ghstack-source-id: 168dc0a7bb648fe0b68bfa4afe3608470d0f1ba0
Pull Request resolved: pytorch#64280
peterbell10 added a commit that referenced this pull request Sep 2, 2021
ghstack-source-id: 0fa3a83cbdea02370e4ebd0e818efa552e568765
Pull Request resolved: #64280
peterbell10 added a commit that referenced this pull request Sep 2, 2021
ghstack-source-id: 345ed21f8d0a1e00370f1e3f99e01344468dc875
Pull Request resolved: #64280
@ezyang ezyang removed their request for review September 2, 2021 16:22
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the update :)

@ngimel
Copy link
Collaborator

ngimel commented Sep 8, 2021

This looks awesome and I'm glad we don't need blas to add bias/compute bias grad, it's a leftover from caffe1 times.

@jbschlosser
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jbschlosser
Copy link
Contributor

Hey @peterbell10, do you mind doing a rebase for this? auto-rebase is failing internally

@pytorch-probot pytorch-probot bot assigned pytorchbot and unassigned pytorchbot Sep 9, 2021
peterbell10 added a commit that referenced this pull request Sep 9, 2021
ghstack-source-id: 5c0cdd094500c03630dc4ec813d6b1ffff081223
Pull Request resolved: #64280
@pytorch-probot
Copy link

pytorch-probot bot commented Sep 9, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/d6d2760cb74c04cc5507ed661bbb80e3f7445770/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-bionic-py3.8-gcc9-coverage ciflow/all, ciflow/coverage, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda10.2-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
paralleltbb-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@jbschlosser
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jbschlosser
Copy link
Contributor

FYI @peterbell10 there's still quite a bit of internal breakage happening due to the renaming. I'm looking into how much of it is unavoidable due to the signature change.

@peterbell10
Copy link
Collaborator Author

@jbschlosser any update on this?

@jbschlosser
Copy link
Contributor

jbschlosser commented Sep 24, 2021

Hey @peterbell10, sorry for the lack of updates. I got the internal situation figured out and it's merged now.

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/133/head branch September 28, 2021 14:20
@seemethere seemethere added this to the 1.10.1 milestone Dec 8, 2021
seemethere pushed a commit that referenced this pull request Dec 8, 2021
Summary: Pull Request resolved: #64280

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30830887

Pulled By: jbschlosser

fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4
(cherry picked from commit 68e5935)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
seemethere pushed a commit that referenced this pull request Dec 9, 2021
Summary: Pull Request resolved: #64280

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30830887

Pulled By: jbschlosser

fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4
(cherry picked from commit 68e5935)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
seemethere added a commit that referenced this pull request Dec 10, 2021
Co-authored-by: Peter Bell <peterbell10@live.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants