updated launch bounds for trilinear 3d #59999

Fuzzkatt · 2021-06-15T01:07:45Z

Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck).

Timing data: (Using Nvidia Titan-V GPU)

facebook-github-bot · 2021-06-15T01:07:49Z

Hi @Fuzzkatt!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2021-06-15T01:07:50Z

💊 CI failures summary and remediations

As of commit 59d1658 (more details on the Dr. CI page):

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_macos_10_13_py3_test (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 15 04:26:00 [E request_callback_no_python.c...quest type 267: Unexpected end of pickler archive.

Jun 15 04:26:00 frame #9: torch::distributed::rpc::RRefUserDelete::fromMessage(torch::distributed::rpc::Message const&) + 40 (0x11f002a98 in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #10: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 191 (0x11f00cadf in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #11: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 78 (0x11efd817e in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #12: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 60 (0x11efd809c in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #13: std::__1::__function::__func<torch::distributed::rpc::TensorPipeAgent::respond(std::__1::shared_ptr<tensorpipe::Pipe>&)::$_7::operator()(tensorpipe::Error const&, c10::intrusive_ptr<torch::distributed::rpc::Message, c10::detail::intrusive_target_default_null_type<torch::distributed::rpc::Message> >, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >)::'lambda'(), std::__1::allocator<torch::distributed::rpc::TensorPipeAgent::respond(std::__1::shared_ptr<tensorpipe::Pipe>&)::$_7::operator()(tensorpipe::Error const&, c10::intrusive_ptr<torch::distributed::rpc::Message, c10::detail::intrusive_target_default_null_type<torch::distributed::rpc::Message> >, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >)::'lambda'()>, void ()>::operator()() + 323 (0x11af23cc3 in libtorch_python.dylib)
Jun 15 04:26:00 frame #14: c10::ThreadPool::main_loop(unsigned long) + 569 (0x118171729 in libc10.dylib)
Jun 15 04:26:00 frame #15: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x118171dd3 in libc10.dylib)
Jun 15 04:26:00 frame #16: _pthread_start + 148 (0x7fff6a8fc109 in libsystem_pthread.dylib)
Jun 15 04:26:00 frame #17: thread_start + 15 (0x7fff6a8f7b8b in libsystem_pthread.dylib)
Jun 15 04:26:00 
Jun 15 04:26:00 [E request_callback_no_python.cpp:552] Received error while processing request type 267: Unexpected end of pickler archive.
Jun 15 04:26:00 Exception raised from readSlowWithBuffer at ../torch/csrc/jit/serialization/unpickler.cpp:756 (most recent call first):
Jun 15 04:26:00 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x11817fa12 in libc10.dylib)
Jun 15 04:26:00 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 205 (0x11817e30d in libc10.dylib)
Jun 15 04:26:00 frame #2: torch::jit::Unpickler::readSlowWithBuffer(char*, unsigned long) + 285 (0x11ecb7c3d in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #3: torch::jit::Unpickler::run() + 127 (0x11ecae30f in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #4: torch::jit::Unpickler::parse_ivalue() + 29 (0x11ecae13d in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #5: torch::jit::unpickle(std::__1::function<unsigned long (char*, unsigned long)>, std::__1::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 289 (0x11ec841a1 in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #6: torch::jit::unpickle(char const*, unsigned long, std::__1::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 206 (0x11ec8430e in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #7: torch::distributed::rpc::(anonymous namespace)::toIValues(torch::distributed::rpc::Message const&, torch::distributed::rpc::MessageType) + 220 (0x11f00167c in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #8: torch::distributed::rpc::ForkMessageBase::fromMessage(torch::distributed::rpc::Message const&, torch::distributed::rpc::MessageType) + 25 (0x11f001d29 in libtorch_cpu.dylib)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (3/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jun 15 01:19:59 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory

Jun 15 01:19:59 ++++ extract_trap_cmd
Jun 15 01:19:59 ++++ printf '%s\n' ''
Jun 15 01:19:59 +++ printf '%s\n' cleanup
Jun 15 01:19:59 ++ trap -- '
Jun 15 01:19:59 cleanup' EXIT
Jun 15 01:19:59 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
Jun 15 01:19:59 ++ which sccache
Jun 15 01:19:59 ++ sccache --stop-server
Jun 15 01:19:59 ++ true
Jun 15 01:19:59 ++ rm /var/lib/jenkins/sccache_error.log
Jun 15 01:19:59 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Jun 15 01:19:59 ++ true
Jun 15 01:19:59 ++ [[ -n '' ]]
Jun 15 01:19:59 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
Jun 15 01:19:59 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Jun 15 01:19:59 ++ SCCACHE_IDLE_TIMEOUT=1200
Jun 15 01:19:59 ++ RUST_LOG=sccache::server=error
Jun 15 01:19:59 ++ sccache --start-server
Jun 15 01:19:59 sccache: Starting the server...
Jun 15 01:19:59 ++ sccache --zero-stats
Jun 15 01:19:59 Compile requests                      0

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

1 job timed out:

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2021-06-16T00:15:44Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

ngimel · 2021-06-17T01:33:03Z

So looks like backward keeps spilling even after the fix?

facebook-github-bot · 2021-06-17T01:33:14Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mcarilli · 2021-06-17T04:05:30Z

So looks like backward keeps spilling even after the fix?

We're not sure if backward is still spilling or if it's bound by something else (ie atomicAdds).
@Fuzzkatt can you confirm nvcc doesn't report register spilling for the backward kernel after this change?

facebook-github-bot · 2021-06-18T04:03:37Z

@ngimel merged this pull request in bcf8752.

updated launch bounds for trilinear 3d

59d1658

mcarilli self-requested a review June 15, 2021 01:08

mcarilli added the module: cuda Related to torch.cuda, and CUDA support in general label Jun 15, 2021

pytorchbot added the open source label Jun 15, 2021

mcarilli requested a review from ngimel June 15, 2021 05:17

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 15, 2021

facebook-github-bot added the cla signed label Jun 16, 2021

ngimel approved these changes Jun 17, 2021

View reviewed changes

facebook-github-bot closed this in bcf8752 Jun 18, 2021

facebook-github-bot added the Merged label Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated launch bounds for trilinear 3d #59999

updated launch bounds for trilinear 3d #59999

Fuzzkatt commented Jun 15, 2021 •

edited

Loading

facebook-github-bot commented Jun 15, 2021

facebook-github-bot commented Jun 15, 2021 •

edited

Loading

facebook-github-bot commented Jun 16, 2021

ngimel commented Jun 17, 2021

facebook-github-bot commented Jun 17, 2021

mcarilli commented Jun 17, 2021

facebook-github-bot commented Jun 18, 2021

updated launch bounds for trilinear 3d #59999

updated launch bounds for trilinear 3d #59999

Conversation

Fuzzkatt commented Jun 15, 2021 • edited Loading

facebook-github-bot commented Jun 15, 2021

Action Required

Process

facebook-github-bot commented Jun 15, 2021 • edited Loading

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_macos_10_13_py3_test (1/4)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (3/4)

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

facebook-github-bot commented Jun 16, 2021

ngimel commented Jun 17, 2021

facebook-github-bot commented Jun 17, 2021

mcarilli commented Jun 17, 2021

facebook-github-bot commented Jun 18, 2021

Fuzzkatt commented Jun 15, 2021 •

edited

Loading

facebook-github-bot commented Jun 15, 2021 •

edited

Loading