Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated launch bounds for trilinear 3d #59999

Closed

Conversation

Fuzzkatt
Copy link
Collaborator

@Fuzzkatt Fuzzkatt commented Jun 15, 2021

Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck).

Timing data: (Using Nvidia Titan-V GPU)
TrilinearTimingData

@facebook-github-bot
Copy link
Contributor

Hi @Fuzzkatt!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 15, 2021

💊 CI failures summary and remediations

As of commit 59d1658 (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 15 04:26:00 [E request_callback_no_python.c...quest type 267: Unexpected end of pickler archive.
Jun 15 04:26:00 frame #9: torch::distributed::rpc::RRefUserDelete::fromMessage(torch::distributed::rpc::Message const&) + 40 (0x11f002a98 in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #10: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 191 (0x11f00cadf in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #11: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 78 (0x11efd817e in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #12: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 60 (0x11efd809c in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #13: std::__1::__function::__func<torch::distributed::rpc::TensorPipeAgent::respond(std::__1::shared_ptr<tensorpipe::Pipe>&)::$_7::operator()(tensorpipe::Error const&, c10::intrusive_ptr<torch::distributed::rpc::Message, c10::detail::intrusive_target_default_null_type<torch::distributed::rpc::Message> >, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >)::'lambda'(), std::__1::allocator<torch::distributed::rpc::TensorPipeAgent::respond(std::__1::shared_ptr<tensorpipe::Pipe>&)::$_7::operator()(tensorpipe::Error const&, c10::intrusive_ptr<torch::distributed::rpc::Message, c10::detail::intrusive_target_default_null_type<torch::distributed::rpc::Message> >, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >)::'lambda'()>, void ()>::operator()() + 323 (0x11af23cc3 in libtorch_python.dylib)
Jun 15 04:26:00 frame #14: c10::ThreadPool::main_loop(unsigned long) + 569 (0x118171729 in libc10.dylib)
Jun 15 04:26:00 frame #15: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x118171dd3 in libc10.dylib)
Jun 15 04:26:00 frame #16: _pthread_start + 148 (0x7fff6a8fc109 in libsystem_pthread.dylib)
Jun 15 04:26:00 frame #17: thread_start + 15 (0x7fff6a8f7b8b in libsystem_pthread.dylib)
Jun 15 04:26:00 
Jun 15 04:26:00 [E request_callback_no_python.cpp:552] Received error while processing request type 267: Unexpected end of pickler archive.
Jun 15 04:26:00 Exception raised from readSlowWithBuffer at ../torch/csrc/jit/serialization/unpickler.cpp:756 (most recent call first):
Jun 15 04:26:00 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x11817fa12 in libc10.dylib)
Jun 15 04:26:00 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 205 (0x11817e30d in libc10.dylib)
Jun 15 04:26:00 frame #2: torch::jit::Unpickler::readSlowWithBuffer(char*, unsigned long) + 285 (0x11ecb7c3d in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #3: torch::jit::Unpickler::run() + 127 (0x11ecae30f in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #4: torch::jit::Unpickler::parse_ivalue() + 29 (0x11ecae13d in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #5: torch::jit::unpickle(std::__1::function<unsigned long (char*, unsigned long)>, std::__1::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 289 (0x11ec841a1 in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #6: torch::jit::unpickle(char const*, unsigned long, std::__1::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 206 (0x11ec8430e in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #7: torch::distributed::rpc::(anonymous namespace)::toIValues(torch::distributed::rpc::Message const&, torch::distributed::rpc::MessageType) + 220 (0x11f00167c in libtorch_cpu.dylib)
Jun 15 04:26:00 frame #8: torch::distributed::rpc::ForkMessageBase::fromMessage(torch::distributed::rpc::Message const&, torch::distributed::rpc::MessageType) + 25 (0x11f001d29 in libtorch_cpu.dylib)

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (3/4)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jun 15 01:19:59 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Jun 15 01:19:59 ++++ extract_trap_cmd
Jun 15 01:19:59 ++++ printf '%s\n' ''
Jun 15 01:19:59 +++ printf '%s\n' cleanup
Jun 15 01:19:59 ++ trap -- '
Jun 15 01:19:59 cleanup' EXIT
Jun 15 01:19:59 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
Jun 15 01:19:59 ++ which sccache
Jun 15 01:19:59 ++ sccache --stop-server
Jun 15 01:19:59 ++ true
Jun 15 01:19:59 ++ rm /var/lib/jenkins/sccache_error.log
Jun 15 01:19:59 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Jun 15 01:19:59 ++ true
Jun 15 01:19:59 ++ [[ -n '' ]]
Jun 15 01:19:59 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
Jun 15 01:19:59 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Jun 15 01:19:59 ++ SCCACHE_IDLE_TIMEOUT=1200
Jun 15 01:19:59 ++ RUST_LOG=sccache::server=error
Jun 15 01:19:59 ++ sccache --start-server
Jun 15 01:19:59 sccache: Starting the server...
Jun 15 01:19:59 ++ sccache --zero-stats
Jun 15 01:19:59 Compile requests                      0

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_definitions.py
Auto-merging .circleci/cimodel/data/pytorch_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


1 job timed out:

  • pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@mcarilli mcarilli self-requested a review June 15, 2021 01:08
@mcarilli mcarilli added the module: cuda Related to torch.cuda, and CUDA support in general label Jun 15, 2021
@mcarilli mcarilli requested a review from ngimel June 15, 2021 05:17
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 15, 2021
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@ngimel
Copy link
Collaborator

ngimel commented Jun 17, 2021

So looks like backward keeps spilling even after the fix?

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mcarilli
Copy link
Collaborator

So looks like backward keeps spilling even after the fix?

We're not sure if backward is still spilling or if it's bound by something else (ie atomicAdds).
@Fuzzkatt can you confirm nvcc doesn't report register spilling for the backward kernel after this change?

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in bcf8752.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants