Reduce overhead when Future invokes callbacks inline #57638

lw · 2021-05-05T14:57:50Z

Stack from ghstack:

Fix race condition in TP agent #58753 Fix race condition in TP agent
Make Future always extract DataPtrs in RPC tests #58675 Make Future always extract DataPtrs in RPC tests
Ensure async_execution works with CUDAFuture #56863 Ensure async_execution works with CUDAFuture
Avoid re-doing CUDA stream sync in OwnerRRef #57355 Avoid re-doing CUDA stream sync in OwnerRRef
Make TP agent use streams from Future when sending response #58428 Make TP agent use streams from Future when sending response
Set and propagate devices in RRef completion future #58674 Set and propagate devices in RRef completion future
Set streams when invoking UDFs #58427 Set streams when invoking UDFs
Create CUDA-aware futures in RequestCallback #58426 Create CUDA-aware futures in RequestCallback
Provide pre-extracted DataPtrs when completing a Future with a Message #58425 Provide pre-extracted DataPtrs when completing a Future with a Message
Allow Future::then to return pre-extracted DataPtrs #58424 Allow Future::then to return pre-extracted DataPtrs
Always use intrusive_ptr for Message (2 out of 2) #58423 Always use intrusive_ptr for Message (2 out of 2)
Always use intrusive_ptr for Message (1 out of 2) #58422 Always use intrusive_ptr for Message (1 out of 2)
Prevent using anything other than intrusive_ptr for Future #58421 Prevent using anything other than intrusive_ptr for Future
Migrate remaining shared_ptr<Future> to intrusive_ptr #58420 Migrate remaining shared_ptr to intrusive_ptr
Make remaining autograd methods return futures #57861 Make remaining autograd methods return futures
Make remaining RRef methods return futures #57860 Make remaining RRef methods return futures
Unify fetching RRefs #57859 Unify fetching RRefs
Deduplicate Python object serialization #57858 Deduplicate Python object serialization
Simplify process(Script|Python)(Remote)?Call #57857 Simplify process(Script|Python)(Remote)?Call
Unify assignment of OwnerRRef result #57856 Unify assignment of OwnerRRef result
Make processPythonExecution return a future #57855 Make processPythonExecution return a future
Remove getScriptRemoteCallType #57854 Remove getScriptRemoteCallType
Simplify OwnerRRef completion #57853 Simplify OwnerRRef completion
Unify async execution for JIT functions #57852 Unify async execution for JIT functions
Unify invoking JIT functions #57851 Unify invoking JIT functions
Unify invoking JIT operands #57850 Unify invoking JIT operands
Make some methods of RequestCallback return void instead of bool #57849 Make some methods of RequestCallback return void instead of bool
Centralize setting messageId in RequestCallback #57848 Centralize setting messageId in RequestCallback
Make RequestCallback collect Futures from methods, rather than providing them #57847 Make RequestCallback collect Futures from methods, rather than providing them
Add helpers to manipulate futures #57846 Add helpers to manipulate futures
Reduce overhead when Future invokes callbacks inline #57638 Reduce overhead when Future invokes callbacks inline
Introduce thenAsync method on Future #57637 Introduce thenAsync method on Future

In RPC there are a few instances of "fastpaths" which do if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because addCallback invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a std::function. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking addCallback (and similar methods) so they can handle raw lambdas, and so that they do not wrap them into std::functions if they are invoked inline. In other words, if the compiler were to inline this new version of addCallback it would obtain the exact same code as that explicit fastpath.

Differential Revision: D28222808

In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/) [ghstack-poisoned]

facebook-github-bot · 2021-05-05T14:58:05Z

💊 CI failures summary and remediations

As of commit 4bab8db (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

May 21 12:18:12 torch_xla/csrc/aten_xla_type.cp... match any declaration in 'torch_xla::AtenXlaType'

May 21 12:17:58 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/aten_xla_type_default.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/aten_xla_type_default.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:00 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/aten_xla_type.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/aten_xla_type.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:04 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/batch_norm.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/batch_norm.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:11 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/reduction.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/reduction.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:12 torch_xla/csrc/aten_xla_type.cpp:1238:25: error: out-of-line definition of 'div' does not match any declaration in 'torch_xla::AtenXlaType'
May 21 12:18:12 at::Tensor AtenXlaType::div(const at::Tensor& self, const at::Tensor& other,
May 21 12:18:12                         ^~~
May 21 12:18:12 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:35:74: note: type of 3rd parameter of member declaration does not match definition ('optional<c10::string_view>' vs 'optional<std::string>')
May 21 12:18:12 static at::Tensor div(const at::Tensor & self, const at::Tensor & other, c10::optional<c10::string_view> rounding_mode);
May 21 12:18:12                                                                          ^
May 21 12:18:12 torch_xla/csrc/aten_xla_type.cpp:1257:26: error: out-of-line definition of 'div_' does not match any declaration in 'torch_xla::AtenXlaType'
May 21 12:18:12 at::Tensor& AtenXlaType::div_(at::Tensor& self, const at::Tensor& other,
May 21 12:18:12                          ^~~~
May 21 12:18:12 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:84:71: note: type of 3rd parameter of member declaration does not match definition ('optional<c10::string_view>' vs 'optional<std::string>')
May 21 12:18:12 static at::Tensor & div_(at::Tensor & self, const at::Tensor & other, c10::optional<c10::string_view> rounding_mode);
May 21 12:18:12                                                                       ^
May 21 12:18:15 2 errors generated.
May 21 12:18:15 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/matrix.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/matrix.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:23 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/pooling.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/pooling.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 21 12:18:23 1 warning generated.
May 21 12:18:23 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/aten_xla_bridge.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/aten_xla_bridge.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_build is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/) ghstack-source-id: 128188696 Pull Request resolved: #57638

In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. In other words, if the compiler were to inline this new version of `addCallback` it would obtain the _exact_ same code as that explicit fastpath. Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/) [ghstack-poisoned]

Pull Request resolved: #57638 In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. ghstack-source-id: 128297741 Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/)

In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. In other words, if the compiler were to inline this new version of `addCallback` it would obtain the _exact_ same code as that explicit fastpath. Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/) [ghstack-poisoned]

mrshenli

(can be done in followup PRs): shall we add some test to verify that the new code throws correct error when the signature is not expected?

lw · 2021-05-18T07:57:35Z

shall we add some test to verify that the new code throws correct error when the signature is not expected?

It's a static assert, hence an incorrect signature will cause a compile error. I don't know how we can test that?

In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. In other words, if the compiler were to inline this new version of `addCallback` it would obtain the _exact_ same code as that explicit fastpath. Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/) [ghstack-poisoned]

Pull Request resolved: pytorch#57638 In RPC there are a few instances of "fastpaths" which do `if (fut.isCompleted()) { do_sth(); } else { fut.addCallback(do_sth); }`. I intend to get rid of them, for reasons I'll clarify later but which in a nutshell have to do with CUDA correctness and readability. Note that dropping the fastpath introduces no change in behavior (because `addCallback` invokes the callback inline anyways), thus the only perf concern comes from the fact that the fastpath avoids constructing and passing around a `std::function`. I don't think this is a significant performance hit. Regardless, this PR preemptively addresses this concern, by tweaking `addCallback` (and similar methods) so they can handle raw lambdas, and so that they do _not_ wrap them into `std::function`s if they are invoked inline. ghstack-source-id: 129567067 Differential Revision: [D28222808](https://our.internmc.facebook.com/intern/diff/D28222808/)

facebook-github-bot · 2021-05-21T20:15:56Z

This pull request has been merged in 1d7cf4b.

facebook-github-bot added the cla signed label May 5, 2021

This was referenced May 5, 2021

Make wrapPropagateTLSState more generic #57634

Closed

Pass reference to parent future in callbacks #57635

Closed

Migrate from shared_ptr to intrusive_ptr for Future #57636

Closed

Introduce thenAsync method on Future #57637

Closed

lw mentioned this pull request May 5, 2021

Ensure there are no uninvoked callbacks when Futures are destroyed #57643

Closed

lw added 2 commits May 17, 2021 03:53

mrshenli approved these changes May 17, 2021

View reviewed changes

lw added 3 commits May 18, 2021 02:35

This was referenced May 20, 2021

Set and propagate devices in RRef completion future #58674

Closed

Avoid re-doing CUDA stream sync in OwnerRRef #57355

Closed

Ensure async_execution works with CUDAFuture #56863

Closed

Make Future always extract DataPtrs in RPC tests #58675

Closed

lw mentioned this pull request May 21, 2021

Fix race condition in TP agent #58753

Closed

facebook-github-bot closed this in 1d7cf4b May 21, 2021

facebook-github-bot added the Merged label May 21, 2021

facebook-github-bot deleted the gh/lw/158/head branch May 25, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce overhead when Future invokes callbacks inline #57638

Reduce overhead when Future invokes callbacks inline #57638

Uh oh!

lw commented May 5, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented May 5, 2021 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

lw commented May 18, 2021

Uh oh!

facebook-github-bot commented May 21, 2021

Uh oh!

Uh oh!

Reduce overhead when Future invokes callbacks inline #57638

Reduce overhead when Future invokes callbacks inline #57638

Uh oh!

Conversation

lw commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_build (1/1)

XLA failure

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

lw commented May 18, 2021

Uh oh!

facebook-github-bot commented May 21, 2021

Uh oh!

Uh oh!

lw commented May 5, 2021 •

edited

Loading

facebook-github-bot commented May 5, 2021 •

edited

Loading