-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[libc++] Optimize std::exception_ptr
#162773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[libc++] Optimize std::exception_ptr
#162773
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
0c0fde7
to
3815568
Compare
@llvm/pr-subscribers-libcxx Author: Adrian Vogelsgesang (vogelsgesang) ChangesThis commit optimizes the performance for To do so, we use 3 high-level approaches:
Those optimizations were implemented for the libc++abi, libsupc++ and libstdc++ ABIs. Fixes #44892 PerformanceWith this change, the compiler can now completely constant-fold https://godbolt.org/z/NaNKe5. Also in cases where the compiler cannot statically prove that exception_ptr is empty, it can at least generate a fast-path check if the exception_ptr is empty, without calling into the library. ABI compatibilityWe use a new visibility macro Moving implementations to the headerTo move the implementation to the header, we must know the selected LIBCXX_CXX_ABI also in the headers. For that purpose, the LIBCXX_CXX_ABI configuration is now exposed via While the Microsoft ABI and the "unimplemented" APIs do not benefit from this optimizations, I also moved their implementation to the headers for more uniformity. Mid-term, we probably have to expose all implementations in the header, anyway, because P3068R6 mandates all methods of Unifying libc++abi, libsupc++ and
|
This should be ready for a first round of reviews now :) Pre-commit tests finally look good. The errors in buildkite seem to be infrastructure issues. Same for macos, which has been queued for 2+ hours now. |
This commit optimizes the performance for `std::exception_ptr` for an empty exception objects. To do so, we use 3 high-level approaches: 1. Moving the implementation from the libc++ library into the libc++ headers, thereby allowing the compiler to inline the function bodies. 2. Adding fast paths to the (now inlineable) bodies, checking for the empty case. 3. Adding move constuctor, assignment and `swap` Those optimizations were implemented for the libc++abi, libsupc++ and libstdc++ ABIs. Fixes #XXX Performance ----------- With this change, the compiler can now completely constant-fold https://godbolt.org/z/NaNKe5. Also in cases where the compiler cannot statically prove that exception_ptr is empty, it can at least generate a fast-path check if the exception_ptr is empty, without calling into the library. ABI compatibility ----------------- We use a new visibility macro `_LIBCPP_EXPORTED_FROM_ABI_INLINEABLE` to ensure that the functions which are now declared in the header are still exported by the library. See the description in `VisibilityMacros.rst` for details. This approach was originally proposed by Nikolas Klauser in https://reviews.llvm.org/D122536 Moving implementations to the header ------------------------------------ To move the implementation to the header, we must know the selected LIBCXX_CXX_ABI also in the headers. For that purpose, the LIBCXX_CXX_ABI configuration is now exposed via `__config_site`. While the Microsoft ABI and the "unimplemented" APIs do not benefit from this optimizations, I also moved their implementation to the headers for more uniformity. Mid-term, we probably have to expose all implementations in the header, anyway, because P3068R6 mandates all methods of `exception_ptr` to be constexpr. Unifying libc++abi, libsupc++ and `none` ----------------------------------------------- Both libc++abi and libsupc++ are reference-counted. The primary difference is the function called for ref-counting: * libc++api uses `__cxa_{in,de}crement_exception_refcount` * libsupc++ uses `__exception_ptr::_M_{addref,release}` This commit factors out the common reference-counting logic into a shared header. Our libsupc++ implementation so far did not take advantage of `_M_addref`/`_M_release`. For the performance benefits of this PR it was necessary to start using them. The same reference-counted approach is also reused for the `none`/`unimplemented` implementation. This has the side effect, that users can now use empty `exception_ptr`s even in the `none` ABI. The abort is only triggered for non-empty `exception_ptr`s. Given this simplifies the code, I think change is acceptable. Unifying nested_exception ------------------------- The implementation for `nested_exception` was effectively identical across all ABIs. For most ABIs, the source code was literally identical. There were only two deviations: * For libsupc++ or libstdc++, we did not use define the `~nested_exception` destructor. * The abort in `nested_exception::rethrow_nested` was unnecessary as the same abort is also present in the `rethrow_exception` implementation. As such, we were able to simply remove that special casing. The implementation is now unified directly in the `__nested_exception.h` header. Standard conformance -------------------- The available constructors, operators and methods of `exception_ptr` is not specified by the standard. As such, adding the move constructor and the assignment operator are standard conformant. libstdc++ made the same decision and also provides those headers.
3815568
to
484cfaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a very aggressive approach, and I'm not convinced it's actually necessary. Can we split this up into a few patches so we can determine better what parts are actually important and which ones we can drop?
I would suggest the following:
- Introduce move ctor/assign and possibly swap - this seems like a rather obvious improvement and should be in no way controversial (the swap may or may not actually be necessary, we'll have to check).
- inline trivial functions into the headers.
- consider whether we can add any attributes to allow improved code gen
- do the inlining of the rest of the functions you want, assuming you still think it's worth the cost.
|
||
namespace std { | ||
|
||
_LIBCPP_HIDE_FROM_ABI _LIBCPP_ALWAYS_INLINE inline void exception_ptr::__increment_refcount(void* __ptr) _NOEXCEPT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't spray always_inline everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only added it to exception_ptr::__{increment,decrement}_refcount
since this was a net-new indirection which I introduced also into already pre-existing functions (such as exception_ptr::~exception_ptr
) and I considered it to be less risky if I mark this indirection with _LIBCPP_ALWAYS_INLINE
to make sure the compiler actually removes that indirection again.
That being said, I guess I was overly careful here. I will remove it
exception_ptr& operator=(nullptr_t) _NOEXCEPT; | ||
~exception_ptr() _NOEXCEPT; | ||
explicit operator bool() const _NOEXCEPT; | ||
_LIBCPP_EXPORTED_FROM_LIB_INLINEABLE exception_ptr() _NOEXCEPT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this new macro is necessary. IIUC these functions are supposed to be inlined, so there isn't much point in providing an external version (except for ABI of course, but we don't need to use the dylib functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full disclosure: This isn't actually my idea. It was yours (modulo potential misunderstandings on my side). It was originally proposed back in April 2022 in https://reviews.llvm.org/D122536. Happy to adjust it however you prefer
IIUC these functions are supposed to be inlined, so there isn't much point in providing an external version
correct
except for ABI of course
How should we ensure that the functions are still emitted as part of the dylib? Currently I rely on _LIBCPP_EXPORTED_FROM_LIB_INLINEABLE
to be marked inline
in the headers but as non-inline in the dylib. By marking it non-inline in the dylib, I ensure that the compiler actually emits code for it. If I would mark them inline
also in the dylib, the compiler wouldn't actually emit any code, since the functions are never called / instantiated by the dylib
except for ABI of course but we don't need to use the dylib functions
Right, I think we can let the compiler choose freely to either use the inlined or the dylib-provided version. This means we might not need [[gnu::inline]]
/ __attribute__((__gnu_inline__))
.
I used it primarily because you proposed its usage back in April 2022 in https://reviews.llvm.org/D122536. It provides a slight benefit that the linker has less work to do (it doesn't have to discard many duplicated definitions of the same inline function). Not sure if we consider this to be worth it, though 🤷
Thanks for your review! Splitting the PR
Happy to split it up 🙂 I would suggest a slightly different splitting across the commits, though:
Are we aligned at least for steps (1)-(4)? If so I will go ahead and open the corresponding PRs for those steps right away
Scratch that - I thought about this incorrectly. I will just mark the move ctor/assign/swap as inline from the get-go, and I thereby won't run into that problem Attributes for improving codegen
I am not quite sure what that step would actually entail. I am not currently aware of any specific attributes which would improve codegen. Do you have any specific attributes in mind? Inlining of
Inlining the destructor, move assignment etc. will be necessary for performance. I inlined
WDYT about those points? Does it make sense to inline |
This commit optimizes the performance for
std::exception_ptr
for an empty exception objects.To do so, we use 3 high-level approaches:
swap
Those optimizations were implemented for the libc++abi, libsupc++ and libstdc++ ABIs.
Fixes #44892
Performance
With this change, the compiler can now completely constant-fold https://godbolt.org/z/dvedE375r. Also in cases where the compiler cannot statically prove that exception_ptr is empty, it can at least generate a fast-path check if the exception_ptr is empty, without calling into the library.
ABI compatibility
We use a new visibility macro
_LIBCPP_EXPORTED_FROM_ABI_INLINEABLE
to ensure that the functions which are now declared in the header are still exported by the library. See the description inVisibilityMacros.rst
for details. This approach was originally proposed by Nikolas Klauser in https://reviews.llvm.org/D122536Moving implementations to the header
To move the implementation to the header, we must know the selected LIBCXX_CXX_ABI also in the headers. For that purpose, the LIBCXX_CXX_ABI configuration is now exposed via
__config_site
.While the Microsoft ABI and the "unimplemented" APIs do not benefit from this optimizations, I also moved their implementation to the headers for more uniformity. Mid-term, we probably have to expose all implementations in the header, anyway, because P3068R6 mandates all methods of
exception_ptr
to be constexpr.Unifying libc++abi, libsupc++ and
none
Both libc++abi and libsupc++ are reference-counted. The primary difference is the function called for ref-counting:
__cxa_{in,de}crement_exception_refcount
__exception_ptr::_M_{addref,release}
This commit factors out the common reference-counting logic into a shared header.
Our libsupc++ implementation so far did not take advantage of
_M_addref
/_M_release
. For the performance benefits of this PR it was necessary to start using them.The same reference-counted approach is also reused for the
none
/unimplemented
implementation. This has the side effect, that users can now use emptyexception_ptr
s even in thenone
ABI. The abort is only triggered for non-emptyexception_ptr
s. Given this simplifies the code, I think change is acceptable.Unifying nested_exception
The implementation for
nested_exception
was effectively identical across all ABIs. For most ABIs, the source code was literally identical. There were only two deviations:~nested_exception
destructor.nested_exception::rethrow_nested
was unnecessary as the same abort is also present in therethrow_exception
implementation. As such, we were able to simply remove that special casing.The implementation is now unified directly in
src/exception.cpp
Standard conformance
The available constructors, operators and methods of
exception_ptr
are not specified by the standard. As such, adding the move constructor and the assignment operator are standard conformant. libstdc++ made the same decision and also provides move constructor, assignment and swap.