-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Speed up classic locale (take 2) #73533
Conversation
@llvm/pr-subscribers-libcxx Author: Louis Dionne (ldionne) ChangesTechnically, it is possible for the ios initializers to run before the initializer of classic_locale_imp_, which would cause This patch fixes that by making the variable trivially constinit, since we're not doing anything in its constructor anyway. Full diff: https://github.com/llvm/llvm-project/pull/73533.diff 2 Files Affected:
diff --git a/libcxx/include/__utility/no_destroy.h b/libcxx/include/__utility/no_destroy.h
index c8581cf9606b571..fc936d2d287316a 100644
--- a/libcxx/include/__utility/no_destroy.h
+++ b/libcxx/include/__utility/no_destroy.h
@@ -28,22 +28,23 @@ struct __uninitialized_tag {};
// initialization using __emplace.
template <class _Tp>
struct __no_destroy {
- _LIBCPP_HIDE_FROM_ABI explicit __no_destroy(__uninitialized_tag) {}
- _LIBCPP_HIDE_FROM_ABI ~__no_destroy() {
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI explicit __no_destroy(__uninitialized_tag) {}
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI ~__no_destroy() {
// nothing
}
template <class... _Args>
- _LIBCPP_HIDE_FROM_ABI explicit __no_destroy(_Args&&... __args) : __obj_(std::forward<_Args>(__args)...) {}
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI explicit __no_destroy(_Args&&... __args)
+ : __obj_(std::forward<_Args>(__args)...) {}
template <class... _Args>
- _LIBCPP_HIDE_FROM_ABI _Tp& __emplace(_Args&&... __args) {
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI _Tp& __emplace(_Args&&... __args) {
new (&__obj_) _Tp(std::forward<_Args>(__args)...);
return __obj_;
}
- _LIBCPP_HIDE_FROM_ABI _Tp& __get() { return __obj_; }
- _LIBCPP_HIDE_FROM_ABI _Tp const& __get() const { return __obj_; }
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI _Tp& __get() { return __obj_; }
+ _LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI _Tp const& __get() const { return __obj_; }
private:
union {
diff --git a/libcxx/src/locale.cpp b/libcxx/src/locale.cpp
index 2cb75f81b36da33..d6ba490a00a1109 100644
--- a/libcxx/src/locale.cpp
+++ b/libcxx/src/locale.cpp
@@ -546,7 +546,7 @@ locale::__imp::use_facet(long id) const
// expensive in parallel applications. The classic locale is used by default
// in all streams. Note: if a new global locale is installed, then we lose
// the benefit of no reference counting.
-__no_destroy<locale::__imp> locale::__imp::classic_locale_imp_(__uninitialized_tag{}); // initialized below in classic()
+constinit __no_destroy<locale::__imp> locale::__imp::classic_locale_imp_(__uninitialized_tag{}); // initialized below in classic()
const locale& locale::classic() {
static const __no_destroy<locale> classic_locale(__private_tag{}, [] {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM!
Locale objects use atomic reference counting, which may be very expensive in parallel applications. The classic locale is used by default by all streams and can be very contended. But it's never destroyed, so the reference counting is also completely pointless on the classic locale. Currently ~70% of time in the parallel stringstream benchmarks is spent in locale ctor/dtor. And the execution radically slows down with more threads. Avoid reference counting on the classic locale. With this change parallel benchmarks start to scale with threads. This is a re-application of f8afc53 (aka PR llvm#72112) which was reverted in 4e0c48b because it broke the sanitizer builds due to an initialization order fiasco. This issue has now been fixed by ensuring that the locale is constinit'ed. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Re-purposing this PR to re-apply #72112 which was reverted. |
1e35a46
to
700678c
Compare
[libc++] Speed up classic locale (#73533)
Locale objects use atomic reference counting, which may be very
expensive in parallel applications. The classic locale is used by
default by all streams and can be very contended. But it's never
destroyed, so the reference counting is also completely pointless on the
classic locale. Currently ~70% of time in the parallel stringstream
benchmarks is spent in locale ctor/dtor. And the execution radically
slows down with more threads.
Avoid reference counting on the classic locale. With this change
parallel benchmarks start to scale with threads.
This is a re-application of f8afc53 (aka PR #72112) which was
reverted in 4e0c48b because it broke the sanitizer builds due
to an initialization order fiasco. This issue has now been fixed by
ensuring that the locale is constinit'ed.