-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Speed up compilation of common uses of std::visit() by ~8x #164196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Speed up compilation of common uses of std::visit() by ~8x #164196
Conversation
@llvm/pr-subscribers-libcxx Author: None (higher-performance) Changes
This PR hard-codes common cases to speed up compilation by roughly ~8x for them. Full diff: https://github.com/llvm/llvm-project/pull/164196.diff 1 Files Affected:
diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..ef5bca4c2fda0 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1578,11 +1578,48 @@ _LIBCPP_HIDE_FROM_ABI constexpr void __throw_if_valueless(_Vs&&... __vs) {
}
}
-template < class _Visitor, class... _Vs, typename>
-_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor, _Vs&&... __vs) {
- using __variant_detail::__visitation::__variant;
- std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
- return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
+template <class _Visitor, class... _Vs, typename>
+_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor,
+ _Vs&&... __vs) {
+#define _XDispatchIndex(_I) \
+ case _I: \
+ if constexpr (__variant_size::value > _I) { \
+ return __visitor( \
+ __variant::__get_alt<_I>(std::forward<_Vs>(__vs)...).__value); \
+ } \
+ [[__fallthrough__]]
+#define _XDispatchMax 7 // Speed up compilation for the common cases
+ if constexpr (sizeof...(_Vs) == 1) {
+ if constexpr (variant_size<__remove_cvref_t<_Vs>...>::value <=
+ _XDispatchMax) {
+ using __variant_detail::__access::__variant;
+ using __variant_size = variant_size<__remove_cvref_t<_Vs>...>;
+ const size_t __indexes[] = {__vs.index()...};
+ switch (__indexes[0]) {
+ _XDispatchIndex(_XDispatchMax - 7);
+ _XDispatchIndex(_XDispatchMax - 6);
+ _XDispatchIndex(_XDispatchMax - 5);
+ _XDispatchIndex(_XDispatchMax - 4);
+ _XDispatchIndex(_XDispatchMax - 3);
+ _XDispatchIndex(_XDispatchMax - 2);
+ _XDispatchIndex(_XDispatchMax - 1);
+ _XDispatchIndex(_XDispatchMax - 0);
+ default:
+ __throw_bad_variant_access();
+ }
+ } else {
+ static_assert(
+ variant_size<__remove_cvref_t<_Vs>...>::value > _XDispatchMax,
+ "forgot to add dispatch case");
+ }
+ } else {
+ using __variant_detail::__visitation::__variant;
+ std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
+ return __variant::__visit_value(std::forward<_Visitor>(__visitor),
+ std::forward<_Vs>(__vs)...);
+ }
+#undef _XDispatchMax
+#undef _XDispatchIndex
}
# if _LIBCPP_STD_VER >= 20
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
6cfb3c0
to
cc3f6cd
Compare
using __variant_detail::__visitation::__variant; | ||
std::__throw_if_valueless(std::forward<_Vs>(__vs)...); | ||
return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...); | ||
# define _XDispatchIndex(_I) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that we can use the same technique for the visit<R>
overload added in C++20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I deliberately avoided doing more work to make it easier to review and get feedback first. It adds edge cases to testing so I'm not sure how folks feel about it.
Looks like I got some bugs, let me fix them, sorry. |
59fa7b6
to
350f45c
Compare
350f45c
to
7c3c8e6
Compare
std::visit
on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:This PR hard-codes common cases to speed up compilation by roughly ~8x for them.