Skip to content

Conversation

higher-performance
Copy link
Contributor

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include <variant>

int main(int argc, char* argv[]) {
  std::variant<char, unsigned char, int> v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.

@higher-performance higher-performance requested a review from a team as a code owner October 20, 2025 02:00
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2025

@llvm/pr-subscribers-libcxx

Author: None (higher-performance)

Changes

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include &lt;variant&gt;

int main(int argc, char* argv[]) {
  std::variant&lt;char, unsigned char, int&gt; v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.


Full diff: https://github.com/llvm/llvm-project/pull/164196.diff

1 Files Affected:

  • (modified) libcxx/include/variant (+42-5)
diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..ef5bca4c2fda0 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1578,11 +1578,48 @@ _LIBCPP_HIDE_FROM_ABI constexpr void __throw_if_valueless(_Vs&&... __vs) {
   }
 }
 
-template < class _Visitor, class... _Vs, typename>
-_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor, _Vs&&... __vs) {
-  using __variant_detail::__visitation::__variant;
-  std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
-  return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
+template <class _Visitor, class... _Vs, typename>
+_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor,
+                                                     _Vs&&... __vs) {
+#define _XDispatchIndex(_I)                                              \
+  case _I:                                                               \
+    if constexpr (__variant_size::value > _I) {                          \
+      return __visitor(                                                  \
+          __variant::__get_alt<_I>(std::forward<_Vs>(__vs)...).__value); \
+    }                                                                    \
+    [[__fallthrough__]]
+#define _XDispatchMax 7 // Speed up compilation for the common cases
+  if constexpr (sizeof...(_Vs) == 1) {
+    if constexpr (variant_size<__remove_cvref_t<_Vs>...>::value <=
+                  _XDispatchMax) {
+      using __variant_detail::__access::__variant;
+      using __variant_size = variant_size<__remove_cvref_t<_Vs>...>;
+      const size_t __indexes[] = {__vs.index()...};
+      switch (__indexes[0]) {
+        _XDispatchIndex(_XDispatchMax - 7);
+        _XDispatchIndex(_XDispatchMax - 6);
+        _XDispatchIndex(_XDispatchMax - 5);
+        _XDispatchIndex(_XDispatchMax - 4);
+        _XDispatchIndex(_XDispatchMax - 3);
+        _XDispatchIndex(_XDispatchMax - 2);
+        _XDispatchIndex(_XDispatchMax - 1);
+        _XDispatchIndex(_XDispatchMax - 0);
+        default:
+          __throw_bad_variant_access();
+      }
+    } else {
+      static_assert(
+          variant_size<__remove_cvref_t<_Vs>...>::value > _XDispatchMax,
+          "forgot to add dispatch case");
+    }
+  } else {
+    using __variant_detail::__visitation::__variant;
+    std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
+    return __variant::__visit_value(std::forward<_Visitor>(__visitor),
+                                    std::forward<_Vs>(__vs)...);
+  }
+#undef _XDispatchMax
+#undef _XDispatchIndex
 }
 
 #    if _LIBCPP_STD_VER >= 20

Copy link

github-actions bot commented Oct 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

using __variant_detail::__visitation::__variant;
std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
# define _XDispatchIndex(_I) \
Copy link
Contributor

@frederick-vs-ja frederick-vs-ja Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that we can use the same technique for the visit<R> overload added in C++20.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I deliberately avoided doing more work to make it easier to review and get feedback first. It adds edge cases to testing so I'm not sure how folks feel about it.

@higher-performance higher-performance marked this pull request as draft October 20, 2025 03:11
@higher-performance
Copy link
Contributor Author

Looks like I got some bugs, let me fix them, sorry.

@higher-performance higher-performance force-pushed the variant-compile-speedup branch 7 times, most recently from 59fa7b6 to 350f45c Compare October 20, 2025 06:39
@higher-performance higher-performance marked this pull request as ready for review October 20, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants