-
Notifications
You must be signed in to change notification settings - Fork 11.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Optimize string operator[] for known large inputs #69500
Conversation
@llvm/pr-subscribers-libcxx Author: Ilya Tocar (TocarIP) ChangesIf we know that index is larger than SSO size, we know that we can't be in SSO case, and should access the pointer. This removes extra check from operator[] for inputs known at compile time to be larger than SSO. Full diff: https://github.com/llvm/llvm-project/pull/69500.diff 1 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index 91935162f02383a..cf9f0c847eb43af 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1198,11 +1198,17 @@ public:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_reference operator[](size_type __pos) const _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__pos <= size(), "string index out of bounds");
+ if (__builtin_constant_p(__pos) && !__fits_in_sso(__pos)) {
+ return *(__get_long_pointer() + __pos);
+ }
return *(data() + __pos);
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 reference operator[](size_type __pos) _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__pos <= size(), "string index out of bounds");
+ if (__builtin_constant_p(__pos) && !__fits_in_sso(__pos)) {
+ return *(__get_long_pointer() + __pos);
+ }
return *(__get_pointer() + __pos);
}
|
ac053b7
to
f67d1cf
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
If we know that index is larger than SSO size, we know that we can't be in SSO case, and should access the pointer. This removes extra check from operator[] for inputs known at compile time to be larger than SSO.
f67d1cf
to
90bfa6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you state how this optimization was measured?
I've looked at disassembly and verified that __is_long branch disappeared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. I tried finding a downside and I couldn't. In general I'm not a huge fan of adding complexity like this, but in this case we're basically leveraging information known at compile-time (when it is known) to generate better code, and we do that in one of the most important types in the library. So I feel like anything reasonable that improves performance here is welcome.
In particular, I suspect this might make it easier for the compiler to vectorize code in case we have multiple constant accesses side by side because we'd remove branches, but TBH I don't know. At least this optimization can't generate worst code, so that's really comforting.
@EricWF I'll wait until like the middle of next week before merging this to give you time to react in case you had any concerns (as suggested by your question).
I would like to see beenchmark results for this. |
What benchmark do you want me to run? |
I would like you to write benchmarks that target the function you modified, ensure that you're case is covered there, then run them to show a performance improvement, as well as to verify that no regression occurred. |
For the following benchmark
Performance goes from
To
For a ~30% speed-up |
And what about for non-constant string indexes? Or what happens in a benchmark where you mix constant and non-constant indexes? I would like this to be a little more thorough |
For mixed index and following code
I see speed up from |
Right, in this case |
I just spoke with @EricWF who said he was fine with this change. I'll merge it now. Thanks everyone for chiming in! |
If we know that index is larger than SSO size, we know that we can't be in SSO case, and should access the pointer. This removes extra check from operator[] for inputs known at compile time to be larger than SSO.