Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Optimize string operator[] for known large inputs #69500

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

TocarIP
Copy link
Contributor

@TocarIP TocarIP commented Oct 18, 2023

If we know that index is larger than SSO size, we know that we can't be in SSO case, and should access the pointer. This removes extra check from operator[] for inputs known at compile time to be larger than SSO.

@TocarIP TocarIP requested a review from a team as a code owner October 18, 2023 19:13
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 18, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 18, 2023

@llvm/pr-subscribers-libcxx

Author: Ilya Tocar (TocarIP)

Changes

If we know that index is larger than SSO size, we know that we can't be in SSO case, and should access the pointer. This removes extra check from operator[] for inputs known at compile time to be larger than SSO.


Full diff: https://github.com/llvm/llvm-project/pull/69500.diff

1 Files Affected:

  • (modified) libcxx/include/string (+6)
diff --git a/libcxx/include/string b/libcxx/include/string
index 91935162f02383a..cf9f0c847eb43af 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1198,11 +1198,17 @@ public:
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_reference operator[](size_type __pos) const _NOEXCEPT {
     _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__pos <= size(), "string index out of bounds");
+    if (__builtin_constant_p(__pos) && !__fits_in_sso(__pos)) {
+      return *(__get_long_pointer() + __pos);
+    }
     return *(data() + __pos);
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 reference operator[](size_type __pos) _NOEXCEPT {
     _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__pos <= size(), "string index out of bounds");
+    if (__builtin_constant_p(__pos) && !__fits_in_sso(__pos)) {
+      return *(__get_long_pointer() + __pos);
+    }
     return *(__get_pointer() + __pos);
   }
 

libcxx/include/string Show resolved Hide resolved
libcxx/include/string Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Oct 20, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

If we know that index is larger than SSO size, we know that we
can't be in SSO case, and should access the pointer. This removes
extra check from operator[] for inputs known at compile time to be
larger than SSO.
Copy link
Member

@EricWF EricWF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you state how this optimization was measured?

@TocarIP
Copy link
Contributor Author

TocarIP commented Oct 20, 2023

I've looked at disassembly and verified that __is_long branch disappeared.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I tried finding a downside and I couldn't. In general I'm not a huge fan of adding complexity like this, but in this case we're basically leveraging information known at compile-time (when it is known) to generate better code, and we do that in one of the most important types in the library. So I feel like anything reasonable that improves performance here is welcome.

In particular, I suspect this might make it easier for the compiler to vectorize code in case we have multiple constant accesses side by side because we'd remove branches, but TBH I don't know. At least this optimization can't generate worst code, so that's really comforting.

@EricWF I'll wait until like the middle of next week before merging this to give you time to react in case you had any concerns (as suggested by your question).

@EricWF
Copy link
Member

EricWF commented Oct 23, 2023

I would like to see beenchmark results for this.

@TocarIP
Copy link
Contributor Author

TocarIP commented Oct 23, 2023

What benchmark do you want me to run?

@EricWF
Copy link
Member

EricWF commented Oct 23, 2023

I would like you to write benchmarks that target the function you modified, ensure that you're case is covered there, then run them to show a performance improvement, as well as to verify that no regression occurred.

@TocarIP
Copy link
Contributor Author

TocarIP commented Oct 23, 2023

For the following benchmark

std::string s = "12345678901234567890123456789012345678901234567890";

void BM_index(benchmark::State& state) {
  for (auto _ : state) {
    testing::DoNotOptimize(s[33]*37+s[65]);
  }
}

Performance goes from

name      cpu/op 
BM_index  0.69ns ± 1%

To

name      cpu/op 
BM_index  0.46ns ± 1%

For a ~30% speed-up

@EricWF
Copy link
Member

EricWF commented Oct 23, 2023

And what about for non-constant string indexes? Or what happens in a benchmark where you mix constant and non-constant indexes?

I would like this to be a little more thorough

@TocarIP
Copy link
Contributor Author

TocarIP commented Oct 24, 2023

For mixed index and following code

  void BM_index(benchmark::State& state) {
    for (auto _ : state) {
      for (int i = 0;i<21;i++) {
        testing::DoNotOptimize(s[33]*37+s[i]);
      }   
    }
  }

I see speed up from BM_index 15.5ns ±15% to BM_index 13.6ns ±16%
Non-constant index are obviously unchanged (16.0ns ±11% vs BM_index 16.3ns ±10%).
Overall its hard to write benchmarks when I don't have a good mental model for a possible regression. Do you have a specific concern in mind? Only code path with constant index is affected and removing branches seems like an obviously good thing (modulo code-alignment shenanigans).

@ldionne
Copy link
Member

ldionne commented Oct 25, 2023

And what about for non-constant string indexes?

Right, in this case __builtin_constant_p will evaluate to false, so it is extremely unlikely that the compiler will generate any code for the if branch. Basically this seems to be a free optimization.

@ldionne
Copy link
Member

ldionne commented Oct 26, 2023

I just spoke with @EricWF who said he was fine with this change. I'll merge it now. Thanks everyone for chiming in!

@ldionne ldionne merged commit 178a1fe into llvm:main Oct 26, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants