Skip to content

Conversation

@hazzlim
Copy link
Contributor

@hazzlim hazzlim commented Jan 20, 2026

This PR adds a Neon implementation of find_last.

Performance numbers (values are speedup figures relative to existing code; values greater than 1 indicate that the new code is faster):

  MSVC SU Clang  SU
bm<uint8_t, not_highly_aligned_allocator, Op::FindLast>/8021/3056 12.571 12.453
bm<uint8_t, not_highly_aligned_allocator, Op::FindLast>/63/62 4.229 3.882
bm<uint8_t, not_highly_aligned_allocator, Op::FindLast>/31/30 2.118 2.333
bm<uint8_t, not_highly_aligned_allocator, Op::FindLast>/15/14 1.441 1.309
bm<uint8_t, not_highly_aligned_allocator, Op::FindLast>/7/6 0.842 0.636
bm<char, not_highly_aligned_allocator, Op::StringRFind>/8021/3056 12.571 12.585
bm<char, not_highly_aligned_allocator, Op::StringRFind>/63/62 3.115 4.045
bm<char, not_highly_aligned_allocator, Op::StringRFind>/31/30 1.872 2.437
bm<char, not_highly_aligned_allocator, Op::StringRFind>/15/14 1.071 1.378
bm<char, not_highly_aligned_allocator, Op::StringRFind>/7/6 0.647 0.667
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 11.727 12.279
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 3.147 3.405
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 1.891 2.083
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 1.155 1.202
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 0.642 0.667
bm<uint16_t, not_highly_aligned_allocator, Op::FindLast>/8021/3056 7.371 8.036
bm<uint16_t, not_highly_aligned_allocator, Op::FindLast>/63/62 3.472 3.583
bm<uint16_t, not_highly_aligned_allocator, Op::FindLast>/31/30 2.492 2.613
bm<uint16_t, not_highly_aligned_allocator, Op::FindLast>/15/14 1.438 1.502
bm<uint16_t, not_highly_aligned_allocator, Op::FindLast>/7/6 0.976 0.947
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/8021/3056 7.703 7.679
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/63/62 3.267 3.6
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/31/30 2.358 2.435
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/15/14 1.515 1.413
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/7/6 0.913 0.927
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 5.972 6.099
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 2.777 3.13
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 1.917 2.062
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 1.218 1.266
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 0.759 0.766
bm<uint32_t, not_highly_aligned_allocator, Op::FindLast>/8021/3056 4.027 3.921
bm<uint32_t, not_highly_aligned_allocator, Op::FindLast>/63/62 2.821 2.875
bm<uint32_t, not_highly_aligned_allocator, Op::FindLast>/31/30 2.437 2.513
bm<uint32_t, not_highly_aligned_allocator, Op::FindLast>/15/14 1.623 1.867
bm<uint32_t, not_highly_aligned_allocator, Op::FindLast>/7/6 0.985 1.148
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/8021/3056 3.921 4
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/63/62 2.829 2.99
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/31/30 2.288 2.414
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/15/14 1.636 1.688
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/7/6 1.048 1.067
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 3.244 3.158
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 2.28 2.451
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 1.828 1.923
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 1.256 1.357
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 0.779 0.857
bm<uint64_t, not_highly_aligned_allocator, Op::FindLast>/8021/3056 1.934 1.934
bm<uint64_t, not_highly_aligned_allocator, Op::FindLast>/63/62 2.188 1.88
bm<uint64_t, not_highly_aligned_allocator, Op::FindLast>/31/30 1.6 1.686
bm<uint64_t, not_highly_aligned_allocator, Op::FindLast>/15/14 1.424 1.505
bm<uint64_t, not_highly_aligned_allocator, Op::FindLast>/7/6 1.003 1.158

@hazzlim hazzlim requested a review from a team as a code owner January 20, 2026 20:05
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jan 20, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jan 20, 2026
@StephanTLavavej StephanTLavavej self-assigned this Jan 20, 2026
@StephanTLavavej StephanTLavavej removed their assignment Jan 21, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jan 21, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Jan 30, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 353b2b9 into microsoft:main Feb 2, 2026
45 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Feb 2, 2026
@StephanTLavavej
Copy link
Member

🦾 🕵️ 🔍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants