Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions library/core/src/slice/ascii.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,25 @@ impl [u8] {
return false;
}

#[cfg(all(target_arch = "x86_64", target_feature = "sse2"))]
{
const CHUNK_SIZE: usize = 16;
// The following function has two invariants:
// 1. The slice lengths must be equal, which we checked above.
// 2. The slice lengths must greater than or equal to N, which this
// if-statement is checking.
if self.len() >= CHUNK_SIZE {
return self.eq_ignore_ascii_case_chunks::<CHUNK_SIZE>(other);
}
}

self.eq_ignore_ascii_case_simple(other)
}

/// ASCII case-insensitive equality check without chunk-at-a-time
/// optimization.
#[inline]
const fn eq_ignore_ascii_case_simple(&self, other: &[u8]) -> bool {
// FIXME(const-hack): This implementation can be reverted when
// `core::iter::zip` is allowed in const. The original implementation:
// self.len() == other.len() && iter::zip(self, other).all(|(a, b)| a.eq_ignore_ascii_case(b))
Expand All @@ -78,6 +97,65 @@ impl [u8] {
true
}

/// Optimized version of `eq_ignore_ascii_case` to process chunks at a time.
///
/// Platforms that have SIMD instructions may benefit from this
/// implementation over `eq_ignore_ascii_case_simple`.
///
/// # Invariants
///
/// The caller must guarantee that the slices are equal in length, and the
/// slice lengths are greater than or equal to `N` bytes.
#[cfg(all(target_arch = "x86_64", target_feature = "sse2"))]
#[inline]
const fn eq_ignore_ascii_case_chunks<const N: usize>(&self, other: &[u8]) -> bool {
// FIXME(const-hack): The while-loops that follow should be replaced by
// for-loops when available in const.

let (self_chunks, self_rem) = self.as_chunks::<N>();
let (other_chunks, _) = other.as_chunks::<N>();

// Branchless check to encourage auto-vectorization
#[inline(always)]
Copy link
Member

@scottmcm scottmcm Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: you don't need always here. It's a non-generic function which is why it went badly with no attribute, but #[inline] is sufficient.

Suggested change
#[inline(always)]
#[inline]

Copy link
Contributor Author

@okaneco okaneco Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without always, it fails the filecheck test I've written and doesn't seem to inline as shown in the godbolt.

Left is inline(always), right is inline
https://rust.godbolt.org/z/Px6nMPWvK

edit: I realize above I included const generics.
With all generics removed, it still doesn't seem to inline the inner call without always
https://rust.godbolt.org/z/4Tq5hWY6T

const fn eq_ignore_ascii_inner<const L: usize>(lhs: &[u8; L], rhs: &[u8; L]) -> bool {
let mut equal_ascii = true;
let mut j = 0;
while j < L {
equal_ascii &= lhs[j].eq_ignore_ascii_case(&rhs[j]);
j += 1;
}

equal_ascii
}

// Process the chunks, returning early if an inequality is found
let mut i = 0;
while i < self_chunks.len() && i < other_chunks.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it feels wrong that we need to check both lengths here. The answer can never actually differ, right? (Because the caller checks equal-length first anyway.)

The function boundary split feels awkward, since it's losing information -- both in simple and non-simple -- that would be helpful to the optimizer when it comes to removing bounds checks.

What does the actual loop condition look like in llvm or asm here?

Copy link
Contributor Author

@okaneco okaneco Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it feels wrong that we need to check both lengths here. The answer can never actually differ, right? (Because the caller checks equal-length first anyway.)

It does feel wrong, and you are correct about how they can never differ. I included this to eliminate an extra branch and panic in the loop. The comparison gets optimized out so there is no extra length check. I never got around to filing this one as a bug report.

What does the actual loop condition look like in llvm or asm here?

Left is with both checks, right is with only 1
https://rust.godbolt.org/z/5Ks3z9MTM

Difference in the loop condition is this on the right which jumps to a panic

.LBB0_3:
        sub     r8, 1
        jb      .LBB0_15

edit: here's a more minimal version with just the chunk loop https://rust.godbolt.org/z/MGEv3qszY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reported as #147953

if !eq_ignore_ascii_inner(&self_chunks[i], &other_chunks[i]) {
return false;
}
i += 1;
}

// Check the length invariant which is necessary for the tail-handling
// logic to be correct. This should have been upheld by the caller,
// otherwise lengths less than N will compare as true without any
// checking.
debug_assert!(self.len() >= N);

// If there are remaining tails, load the last N bytes in the slices to
// avoid falling back to per-byte checking.
Comment on lines +146 to +147
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only correct because of a check in the caller that's not documented in the comments on this method nor in the code here.

I would suggest at least having a debug_assert!(self.len() >= N) or similar here to help communicate that invariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've been thinking about this since my last push.

I think calling this function with a const generic parameter might be a better choice to encode the information, so I'm going to change it to that along with a debug_assert here.

if !self_rem.is_empty() {
if let (Some(a_rem), Some(b_rem)) = (self.last_chunk::<N>(), other.last_chunk::<N>()) {
if !eq_ignore_ascii_inner(a_rem, b_rem) {
return false;
}
}
}

true
}

/// Converts this slice to its ASCII upper case equivalent in-place.
///
/// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z',
Expand Down
1 change: 1 addition & 0 deletions library/coretests/benches/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use test::{Bencher, black_box};
mod char_count;
mod corpora;
mod debug;
mod eq_ignore_ascii_case;
mod iter;

#[bench]
Expand Down
45 changes: 45 additions & 0 deletions library/coretests/benches/str/eq_ignore_ascii_case.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
use test::{Bencher, black_box};

use super::corpora::*;

#[bench]
fn bench_str_under_8_bytes_eq(b: &mut Bencher) {
let s = black_box("foo");
let other = black_box("foo");
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}

#[bench]
fn bench_str_of_8_bytes_eq(b: &mut Bencher) {
let s = black_box(en::TINY);
let other = black_box(en::TINY);
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}

#[bench]
fn bench_str_17_bytes_eq(b: &mut Bencher) {
let s = black_box(&en::SMALL[..17]);
let other = black_box(&en::SMALL[..17]);
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}

#[bench]
fn bench_str_31_bytes_eq(b: &mut Bencher) {
let s = black_box(&en::SMALL[..31]);
let other = black_box(&en::SMALL[..31]);
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}

#[bench]
fn bench_medium_str_eq(b: &mut Bencher) {
let s = black_box(en::MEDIUM);
let other = black_box(en::MEDIUM);
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}

#[bench]
fn bench_large_str_eq(b: &mut Bencher) {
let s = black_box(en::LARGE);
let other = black_box(en::LARGE);
b.iter(|| assert!(s.eq_ignore_ascii_case(other)))
}
16 changes: 16 additions & 0 deletions tests/codegen-llvm/lib-optimizations/eq_ignore_ascii_case.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
//@ compile-flags: -Copt-level=3
//@ only-x86_64
#![crate_type = "lib"]

// Ensure that the optimized variant of the function gets auto-vectorized and
// that the inner helper function is inlined.
// CHECK-LABEL: @eq_ignore_ascii_case_autovectorized
#[no_mangle]
pub fn eq_ignore_ascii_case_autovectorized(s: &str, other: &str) -> bool {
// CHECK: load <16 x i8>
// CHECK: load <16 x i8>
// CHECK: bitcast <16 x i1>
// CHECK-NOT: call {{.*}}eq_ignore_ascii_inner
// CHECK-NOT: panic
s.eq_ignore_ascii_case(other)
}
Loading