Skip to content

Conversation

@Zalathar
Copy link
Member

Successful merges:

r? @ghost

Create a similar rollup

okaneco and others added 14 commits October 7, 2025 05:52
Refactor the current functionality into a helper function
Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
Add a codegen test
Add benches for `eq_ignore_ascii_case`

The optimized function is initially only enabled for x86_64 which has `sse2` as
part of its baseline, but none of the code is platform specific. Other
platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the
optimized path is gated behind a length check for greater than or equal to 16.
Refactor the eq check into an inner function for reuse in tail checking

Rather than fall back to the simple implementation for tail handling,
load the last 16 bytes to take advantage of vectorization. This doesn't
seem to negatively impact check time even when the remainder count is low.
Add #[inline(always)] to inner function and check not for filecheck test
Add comments for the optimized function invariants to the caller
Add const-hack fixme for using while-loops
Document the invariant for the `_chunks` function
Add a debug assert for the tail handling invariant
Remove copyright notices for files licensed under the standard terms (MIT OR Apache-2.0).
I tried removing it in rust-lang#151203, to replace it with something simpler.
But a couple of fuzzing failures have come up and I don't have a clear
picture on how to fix them. So I'm reverting the main part of rust-lang#151203.

This commit also adds the two fuzzing tests.

Fixes rust-lang#151226, rust-lang#151358.
They currently aren't used because r-a didn't support them, but r-a
support was recently merged in
rust-lang/rust-analyzer#21243.
…youxu

Try to reduce rustdoc GUI tests flakyness

Should help with rust-lang#93784.

I replaced a use of `puppeteer.wait` function with a loop instead (like the rest of `browser-ui-test`).

r? @jieyouxu
…=scottmcm

slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization

- Refactor the current functionality into a helper function
- Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
- Add a codegen test checking for vectorization and no panicking
- Add benches for `eq_ignore_ascii_case`

---

The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.

Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.
```
before:
    str::eq_ignore_ascii_case::bench_large_str_eq         4942.30ns/iter +/- 48.20
    str::eq_ignore_ascii_case::bench_medium_str_eq         632.01ns/iter +/- 16.87
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        16.28ns/iter  +/- 0.45
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        35.23ns/iter  +/- 2.28
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq       7.56ns/iter  +/- 0.22
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq    2.64ns/iter  +/- 0.06
after:
    str::eq_ignore_ascii_case::bench_large_str_eq         611.63ns/iter +/- 28.29
    str::eq_ignore_ascii_case::bench_medium_str_eq         77.10ns/iter +/- 19.76
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        3.49ns/iter  +/- 0.39
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        3.50ns/iter  +/- 0.27
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq      7.27ns/iter  +/- 0.09
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq   2.60ns/iter  +/- 0.05
```
Reintroduce `QueryStackFrame` split.

I tried removing it in rust-lang#151203, to replace it with something simpler. But a couple of fuzzing failures have come up and I don't have a clear picture on how to fix them. So I'm reverting the main part of rust-lang#151203.

This commit also adds the two fuzzing tests.

Fixes rust-lang#151226, rust-lang#151358.

r? @oli-obk
…ts-query-key, r=Noratrieb

Use an associated type default for `Key::Cache`.

They currently aren't used because r-a didn't support them, but r-a support was recently merged in
rust-lang/rust-analyzer#21243.

r? @Noratrieb
…jieyouxu

Omit standard copyright notice

Remove copyright notices for files licensed under the standard terms (MIT OR Apache-2.0).
@rust-bors rust-bors bot added the rollup A PR which is a rollup label Jan 27, 2026
@rustbot rustbot added A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-rustc-dev-guide Area: rustc-dev-guide S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Jan 27, 2026
@Zalathar
Copy link
Member Author

Rollup of everything.

@bors r+ rollup=never p=5

@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 27, 2026

📌 Commit 6ec16a4 has been approved by Zalathar

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 27, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added the merged-by-bors This PR was explicitly merged by bors. label Jan 27, 2026
@rust-bors rust-bors bot removed the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 27, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 27, 2026

☀️ Test successful - CI
Approved by: Zalathar
Duration: 3h 34m 7s
Pushing 78df2f9 to main...

@rust-bors rust-bors bot merged commit 78df2f9 into rust-lang:main Jan 27, 2026
12 checks passed
@rustbot rustbot added this to the 1.95.0 milestone Jan 27, 2026
@rust-timer
Copy link
Collaborator

📌 Perf builds for each rolled up PR:

PR# Message Perf Build Sha
#147436 slice/ascii: Optimize eq_ignore_ascii_case with auto-vect… 9d94a72d6208e3bb3afa41aa05e07801aab64301 (link)
#151097 Use an associated type default for Key::Cache. eb24d3bb6957e89736a7f89c70e0e6b70743cf50 (link)
#151390 Reintroduce QueryStackFrame split. b78baf02f91f326ce263fdf6345fa9b177f3b049 (link)
#151692 Try to reduce rustdoc GUI tests flakyness 2653e77232b7342be8ff13690952ff7db804544b (link)
#151702 Omit standard copyright notice 0fe9d3079f4c25c1539fad1434f95414b01009b2 (link)

previous master: ebf13cca58

In the case of a perf regression, run the following command for each PR you suspect might be the cause: @rust-timer build $SHA

@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing ebf13cc (parent) -> 78df2f9 (this PR)

Test differences

Show 38 test diffs

Stage 1

  • str::eq_ignore_ascii_case::bench_large_str_eq: [missing] -> pass (J0)
  • str::eq_ignore_ascii_case::bench_medium_str_eq: [missing] -> pass (J0)
  • str::eq_ignore_ascii_case::bench_str_17_bytes_eq: [missing] -> pass (J0)
  • str::eq_ignore_ascii_case::bench_str_31_bytes_eq: [missing] -> pass (J0)
  • str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq: [missing] -> pass (J0)
  • str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq: [missing] -> pass (J0)
  • [codegen] tests/codegen-llvm/lib-optimizations/eq_ignore_ascii_case.rs: [missing] -> ignore (only executed when the architecture is x86_64) (J1)
  • [codegen] tests/codegen-llvm/lib-optimizations/eq_ignore_ascii_case.rs: [missing] -> pass (J4)
  • [ui] tests/ui/query-system/query-cycle-printing-issue-151226.rs: [missing] -> pass (J4)
  • [ui] tests/ui/query-system/query-cycle-printing-issue-151358.rs: [missing] -> pass (J4)

Stage 2

  • str::eq_ignore_ascii_case::bench_large_str_eq: [missing] -> pass (J2)
  • str::eq_ignore_ascii_case::bench_medium_str_eq: [missing] -> pass (J2)
  • str::eq_ignore_ascii_case::bench_str_17_bytes_eq: [missing] -> pass (J2)
  • str::eq_ignore_ascii_case::bench_str_31_bytes_eq: [missing] -> pass (J2)
  • str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq: [missing] -> pass (J2)
  • str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq: [missing] -> pass (J2)
  • [codegen] tests/codegen-llvm/lib-optimizations/eq_ignore_ascii_case.rs: [missing] -> ignore (only executed when the architecture is x86_64) (J3)
  • [ui] tests/ui/query-system/query-cycle-printing-issue-151226.rs: [missing] -> pass (J5)
  • [ui] tests/ui/query-system/query-cycle-printing-issue-151358.rs: [missing] -> pass (J5)
  • [codegen] tests/codegen-llvm/lib-optimizations/eq_ignore_ascii_case.rs: [missing] -> pass (J6)

Additionally, 18 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 78df2f92de1da3601d967dc8beb9f9cea267e45f --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-apple-various: 4495.8s -> 5727.7s (+27.4%)
  2. pr-check-1: 1675.4s -> 1881.4s (+12.3%)
  3. pr-check-2: 2268.9s -> 2491.9s (+9.8%)
  4. dist-aarch64-msvc: 6683.8s -> 6048.8s (-9.5%)
  5. tidy: 172.9s -> 156.7s (-9.4%)
  6. x86_64-gnu-llvm-21-2: 5386.9s -> 5876.6s (+9.1%)
  7. dist-various-1: 3833.8s -> 4177.2s (+9.0%)
  8. dist-x86_64-musl: 7217.1s -> 7832.0s (+8.5%)
  9. dist-x86_64-apple: 8563.6s -> 7837.6s (-8.5%)
  10. x86_64-gnu-llvm-20: 4252.7s -> 4577.0s (+7.6%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (78df2f9): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 4
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 1
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.0%] 3
All ❌✅ (primary) -0.3% [-0.3%, -0.3%] 1

Max RSS (memory usage)

Results (primary 3.6%, secondary 2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.6% [1.3%, 7.1%] 4
Regressions ❌
(secondary)
2.3% [1.8%, 3.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.6% [1.3%, 7.1%] 4

Cycles

Results (secondary -0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.6%, 4.0%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.4% [-5.3%, -2.9%] 4
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.2%] 5
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.5% [-0.5%, -0.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.0% [-0.5%, 0.2%] 6

Bootstrap: 472.178s -> 472.873s (0.15%)
Artifact size: 383.72 MiB -> 385.68 MiB (0.51%)

@rustbot rustbot added the perf-regression Performance regression. label Jan 27, 2026
@Zalathar Zalathar deleted the rollup-goIJldt branch January 27, 2026 10:55
@Kobzol
Copy link
Member

Kobzol commented Jan 27, 2026

The regressions are very tiny and only on two secondary benchmarks, I don't think we have to investigate further.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-rustc-dev-guide Area: rustc-dev-guide merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. rollup A PR which is a rollup T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants