Skip to content

Conversation

@jieyouxu
Copy link
Member

@jieyouxu jieyouxu commented Nov 3, 2025

This PR reverts RUST-147622 for several reasons:

  1. The RUST-147622 PR would format the generated core library code using an arbitrary rustfmt picked up from PATH, which will cause hard-to-debug failures when the rustfmt used to format the generated unicode data code versus the rustfmt used to format the in-tree library code produce incompatible formatting.
  2. Previously, the unicode-table-generator tests were not run under CI as part of coretests, and since for x86_64-gnu-aux job we run library coretests with miri, the generated tests unfortunately caused an unacceptably large Merge CI time regression from ~2 hours to ~3.5 hours, making it the slowest Merge CI job (and thus the new bottleneck).
  3. This PR also has an unintended effect of causing a diagnostic regression (RUST-148387), though that's mostly an edge case not properly handled by rustc diagnostics.

Given that these are three distinct causes with non-trivial fixes, I'm proposing to revert this PR to return us to baseline. This is not prejudice against relanding the changes with these issues addressed, but to alleviate time pressure to address these non-trivial issues.

FYI @Kmeakin @joboet (PR author/review). Note that these issues are very subtle, so you cannot be reasonably expected to know about them beforehand.

This was discussed in:

@rustbot
Copy link
Collaborator

rustbot commented Nov 3, 2025

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 3, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

This PR reverts RUST-147622 for several reasons:

1. The RUST-147622 PR would format the generated core library code using
   an arbitrary `rustfmt` picked up from `PATH`, which will cause
   hard-to-debug failures when the `rustfmt` used to format the
   generated unicode data code versus the `rustfmt` used to format the
   in-tree library code.
2. Previously, the `unicode-table-generator` tests were not run under CI
   as part of `coretests`, and since for `x86_64-gnu-aux` job we run
   library `coretests` with `miri`, the generated tests unfortunately
   caused an unacceptably large Merge CI time regression from ~2 hours
   to ~3.5 hours, making it the slowest Merge CI job (and thus the new
   bottleneck).
3. This PR also has an unintended effect of causing a diagnostic
   regression (RUST-148387), though that's mostly an edge case not
   properly handled by `rustc` diagnostics.

Given that these are three distinct causes with non-trivial fixes, I'm
proposing to revert this PR to return us to baseline. This is not
prejudice against relanding the changes with these issues addressed, but
to alleviate time pressure to address these non-trivial issues.
@jieyouxu jieyouxu force-pushed the revert-unicode-generator branch from 1a4b577 to 4aeb297 Compare November 3, 2025 11:53
@joboet
Copy link
Member

joboet commented Nov 3, 2025

I think it's quite simple to fix these issues:

  1. We can remove the rustfmt invocation from the generator, a normal ./x fmt run after the generator will take care of formatting.
  2. I'd just add a #[cfg(not(miri))] to the tests.
  3. This issue is not exclusive to the PR.

How time-critical is this? I can whip up a PR for the two issues today.

@Noratrieb
Copy link
Member

Noratrieb commented Nov 3, 2025

wrt 3. I'm concerned that an "NFC" refactor caused a change in behavior, that's the more alarming part than the diagnostics bug itself

@Zalathar
Copy link
Member

Zalathar commented Nov 3, 2025

The fact that this is adding ~30 minutes to every successful merge is IMO a good reason to want to revert the changes as soon as possible, without waiting for a fix-forward.

If it's easy to fix the problems, it's just as easy to fix them in a PR that reapplies the changes.

@joboet
Copy link
Member

joboet commented Nov 3, 2025

Oh, about the second issue: the tests are marked as #[cfg_attr(miri, ignore)], so the time issue probably stems from something else...

@Kmeakin
Copy link
Contributor

Kmeakin commented Nov 3, 2025

For point 2, shouldn't #[cfg_attr(miri, ignore)] cause the tests to not be run on miri?

@Zalathar
Copy link
Member

Zalathar commented Nov 3, 2025

2025-11-01T08:50:02.0254739Z test unicode::grapheme_extend ... ignored
2025-11-01T08:50:02.1058441Z test unicode::lowercase ... ignored
2025-11-01T10:15:58.8090642Z test unicode::n ... ok
2025-11-01T10:15:58.8822592Z test unicode::to_lowercase ... ignored

#[test]
fn n() {
test_boolean_property(test_data::N, unicode_data::n::lookup);
}

@Kmeakin
Copy link
Contributor

Kmeakin commented Nov 3, 2025

Forgot to annotate n with #[cfg_attr(miri, ignore)] 🤦‍♂️

@Zalathar
Copy link
Member

Zalathar commented Nov 3, 2025

Let's revert this quickly, to undo the impact on CI times.

Discussion of potential fixes and remaining concerns can happen on the reapply PR.

@bors r+ p=6

@bors
Copy link
Collaborator

bors commented Nov 3, 2025

📌 Commit 4aeb297 has been approved by Zalathar

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 3, 2025
@Zalathar
Copy link
Member

Zalathar commented Nov 3, 2025

Failure in rollup would be awkward, and it would be nice to see clean stats from the revert, so:

@bors rollup=never

@joboet
Copy link
Member

joboet commented Nov 3, 2025

Fixes are up at #148436.

@bors
Copy link
Collaborator

bors commented Nov 3, 2025

⌛ Testing commit 4aeb297 with merge f5711a5...

@bors
Copy link
Collaborator

bors commented Nov 3, 2025

☀️ Test successful - checks-actions
Approved by: Zalathar
Pushing f5711a5 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 3, 2025
@bors bors merged commit f5711a5 into rust-lang:master Nov 3, 2025
12 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Nov 3, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 35ebdf9 (parent) -> f5711a5 (this PR)

Test differences

Show 31 test diffs

Stage 1

  • unicode::alphabetic: pass -> [missing] (J2)
  • unicode::case_ignorable: pass -> [missing] (J2)
  • unicode::cased: pass -> [missing] (J2)
  • unicode::grapheme_extend: pass -> [missing] (J2)
  • unicode::lowercase: pass -> [missing] (J2)
  • unicode::n: pass -> [missing] (J2)
  • unicode::to_lowercase: pass -> [missing] (J2)
  • unicode::to_uppercase: pass -> [missing] (J2)
  • unicode::uppercase: pass -> [missing] (J2)
  • unicode::white_space: pass -> [missing] (J2)

Stage 2

  • unicode::alphabetic: ignore -> [missing] (J0)
  • unicode::case_ignorable: ignore -> [missing] (J0)
  • unicode::cased: ignore -> [missing] (J0)
  • unicode::grapheme_extend: ignore -> [missing] (J0)
  • unicode::lowercase: ignore -> [missing] (J0)
  • unicode::to_lowercase: ignore -> [missing] (J0)
  • unicode::to_uppercase: ignore -> [missing] (J0)
  • unicode::uppercase: ignore -> [missing] (J0)
  • unicode::white_space: ignore -> [missing] (J0)
  • unicode::alphabetic: pass -> [missing] (J1)
  • unicode::case_ignorable: pass -> [missing] (J1)
  • unicode::cased: pass -> [missing] (J1)
  • unicode::grapheme_extend: pass -> [missing] (J1)
  • unicode::lowercase: pass -> [missing] (J1)
  • unicode::to_lowercase: pass -> [missing] (J1)
  • unicode::to_uppercase: pass -> [missing] (J1)
  • unicode::uppercase: pass -> [missing] (J1)
  • unicode::white_space: pass -> [missing] (J1)
  • unicode::n: pass -> [missing] (J3)

Additionally, 2 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard f5711a55f5d5e2f942057d0f6d648dd2d8b2c37b --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. x86_64-gnu-aux: 10859.2s -> 6448.4s (-40.6%)
  2. dist-apple-various: 3500.9s -> 4241.0s (+21.1%)
  3. dist-aarch64-apple: 7218.1s -> 8434.1s (+16.8%)
  4. x86_64-gnu-llvm-20: 2332.7s -> 2635.4s (+13.0%)
  5. x86_64-gnu: 6502.4s -> 7330.3s (+12.7%)
  6. dist-loongarch64-linux: 5014.4s -> 5619.8s (+12.1%)
  7. dist-riscv64-linux: 4696.0s -> 5194.8s (+10.6%)
  8. dist-x86_64-msvc-alt: 8838.4s -> 9739.1s (+10.2%)
  9. dist-various-1: 3844.3s -> 4190.0s (+9.0%)
  10. dist-x86_64-apple: 7351.9s -> 7967.7s (+8.4%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f5711a5): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.1%, 0.3%] 5
Improvements ✅
(primary)
-0.8% [-1.0%, -0.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.8% [-1.0%, -0.4%] 3

Max RSS (memory usage)

Results (primary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [3.5%, 3.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-1.7%, -1.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.9% [-1.7%, 3.5%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.2%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.1%, 0.2%] 10

Bootstrap: 474.155s -> 474.205s (0.01%)
Artifact size: 390.84 MiB -> 390.90 MiB (0.01%)

@Kobzol
Copy link
Member

Kobzol commented Nov 3, 2025

Performance-wish it's a wash, and this is a revert anyway.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Nov 3, 2025
@jieyouxu jieyouxu deleted the revert-unicode-generator branch November 3, 2025 23:24
@tgross35
Copy link
Contributor

tgross35 commented Nov 4, 2025

Fixes are up at #148436.

That's this PR, I assume you meant #148438

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants