- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.9k
 
reland and fix RUST-147622 #148438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
reland and fix RUST-147622 #148438
Conversation
| 
           
 If you want to modify   | 
    
| 
          
 rustbot has assigned @Mark-Simulacrum. Use   | 
    
| 
           @bors try jobs=x86_64-gnu-aux  | 
    
fix fallout from RUST-147622 try-job: x86_64-gnu-aux
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
| 
           Queued 66d8c54 with parent 35ebdf9, future comparison URL.  | 
    
| 
           ☔ The latest upstream changes (presumably #148436) made this pull request unmergeable. Please resolve the merge conflicts.  | 
    
Instead of `include_str!()`ing `range_search.rs`, just make it a normal module under `core::unicode`. This means the same source code doesn't have to be checked in twice, and it plays nicer with IDEs. Also rename it to `rt` since it includes functions for searching the bitsets and case conversion tables as well as the range represesentation.
Remove `#[rustfmt::skip]` from all the generated modules in `unicode_data.rs`. This means we won't have to worry so much about getting indetation and formatting right when generating code. Exempted for now some tables which would be too big when formatted by `rustfmt`.
This check was made redundant (it will always be true) when we removed all ASCII characters from the tables (rust-lang@a8c6694).
To make the final output code easier to see: * Get rid of the unnecessary line-noise of `.unwrap()`ing calls to `write!()` by moving the `.unwrap()` into a macro. * Join consecutive `write!()` calls using a single multiline format string. * Replace `.push()` and `.push_str(format!())` with `write!()`. * If after doing all of the above, there is only a single `write!()` call in the function, just construct the string directly with `format!()`.
Instead of generating a standalone executable to test `unicode_data`, generate normal tests in `coretests`. This ensures tests are always generated, and will be run as part of the normal testsuite. Also change the generated tests to loop over lookup tables, rather than generating a separate `assert_eq!()` statement for every codepoint. The old approach produced a massive (20,000 lines plus) file which took minutes to compile!
f34f268    to
    ab82fbe      
    Compare
  
    | 
           This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.  | 
    
| fn run(self, builder: &Builder<'_>) { | ||
| let mut cmd = builder.tool_cmd(Tool::UnicodeTableGenerator); | ||
| cmd.arg(builder.src.join("library/core/src/unicode/unicode_data.rs")); | ||
| cmd.arg(builder.src.join("library/coretests/tests/unicode/test_data.rs")); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a short comment noting that these two command-line arguments are output paths for generated code?
That's useful information for bootstrap maintainers who aren't already familiar with unicode-table-generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please note that this actually generates tests under coretests, which gets picked up as library tests.
| 
           One of the difficulties I had with #147622 was that the PR title and description did not describe the true scope and scale of the changes, which made it much harder to track down the build-duration regression. Could you please update the title and description of this PR to more accurately describe what's being changed?  | 
    
| 
           If you could also include this commit I added in a separate PR, it would be appreciated. Or, you could move the verification to a coretest, which I am also okay writing as well. The amount of const-time code added into the generator was a bit much, and it should be kept in libcore instead.  | 
    
| 
           I would also suggest that using the word “refactor” for high-impact cross-subsystem changes is, from a maintenance perspective, downright misleading.  | 
    
| 
           I mean, I would argue that the term "refactor" by itself is vague in almost all cases and a general rule to avoid it in favour of being more descriptive is probably best. As in, if you wanted to create a rustbot rule to tell people to consider rewording the PR title whenever the word "refactor" is in it, I would consider that reasonable. You can describe most code changes as "refactors," and I wouldn't disagree with them necessarily being refactors, but I do think that the term isn't very helpful for understanding what the changes are.  | 
    
| } | ||
| 
               | 
          ||
| #[test] | ||
| #[cfg_attr(miri, ignore)] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These definitely should have a comment explaining why they aren't run on miri, since it really isn't clear looking at the tests alone.
Also, I'm a little confused why this isn't triggering a dead_code lint for test_boolean_property on miri as well…
| const { | ||
| let mut i = 0; | ||
| while i < UPPERCASE_TABLE.len() { | ||
| let (_, val) = UPPERCASE_TABLE[i]; | ||
| if val & (1 << 22) == 0 { | ||
| assert!(char::from_u32(val).is_some()); | ||
| } else { | ||
| let index = val & ((1 << 22) - 1); | ||
| assert!((index as usize) < UPPERCASE_TABLE_MULTI.len()); | ||
| } | ||
| i += 1; | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test function using const {} to generate asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accepted idiom for having a large test case that is executed during const-eval, like this, is to simply declare a const _: () =, which should not be nested inside a function for no particular reason. Such a nesting mostly serves the purpose of obfuscating what the function actually does at runtime at best, and potentially having funny results when the code flows through any sort of MIR interpreter at worst (const eval or our friend miri), because the const {} should not matter to the function's actual execution but can be visible in differing ways (that I honestly don't fully understand the nuances of) at various points during compilation to MIR.
| const {{ | ||
| let mut i = 0; | ||
| while i < {case_upper}CASE_TABLE.len() {{ | ||
| let (_, val) = {case_upper}CASE_TABLE[i]; | ||
| if val & (1 << 22) == 0 {{ | ||
| assert!(char::from_u32(val).is_some()); | ||
| }} else {{ | ||
| let index = val & ((1 << 22) - 1); | ||
| assert!((index as usize) < {case_upper}CASE_TABLE_MULTI.len()); | ||
| }} | ||
| i += 1; | ||
| }} | ||
| }} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my beef is with this here.
#147622 was reverted in #148436 due to some problems. This PR relands #147622 and addresses the issues described in the revert by:
rustfmtinvocation from the generator and telling users to run./x fmtinstead.#[cfg_attr(miri, ignore)]to thentest.unicode_data::nmodule tonumberto reduce collisions.Only ab82fbe is a new commit, the others are taken unmodified from #147622.
CC @Kmeakin @jieyouxu