Skip to content

Conversation

Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Oct 13, 2025

Minor refactors to unicode_data that occured to me while trying to reduce the size of the tables. Splitting into a separate PR. NFC

Instead of `include_str!()`ing `range_search.rs`, just make it a normal
module under `core::unicode`. This means the same source code doesn't
have to be checked in twice, and it plays nicer with IDEs.

Also rename it to `rt` since it includes functions for searching the
bitsets as well as the range represesentation.
Run `rustfmt` on the generated tables. This means we won't have to worry
so much about getting indetation and formatting right when generating
code.

Exempted for now some tables which would be too big when formatted by
`rustfmt`.
This check was made redundant (it will always be true) when we removed
all ASCII characters from the tables
(rust-lang@a8c6694).
@rustbot
Copy link
Collaborator

rustbot commented Oct 13, 2025

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 13, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 13, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

To make the final output code easier to see:
* Get rid of the unnecessary line-noise of `.unwrap()`ing calls to
  `write!()` by moving the `.unwrap()` into a macro.
* Join consecutive `write!()` calls using a single multiline format
  string.
* Replace `.push()` and `.push_str(format!())` with `write!()`.
* If after doing all of the above, there is only a single `write!()`
  call in the function, just construct the string directly with
  `format!()`.
@Kmeakin Kmeakin force-pushed the km/unicode-data/refactors branch from 2c5244e to 90adbe2 Compare October 13, 2025 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants