Skip to content

Replace printables table with unicode_data.rs tables#155527

Open
Jules-Bertholet wants to merge 1 commit intorust-lang:mainfrom
Jules-Bertholet:riir-printable
Open

Replace printables table with unicode_data.rs tables#155527
Jules-Bertholet wants to merge 1 commit intorust-lang:mainfrom
Jules-Bertholet:riir-printable

Conversation

@Jules-Bertholet
Copy link
Copy Markdown
Contributor

This gets rid of the printable.py script, ensuring that unicode-table-generator handles all our Unicode data table generation needs.

There are also some drive-by documentation improvements in library/core/char/methods.rs.

There is one change in behavior: we now consider all characters with the Default_Ignorable_Code_Point property to be unprintable. These characters can be hidden/invisible otherwise.

I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on char.

@rustbot label A-Unicode

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 19, 2026

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 19, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 19, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @scottmcm, libs
  • @scottmcm, libs expanded to 7 candidates
  • Random selection from Mark-Simulacrum, jhpratt, scottmcm

@rustbot rustbot added the A-Unicode Area: Unicode label Apr 19, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

This gets rid of the `printable.py` script,
ensuring that `unicode-table-generator` handles all our
Unicode data table generation needs.

There are also some drive-by documentation improvements
in `library/core/char/methods.rs`.

There is one change in behavior: we now consider all
characters with the `Default_Ignorable_Code_Point`
property to be unprintable. These characters can be
hidden/invisible otherwise.

I've elected to give each Unicode property its own table,
instead of merging them all into one.
This is slightly less efficient in terms of space,
but should allow us to expose these tables in the future
with public methods on `char`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Unicode Area: Unicode S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants