Add APIs for case folding to the standard library#154742
Conversation
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged.
If you want to modify |
|
r? @scottmcm rustbot has assigned @scottmcm. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
5b5e617 to
bf4ee7c
Compare
|
@rustbot reroll |
This comment has been minimized.
This comment has been minimized.
bf4ee7c to
f504859
Compare
This comment has been minimized.
This comment has been minimized.
f504859 to
b0d7515
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dd25c4f to
b14b43b
Compare
|
LGTM. |
This comment has been minimized.
This comment has been minimized.
|
r? libs-api |
|
I don't mind, but any particular reason for the reassign? I thought it was good to go. |
|
The API needs libs-API approval, I believe. They expressed interest in something like this, but there was never an ACP. (I also need to add a tracking issue after I get that) |
Rollup of 8 pull requests Successful merges: - #157240 (Enable Enzyme for aarch64-apple-darwin) - #157276 (miri subtree update) - #154742 (Add APIs for case folding to the standard library) - #157130 (Use a `ArrayVec` in `CastTarget`) - #157195 (Move feature gating to the new attr parsing infrastructure) - #157256 (tests: adapt for LLVM codegen change) - #157265 (Update books) - #157277 (triagebot.toml: add LawnGnome to libs reviewers)
|
This pull request was unapproved. This PR was contained in a rollup (#157279), which was unapproved. |
This reverts commit 1ec2ee9, which unfortunately prevented `convert_while_ascii` from vectorizing :( "LLVM gave, and LLVM hath taken away"
|
@rustbot ready |
|
@bors r+ rollup=iffy |
…imulacrum Add APIs for case folding to the standard library [Libs-api requested these](rust-lang#154287 (comment)), so here they are. New public API (gated behind `#[feature(casefold)]`): ```rust impl char { pub fn to_casefold(self) -> ToCasefold; } impl str { pub fn to_casefold(&self) -> String; pub fn eq_ignore_case(&self) -> bool; } pub struct ToCasefold { ... } impl Iterator for ToCasefold { type Item = char; ... } impl DoubleEndedIterator for ToCasefold { ... } impl FusedIterator for ToCasefold { } impl ExactSizeIterator for ToCasefold { ... } impl fmt::Display for ToCasefold { ... } ``` ## Notes - This only adds a negligible amount of static data to `core::unicode`. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation. - No normalization (e.g. NFC) is performed, so visually and semantically equivalent strings can compare unequal. - I have not put any effort into optimizing `eq_ignore_case()`; there may be a more performant implementation. - `char::eq_ignore_case()` is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API. @rustbot label T-libs-api A-unicode
…imulacrum Add APIs for case folding to the standard library [Libs-api requested these](rust-lang#154287 (comment)), so here they are. New public API (gated behind `#[feature(casefold)]`): ```rust impl char { pub fn to_casefold(self) -> ToCasefold; } impl str { pub fn to_casefold(&self) -> String; pub fn eq_ignore_case(&self) -> bool; } pub struct ToCasefold { ... } impl Iterator for ToCasefold { type Item = char; ... } impl DoubleEndedIterator for ToCasefold { ... } impl FusedIterator for ToCasefold { } impl ExactSizeIterator for ToCasefold { ... } impl fmt::Display for ToCasefold { ... } ``` ## Notes - This only adds a negligible amount of static data to `core::unicode`. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation. - No normalization (e.g. NFC) is performed, so visually and semantically equivalent strings can compare unequal. - I have not put any effort into optimizing `eq_ignore_case()`; there may be a more performant implementation. - `char::eq_ignore_case()` is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API. @rustbot label T-libs-api A-unicode
…uwer Rollup of 17 pull requests Successful merges: - #157251 (`rust-analyzer` subtree update) - #157533 (Subtree sync for rustc_codegen_cranelift) - #154742 (Add APIs for case folding to the standard library) - #155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - #157016 (add `extern "tail"` calling convention) - #157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - #157386 (Parse deprecated note links separately in rustc_resolve) - #157483 (fix windows-gnu TLS leak) - #157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - #157509 (remove solaris implementation for File::lock, it has the wrong semantics) - #157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - #156136 (Move tests box) - #157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - #157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - #157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - #157494 (Convert `QueryRegionConstraint` into a struct) - #157526 (std tests: skip a slow test on Miri) Failed merges: - #155527 (Replace printables table with `unicode_data.rs` tables)
…uwer Rollup of 17 pull requests Successful merges: - #157251 (`rust-analyzer` subtree update) - #157533 (Subtree sync for rustc_codegen_cranelift) - #154742 (Add APIs for case folding to the standard library) - #155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - #157016 (add `extern "tail"` calling convention) - #157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - #157386 (Parse deprecated note links separately in rustc_resolve) - #157483 (fix windows-gnu TLS leak) - #157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - #157509 (remove solaris implementation for File::lock, it has the wrong semantics) - #157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - #156136 (Move tests box) - #157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - #157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - #157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - #157494 (Convert `QueryRegionConstraint` into a struct) - #157526 (std tests: skip a slow test on Miri) Failed merges: - #155527 (Replace printables table with `unicode_data.rs` tables)
…uwer Rollup of 17 pull requests Successful merges: - #157251 (`rust-analyzer` subtree update) - #157533 (Subtree sync for rustc_codegen_cranelift) - #154742 (Add APIs for case folding to the standard library) - #155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - #157016 (add `extern "tail"` calling convention) - #157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - #157386 (Parse deprecated note links separately in rustc_resolve) - #157483 (fix windows-gnu TLS leak) - #157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - #157509 (remove solaris implementation for File::lock, it has the wrong semantics) - #157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - #156136 (Move tests box) - #157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - #157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - #157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - #157494 (Convert `QueryRegionConstraint` into a struct) - #157526 (std tests: skip a slow test on Miri) Failed merges: - #155527 (Replace printables table with `unicode_data.rs` tables)
…imulacrum Add APIs for case folding to the standard library [Libs-api requested these](rust-lang#154287 (comment)), so here they are. New public API (gated behind `#[feature(casefold)]`): ```rust impl char { pub fn to_casefold(self) -> ToCasefold; } impl str { pub fn to_casefold(&self) -> String; pub fn eq_ignore_case(&self) -> bool; } pub struct ToCasefold { ... } impl Iterator for ToCasefold { type Item = char; ... } impl DoubleEndedIterator for ToCasefold { ... } impl FusedIterator for ToCasefold { } impl ExactSizeIterator for ToCasefold { ... } impl fmt::Display for ToCasefold { ... } ``` ## Notes - This only adds a negligible amount of static data to `core::unicode`. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation. - No normalization (e.g. NFC) is performed, so visually and semantically equivalent strings can compare unequal. - I have not put any effort into optimizing `eq_ignore_case()`; there may be a more performant implementation. - `char::eq_ignore_case()` is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API. @rustbot label T-libs-api A-unicode
…imulacrum Add APIs for case folding to the standard library [Libs-api requested these](rust-lang#154287 (comment)), so here they are. New public API (gated behind `#[feature(casefold)]`): ```rust impl char { pub fn to_casefold(self) -> ToCasefold; } impl str { pub fn to_casefold(&self) -> String; pub fn eq_ignore_case(&self) -> bool; } pub struct ToCasefold { ... } impl Iterator for ToCasefold { type Item = char; ... } impl DoubleEndedIterator for ToCasefold { ... } impl FusedIterator for ToCasefold { } impl ExactSizeIterator for ToCasefold { ... } impl fmt::Display for ToCasefold { ... } ``` ## Notes - This only adds a negligible amount of static data to `core::unicode`. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation. - No normalization (e.g. NFC) is performed, so visually and semantically equivalent strings can compare unequal. - I have not put any effort into optimizing `eq_ignore_case()`; there may be a more performant implementation. - `char::eq_ignore_case()` is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API. @rustbot label T-libs-api A-unicode
Rollup of 25 pull requests Successful merges: - #157251 (`rust-analyzer` subtree update) - #157533 (Subtree sync for rustc_codegen_cranelift) - #154742 (Add APIs for case folding to the standard library) - #155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - #156222 (Stabilize `Result::map_or_default` and `Option::map_or_default`) - #157016 (add `extern "tail"` calling convention) - #157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - #157386 (Parse deprecated note links separately in rustc_resolve) - #157483 (fix windows-gnu TLS leak) - #157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - #157509 (remove solaris implementation for File::lock, it has the wrong semantics) - #157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - #156136 (Move tests box) - #156573 (Add unwinder_private_data_size for wasm64 target) - #156783 (docs: make `Rc::into_raw` clickable in `Rc::increment_strong_count` doc) - #156840 (Stabilize `PathBuf::into_string`) - #156936 (Remove FIXME about impl PinCoerceUnsized for UnsafePinned<T>) - #157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - #157380 (clarify compiler_fence (and fence) docs) - #157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - #157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - #157494 (Convert `QueryRegionConstraint` into a struct) - #157526 (std tests: skip a slow test on Miri) - #157531 (ci: bump x86_64-gnu base image to 26.04) - #157556 (Add `BTree::append()` change to 1.96.0 relnotes) Failed merges: - #155527 (Replace printables table with `unicode_data.rs` tables)
Rollup of 25 pull requests Successful merges: - rust-lang/rust#157251 (`rust-analyzer` subtree update) - rust-lang/rust#157533 (Subtree sync for rustc_codegen_cranelift) - rust-lang/rust#154742 (Add APIs for case folding to the standard library) - rust-lang/rust#155144 (mir_build: Add an extra intermediate step in MIR building for patterns ) - rust-lang/rust#156222 (Stabilize `Result::map_or_default` and `Option::map_or_default`) - rust-lang/rust#157016 (add `extern "tail"` calling convention) - rust-lang/rust#157264 (diagnostics: Fix ICE building a trait ref in method suggestions) - rust-lang/rust#157386 (Parse deprecated note links separately in rustc_resolve) - rust-lang/rust#157483 (fix windows-gnu TLS leak) - rust-lang/rust#157488 (compiletest: inject `#![windows_subsystem = "windows"]` to debuginfo tests on Windows) - rust-lang/rust#157509 (remove solaris implementation for File::lock, it has the wrong semantics) - rust-lang/rust#157521 (Rename `SyncView::{as_pin => as_pin_ref}`) - rust-lang/rust#156136 (Move tests box) - rust-lang/rust#156573 (Add unwinder_private_data_size for wasm64 target) - rust-lang/rust#156783 (docs: make `Rc::into_raw` clickable in `Rc::increment_strong_count` doc) - rust-lang/rust#156840 (Stabilize `PathBuf::into_string`) - rust-lang/rust#156936 (Remove FIXME about impl PinCoerceUnsized for UnsafePinned<T>) - rust-lang/rust#157365 (Revert "LLVM 23: Run AssignGUIDPass in some places") - rust-lang/rust#157380 (clarify compiler_fence (and fence) docs) - rust-lang/rust#157471 (Debug assert that parsed attributes are in the `BUILTIN_ATTRIBUTE_MAP`) - rust-lang/rust#157485 (Rename `errors.rs` file to `diagnostics.rs` (1/N)) - rust-lang/rust#157494 (Convert `QueryRegionConstraint` into a struct) - rust-lang/rust#157526 (std tests: skip a slow test on Miri) - rust-lang/rust#157531 (ci: bump x86_64-gnu base image to 26.04) - rust-lang/rust#157556 (Add `BTree::append()` change to 1.96.0 relnotes) Failed merges: - rust-lang/rust#155527 (Replace printables table with `unicode_data.rs` tables)
View all comments
Libs-api requested these, so here they are.
New public API (gated behind
#[feature(casefold)]):Notes
core::unicode. To accomplish that, we compute the case-folding for most characters as the lowercase of their uppercase; this double mapping adds some complexity to the implementation.eq_ignore_case(); there may be a more performant implementation.char::eq_ignore_case()is left to future work—it's a potential footgun, so we may want to think more deeply about how to expose and document that API.@rustbot label T-libs-api A-unicode