Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for str_utf16 stabilization #27714

Closed
aturon opened this issue Aug 12, 2015 · 8 comments
Closed

Tracking issue for str_utf16 stabilization #27714

aturon opened this issue Aug 12, 2015 · 8 comments
Labels
B-unstable Feature: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@aturon
Copy link
Member

aturon commented Aug 12, 2015

The str_utf16 feature tracks APIs like str::utf16_units that support utf16 encoding. These APIs are working well today, but it would be good to have a clear picture of the final state of APIs we'd like to have for various string encodings in std, and how that relates to other unicode support crates.

cc @SimonSapin

@aturon aturon added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Feature: Implemented in the nightly compiler and unstable. labels Aug 12, 2015
@SimonSapin
Copy link
Contributor

Regarding naming: utf16_units is not a good name. As far as I can tell, the choice of "code unit" in the Unicode Standard is mostly “what’s another word for "thing" that we haven’t used yet?” I suggest renaming str::utf16_units to str::encode_utf16 (and its return type accordingly, if it stays around at all).

Regarding the API: &str -> Iterator<Item=u16> in the middle of the balance between terseness and generality:

  • If what you want is a Vec<u16>, having the method return an iterator makes you write "redundant" .collect() (or even .collect::<Vec<_>>()).
  • If what you have is an iterator of char, collecting it into a String is wasteful as the only thing the method does with &str is get a str::Chars iterator out of it.

I suggest changing str::encode_utf16 to go all the way to the "convenience" side and make it return a Vec<u16>. (This is symmetrical with String::from_utf16* where neither input or output is an iterator.) char::encode_utf16 (#27784) can be used directly to avoid extra allocations in cases where a different type of input or output is desired.

bors added a commit that referenced this issue Aug 27, 2015
* Rename `Utf16Items` to `Utf16Decoder`. "Items" is meaningless.
* Generalize it to any `u16` iterator, not just `[u16].iter()`
* Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl.
* Replace `Utf16Item::to_char_lossy` with a `Utf16Decoder::lossy` iterator adaptor.

This is a [breaking change], but only for users of the unstable `rustc_unicode` crate.

I’d like this functionality to be stabilized and re-exported in `std` eventually, as the "low-level equivalent" of `String::from_utf16` and `String::from_utf16_lossy` like #27784 is the low-level equivalent of #27714.

CC @aturon, @alexcrichton
@aturon
Copy link
Member Author

aturon commented Nov 3, 2015

While this still isn't ready for stabilization, I'm nominating for libs team discussion about @SimonSapin's proposals.

@aturon aturon added I-nominated I-needs-decision Issue: In need of a decision. and removed I-nominated labels Nov 3, 2015
@retep998
Copy link
Member

retep998 commented Nov 4, 2015

I've seen several people try to use this rather than OsStrExt::encode_wide when trying to work with Windows API. Perhaps some sort of warning about this would help.

@alexcrichton alexcrichton added I-nominated and removed I-needs-decision Issue: In need of a decision. labels Dec 16, 2015
@SimonSapin
Copy link
Contributor

@alexcrichton why remove the needs-decision tag? What do you think of the changes proposed in my previous comment?

@alexcrichton
Copy link
Member

Ah I misinterpreted @aturon's original intention, but the libs team decided yesterday to talk about this during the next triage meeting, and I-nominated is now our best way to do that, so I'm gonna leave as is :)

@aturon aturon removed the I-nominated label Jan 8, 2016
@aturon
Copy link
Member Author

aturon commented Jan 8, 2016

Libs team consensus: we agree with the proposed renaming, but feel that the current signature (returning an iterator) is the most consistent with the design elsewhere in the standard library, and that on balance the flexibility of an iterator outweighs the minor pain of writing .collect() in most uses. We return Vec very very rarely in the standard library today.

@alexcrichton
Copy link
Member

🔔 This issue is now entering its cycle final comment period to be stabilized in 1.8 🔔

The libs team would like to rename the method to encode_utf16, but beyond that we have no changes planned

@alexcrichton alexcrichton added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed I-nominated labels Jan 29, 2016
@alexcrichton
Copy link
Member

The libs team discussed this during triage yesterday and the conclusion was to stabilize with the name encode_utf16.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Feb 29, 2016
This commit is the result of the FCPs ending for the 1.8 release cycle for both
the libs and the lang suteams. The full list of changes are:

Stabilized

* `braced_empty_structs`
* `augmented_assignments`
* `str::encode_utf16` - renamed from `utf16_units`
* `str::EncodeUtf16` - renamed from `Utf16Units`
* `Ref::map`
* `RefMut::map`
* `ptr::drop_in_place`
* `time::Instant`
* `time::SystemTime`
* `{Instant,SystemTime}::now`
* `{Instant,SystemTime}::duration_since` - renamed from `duration_from_earlier`
* `{Instant,SystemTime}::elapsed`
* Various `Add`/`Sub` impls for `Time` and `SystemTime`
* `SystemTimeError`
* `SystemTimeError::duration`
* Various impls for `SystemTimeError`
* `UNIX_EPOCH`
* `ops::{Add,Sub,Mul,Div,Rem,BitAnd,BitOr,BitXor,Shl,Shr}Assign`

Deprecated

* Scoped TLS (the `scoped_thread_local!` macro)
* `Ref::filter_map`
* `RefMut::filter_map`
* `RwLockReadGuard::map`
* `RwLockWriteGuard::map`
* `Condvar::wait_timeout_with`

Closes rust-lang#27714
Closes rust-lang#27715
Closes rust-lang#27746
Closes rust-lang#27748
Closes rust-lang#27908
Closes rust-lang#29866
bors added a commit that referenced this issue Feb 29, 2016
This commit is the result of the FCPs ending for the 1.8 release cycle for both
the libs and the lang suteams. The full list of changes are:

Stabilized

* `braced_empty_structs`
* `augmented_assignments`
* `str::encode_utf16` - renamed from `utf16_units`
* `str::EncodeUtf16` - renamed from `Utf16Units`
* `Ref::map`
* `RefMut::map`
* `ptr::drop_in_place`
* `time::Instant`
* `time::SystemTime`
* `{Instant,SystemTime}::now`
* `{Instant,SystemTime}::duration_since` - renamed from `duration_from_earlier`
* `{Instant,SystemTime}::elapsed`
* Various `Add`/`Sub` impls for `Time` and `SystemTime`
* `SystemTimeError`
* `SystemTimeError::duration`
* Various impls for `SystemTimeError`
* `UNIX_EPOCH`
* `ops::{Add,Sub,Mul,Div,Rem,BitAnd,BitOr,BitXor,Shl,Shr}Assign`

Deprecated

* Scoped TLS (the `scoped_thread_local!` macro)
* `Ref::filter_map`
* `RefMut::filter_map`
* `RwLockReadGuard::map`
* `RwLockWriteGuard::map`
* `Condvar::wait_timeout_with`

Closes #27714
Closes #27715
Closes #27746
Closes #27748
Closes #27908
Closes #29866
Closes #28235
Closes #29720
dlrobertson pushed a commit to dlrobertson/rust that referenced this issue Nov 29, 2018
This commit is the result of the FCPs ending for the 1.8 release cycle for both
the libs and the lang suteams. The full list of changes are:

Stabilized

* `braced_empty_structs`
* `augmented_assignments`
* `str::encode_utf16` - renamed from `utf16_units`
* `str::EncodeUtf16` - renamed from `Utf16Units`
* `Ref::map`
* `RefMut::map`
* `ptr::drop_in_place`
* `time::Instant`
* `time::SystemTime`
* `{Instant,SystemTime}::now`
* `{Instant,SystemTime}::duration_since` - renamed from `duration_from_earlier`
* `{Instant,SystemTime}::elapsed`
* Various `Add`/`Sub` impls for `Time` and `SystemTime`
* `SystemTimeError`
* `SystemTimeError::duration`
* Various impls for `SystemTimeError`
* `UNIX_EPOCH`
* `ops::{Add,Sub,Mul,Div,Rem,BitAnd,BitOr,BitXor,Shl,Shr}Assign`

Deprecated

* Scoped TLS (the `scoped_thread_local!` macro)
* `Ref::filter_map`
* `RefMut::filter_map`
* `RwLockReadGuard::map`
* `RwLockWriteGuard::map`
* `Condvar::wait_timeout_with`

Closes rust-lang#27714
Closes rust-lang#27715
Closes rust-lang#27746
Closes rust-lang#27748
Closes rust-lang#27908
Closes rust-lang#29866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Feature: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants