Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement std::convert traits for char #35755

Merged
merged 2 commits into from Sep 1, 2016
Merged

Conversation

@SimonSapin
Copy link
Contributor

SimonSapin commented Aug 17, 2016

This is motivated by avoiding the as operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible.

I’m less certain that From<u8> for char should be implemented: while it matches an existing behavior of as, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.

@rust-highfive
Copy link
Collaborator

rust-highfive commented Aug 17, 2016

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@sfackler sfackler added the T-libs label Aug 17, 2016
///
/// Surrogates are used in the UTF-16 encoding, and therefore are not characters.
SurrogateCodePoint,
}

This comment has been minimized.

Copy link
@sfackler

sfackler Aug 17, 2016

Member

We're never going to need to add any extra cases to this enum, right? Should we stick a __ForExtensibility variant just in case?

@sfackler
Copy link
Member

sfackler commented Aug 17, 2016

I personally feel okay about the u8 -> char impl as it seems like what everyone would expect to happen, but am interested to see what other people think.

cc @rust-lang/libs

@alexcrichton
Copy link
Member

alexcrichton commented Aug 18, 2016

Seems reasonable to me, but I'd prefer to use an opaque struct with optional method accessors rather than an enum for the error type in TryFrom

@SimonSapin SimonSapin force-pushed the SimonSapin:char_convert branch from 249b789 to 82678c5 Aug 18, 2016
@SimonSapin
Copy link
Contributor Author

SimonSapin commented Aug 18, 2016

@sfackler Re extensibility, I don’t expect this to ever be needed. The range of Unicode Scalar Values changed exactly once in the history of Unicode. At first it was “16 bits ought to be enough for everybody” 0x0000...0xFFFF. When that turned out not to be enough and a lot of systems were already using u16 as a code unit, UTF-16 was introduced. The current range, 0x0000...0xD7FF | 0xE000...0x10FFFF, is exactly what can be encoded by UTF-16. (UTF-8’s original design can support up to 0x7FFF_FFFF with 6-bytes sequences. It is now artificially restricted to match.)

In Unicode 9.0, 76% of the million and some code points are unassigned, so they’re not expected to run out. And breaking compatibility with UTF-16 is such a breaking change that I imagine it’s not even considered.

And this concern disappears with…

@alexcrichton Yeah, I also considered an opaque struct. Even without accessor method since I can’t think of a use case for it. (Code like a WTF-8 implementation that wants to deal with surrogate code points will likely do its own code point arithmetic anyway.) And a method cal always be added later. I’ve updated the PR.

@alexcrichton
Copy link
Member

alexcrichton commented Aug 23, 2016

Discussed during @rust-lang/libs triage today, conclusion was to merge. Thanks for the update @SimonSapin!

@bors: r+

@bors
Copy link
Contributor

bors commented Aug 23, 2016

📌 Commit 82678c5 has been approved by alexcrichton

eddyb added a commit to eddyb/rust that referenced this pull request Aug 23, 2016
…hton

Implement std::convert traits for char

This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible.

I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.
@bors
Copy link
Contributor

bors commented Aug 23, 2016

The latest upstream changes (presumably #35656) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -176,6 +172,41 @@ pub unsafe fn from_u32_unchecked(i: u32) -> char {
transmute(i)
}

#[stable(feature = "char_convert", since = "1.12.0")]

This comment has been minimized.

Copy link
@ollie27

ollie27 Aug 27, 2016

Contributor

Should these not be "1.13.0"?

This comment has been minimized.

Copy link
@SimonSapin

SimonSapin Aug 29, 2016

Author Contributor

Not at the time I first opened this PR, but now yes. Fixed.

#[stable(feature = "char_convert", since = "1.12.0")]
impl From<u8> for char {
#[inline]
fn from(i: u8) -> Self {

This comment has been minimized.

Copy link
@ollie27

ollie27 Aug 27, 2016

Contributor

I think it would be a good idea to add documentation to this method as it's not completely obvious how it's implemented. Specifically it should mention the code page.

This comment has been minimized.

Copy link
@SimonSapin

SimonSapin Aug 29, 2016

Author Contributor

I disagree with using the term "code page", but I’ve added a detailed doc-comment.

/// The error type returned when a conversion from u32 to char fails.
#[unstable(feature = "try_from", issue = "33417")]
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct CharTryFromError(());

This comment has been minimized.

Copy link
@ollie27

ollie27 Aug 27, 2016

Contributor

This should probably implement std::error::Error.

This comment has been minimized.

Copy link
@SimonSapin

SimonSapin Aug 29, 2016

Author Contributor

Done.

@bors
Copy link
Contributor

bors commented Aug 29, 2016

🔒 Merge conflict

SimonSapin added 2 commits Aug 17, 2016
These fit with other From implementations between integer types.

This helps the coding style of avoiding the 'as' operator that sometimes
silently truncates, and signals that these specific conversions are
lossless and infaillible.
For symmetry with From<char> for u32.
@SimonSapin SimonSapin force-pushed the SimonSapin:char_convert branch from 82678c5 to f040208 Aug 29, 2016
@alexcrichton
Copy link
Member

alexcrichton commented Aug 29, 2016

@bors
Copy link
Contributor

bors commented Sep 1, 2016

Testing commit f040208 with merge b2799a5...

bors added a commit that referenced this pull request Sep 1, 2016
Implement std::convert traits for char

This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible.

I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.
@bors bors merged commit f040208 into rust-lang:master Sep 1, 2016
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
homu Test successful
Details
@bluss bluss added the relnotes label Sep 8, 2016
@bluss
Copy link
Member

bluss commented Sep 8, 2016

Mini-reminder: let's tag more user-visible stuff with relnotes

@SimonSapin SimonSapin deleted the SimonSapin:char_convert branch Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.