Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upTracking issue: UTF-8 decoder in libcore #33906
Comments
apasel422
added
the
A-libs
label
Jun 24, 2016
alexcrichton
added
T-libs
B-unstable
labels
Jul 14, 2016
bors
added a commit
that referenced
this issue
Jul 14, 2016
This comment has been minimized.
This comment has been minimized.
|
I’ve submitted #35947 to make this emit errors as specified in Unicode, like |
This comment has been minimized.
This comment has been minimized.
|
I think this is the tracking issue for this functionality? DecodeUtf8, like DecodeUtf16, should have a way to recover invalid byte sequences encountered. |
This comment has been minimized.
This comment has been minimized.
|
std::str::next_code_point being public is bad; it assumes valid UTF-8 input and libcore needs the agility to use unsafe code in this function if it turns out to be beneficial. |
This comment has been minimized.
This comment has been minimized.
|
The initial message of this issue is somewhat misleading now that this is the tracking issue for the It is not anymore about |
martinhath
referenced this issue
Jan 12, 2017
Closed
std::path::Path::Display allocates internally #38879
aturon
changed the title
UTF-8 decoder in libcore
Tracking issue: UTF-8 decoder in libcore
Mar 3, 2017
steveklabnik
removed
the
A-libs
label
Mar 24, 2017
Mark-Simulacrum
added
the
C-tracking-issue
label
Jul 22, 2017
SimonSapin
referenced this issue
Mar 17, 2018
Merged
Add an example of lossy decoding to str::Utf8Error docs #49105
This comment has been minimized.
This comment has been minimized.
|
I’m inclined to not stabilize this. Now that
If you want to build a It’s tempting to add new APIs to libcore for something like the example above, but there’s a lot of possible variation: returning an https://docs.rs/utf-8/ tries to support all of these use cases (still on top of @strake what do you think? |
kennytm
added a commit
to kennytm/rust
that referenced
this issue
Mar 22, 2018
This comment has been minimized.
This comment has been minimized.
|
The libs team discussed this and the consensus was to deprecate this feature. The use case motivating it can be handled by using @rfcbot fcp close |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Mar 30, 2018
•
|
Team member @SimonSapin has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
rfcbot
added
the
proposed-final-comment-period
label
Mar 30, 2018
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Apr 3, 2018
|
|
rfcbot
added
final-comment-period
and removed
proposed-final-comment-period
labels
Apr 3, 2018
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin No objection here — this should be the common use case, and for cases where one truly wants to operate on a single byte at a time from an iterator, the code need not be in libcore. |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Apr 13, 2018
|
The final comment period is now complete. |
SimonSapin
added a commit
to SimonSapin/rust
that referenced
this issue
Apr 14, 2018
SimonSapin
referenced this issue
Apr 14, 2018
Merged
Deprecate Read::chars and char::decode_utf8 #49970
SimonSapin
added a commit
to SimonSapin/rust
that referenced
this issue
Apr 15, 2018
kennytm
added a commit
to kennytm/rust
that referenced
this issue
Apr 24, 2018
This comment has been minimized.
This comment has been minimized.
|
Deprecated in #49970 |
strake commentedMay 27, 2016
•
edited by SimonSapin
Update (@SimonSapin): this is now the tracking issue for these items in both
core::charandstd::char:decode_utf8()which takes an iterable ofu8and returnDecodeUtf8DecodeUtf8which implementsIterator<Item=Result<char, InvalidSequence>>InvalidSequencewhich is opaqueOriginal issue:
In libcore we have a facility to encode a character to UTF-8, i.e.
char::EncodeUtf8, but no facility to decode a character from potentially-invalid UTF-8, and return 0xFFFD if it reads an invalid sequence, which seems a surprising omission to me as a libcore user, given in libstd we havestring::String::from_utf8_lossy.These options came to mind:
str::next_code_point_lossyor so which behaves asstr::next_code_pointbut checks whether its input is valid and returns 0xFFFD if notDecodeUtf8which one can make from an arbitrary iterator of bytes, which decodes them