Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upTracking issue for UTF-16 decoding iterators #27830
Comments
SimonSapin
referenced this issue
Aug 14, 2015
Merged
Refactor low-level UTF-16 decoding and move Borrow(Mut) to libcore. #27808
steveklabnik
added
the
A-libs
label
Aug 17, 2015
alexcrichton
added
T-libs
B-unstable
labels
Mar 8, 2016
aturon
added
the
I-nominated
label
Mar 9, 2016
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton
added
final-comment-period
and removed
I-nominated
labels
Mar 11, 2016
This comment has been minimized.
This comment has been minimized.
|
My personal opinion: Ok, so the APIs that this is talking about are:
I would not personally be a huge fan of stabilizing this cycle, some concerns being:
I think I'd be more comfortable with stabilizing given some utf-8 decoding functions as well, but it's probably worth also looking at the matrix of conversions we have:
I guess in that sense we'd be "complete" with |
This comment has been minimized.
This comment has been minimized.
I don't see this as a concern, really. The more we work with iterators, the more natural things like "iterator transformers" such as this are. It's a fine ball to get rolling in my opinion.
Agreed.
Perhaps so -- but IMO this should influence organization more than anything. That is, we might want to think about a submodule for constants, if we do anticipate adding more over time. Most of the other points have the flavor of: why stabilize just this one piece? I agree that I'd really like to have an overall vision here; I feel like every cycle we stabilize a couple of related methods. That said, your matrix is pretty useful, and does indeed suggest we should land both this and an analogous utf8 decoder. |
This comment has been minimized.
This comment has been minimized.
|
I would be in favor of postponing until we have a more complete vision here as well (I'm a consumer of UTF-8 functions like this, which I typically just implement in-crate). There may be other routines we want to consider as well, for example, decoding a UTF-8 sequence in reverse can often be useful. |
This comment has been minimized.
This comment has been minimized.
Since
We can remove the constant if lossy decoding is built-in.
Yes, I do think we’re missing something lower-level than we currently have in But since I have some experiments at https://github.com/SimonSapin/rust-utf8. (In you’re interested, the commit history shows a number of different APIs I tried.) It supports "incremental" lossy decoding: input is a number of But this is significantly more API surface than, say, an iterator adaptor. And there’s probably a wide variety of use cases with slightly different constraints (@BurntSushi mentioned decoding in reverse), so I don’t know if it makes sense to try and support all of them in Still, it’d be nice to have a single UTF-8 decoding primitive |
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin Would |
This comment has been minimized.
This comment has been minimized.
|
@BurntSushi Building something that yields |
This comment has been minimized.
This comment has been minimized.
|
The libs team discussed this during triage yesterday and the conclusion was to stabilize essentially everything as-is modulo changing the error returned by the iterator. We felt that there's room for decoding an iterator of u8 to char, but we can always add that later. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Apr 7, 2016
alexcrichton
referenced this issue
Apr 7, 2016
Merged
std: Stabilize APIs for the 1.9 release #32804
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Apr 7, 2016
alexcrichton
referenced this issue
Apr 7, 2016
Closed
Tracking issue for char encoding methods #27784
Manishearth
added a commit
to Manishearth/rust
that referenced
this issue
Apr 8, 2016
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Apr 8, 2016
bors
added a commit
that referenced
this issue
Apr 9, 2016
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Apr 11, 2016
bors
added a commit
that referenced
this issue
Apr 12, 2016
bors
closed this
in
#32804
Apr 12, 2016
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Apr 12, 2016
This comment has been minimized.
This comment has been minimized.
|
Is there any reason that these items are defined in I was investigating #49319 and ended up here. |
This comment has been minimized.
This comment has been minimized.
|
I’ll respond in #49319 since this thread has been closed for two years :) |
SimonSapin commentedAug 14, 2015
#27808 proposes exposing in
std::chartwo iterator adaptorsUtf16DecoderandUtf16LossyDecoder. This functionality was previously only available with an API that require allocation (String::from_utf16{,_lossy}) or using the unstablerustc_unicodecrate directly.They are exposed unstable with a new
utf16_decoderfeature name. I’d like to stabilize them when we’re confident with the naming and API.