I’m in favor of stabilizing Read::chars eventually, but it’s not ready yet:

It’s unstable because

the semantics of a partial read/write of where errors happen is currently unclear and may change

(The same would apply to chars_lossy.)

This behavior should be per Unicode Standard §5.22 "Best Practice for U+FFFD Substitution" http://www.unicode.org/versions/Unicode8.0.0/ch05.pdf#G40630

Roughly, that means stopping at the first unexpected byte. This is not the behavior currently implemented, which reads as many bytes as indicated by the first byte and then checks them. This is a problem as, with only Read (as opposed to, say, BufRead), you can’t put a byte "back" in the stream after reading it.

Here are some failing tests.

        let mut buf = Cursor::new(&b"\xf0\x9fabc"[..]);
        let mut chars = buf.chars();
        assert!(match chars.next() { Some(Err(CharsError::NotUtf8)) => true, _ => false });
        assert_eq!(chars.next().unwrap().unwrap(), 'a');
        assert_eq!(chars.next().unwrap().unwrap(), 'b');
        assert_eq!(chars.next().unwrap().unwrap(), 'c');
        assert!(chars.next().is_none());

        let mut buf = Cursor::new(&b"\xed\xa0\x80a"[..]);
        let mut chars = buf.chars();
        assert!(match chars.next() { Some(Err(CharsError::NotUtf8)) => true, _ => false });
        assert_eq!(chars.next().unwrap().unwrap(), 'a');
        assert!(chars.next().is_none());

        let mut buf = Cursor::new(&b"\xed\xa0a"[..]);
        let mut chars = buf.chars();
        assert!(match chars.next() { Some(Err(CharsError::NotUtf8)) => true, _ => false });
        assert_eq!(chars.next().unwrap().unwrap(), 'a');
        assert!(chars.next().is_none());

I’ve looked at fixing this, but it basically involves duplicating all of the UTF-8 decoding logic from str::from_utf8, which I’m not really happy with. (That many more tests would need to be added.) I’ll try to think of some way to have a more basic decoder that can be used by both.

Tracking issue for Read::chars #27802

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions