Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Statically enforce Unicode in std::fmt #526

Merged
merged 1 commit into from Dec 30, 2014

Conversation

Projects
None yet
8 participants
@alexcrichton
Copy link
Member

alexcrichton commented Dec 15, 2014

Statically enforce that the std::fmt module can only create valid UTF-8 data
by removing the arbitrary write method in favor of a write_str method.

Rendered

RFC: Statically enforce Unicode in std::fmt
Statically enforce that the `std::fmt` module can only create valid UTF-8 data
by removing the arbitrary `write` method in favor of a `write_str` method.
@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Dec 15, 2014

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 15, 2014

👍

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 15, 2014

The fmt::Writer trait can also be located as io::TextWriter instead to emphasize its possible future connection with I/O, although there are not concrete plans today to develop these connections.

It is unclear to what degree a fmt::Writer needs to interact with io::Writer and the various adaptors/buffers. For example one would have to implement their own BufferedWriter for a fmt::Writer.

I think it’s fine for this trait to be in std::fmt. My current plan is to consider it an implementation detail of formatting, experiment on crates.io with separate traits for general-purpose Unicode streams, and if that proves successful propose adding them to std at some point after Rust 1.0. Then, std::fmt::Writer could (maybe, if it’s compatible) become a re-export of std::io::TextWriter (or whatever it’ll end up being named.)

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Dec 15, 2014

I seem to remember from way back when I was working on the base64 and hex modules that push_char was significantly slower than push_byte. Would it be worth adding an unsafe write_u8 as well for things like {, ,, etc? Note that I haven't actually checked that this is still the case :)

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Dec 15, 2014

I'd probably be more in favor of an unsafe function to go from &'a u8 to &'a str and then calling write_str instead, would you be ok with that?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 15, 2014

@sfackler Looks like String::push (formerly push_char) doen’t have a fast path for ASCII (single-byte) code points. Would that help?

Also, if the char/byte is literal, how does .push_str("{") (with a &str of length 1) compare?


```rust
pub trait Writer {
fn write_str(&mut self, data: &str) -> Result;

This comment has been minimized.

@Ericson2314

Ericson2314 Dec 15, 2014

Contributor

Doesn't Result need to be instantiated?

This comment has been minimized.

@SimonSapin

This comment has been minimized.

@Ericson2314

Ericson2314 Dec 15, 2014

Contributor

Ah ok, thanks. I somehow missed that one looking for other Results.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Dec 15, 2014

@SimonSapin I'd imagine that'd help, but I ran into this like ~1.5 years ago so my memory's a bit rusty :)

@alexcrichton Yeah, implementations just transmuting from &[u8] to &str seems fine for now.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 15, 2014

@sfackler You can always used str::from_utf8_unchecked, but that seems unnecessary for the example you gave where you can just have a "{" string literal. (As opposed to '{'.)

@Sgeo

This comment has been minimized.

Copy link

Sgeo commented Dec 16, 2014

Could it be confusing for newcomers for the stdlib to have two traits called Writer? Having to check to see which is in use, etc.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 16, 2014

@Sgeo I think we more or less have a convention to take advantage of the namespacing provided by modules and not try to duplicate it within names. Rather than have use std::fmt::Writer; and later use Writer (which is unclear), you can (and maybe should) have use std::fmt; and then use fmt::Writer.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Dec 30, 2014

This RFC, which represents a longstanding request to enforce a strong Unicode convention and appears to have no real downsides, has been accepted. Tracking issue.

@lfairy

This comment has been minimized.

Copy link
Contributor

lfairy commented Jan 7, 2015

I notice that in the "after" benchmark, .to_string() is still about 3x slower for small strings. Can we do anything about the remaining overhead?

@alexcrichton alexcrichton deleted the alexcrichton:fmt-text-writer branch Jan 7, 2015

@Centril Centril added the A-fmt label Nov 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.