RFC: io and os reform: initial skeleton #517

aturon · 2014-12-12T06:35:30Z

This RFC proposes a significant redesign of the std::io and std::os modules
in preparation for API stabilization. The specific problems addressed by the
redesign are given in the Problems section below, and the key ideas of the
design are given in Vision for IO.

Note about RFC structure

This RFC was originally posted as a single monolithic file, which made
it difficult to discuss different parts separately.

It has now been split into a skeleton that covers (1) the problem
statement, (2) the overall vision and organization, and (3) the
std::os module.

Other parts of the RFC are marked with (stub) and will be filed as
follow-up PRs against this RFC.

Rendered

aturon · 2014-12-12T06:36:22Z

cc @carllerche @wycats @alexcrichton

carllerche · 2014-12-12T06:53:03Z

I will have to read this in more detail tomorrow, but I just wanted to mention that it seems that adding fn skip(...) to the Reader trait has not been mentioned. This would handle issue rust-lang/rust#13989.

sfackler · 2014-12-12T06:59:36Z

text/0000-io-os-reform.md

+    // these all return partial results on error
+    fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... }
+    fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... }
+    fn read_at_least(&mut self, min: uint,  buf: &mut [u8]) -> NonatomicResult<uint, uint, Err>  { ... }


It's never been totally clear to me what the exact use case for this is. Is this method ever called with min not equal to buf.len()?

Conceptually I've considered this in terms of a buffered reader. For example if you ask for 10 bytes from a buffered reader, the buffered reader can pass its entire buffer to the underlying reader, but request that only 10 bytes be actually read. In that sense I think it's a bit of a performance optimization where you're willing to accept a huge amount of bytes but only require a few.

I don't think this is implemented or used much in practice though, so the benefit may be fairly negligible to have the extra argument.

Actually requesting only 10 bytes sounds different than what this function's name describes. For the "only read 10 bytes" case, I'd expect one would pass a buffer.slice_to(10) to read (well, some form of read that always reads the amount requested).

@alexcrichton

In that sense I think it's a bit of a performance optimization

I don't quite understand what kind of performance gain you expect.

Reducing the number of read()-like calls against BufferedReader, or

reducing the number of read() calls by BufferedReader against the underlying "real" stream such as File or TcpStream?

The former will only save negligible number of nanoseconds (if any) because BufferedReader::read() etc. are memory operations in user space. The latter is a matter of tuning internal parameters of BufferedReader.

Did I miss anything? My understanding of its behavior is:

let mut b = [0u8, .. 30]; let res = r.read_at_least(10, b.slice_to_mut(20));

res can be any of Ok(10), Ok(15), Ok(20), or Err(PartialResult(5, EndOfFile)). It will be tedious to change how to cook the content of b depending on the Ok() value.

I can't think of any practical usages of read_at_least(). @alexcrichton How would you use it in, say, your tar code?

In the case of a buffered reader, I would consider it a performance optimization in terms of the number of reads of the underlying stream. The buffered reader can pass down a very large buffer but only request that a tiny part gets filled, and if more than that is filled in then it results in, for example, fewer syscalls (in theory).

@alexcrichton The current implementation of BufferedReader simply calls read() with its entire internal buffer. Then it should suffice to have a simpler convenience method like the below one in Reader because it will be "inherited" by BufferedReader:

fn read_exact(&mut self, buf: &mut [u8]) -> NonatomicResult<(), uint, Err> { let mut read_so_far = 0; while read_so_far < buf.len() { match self.read(buf.slice_from_mut(read_so_far)) { Ok(n) => read_so_far += n, Err(e) => return NonatomicResult(read_so_far, e) } } Ok(()) }

(cf. PR rust-lang/rust#18059 )

sfackler · 2014-12-12T07:05:56Z

text/0000-io-os-reform.md

+        Deadlined { deadline: deadline, inner: inner }
+    }
+
+    pub fn deadline(&self) -> u64 {


s/u64/Duration/

sfackler · 2014-12-12T07:14:27Z

text/0000-io-os-reform.md

+and `read_char` is removed in favor of the `chars` iterator. These
+iterators will be changed to yield `NonatomicResult` values.
+
+The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay


Are these being renamed? If not, they'll have to live in a different module from the BufferedReader trait, right?

It might also be worth thinking about if we want to keep BufferedStream::with_capacities as is, or have it take a single size that's used for both buffers. I'm not really sure if anyone wants different buffer sizes for readers and writers.

Argh! The renaming to BufferedReader came late and I didn't catch this.

I'm not particularly happy with the trait name. @alexcrichton prefers Buffered, but I worry that we may eventually want something for buffered writers.

Suggestions welcome.

re: with_capacities, I agree; we could simplify for now, and add back this functionality later if needed.

sfackler · 2014-12-12T07:43:00Z

I'm a bit worried about the timeout changes making some uses of the current infrastructure impossible, or maybe just painful/awkward. For example, rust-postgres provides an iterator over asynchronous notifications sent from the database. A method defined on the iterator is next_block_for, which blocks waiting for a notification for some duration. The implementation sets a read timeout on the socket, reads the first byte, and then unsets the timeout. The assumption is that if we get any data from the server, we can expect a full message to come in fairly quickly after that. The logic required to read half a message and then stop and save it off when we hit the IO timeout is just too complex to bother with.

The current setup works fine, if a bit awkwardly: https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/message.rs#L279-L285
https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/lib.rs#L286-L320

With the new infrastructure, it'll still be possible to take the same strategy, but probably through some kind of gross hackery like reading the first byte with a timeout, and then passing that byte to the main message read function without the timeout.

What would really be ideal is to have the ability to wait on the socket for data to be ready to read for a certain period of time. Is something like that feasible to implement before 1.0 in a cross platform manner?

SimonSapin · 2014-12-12T10:11:40Z

they involve extending a vector's capacity, and then passing in the resulting uninitialized memory to the read method, which is not marked unsafe! Thus the current design can lead to undefined behavior in safe code.

I don’t understand why this is undefined behavior and unsafe fn with_extra(&mut self, n: uint) -> &mut [T]; on Vec is not.

SimonSapin · 2014-12-12T10:22:42Z

roughly interpreted at UTF-16, but may not actually be valid UTF-16 -- an "encoding" often call UCS-2; see http://justsolve.archiveteam.org/wiki/UCS-2 for a bit more detail.

I like the explanation linked there. Good find. 👍

SimonSapin · 2014-12-12T10:30:15Z

text/0000-io-os-reform.md

+
+    impl OsStr {
+        pub fn from_str(value: &str) -> &OsStr;
+        pub fn as_str(&self) -> Option<&str>;


Should this be -> Result<&str, ()>?

Or as a larger point (perhaps out of scope for this RFC), should Option<T> return types be Result<T, ()> instead when None kind of represents an error, in order to interoperate with try! and other error-handling infrastructure we might add?

I recently changed from_utf8 to return Result<&str, Utf8Error>, so this can probably pick up that error. I suspect this will probably just continue to return the same value as str::from_utf8.

I do think that in general Option should only be used where None is a normal value, not an error (to use with try! as you pointed out). In the second pass of stabilization we're going to look closely at all this.

A string not being UTF-8 or a file not existing aren't any more of an error than a key not existing in a map. Attempting to open a file or parse text is also a way to discover if what you were looking for was there, just like a map lookup. There are few remaining use cases for Option if it's not meant to be used this way... any missing value can be considered an error, just like a missing file / whatever.

netvl · 2014-12-12T10:34:37Z

Currently Rust IpAddr structure does not support zone indices in IPv6 addresses. They are needed for link-local addresses which are arguably much more important in IPv6 than in IPv4. Are there plans to do something with it?

netvl · 2014-12-12T10:38:21Z

text/0000-io-os-reform.md

+```
+
+In addition, `read_line` is removed in favor of the `lines` iterator,
+and `read_char` is removed in favor of the `chars` iterator (now on


These methods are occasionally very useful when you don't need to read the entire stream but only a few lines or characters. I have several of these in my code base. The supposed replacement

let line = r.lines().next().unwrap()

doesn't look really good.

Also, why chars() is on Reader? Doesn't reading characters require buffering?

Yes we were thinking that the code you listed would be the replacement for a bare read_line. In general we're trying to move as much functionality to iterators as possible, and we could possibly add some form of method on an iterator which peels off the first element, failing if it's None if this becomes too unergonomic.

The current implementation for chars() doesn't actually use buffering at all, it just peeks at a byte and then might read some more bytes. We thought that if we're exposing bytes() on Reader which is not speedy unless buffered, then we may as well expose chars() as well.

@netvl Note that the .unwrap() part (or something like try!()) is necessary in either cases. I don't think

let first = r.lines().next().unwrap(); let second = r.lines().next().unwrap();

or

let mut lines = r.lines(); let first = lines.next().unwrap(); let second = lines.next().unwrap();

looks that worse than

let first = r.read_line().unwrap(); let second = r.read_line().unwrap();

As for chars(), any Unicode character in UTF-8 occupies at most 6 bytes. So read() into a fixed array on stack will suffice.

untitaker · 2015-01-23T17:59:43Z

text/0000-io-os-reform.md

+RFC discusses the most significant problems below.
+
+This section only covers specific problems with the current library; see
+[Vision for IO] for a higher-level view.  section.


...for a higher-level view. section.

Typo?

aturon · 2015-01-26T17:08:04Z

The fs sub-RFC has now been posted.

yazaddaruvala · 2015-01-28T05:37:34Z

@aturon given that the eventual goal is to have both blocking and nonblocking implementations of the std::io crate do you think it makes sense to move the new blocking APIs into std::io::blocking rather than just std::io? (replace io with fs or process where it makes sense).

Summary:
std::io::nonblocking could then be created lazily. The naming would let users know that std::io doesn't have a default, there are two options and its up to them to pick.

Long Story:
I think we can all agree, blocking io and nonblocking io are just different, neither is really better than the other. They just have different trade offs and should be used appropriately. Given that:

Option 1: Call the APIs read vs read_async

You're sort of suggesting that async is a special kind of read. Also since one has 2x the characters, its slightly discouraging (both to write and read).
Its not explicit that read is blocking and may accidentally be used when it shouldn't.

Option 2: Call the APIs read_sync vs read_async

Just a bit verbose

Option 3: Call the APIs blocking::read vs nonblocking::read

It would be a little unusual for the same mod to use both types of io. Generally then (when used exclusively) you would get simple function names in both cases. eg. read

I would be happy with either Option 2 or 3, preferring 3. Option 1 just leaves me a little uneasy..

nodakai · 2015-01-28T06:33:13Z

@yazaddaruvala I think it's more practical to have std::io for blocking I/O and std::io::nonblocking for non-blocking I/O. I doubt anyone would complain about the "asymmetry" between them. (Maybe we can lift std::io::blocking to std::io...)

tshepang · 2015-01-28T07:33:32Z

Maybe we can lift std::io::blocking to std::io..

I don't understand this @nodakai.

nodakai · 2015-01-28T07:41:14Z

@tshepang I meant re-export such as std::io::fs::File → std::io::File

yazaddaruvala · 2015-01-29T05:47:25Z

@nodakai I understand the desire to have a "default" io, especially given that its currently the only io. But is it really the right long term philosophy?

If the only current difference is import std::io vs import std::io::blocking is it really worth causing asymmetry? i.e. implicitly endorsing one over the other?

I've only played around with Nodejs a little, and some of its ideas are definitely controversial, but I think everyone can agree the one thing it did really well was educate all of its users about the difference between blocking and nonblocking io. And yes you could achieve this through documentation.. but similar to immutable by default, syntax/explicitness (at minimal cost) is the best way to educate people.

reem · 2015-01-29T06:29:18Z

I've actually been doing a lot of experimentation and work on non-blocking IO and I can say that I do think that blocking IO is a better default model for Rust. It fits much better in with the borrow system for resource management, since the usage of all resources is deterministic relative to the structure of the code, whereas this is not true at all for asynchronous actions.

Additionally, until (if?) Rust has true async/await support or even simple generators via yield it is pretty hard to write clean asynchronous code or integrate well with the borrow checker. Clean non-blocking IO interfaces also inevitably end up doing a lot of double-buffering and allocation, which can really hamper performance.

The non-clean, extremely low-level bindings for asynchronous IO already exist in the form of mio, and I really think there is no need to integrate these things into std::io until they have a time to mature and we figure out the best way to model asynchronous behavior in Rust. We can always add a std::nio or std::aio in the future, though I'm still not convinced that there's anything wrong with this stuff just living in the cargo-verse.

tshepang · 2015-01-29T11:16:11Z

std::nio/std::aio are neat names, much better than std::io::nonblocking

aturon · 2015-01-30T20:41:04Z

I agree with what several others have said here: I think std::io is fine, with the potential to either grow in place to encompass async IO or else add std::aio later on.

MarkusJais · 2015-02-03T08:21:38Z

I agree, too. std::aio sounds much better than std::io::nonblocking.

aturon · 2015-02-03T22:59:14Z

An amendment for std::net has been posted.

mzabaluev · 2015-02-05T09:44:27Z

Regarding with_extra, I wonder if there could be a more self-explanatory way to use it:

impl<T> Vec<T> where T: Copy {
    pub unsafe fn fill_more<F, E>(&mut self, len: usize, op: F) -> Result<usize, E>
        where F: FnOnce(&mut [T]) -> Result<usize, E>
    { ... }
}

erickt · 2015-02-05T17:38:12Z

@mzabaluev: Or maybe Vec::with_uninitialized(...)?

…nstructors, r=alexcrichton `std::io` does not currently expose the `stdin_raw`, `stdout_raw`, or `stderr_raw` functions. According to the current plans for stdio (see rust-lang/rfcs#517), raw access will likely be provided using the platform-specific `std::os::{unix,windows}` modules. At the moment we don't expose any way to do this. As such, delete all mention of the `*_raw` functions from the `stdin`/`stdout`/`stderr` function documentation. While we're at it, remove a few `pub`s from items that aren't exposed. This is done just to lessen the confusion experienced by anyone who looks at the source in an attempt to find the `*_raw` functions.

RFC: io and os reform

679f487

sfackler reviewed Dec 12, 2014
View reviewed changes

Typo fix

00ca609

aturon force-pushed the io-os-reform branch from 786d0b4 to 00ca609 Compare December 12, 2014 07:00

sfackler reviewed Dec 12, 2014
View reviewed changes

text/0000-io-os-reform.md

Deadlined { deadline: deadline, inner: inner }

}

pub fn deadline(&self) -> u64 {

Copy link

Member

sfackler Dec 12, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/u64/Duration/

aturon added 2 commits December 11, 2014 23:12

Remove spurious chars method

a58d081

Change u64 to Duration

6a19ab2

sfackler reviewed Dec 12, 2014
View reviewed changes

SimonSapin reviewed Dec 12, 2014
View reviewed changes

netvl reviewed Dec 12, 2014
View reviewed changes

alexcrichton mentioned this pull request Jan 20, 2015

need std::io::listener for impl Listener<TcpStream, TcpAcceptor> for TcpListener rust-lang/rust#12040

Closed

aturon mentioned this pull request Jan 23, 2015

std: Rename io to old_io rust-lang/rust#21543

Merged

untitaker reviewed Jan 23, 2015
View reviewed changes

aturon mentioned this pull request Jan 26, 2015

Amend RFC 517: Add material on std::fs #739

Merged

alexcrichton mentioned this pull request Feb 3, 2015

Amend RFC 517: Add material on std::net #807

Merged

zemlanin mentioned this pull request Feb 15, 2015

io reform update johannhof/markdown.rs#9

Merged

nagisa mentioned this pull request Feb 16, 2015

std::fs::create_dir* should take permissions rust-lang/rust#22415

Closed

achernya mentioned this pull request Feb 22, 2015

std::process should provide a facility to specify custom file descriptors to stdin/stdout/stderr #893

Closed

lilyball mentioned this pull request Mar 15, 2015

Remove incorrect references to _raw stdio functions rust-lang/rust#23379

Merged

aturon mentioned this pull request Mar 23, 2015

RFC for read_all #980

Merged

aturon mentioned this pull request Jan 27, 2016

Add File::try_clone rust-lang/rust#31069

Merged

sfackler mentioned this pull request May 30, 2017

Add a Read::initializer method rust-lang/rust#42002

Merged

This was referenced Dec 24, 2017

Survey: Crates / libraries / tools you need / want rust-embedded/wg#22

Closed

Proposed refactoring to implement core::io #2262

Open

Centril added A-platform Platform related proposals & ideas A-input-output Proposals relating to std{in, out, err}. labels Nov 23, 2018

RFC: io and os reform: initial skeleton #517

RFC: io and os reform: initial skeleton #517

Conversation

aturon commented Dec 12, 2014

Note about RFC structure

aturon commented Dec 12, 2014

carllerche commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfackler commented Dec 12, 2014

SimonSapin commented Dec 12, 2014

SimonSapin commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netvl commented Dec 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aturon commented Jan 26, 2015

yazaddaruvala commented Jan 28, 2015

nodakai commented Jan 28, 2015

tshepang commented Jan 28, 2015

nodakai commented Jan 28, 2015

yazaddaruvala commented Jan 29, 2015

reem commented Jan 29, 2015

tshepang commented Jan 29, 2015

aturon commented Jan 30, 2015

MarkusJais commented Feb 3, 2015

aturon commented Feb 3, 2015

mzabaluev commented Feb 5, 2015

erickt commented Feb 5, 2015