Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: io and os reform: initial skeleton #517

Merged
merged 18 commits into from
Jan 13, 2015
Merged

Conversation

aturon
Copy link
Member

@aturon aturon commented Dec 12, 2014

This RFC proposes a significant redesign of the std::io and std::os modules
in preparation for API stabilization. The specific problems addressed by the
redesign are given in the Problems section below, and the key ideas of the
design are given in Vision for IO.

Note about RFC structure

This RFC was originally posted as a single monolithic file, which made
it difficult to discuss different parts separately.

It has now been split into a skeleton that covers (1) the problem
statement, (2) the overall vision and organization, and (3) the
std::os module.

Other parts of the RFC are marked with (stub) and will be filed as
follow-up PRs against this RFC.

Rendered

@aturon
Copy link
Member Author

aturon commented Dec 12, 2014

cc @carllerche @wycats @alexcrichton

@carllerche
Copy link
Member

I will have to read this in more detail tomorrow, but I just wanted to mention that it seems that adding fn skip(...) to the Reader trait has not been mentioned. This would handle issue rust-lang/rust#13989.

// these all return partial results on error
fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... }
fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... }
fn read_at_least(&mut self, min: uint, buf: &mut [u8]) -> NonatomicResult<uint, uint, Err> { ... }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never been totally clear to me what the exact use case for this is. Is this method ever called with min not equal to buf.len()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually I've considered this in terms of a buffered reader. For example if you ask for 10 bytes from a buffered reader, the buffered reader can pass its entire buffer to the underlying reader, but request that only 10 bytes be actually read. In that sense I think it's a bit of a performance optimization where you're willing to accept a huge amount of bytes but only require a few.

I don't think this is implemented or used much in practice though, so the benefit may be fairly negligible to have the extra argument.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually requesting only 10 bytes sounds different than what this function's name describes. For the "only read 10 bytes" case, I'd expect one would pass a buffer.slice_to(10) to read (well, some form of read that always reads the amount requested).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton

In that sense I think it's a bit of a performance optimization

I don't quite understand what kind of performance gain you expect.

  1. Reducing the number of read()-like calls against BufferedReader, or
  2. reducing the number of read() calls by BufferedReader against the underlying "real" stream such as File or TcpStream?

The former will only save negligible number of nanoseconds (if any) because BufferedReader::read() etc. are memory operations in user space. The latter is a matter of tuning internal parameters of BufferedReader.

Did I miss anything? My understanding of its behavior is:

let mut b = [0u8, .. 30];
let res = r.read_at_least(10, b.slice_to_mut(20));

res can be any of Ok(10), Ok(15), Ok(20), or Err(PartialResult(5, EndOfFile)). It will be tedious to change how to cook the content of b depending on the Ok() value.

I can't think of any practical usages of read_at_least(). @alexcrichton How would you use it in, say, your tar code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a buffered reader, I would consider it a performance optimization in terms of the number of reads of the underlying stream. The buffered reader can pass down a very large buffer but only request that a tiny part gets filled, and if more than that is filled in then it results in, for example, fewer syscalls (in theory).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton The current implementation of BufferedReader simply calls read() with its entire internal buffer. Then it should suffice to have a simpler convenience method like the below one in Reader because it will be "inherited" by BufferedReader:

fn read_exact(&mut self, buf: &mut [u8]) -> NonatomicResult<(), uint, Err> {
    let mut read_so_far = 0;
    while read_so_far < buf.len() {
        match self.read(buf.slice_from_mut(read_so_far)) {
            Ok(n) => read_so_far += n,
            Err(e) => return NonatomicResult(read_so_far, e)
        }
    }
    Ok(())
}

(cf. PR rust-lang/rust#18059 )

Deadlined { deadline: deadline, inner: inner }
}

pub fn deadline(&self) -> u64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/u64/Duration/

and `read_char` is removed in favor of the `chars` iterator. These
iterators will be changed to yield `NonatomicResult` values.

The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these being renamed? If not, they'll have to live in a different module from the BufferedReader trait, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth thinking about if we want to keep BufferedStream::with_capacities as is, or have it take a single size that's used for both buffers. I'm not really sure if anyone wants different buffer sizes for readers and writers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! The renaming to BufferedReader came late and I didn't catch this.

I'm not particularly happy with the trait name. @alexcrichton prefers Buffered, but I worry that we may eventually want something for buffered writers.

Suggestions welcome.

re: with_capacities, I agree; we could simplify for now, and add back this functionality later if needed.

@sfackler
Copy link
Member

I'm a bit worried about the timeout changes making some uses of the current infrastructure impossible, or maybe just painful/awkward. For example, rust-postgres provides an iterator over asynchronous notifications sent from the database. A method defined on the iterator is next_block_for, which blocks waiting for a notification for some duration. The implementation sets a read timeout on the socket, reads the first byte, and then unsets the timeout. The assumption is that if we get any data from the server, we can expect a full message to come in fairly quickly after that. The logic required to read half a message and then stop and save it off when we hit the IO timeout is just too complex to bother with.

The current setup works fine, if a bit awkwardly: https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/message.rs#L279-L285
https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/lib.rs#L286-L320

With the new infrastructure, it'll still be possible to take the same strategy, but probably through some kind of gross hackery like reading the first byte with a timeout, and then passing that byte to the main message read function without the timeout.

What would really be ideal is to have the ability to wait on the socket for data to be ready to read for a certain period of time. Is something like that feasible to implement before 1.0 in a cross platform manner?

@SimonSapin
Copy link
Contributor

they involve extending a vector's capacity, and then passing in the resulting uninitialized memory to the read method, which is not marked unsafe! Thus the current design can lead to undefined behavior in safe code.

I don’t understand why this is undefined behavior and unsafe fn with_extra(&mut self, n: uint) -> &mut [T]; on Vec is not.

@SimonSapin
Copy link
Contributor

roughly interpreted at UTF-16, but may not actually be valid UTF-16 -- an "encoding" often call UCS-2; see http://justsolve.archiveteam.org/wiki/UCS-2 for a bit more detail.

I like the explanation linked there. Good find. 👍


impl OsStr {
pub fn from_str(value: &str) -> &OsStr;
pub fn as_str(&self) -> Option<&str>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be -> Result<&str, ()>?

Or as a larger point (perhaps out of scope for this RFC), should Option<T> return types be Result<T, ()> instead when None kind of represents an error, in order to interoperate with try! and other error-handling infrastructure we might add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently changed from_utf8 to return Result<&str, Utf8Error>, so this can probably pick up that error. I suspect this will probably just continue to return the same value as str::from_utf8.

I do think that in general Option should only be used where None is a normal value, not an error (to use with try! as you pointed out). In the second pass of stabilization we're going to look closely at all this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string not being UTF-8 or a file not existing aren't any more of an error than a key not existing in a map. Attempting to open a file or parse text is also a way to discover if what you were looking for was there, just like a map lookup. There are few remaining use cases for Option if it's not meant to be used this way... any missing value can be considered an error, just like a missing file / whatever.

@netvl
Copy link

netvl commented Dec 12, 2014

Currently Rust IpAddr structure does not support zone indices in IPv6 addresses. They are needed for link-local addresses which are arguably much more important in IPv6 than in IPv4. Are there plans to do something with it?

```

In addition, `read_line` is removed in favor of the `lines` iterator,
and `read_char` is removed in favor of the `chars` iterator (now on
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are occasionally very useful when you don't need to read the entire stream but only a few lines or characters. I have several of these in my code base. The supposed replacement

let line = r.lines().next().unwrap()

doesn't look really good.

Also, why chars() is on Reader? Doesn't reading characters require buffering?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we were thinking that the code you listed would be the replacement for a bare read_line. In general we're trying to move as much functionality to iterators as possible, and we could possibly add some form of method on an iterator which peels off the first element, failing if it's None if this becomes too unergonomic.

The current implementation for chars() doesn't actually use buffering at all, it just peeks at a byte and then might read some more bytes. We thought that if we're exposing bytes() on Reader which is not speedy unless buffered, then we may as well expose chars() as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@netvl Note that the .unwrap() part (or something like try!()) is necessary in either cases. I don't think

let first = r.lines().next().unwrap();
let second = r.lines().next().unwrap();

or

let mut lines = r.lines();
let first = lines.next().unwrap();
let second = lines.next().unwrap();

looks that worse than

let first = r.read_line().unwrap();
let second = r.read_line().unwrap();

As for chars(), any Unicode character in UTF-8 occupies at most 6 bytes. So read() into a fixed array on stack will suffice.

RFC discusses the most significant problems below.

This section only covers specific problems with the current library; see
[Vision for IO] for a higher-level view. section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...for a higher-level view. section.

Typo?

@aturon
Copy link
Member Author

aturon commented Jan 26, 2015

The fs sub-RFC has now been posted.

@yazaddaruvala
Copy link

@aturon given that the eventual goal is to have both blocking and nonblocking implementations of the std::io crate do you think it makes sense to move the new blocking APIs into std::io::blocking rather than just std::io? (replace io with fs or process where it makes sense).

Summary:
std::io::nonblocking could then be created lazily. The naming would let users know that std::io doesn't have a default, there are two options and its up to them to pick.

Long Story:
I think we can all agree, blocking io and nonblocking io are just different, neither is really better than the other. They just have different trade offs and should be used appropriately. Given that:

Option 1: Call the APIs read vs read_async

  • You're sort of suggesting that async is a special kind of read. Also since one has 2x the characters, its slightly discouraging (both to write and read).
  • Its not explicit that read is blocking and may accidentally be used when it shouldn't.

Option 2: Call the APIs read_sync vs read_async

  • Just a bit verbose

Option 3: Call the APIs blocking::read vs nonblocking::read

  • It would be a little unusual for the same mod to use both types of io. Generally then (when used exclusively) you would get simple function names in both cases. eg. read

I would be happy with either Option 2 or 3, preferring 3. Option 1 just leaves me a little uneasy..

@nodakai
Copy link

nodakai commented Jan 28, 2015

@yazaddaruvala I think it's more practical to have std::io for blocking I/O and std::io::nonblocking for non-blocking I/O. I doubt anyone would complain about the "asymmetry" between them. (Maybe we can lift std::io::blocking to std::io...)

@tshepang
Copy link
Member

Maybe we can lift std::io::blocking to std::io..

I don't understand this @nodakai.

@nodakai
Copy link

nodakai commented Jan 28, 2015

@tshepang I meant re-export such as std::io::fs::Filestd::io::File

@yazaddaruvala
Copy link

@nodakai I understand the desire to have a "default" io, especially given that its currently the only io. But is it really the right long term philosophy?

If the only current difference is import std::io vs import std::io::blocking is it really worth causing asymmetry? i.e. implicitly endorsing one over the other?

I've only played around with Nodejs a little, and some of its ideas are definitely controversial, but I think everyone can agree the one thing it did really well was educate all of its users about the difference between blocking and nonblocking io. And yes you could achieve this through documentation.. but similar to immutable by default, syntax/explicitness (at minimal cost) is the best way to educate people.

@reem
Copy link

reem commented Jan 29, 2015

I've actually been doing a lot of experimentation and work on non-blocking IO and I can say that I do think that blocking IO is a better default model for Rust. It fits much better in with the borrow system for resource management, since the usage of all resources is deterministic relative to the structure of the code, whereas this is not true at all for asynchronous actions.

Additionally, until (if?) Rust has true async/await support or even simple generators via yield it is pretty hard to write clean asynchronous code or integrate well with the borrow checker. Clean non-blocking IO interfaces also inevitably end up doing a lot of double-buffering and allocation, which can really hamper performance.

The non-clean, extremely low-level bindings for asynchronous IO already exist in the form of mio, and I really think there is no need to integrate these things into std::io until they have a time to mature and we figure out the best way to model asynchronous behavior in Rust. We can always add a std::nio or std::aio in the future, though I'm still not convinced that there's anything wrong with this stuff just living in the cargo-verse.

@tshepang
Copy link
Member

std::nio/std::aio are neat names, much better than std::io::nonblocking

@aturon
Copy link
Member Author

aturon commented Jan 30, 2015

I agree with what several others have said here: I think std::io is fine, with the potential to either grow in place to encompass async IO or else add std::aio later on.

@MarkusJais
Copy link

I agree, too. std::aio sounds much better than std::io::nonblocking.

@aturon
Copy link
Member Author

aturon commented Feb 3, 2015

An amendment for std::net has been posted.

@mzabaluev
Copy link
Contributor

Regarding with_extra, I wonder if there could be a more self-explanatory way to use it:

impl<T> Vec<T> where T: Copy {
    pub unsafe fn fill_more<F, E>(&mut self, len: usize, op: F) -> Result<usize, E>
        where F: FnOnce(&mut [T]) -> Result<usize, E>
    { ... }
}

@erickt
Copy link

erickt commented Feb 5, 2015

@mzabaluev: Or maybe Vec::with_uninitialized(...)?

Manishearth added a commit to Manishearth/rust that referenced this pull request Mar 15, 2015
…nstructors, r=alexcrichton

`std::io` does not currently expose the `stdin_raw`, `stdout_raw`, or
`stderr_raw` functions. According to the current plans for stdio (see
rust-lang/rfcs#517), raw access will likely be provided using the
platform-specific `std::os::{unix,windows}` modules. At the moment we
don't expose any way to do this. As such, delete all mention of the
`*_raw` functions from the `stdin`/`stdout`/`stderr` function
documentation.

While we're at it, remove a few `pub`s from items that aren't exposed.
This is done just to lessen the confusion experienced by anyone who
looks at the source in an attempt to find the `*_raw` functions.
@aturon aturon mentioned this pull request Mar 23, 2015
@Centril Centril added A-platform Platform related proposals & ideas A-input-output Proposals relating to std{in, out, err}. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-output Proposals relating to std{in, out, err}. A-platform Platform related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.