Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAmend RFC 517: Revisions to reader/writer, core::io and std::io #576
Conversation
sfackler
reviewed
Jan 13, 2015
| type NonatomicResult<S, T, Err> = Result<S, PartialResult<T, Err>>; | ||
|
|
||
| // Ergonomically throw out the partial result | ||
| impl<T, Err> FromError<PartialResult<T, Err> for Err { ... } |
This comment has been minimized.
This comment has been minimized.
aturon
referenced this pull request
Jan 13, 2015
Merged
RFC: io and os reform: initial skeleton #517
sfackler
reviewed
Jan 13, 2015
| ```rust | ||
| trait Writer { | ||
| type Err; | ||
| fn write(&mut self, buf: &[u8]) -> Result<uint, Err>; |
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
To help avoid tons of subtle breakage when this change is made, can we rename the current write method to write_all at some point before the complete overhaul happens? It'll force everyone to switch their code to the thing that will be semantically the same before this lands.
This comment has been minimized.
This comment has been minimized.
sfackler
reviewed
Jan 13, 2015
| @@ -473,12 +649,229 @@ throughout IO, we can go on to explore the modules in detail. | |||
| ### `core::io` | |||
| [core::io]: #coreio | |||
|
|
|||
| > To be added in a follow-up PR. | |||
| The `io` module is split into a the parts that can live in `libcore` | |||
This comment has been minimized.
This comment has been minimized.
sfackler
reviewed
Jan 13, 2015
|
|
||
| ```rust | ||
| // A reader that yields no bytes | ||
| fn empty() -> Empty; |
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
Having free function "constructors" is relatively uncommon. What's the rationale for switching all of these?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
The purpose here is intended to be symmetric with the top-level constructors in the std::iter module, which currently follow this pattern.
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
Ah, cool.
Empty and Sink are nullary types, so we don't technically need empty() or sink(). Do you just want them there for consistency?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
I would personally feel so, yes. I may also recommend going as far as having Empty { _foo: () } to prevent construction outside the module for now.
This comment has been minimized.
This comment has been minimized.
aturon
Jan 13, 2015
Author
Member
Having free function "constructors" is relatively uncommon. What's the rationale for switching all of these?
One way I think about this is, if we had impl Trait, the types in question here would all be private and you'd just have free fns returning impl Reader etc.
This comment has been minimized.
This comment has been minimized.
tailhook
commented
Jan 13, 2015
|
I believe I should repeat my question about EINTR here. Is there any agreement on how EINTR is handled by The problem is that if EINTR is returned as error by those methods, it means that for example running a However if EINTR is consumed by internal loop of |
sfackler
reviewed
Jan 13, 2015
| `ReaderExt`). These iterators will be changed to yield | ||
| `NonatomicResult` values. | ||
|
|
||
| The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
liigo
Jan 14, 2015
Contributor
+1
2015年1月13日 下午1:03于 "Steven Fackler"
FWIW, I still like
Reader -> Read
Writer -> Write
Buffer -> ReadBuffer—
Reply to this email directly or view it on GitHub.
sfackler
reviewed
Jan 13, 2015
| The various in-memory readers and writers available today will be | ||
| consolidated into just `MemReader` and `MemWriter`: | ||
|
|
||
| `MemReader` (like today's `BufReader`) |
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
@reem has pointed out that this means we'll lose the ability to have a Send Reader adapter for Vec<u8>.
This comment has been minimized.
This comment has been minimized.
wycats
Jan 13, 2015
Contributor
Yep. I made this point to @aturon as well. I think the ability to convert a Vec<u8> into a Box<Reader + Send> is important.
This comment has been minimized.
This comment has been minimized.
sfackler
reviewed
Jan 13, 2015
| The `ChanReader` and `ChanWriter` adapters will be kept exactly as they are today. | ||
|
|
||
| #### `stdin`, `stdout`, `stderr` | ||
| [stdin, stdout, stderr]: #stdin-stdout-stderr |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
Would you think of this as an implementation detail, or would you think that the API would change as a result? There are some cases to consider, for example:
println!("foo"); // I expect this to be atomic
let mut out = stdout();
out.write_line("foo"); // is this atomic?
write!(&mut out, "foo {}", "bar"); // is this atomic?In this sense, I would think that the stdout handle would perhaps want to be locked for a period of time in some instances, and we'd have to consider expansions to the API.
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
I was talking about the set_stdout and set_stderr functions, rather than global vs thread local buffering.
(This is also just a clarification question, I'm not pushing either way to keep or remove them)
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
Ah I see! Those are being moved to different locations to reflect what they're actually doing. Something along the lines of:
std::io::stdio::set_stdout=>std::fmt::set_print_outputstd::io::stdio::set_stderr=>std::fmt::set_panic_output
sfackler
reviewed
Jan 13, 2015
| type Err; // new associated error type | ||
|
|
||
| // unchanged except for error type | ||
| fn read(&mut self, buf: &mut [u8]) -> Result<uint, Err>; |
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
There was some discussion on the previous RFC about how it would be beneficial to update the documentation for read to state that valid implementations may never return Ok(0) except when passed a 0-type buffer. Consumers of Reader can use this to avoid special handling of the "what if the reader returns Ok(0) forever?" case.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 13, 2015
Member
It's not really any different than any other contract-violating trait implementation, like someone implementing Clone, but actually returning an object with totally different state.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
Sorry, I didn't see your original comment. My comment was for the RFC author.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
I'm not suggesting anything right now. I'm just saying that calling read again after EOF is not an error.
This comment has been minimized.
This comment has been minimized.
erickt
Jan 13, 2015
@nagisa: Three things. First, I personally want to err on std::io being lower level and allow us to build higher level abstractions on top of it. Second, end-of-file is not an error when reading from a stream. Third, I'm not partial to us having this hole where we document that you shouldn't return Ok(0) but we can't enforce it. I think it'll be a source of errors.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
Let me quickly document the reason why I have suggested making Ok(0) UB but not replacing IoError::EOF by Ok(0): All IO functions return something like IoResult and you can only replace IoError::EOF by Ok(0) if T == uint.
This comment has been minimized.
This comment has been minimized.
erickt
Jan 16, 2015
@mahkoh: Given how hard we've worked against having UB everywhere else, it'd be a shame to have it brought back with IO. For those methods that return a IoResult<T> where T isn't an integer, it could instead return IoResult<Option<T>> where None means we hit the end of the file. We've got this pattern in our ReaderIterator, and I'm using it pretty successfully in serde, so it wouldn't be that crazy.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 17, 2015
Contributor
UB is of course the wrong word. It's not worse than whatever UB http://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html causes.
mahkoh
reviewed
Jan 13, 2015
| extending a vector's capacity, and then *passing in the resulting | ||
| uninitialized memory* to the `read` method, which is not marked | ||
| `unsafe`! Thus the current design can lead to undefined behavior in | ||
| safe code. |
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
I don't see how. I doubt that memory returned by an allocator will be treated as undef by LLVM. And even if this is the case, it should be easily possible to tell LLVM that the memory is not undef. Apart from this issue, this is equivalent to passing the same slice multiple times to read without zeroing in between.
Again: Please explain how this can lead to UB.
This comment has been minimized.
This comment has been minimized.
Ericson2314
Jan 13, 2015
Contributor
Because anybody can make their own Reader impl, and nothing in the type prevents custom impls from reading the uninitialized bytes.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
Again: Please explain how this can lead to UB in safe code. An example will do.
This comment has been minimized.
This comment has been minimized.
huonw
Jan 13, 2015
Member
let b = buffer[0]; read[b] is UB. We shouldn't assume that the optimiser won't propagate undef through.
Also, leaving uninit buffers around makes heartbleed like bugs easier by making it easier to accidentally read old data. Sure, you're meant to zero secrets, but humans are nothing if not fallable. (The memory could be some old left-over allocation, no connection to the local code.)
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
You're not refuting the first paragraph in my first post, @huonw, so I'm not sure where you get from that b is undef.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
I don't think that the claim that "memory returned by an allocator will [not] be treated as undef by LLVM" is true, and I've tried to show how in a gist: https://gist.github.com/alexcrichton/2cfc64d1efe2afff48d3. This gist uses unsafe code, but you can consider the unsafe code to be that in std::io (which exists today), and the other relevant code is in a read implementation.
Note that the foo function always returns undef. Also note that LLVM does not see through our calls to jemalloc functions for the bar function, but it does see through the calls if the standard malloc/free calls are used.
This comment has been minimized.
This comment has been minimized.
huonw
Jan 13, 2015
Member
Also note that LLVM does not see through our calls to jemalloc functions for the bar function, but it does see through the calls if the standard malloc/free calls are used.
And in a perfect world, LLVM would be upgraded to be able understand that jemalloc/arbitrary-other-allocator is an allocator too, and so see through that.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
So it's already not undef but by accident because LLVM doesn't know jemalloc. Anyway, I made a tiny change to your gist and now it's no longer undef: https://gist.github.com/anonymous/7a1fe6474ce5b986e348 (in foo).
mahkoh
reviewed
Jan 13, 2015
| // these all return partial results on error | ||
| fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... } | ||
| fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... } | ||
| fn read_at_least(&mut self, min: uint, buf: &mut [u8]) -> NonatomicResult<uint, uint, Err> { ... } |
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
The implementation of this function is huge and doesn't get inlined. All convenience functions currently use this function because they cannot handle Ok(0). For this reason, calling read_u8 is 100% slower than read.
From the documentation:
This will continue to call read until at least min bytes have been read. If read returns 0 too many times, NoProgress will be returned.
The current implementation picks the completely random number 1000 to mean "too many times" and calls read happily 1000 times in the worst case. What is the design behind this? Aren't you just working around the completely broken design of read? How are you going to improve the performance here? The Posix read function is strictly superior.
mahkoh
reviewed
Jan 13, 2015
|
|
||
| (Note: `read_to_end` is currently not memory safe for the same reason, | ||
| but is considered a very important convenience. Thus, we will continue | ||
| to provide it, but will zero the slice beforehand.) |
This comment has been minimized.
This comment has been minimized.
mahkoh
reviewed
Jan 13, 2015
| For convenience, `write_all` recovers the behavior of today's `write`, | ||
| looping until either the entire buffer is written or an error | ||
| occurs. In the latter case, however, it now also yields the number of | ||
| bytes that had been written prior to the error. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
mahkoh
reviewed
Jan 13, 2015
| fn fill_buf(&mut self) -> Result<&[u8], Self::Err>; | ||
| fn consume(&mut self, amt: uint); | ||
|
|
||
| // This should perhaps yield an iterator |
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
It's like you want these functions to be slow. Doing this with an iterator will be 8? 16? times slower than doing it with a memchr and memcpy.
This comment has been minimized.
This comment has been minimized.
erickt
Jan 13, 2015
I looked into this a couple weeks ago and I'm not too worried about the performance. I prototyped out a BufferedReaderIterator, which adapts any Reader into a Iterator by doing one multibyte read. When I benchmarked it with a MemReader, it performed roughly at the same speed as a slice iterator.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 13, 2015
Contributor
An optimized implementation can copy 16 bytes at once and check of the byte in parallel. How do you expect to get this performance with an iterator that looks at one byte at a time? To illustrate this let's compare the performance of memchr and iter().position():
test _memchr ... bench: 59265 ns/iter (+/- 1536)
test position ... bench: 1373400 ns/iter (+/- 9820)
This comment has been minimized.
This comment has been minimized.
aturon
Jan 13, 2015
Author
Member
It's like you want these functions to be slow. Doing this with an iterator will be 8? 16? times slower than doing it with a memchr and memcpy.
You misunderstood the suggestion. The use of an iterator here would be akin to the lines iterator below, i.e. a way to repeatedly call read_until. The iterator itself would yield vectors. This makes it possible to use read_until in a for loop, for example.
mahkoh
reviewed
Jan 13, 2015
| #### Channel adapters | ||
| [Channel adapters]: #channel-adapters | ||
|
|
||
| The `ChanReader` and `ChanWriter` adapters will be kept exactly as they are today. |
This comment has been minimized.
This comment has been minimized.
erickt
reviewed
Jan 13, 2015
|
|
||
| ```rust | ||
| // If flushing fails, you get the unflushed data back | ||
| fn into_inner(self) -> NonatomicResult<W, Vec<u8>, W::Err>; |
This comment has been minimized.
This comment has been minimized.
erickt
Jan 13, 2015
Could this return an associated type for the buffer instead of a Vec? That would let us use a stack slice or fixed size array for a buffer.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 13, 2015
Member
Note that this is only return on an error, which is not necessarily common, so "forcing an allocation" I don't think is really too worrisome here.
This comment has been minimized.
This comment has been minimized.
nodakai
commented
Jan 13, 2015
|
@tailhook I think APIs returning |
wycats
reviewed
Jan 13, 2015
| `Writer` (and various adapters built on top of them) from moving to | ||
| `libcore` -- `IoError` currently requires the `String` type. | ||
|
|
||
| With associated types, there is essentially no downside in making |
This comment has been minimized.
This comment has been minimized.
wycats
Jan 13, 2015
Contributor
Hm. It seems like if we move this into core, it couldn't be bounded by std::error::Error, right? This could be a problem when interoperating with other pieces of infrastructure that are explicitly bound by Error.
This comment has been minimized.
This comment has been minimized.
aturon
Jan 13, 2015
Author
Member
Any additional bounds on the error type you want to apply can be done when you're bounding over a Reader or Writer.
wycats
reviewed
Jan 13, 2015
| having made some progress. The `PartialResult` type then returns the | ||
| progress that was made along with the error, but with a `FromError` | ||
| implementation that makes it trivial to throw out the partial result | ||
| if desired. |
This comment has been minimized.
This comment has been minimized.
wycats
reviewed
Jan 13, 2015
| pub struct IterReader<T> { ... } | ||
| ``` | ||
|
|
||
| As with `std::iter`, these adapters are object unsafe an hence placed |
This comment has been minimized.
This comment has been minimized.
wycats
reviewed
Jan 13, 2015
|
|
||
| Note that the same `ByRef` type is used for both `Reader` and `Writer` | ||
| -- and this RFC proposes to use it for `std::iter` as well. The | ||
| insight is that there is no difference between the *type* used for |
This comment has been minimized.
This comment has been minimized.
ktossell
referenced this pull request
Jan 13, 2015
Closed
Separate Ipv4Addr/Ipv6Addr types, new methods for inspecting addresses #20785
This comment has been minimized.
This comment has been minimized.
I'd first recommend taking a look at this article about EINTR (just did so myself for a refresher!), it's got some great information about EINTR and what it can be used for. The gist of the message is that EINTR cannot be used to reliably catch a "please interrupt" signal to the program in the case of syscalls like Due to our commitment to cross platform compatibility as well as the objects being somewhat high-level primitives, I think we will continue to swallow As @nodakai mentions, however, it may be useful to handle this case (such as if you're manually using For example, |
This comment has been minimized.
This comment has been minimized.
tailhook
commented
Jan 13, 2015
|
@nodakai, the issue with "D" state is both rare enough and impossible to do something about so we should not care. In many other cases Ctrl+C works. Not being able to use is very very annoying. There are precedents of bad SIGINT handling in history, for example SVN versions 1.1-1.4 (IIRC) were not handling the signal fast (could linger for minutes) and that was deemed as a bug. And you know, you usually can press Ctrl+/ or just kill process externally, but that does not execute destructors. And I don't wan't another bag of utilities that do same. @alexcrichton, while I can live with Sure, complex programs like network servers which will use low level level API, unblocking calls, asynchronous loop like "mio", and will use signalfd (or polyfill) anyway, so will not suffer this problem. But then what the high-level cross-platform interface is designed for? Is it for command-line tools that either hang or die without a cleanup? Is this only because they can't do anything better on Windows? Then I guess it's better to turn EINTR to error or even panic on it, in which case if the error is unhandled, at least it will execute all the destructors. I think we may return some |
This comment has been minimized.
This comment has been minimized.
As mentioned in the article that I linked to, using EINTR to handle an interrupt for your program in some form is racy (and generally not correct) if all you're using is the I definitely agree that we'll want a nice way to interrupt small programs performing various operations, but I don't think that EINTR is the right way to do so. Doing this, however, would require deeper modifications to the I/O interfaces to enable something along the lines of:
These are both somewhat involved, which is definitely where some high-level abstractions would help out! These may not exist in the standard library immediately, but I'm sure they'll start popping up in Cargo soon though. To reiterate, handling an interruption via EINTR in
A crucial part of the stabilization of |
This comment has been minimized.
This comment has been minimized.
tailhook
commented
Jan 13, 2015
|
@alexcrichton, I agree with most of what you say, but:
Sounds like "never use abstractions in standard library". So why they are there in the first place? Well, in fact the more I think about it the more I like idea of |
This comment has been minimized.
This comment has been minimized.
Binero
commented
Jan 13, 2015
|
I feel like this probably belongs here; I feel it would be nice to have a "SearchableReader" trait which is like a Reader, but you can change the position inside of it. |
Valloric
reviewed
Jan 13, 2015
|
|
||
| ```rust | ||
| trait Reader { | ||
| type Err; // new associated error type |
This comment has been minimized.
This comment has been minimized.
Valloric
Jan 13, 2015
Is it really worth saving 2 keystrokes here? Please just name it Error, we already have far too many unnecessary abbreviations in the Rust std library and newcomers are rightly complaining about it.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I doubt it. Unlike a |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 1, 2015
Ericson2314
added a commit
to QuiltOS/rust
that referenced
this pull request
Feb 1, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 1, 2015
This comment has been minimized.
This comment has been minimized.
|
Some concerns from @mahkoh on the implementation: rust-lang/rust#21835 (comment) The
Just a point to clarify. rust-lang/rust#21835 (comment) Should rust-lang/rust#21835 (comment) A suggestion that rust-lang/rust#21835 (comment) A suggestion to use @mahkoh if I misrepresented anything, please let me know! I'll try to copy over future worries to this RFC as well. |
This comment has been minimized.
This comment has been minimized.
Tobba
commented
Feb 1, 2015
|
It seems the issues with the associated error types mentioned by @alexcrichton are quite easily solved and don't load to any major ergonomic issues.
See this branch: https://github.com/Tobba/rust/tree/core-io which ports librbml to an (incomplete) implementation of a proposal for the I/O design. |
This comment has been minimized.
This comment has been minimized.
One of the key points I personally found was that porting librustc to use librbml was incredibly painful, I never actually got it to compile! The rbml implementation does not necessarily leverage I would encourage you to write some tests for the various primitives as well because it ended up leading to tests looking like this versus looking like this |
This comment has been minimized.
This comment has been minimized.
I would demonstrate these things myself, but I cannot build the standard library ATM. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 2, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 2, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 2, 2015
This comment has been minimized.
This comment has been minimized.
|
Some replies to @mahkoh's concerns:
I think we should have
I think @mahkoh is raising a good point here, and
The proposal in the comment, in particular, says that
This connects to broader API guidelines, of course, but in general in Rust APIs we use types to convey semantic content/limitations when possible. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 2, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 3, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 3, 2015
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Feb 3, 2015
|
re: EINTR handling I think at a minimum you should LOUDLY document any function that can Also, I think it would be more conservative to make read/write retry EINTR On Fri, Jan 30, 2015 at 1:25 PM, Aaron Turon notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
At this time it looks like there's broad enough support for this amendment that I'm going to merge it. There are still some concerns about using a concrete There are also some lingering concerns about There have also been a number of proposals for backwards-compatible extensions. Due to this compatibility this RFC is going to go ahead and land ahead of them and future RFCs can be used to add new traits and/or methods where the design can be hashed out. I'd like to again emphasize that none of this functionality will land initially as I'd like to thank everyone again who participated in this thread (and #517!), all the comments have been quite helpful and this is all definitely moving in a great direction. Thank you! |
aturon commentedJan 13, 2015
The IO reform RFC is being split into several semi-independent pieces, posted as PRs like this one.
This RFC amendment adds the sections on
Reader/Writerrevisions, as well ascore::ioandstd::io(which are closely related).Rendered