New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed EOF reinstatement. #903

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@tfogal
Copy link

tfogal commented Feb 24, 2015

Precedence: C, C++, Go, and Java all return EOFs as error codes when
the stream ends. Java includes higher-level APIs that return nulls,
about as close to the current Rust implementation as can be, given the
language differences.

Precedence: Win32 APIs and Windows-based stuff in general tends to
return 'success' and an indicator that it read 0 bytes when such calls
fail. Well, mostly. Asynchronous calls are weird, and if you supply
one of the parameters that is commonly null, then Win32's "ReadFile"
will actually start to give you EOFs.

Personally, I have a strong preference for using EOFs a la "the C way"
everywhere. Rust being sold as a systems programming language makes me
expect this kind of behavior. Hearing that the IO reforms are trying
to simplify and not implement the union of things between platforms
again implies to me that this will work in the "standard" way.

To be honest, I'm not sure I even like this. It seems strange that we have
read_to_end and read_to_string, and furthermore that read statically
takes a &mut [u8]. Why don't these methods take an iterator (or is it a
Cursor in this nomenclature?) and the specifics of whether that goes into a
u8 slice or a Vec or a String or a user-implemented-type-T get handled
by the iterator implementation?

Also, side bug fixed: write was returning a uint, but apparently usize is the
new uint.

Proposed EOF reinstatement.
Precedence: C, C++, Go, and Java all return EOFs as error codes when
the stream ends.  Java includes higher-level APIs that return nulls,
about as close to the current Rust implementation as can be, given the
language differences.

Precedence: Win32 APIs and Windows-based stuff in general tends to
return 'success' and an indicator that it read 0 bytes when such calls
fail.  Well, mostly.  Asynchronous calls are weird, and if you supply
one of the parameters that is commonly null, then Win32's "ReadFile"
will actually start to give you EOFs.

Personally, I have a strong preference for using EOFs a la "the C way"
everywhere.  Rust being sold as a systems programming language makes me
expect this kind of behavior.  Hearing that the IO reforms are trying
to simplify and not implement the union of things between platforms
again implies to me that this will work in the "standard" way.
number of bytes read or an error. If a short read occurs, that call
gives the (short) data and returns the number of bytes written into
that buffer. If the error was non-transient, future calls shall return
the reason for the error (usually `EndOfFile`).

This comment has been minimized.

@jmesmon

jmesmon Feb 24, 2015

So now we'll need to wrap read_to_end in order to actually read to the end? Why not add a provided method called really_read_to_end?

Why doesn't a plain read() work for the use case that you're using this modified read_to_end() on?

This comment has been minimized.

@tfogal

tfogal Feb 24, 2015

Point taken. Honestly, perhaps it would be better just to remove these methods and let them get handled at the user level. I would certainly be more in favor of that than standardizing something that was not vetted.

Anyway, the thought was that sometimes you might intend to read "everything that's there". Depending on how things are set up, those (system call) reads (i.e. Rust's implementation) might get some partial bytes and not actually block. Without this clause, the implementation would need to busy wait (or "busy read" i guess) to deal with non-blocking and non-terminating sources.

I would be much more in favor of dropping these from the trait and only providing them as specific methods on types that make sense (i.e. synchronous files).

This comment has been minimized.

@mahkoh

mahkoh Feb 24, 2015

Contributor

These methods worked perfectly fine in old_io. There is no reason to remove them.

@jmesmon

This comment has been minimized.

Copy link

jmesmon commented Feb 24, 2015

For what it's worth, the returning of 0 on EOF is similar in style to standard posix apis. Not sure we should really be looking at Win32 as an example of what to do in API design ("Well, mostly." doesn't inspire confidence).

The use of slices is for efficiency purposes, if iterators were used most read impls would need to allocate memory internally, perform a syscall, and then write the data into the provided iterator. At least an extra copy, and potentially quite a bit of iterator overhead.

@tfogal

This comment has been minimized.

Copy link

tfogal commented Feb 24, 2015

C's getlineexplicitly returns EOF, which is what spawned the initial confusion and PR motivation.

We probably agree w.r.t. win32; just trying to be thorough.

I expected that the slice implementation would decay into a pointer under the hood and the implementation would thus get to read directly into it. If this isn't conceivable, first I'd cry, then I'd agree with you.

@@ -556,6 +556,15 @@ buffers as input. This has multiple benefits:
results are wanted, one can use `read_to_end` and convert to a
`String` only at the end.

`read_to_end` and `read_to_string` return a `Result` with either the
number of bytes read or an error. If a short read occurs, that call

This comment has been minimized.

@alexcrichton

alexcrichton Feb 24, 2015

Member

In this context, what would you define as a "short read"? If you're asking for the entire contents of the stream then a "short read" implies you already know how many bytes are in the stream in the first place.

Additionally, below you mention a "non-transient" error, but how do we classify errors as transient or not transient?

This comment has been minimized.

@tfogal

tfogal Feb 24, 2015

In this context, what would you define as a "short read"? If you're asking for the entire
contents of the stream then a "short read" implies you already know how many bytes are in
the stream in the first place.

We talked about this above, I conceded the point. But to reiterate, the intent was to allow the implementation not to implement a busy loop if given a non-blocking source.

Additionally, below you mention a "non-transient" error, but how do we classify errors as
transient or not transient?

Not sure it's worth discussing, given the above, but: the intent was to give liberty to an implementation to just return whatever it got from the syscall.


Anyway, we've established that this solution sucks, but I still believe there should be something that prevents these from getting called with non-blocking sources, because it can't have any reasonable semantics for that case.

This comment has been minimized.

@alexcrichton

alexcrichton Feb 25, 2015

Member

We talked about this above, I conceded the point.

Ah ok, do you want to update the RFC with the recent comments?

fn flush(&mut self) -> Result<(), Error>;
fn write_all(&mut self, buf: &[u8]) -> Result<(), Error> { .. }
fn write_fmt(&mut self, fmt: &fmt::Arguments) -> Result<(), Error> { .. }
fn write_all(&mut self, buf: &[u8]) -> Result<usize, Error> { .. }

This comment has been minimized.

@alexcrichton

alexcrichton Feb 24, 2015

Member

This is a crucial part of the design of write_all that it fails if the entire buffer could not be written. Otherwise there are no real semantic wins over the write function and this ends up just largely being a duplicate.

This comment has been minimized.

@tfogal
@@ -1009,6 +1021,13 @@ cross-platform compatibility (in the case of `read_line`) and for ease
in copying data without loss (in particular, distinguishing whether
the last line included a final delimiter).

`EINTR`-based errors are automatically restarted by `read_until`
and `read_line`. Short reads give a partial buffer and return the

This comment has been minimized.

@alexcrichton

alexcrichton Feb 24, 2015

Member

Can you define what "short read" means in this context?

This comment has been minimized.

@tfogal

tfogal Feb 24, 2015

e.g. for read_line, it might not represent a line in the sense that it might not ever see a '\n'.

Or were you hinting that it was clear from discussion but not in the doc (and I should update the commit)?

@aturon aturon self-assigned this Mar 5, 2015

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Mar 20, 2015

Thanks for this RFC!

Note that the method signatures have now all been updated to return usize on success, as suggested here. I think this addresses the most important concern raised by this RFC: there is a now a uniform way to detect and handle EOF throughout new IO. So this RFC has already been partially implemented.

The other proposal of the RFC is to produce a separate notion of EOF error after the first time a zero byte result is returned. This seems problematic for a few reasons:

  • It means code must be prepared to handle two varieties of EOF-signaling results, which undoes much of the motivation for using a 0 return to signal EOF.
  • EOF is not always a permanent state, so the contract wouldn't always make sense.
  • It relies on a somewhat subtle contract, which again was one of the problems we were trying to get away from in old_io.

Note that in particular, with the current IO setup, it works well to use try! to handle "actual" errors, and then look at the byte count to decide whether you're done.

With all of that said, and the signature change already in place, I'm going to close out this RFC. Thanks again for bringing these issues to the community's attention!

@aturon aturon closed this Mar 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment