Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: io and os reform: initial skeleton #517

Merged
merged 18 commits into from Jan 13, 2015

Conversation

Projects
None yet
@aturon
Copy link
Member

aturon commented Dec 12, 2014

This RFC proposes a significant redesign of the std::io and std::os modules
in preparation for API stabilization. The specific problems addressed by the
redesign are given in the Problems section below, and the key ideas of the
design are given in Vision for IO.

Note about RFC structure

This RFC was originally posted as a single monolithic file, which made
it difficult to discuss different parts separately.

It has now been split into a skeleton that covers (1) the problem
statement, (2) the overall vision and organization, and (3) the
std::os module.

Other parts of the RFC are marked with (stub) and will be filed as
follow-up PRs against this RFC.

Rendered

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Dec 12, 2014

@carllerche

This comment has been minimized.

Copy link
Member

carllerche commented Dec 12, 2014

I will have to read this in more detail tomorrow, but I just wanted to mention that it seems that adding fn skip(...) to the Reader trait has not been mentioned. This would handle issue rust-lang/rust#13989.

// these all return partial results on error
fn read_to_end(&mut self) -> NonatomicResult<Vec<u8>, Vec<u8>, Err> { ... }
fn read_to_string(&self) -> NonatomicResult<String, Vec<u8>, Err> { ... }
fn read_at_least(&mut self, min: uint, buf: &mut [u8]) -> NonatomicResult<uint, uint, Err> { ... }

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

It's never been totally clear to me what the exact use case for this is. Is this method ever called with min not equal to buf.len()?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Conceptually I've considered this in terms of a buffered reader. For example if you ask for 10 bytes from a buffered reader, the buffered reader can pass its entire buffer to the underlying reader, but request that only 10 bytes be actually read. In that sense I think it's a bit of a performance optimization where you're willing to accept a huge amount of bytes but only require a few.

I don't think this is implemented or used much in practice though, so the benefit may be fairly negligible to have the extra argument.

This comment has been minimized.

@jmesmon

jmesmon Dec 13, 2014

Actually requesting only 10 bytes sounds different than what this function's name describes. For the "only read 10 bytes" case, I'd expect one would pass a buffer.slice_to(10) to read (well, some form of read that always reads the amount requested).

This comment has been minimized.

@nodakai

nodakai Dec 16, 2014

@alexcrichton

In that sense I think it's a bit of a performance optimization

I don't quite understand what kind of performance gain you expect.

  1. Reducing the number of read()-like calls against BufferedReader, or
  2. reducing the number of read() calls by BufferedReader against the underlying "real" stream such as File or TcpStream?

The former will only save negligible number of nanoseconds (if any) because BufferedReader::read() etc. are memory operations in user space. The latter is a matter of tuning internal parameters of BufferedReader.

Did I miss anything? My understanding of its behavior is:

let mut b = [0u8, .. 30];
let res = r.read_at_least(10, b.slice_to_mut(20));

res can be any of Ok(10), Ok(15), Ok(20), or Err(PartialResult(5, EndOfFile)). It will be tedious to change how to cook the content of b depending on the Ok() value.

I can't think of any practical usages of read_at_least(). @alexcrichton How would you use it in, say, your tar code?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 16, 2014

Member

In the case of a buffered reader, I would consider it a performance optimization in terms of the number of reads of the underlying stream. The buffered reader can pass down a very large buffer but only request that a tiny part gets filled, and if more than that is filled in then it results in, for example, fewer syscalls (in theory).

This comment has been minimized.

@nodakai

nodakai Dec 17, 2014

@alexcrichton The current implementation of BufferedReader simply calls read() with its entire internal buffer. Then it should suffice to have a simpler convenience method like the below one in Reader because it will be "inherited" by BufferedReader:

fn read_exact(&mut self, buf: &mut [u8]) -> NonatomicResult<(), uint, Err> {
    let mut read_so_far = 0;
    while read_so_far < buf.len() {
        match self.read(buf.slice_from_mut(read_so_far)) {
            Ok(n) => read_so_far += n,
            Err(e) => return NonatomicResult(read_so_far, e)
        }
    }
    Ok(())
}

(cf. PR rust-lang/rust#18059 )

@aturon aturon force-pushed the aturon:io-os-reform branch from 786d0b4 to 00ca609 Dec 12, 2014

Deadlined { deadline: deadline, inner: inner }
}
pub fn deadline(&self) -> u64 {

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

s/u64/Duration/

fn by_ref<'a>(&'a mut self) -> ByRef<'a, Self> { ... }
// Whenever bytes are written to `self`, write them to `other` as well
fn carbon_copy<W: Writer>(self, other: W) -> CarbonCopy<Self, W> { ... }

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

This seems like a kind of weird name.

This comment has been minimized.

@aturon

aturon Dec 12, 2014

Author Member

Yes, I don't expect this name to survive the RFC process.

This comment has been minimized.

@michaelsproul

michaelsproul Dec 12, 2014

I think it's a better name than tee, which is very platform specific. Naming these two methods consistently would be nice: forward_on_read and forward_on_write perhaps?

This comment has been minimized.

@jmesmon

jmesmon Dec 13, 2014

Is the name tee really platform specific? More generally: does anything but unix have a concept that maps to this? If some people already know what tee means, isn't that better than no one knowing exactly what carbon_copy implies?

This comment has been minimized.

@michaelsproul

michaelsproul Dec 14, 2014

The problem as I see it is in search-ability. Imagine you've never heard of tee and you're looking for this functionality. The name conveys literally nothing about the function...

Even carbon_copy is enough to signal that the function has something to do with copying, something which can be made clear by reading the description. Furthermore, forward_on_read and forward_on_write are even better at signalling both the involvement of another thing (the Reader/Writer) and the conditions for the event to occur (on reading/writing).

This comment has been minimized.

@nodakai

nodakai Dec 16, 2014

The idea of carbon_copy() fits more with a separate convenience wrapper similar to BufferedReader / Writer:

let mut f = File::create(...);
let mut stdout = io::stdio::stdout();
let mut tee = TeeWriter(f, stdout);
tee.write_all(b"Hello world!\n");

Anyways, the semantics of partial failure need to be discussed in detail.

This comment has been minimized.

@nagisa

nagisa Dec 16, 2014

Contributor

That indeed looks much better and applies to the Reader::tee as well. 👍

and `read_char` is removed in favor of the `chars` iterator. These
iterators will be changed to yield `NonatomicResult` values.
The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

Are these being renamed? If not, they'll have to live in a different module from the BufferedReader trait, right?

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

It might also be worth thinking about if we want to keep BufferedStream::with_capacities as is, or have it take a single size that's used for both buffers. I'm not really sure if anyone wants different buffer sizes for readers and writers.

This comment has been minimized.

@aturon

aturon Dec 12, 2014

Author Member

Argh! The renaming to BufferedReader came late and I didn't catch this.

I'm not particularly happy with the trait name. @alexcrichton prefers Buffered, but I worry that we may eventually want something for buffered writers.

Suggestions welcome.

re: with_capacities, I agree; we could simplify for now, and add back this functionality later if needed.

that we will hew more closely to the traditional setup:
* `stderr` will be unbuffered and `stderr_raw` will therefore be dropped.
* `stdout` will be line-buffered for TTY, fully buffered otherwise.

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

Will we have a mechanism for flushing this on process exit?

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

I assume stdout is going to be a global singleton buffered writer like stdin is currently?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

I believe we were going to keep the stdout object returned will retain the per-object buffering semantics which means that flushes happen when the object is dropped. In that sense I believe that all the buffers will be flushed by process exit as all their destructors should have been called.

Do think it's worth changing to a global stdout buffer though? Conceptually it makes more sense to me for stdin to be globally buffered than for stdout.

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

The one potential concern I have with per-object buffering is that caller's that don't realize that that's the behavior will be accidentally performing tons of allocation and deallocation as they repeatedly create transient stdout writers. It does solve the flush on exit problem, though, without requiring weird atexit handlers or anything like that.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Hm good point! If we were to have a globally allocated instance, however, we do have the infrastructure for running things at exit so we could probably schedule it there to drop the global ref count which may end up flushing the buffer.

I do think, however, that we should hammer out the per-object or global semantics here though.

This comment has been minimized.

@thestinger

thestinger Dec 12, 2014

Either way, the ability to change the defaults like C is important. The C design with global buffers leads to a whole family of unsafe _unlocked versions of the input/output functions though. Performance is often important, because many programs use the standard streams for bulk I/O.

This comment has been minimized.

@nodakai

nodakai Dec 17, 2014

I wonder if io::stdio::stdout() etc. should support customizing what it returns. For example Python supports assignment to sys.stdout. This colorama library wraps sys.stdout with its own adapter to interpret ANSI color escape sequences for Windows cmd.exe which doesn't understand them.

(This is only a nice-to-have feature.)

* `copy`. Take `AsPath` bound.
* `rename`. Take `AsPath` bound.
* `remove_file` (renamed from `unlink`). Take `AsPath` bound.

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

This isn't delete to make it clear that it's not going to work on a directory?

EDIT: ah, nvm.

This comment has been minimized.

@aturon

aturon Dec 12, 2014

Author Member

Actually delete is a better name.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

I do like how remove_file and remove_dir are clear equivalents though, so we should be sure to change both if we change one.

**Directories**:
* `make_dir` (renamed from `mkdir`). Take `AsPath` bound.
* `make_dir_all` (renamed from `mkdir_recursive`). Take `AsPath` bound.

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

The _all suffix isn't the most clear about the differences between this and make_dir.

This comment has been minimized.

@aturon

aturon Dec 12, 2014

Author Member

Agreed. I started with make_dir_recursive but that seemed pretty long. Other ideas?

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

Java uses mkdir and mkdirs, but I'm not sure make_dirs is really any more illuminating than make_dir_all, though it is shorter.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Some other alternatives looking at other languages and adapting it to what we're doing:

  • Ruby - make_path, remove_path
  • Java/Python - make_dirs, remove_dirs
  • Go - make_dir_all, remove_dir_all`

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Oh and:

  • Boost - make_directories - remove_all

This comment has been minimized.

@carllerche

carllerche Dec 12, 2014

Member

@aturon I don't hate make_dir_recursive, but it seems like it could be shortened to mkdir / mkdir_recursive

This comment has been minimized.

@Havvy

Havvy Dec 13, 2014

Contributor

I find 'mkdir' to be too short to be illuminating to somebody who hasn't been told that 'mkdir' is short for 'make directory' - which involves many Windows only programmers.

This comment has been minimized.

@thestinger

thestinger Dec 13, 2014

@Havvy: You can make that argument without the strange Windows claim, where mkdir is the name of the command used to create new directories (since DOS).

This comment has been minimized.

@arielb1

arielb1 Dec 14, 2014

Contributor

file_attr doesn't sound right to me (there are several things called "file attributes" (e.g. GetFileAttributes), and this function returns none of them). I think stat is a fine name, Windows calls this GetFileInformationByHandle/NtQueryInformationFile, and file_info could be also a decent name (but it feels way too non-specific to me), but on Windows you typically get this from FindFirstFile, which is certainly not a good name.

This comment has been minimized.

@thestinger

thestinger Dec 14, 2014

@arielb1: I think you meant to post that up there ^.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Dec 12, 2014

I'm a bit worried about the timeout changes making some uses of the current infrastructure impossible, or maybe just painful/awkward. For example, rust-postgres provides an iterator over asynchronous notifications sent from the database. A method defined on the iterator is next_block_for, which blocks waiting for a notification for some duration. The implementation sets a read timeout on the socket, reads the first byte, and then unsets the timeout. The assumption is that if we get any data from the server, we can expect a full message to come in fairly quickly after that. The logic required to read half a message and then stop and save it off when we hit the IO timeout is just too complex to bother with.

The current setup works fine, if a bit awkwardly: https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/message.rs#L279-L285
https://github.com/sfackler/rust-postgres/blob/39ad5ff651199287e92aa65ec771267c2f54ea8b/src/lib.rs#L286-L320

With the new infrastructure, it'll still be possible to take the same strategy, but probably through some kind of gross hackery like reading the first byte with a timeout, and then passing that byte to the main message read function without the timeout.

What would really be ideal is to have the ability to wait on the socket for data to be ready to read for a certain period of time. Is something like that feasible to implement before 1.0 in a cross platform manner?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 12, 2014

they involve extending a vector's capacity, and then passing in the resulting uninitialized memory to the read method, which is not marked unsafe! Thus the current design can lead to undefined behavior in safe code.

I don’t understand why this is undefined behavior and unsafe fn with_extra(&mut self, n: uint) -> &mut [T]; on Vec is not.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Dec 12, 2014

roughly interpreted at UTF-16, but may not actually be valid UTF-16 -- an "encoding" often call UCS-2; see http://justsolve.archiveteam.org/wiki/UCS-2 for a bit more detail.

I like the explanation linked there. Good find. 👍

impl OsStr {
pub fn from_str(value: &str) -> &OsStr;
pub fn as_str(&self) -> Option<&str>;

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Should this be -> Result<&str, ()>?

Or as a larger point (perhaps out of scope for this RFC), should Option<T> return types be Result<T, ()> instead when None kind of represents an error, in order to interoperate with try! and other error-handling infrastructure we might add?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

I recently changed from_utf8 to return Result<&str, Utf8Error>, so this can probably pick up that error. I suspect this will probably just continue to return the same value as str::from_utf8.

I do think that in general Option should only be used where None is a normal value, not an error (to use with try! as you pointed out). In the second pass of stabilization we're going to look closely at all this.

This comment has been minimized.

@thestinger

thestinger Dec 12, 2014

A string not being UTF-8 or a file not existing aren't any more of an error than a key not existing in a map. Attempting to open a file or parse text is also a way to discover if what you were looking for was there, just like a map lookup. There are few remaining use cases for Option if it's not meant to be used this way... any missing value can be considered an error, just like a missing file / whatever.

@netvl

This comment has been minimized.

Copy link

netvl commented Dec 12, 2014

Currently Rust IpAddr structure does not support zone indices in IPv6 addresses. They are needed for link-local addresses which are arguably much more important in IPv6 than in IPv4. Are there plans to do something with it?

```
In addition, `read_line` is removed in favor of the `lines` iterator,
and `read_char` is removed in favor of the `chars` iterator (now on

This comment has been minimized.

@netvl

netvl Dec 12, 2014

These methods are occasionally very useful when you don't need to read the entire stream but only a few lines or characters. I have several of these in my code base. The supposed replacement

let line = r.lines().next().unwrap()

doesn't look really good.

Also, why chars() is on Reader? Doesn't reading characters require buffering?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Yes we were thinking that the code you listed would be the replacement for a bare read_line. In general we're trying to move as much functionality to iterators as possible, and we could possibly add some form of method on an iterator which peels off the first element, failing if it's None if this becomes too unergonomic.

The current implementation for chars() doesn't actually use buffering at all, it just peeks at a byte and then might read some more bytes. We thought that if we're exposing bytes() on Reader which is not speedy unless buffered, then we may as well expose chars() as well.

This comment has been minimized.

@nodakai

nodakai Dec 17, 2014

@netvl Note that the .unwrap() part (or something like try!()) is necessary in either cases. I don't think

let first = r.lines().next().unwrap();
let second = r.lines().next().unwrap();

or

let mut lines = r.lines();
let first = lines.next().unwrap();
let second = lines.next().unwrap();

looks that worse than

let first = r.read_line().unwrap();
let second = r.read_line().unwrap();

As for chars(), any Unicode character in UTF-8 occupies at most 6 bytes. So read() into a fixed array on stack will suffice.


Such a framework is out of scope for this RFC, however, so the
endian-sensitive functionality will likely be provided elsewhere
(likely out of tree).

This comment has been minimized.

@netvl

netvl Dec 12, 2014

I don't like this section at all. Endianness conversions are absolutely necessary when working with cross-platform low-level protocols and serialization formats. Removing this entirely without an immediately available replacement is very bad. A lot of serialization crates will be broken without any ability to fix it (except manual endianness conversions). What, for example, libraries like this one should do?

This functionality should not be removed without providing an alternative first.

This comment has been minimized.

@arielb1

arielb1 Dec 12, 2014

Contributor

@netvl

You should just parse binary integers manually.

This comment has been minimized.

@netvl

netvl Dec 12, 2014

That's exactly my point. This is really error-prone and should not happen. Instead of relying on the library to do everything right, you yourself need to think about it. Eventually some library will appear, obviously, whether it will be community-driven or "official" one, but it is better not to force it given that we can avoid it.

This comment has been minimized.

@sfackler

sfackler Dec 12, 2014

Member

Creating an replacement will involve copying a bunch of the methods defined on Reader and Writer into a new trait, impling them for Reader and Writer and pushing it to crates.io. That does not seem like some kind of nightmare scenario to me.

This comment has been minimized.

@netvl

netvl Dec 12, 2014

Yes, but I think this should be defined in this RFC explicitly.

This comment has been minimized.

@thestinger

thestinger Dec 12, 2014

There are already in-memory endian conversion functions. It could just define a way to go from [u8, ..n] to the integer with width n. You read into the buffer and then convert. If no conversion is required, it will be a no-op and can likely be optimized out.

This comment has been minimized.

@netvl

netvl Dec 13, 2014

@thestinger, I thought that the only way to convert endianness was the Reader. If there are in-memory conversion functions, that's ok, I think.

impl OsStrBufExt for os_str::OsStrBuf { ... }
trait OsStrExt {
fn to_wide(&self) -> &[u16];

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Did you mean -> Vec<u16> here? The only way to have both -> &[u16] and pub fn as_str(&self) -> Option<&str>; is to keep both representations around. Is it worth the memory cost? I suppose the WTF-8 representation could then be initialized lazily, skipping conversions entirely for e.g. opening a Path from readdir on Windows. Or maybe that wouldn’t work (could lazy initialization be done at all in methods that take &self and not &mut self?)

This comment has been minimized.

@aturon

aturon Dec 13, 2014

Author Member

I meant Vec<u16>; will fix.

trait OsStrBufExt {
fn from_vec(Vec<u16>) -> Self;
fn into_vec(Self) -> Vec<u16>;

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Same comment as for OsStrExt::to_wide, this seems to assume that the internal representation is Vec<u16>. If it’s WTF-8, these methods could have less restrictive ownership constraints: from_slice(&[u16]) -> Self and to_vec(&self) -> Vec<u16>.

Also, the names maybe should be consistent (vec v.s. wide).

This comment has been minimized.

@aturon

aturon Dec 13, 2014

Author Member

Yep. This was a copy-paste bug.

pub struct ByRef<'a, Sized? T:'a> {
pub inner: &'a mut T
}
```

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Is there a point in even having a ByRef type? Could this work? impl<'a, W: Writer> Writer for &'a mut W { …

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

It's useful for composition where later methods end up consuming self. For example:

iter.by_ref().map(...);
// vs
(&mut iter).map(...);

I believe we do actually have an implementation of Writer for &mut W: Writer, but the by_ref method just helps with chaining. (iterators have this as well).

[Buffering]: #buffering
The current `Buffer` trait will be renamed to `BufferedReader` for
clarity (and to open the door to `BufferedWriter` at some later

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

What do the current BufferedReader and BufferedWriter (concrete wrappers rather than traits) become?

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Nevermind, that’s answered below. Traits and types with the same name don’t collide?

This comment has been minimized.

@reem

reem Dec 12, 2014

It seems like they must when trait objects involved.

This comment has been minimized.

@nodakai

nodakai Dec 17, 2014

We must rename either of two BufferedReaders.

trait X { }
struct X;

fn main() {
}
<anon>:2:1: 2:10 error: duplicate definition of type or module `X`
<anon>:2 struct X;
         ^~~~~~~~~
<anon>:1:1: 1:12 note: first definition of type or module `X` here
<anon>:1 trait X { }
         ^~~~~~~~~~~
error: aborting due to previous error
[MemReader and MemWriter]: #memreader-and-memwriter
The various in-memory readers and writers available today will be
consolidated into just `MemReader` and `MemWriter`:

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

impl Writer for Vec<u8> and impl<'a> Reader for &'a [u8] were added relatively recently. Does this imply they’re removed? I’d rather have them stay.

This comment has been minimized.

@reem

reem Dec 12, 2014

It looks like those may be orthogonal, since MemReader and MemWriter will implement Seek, unlike Vec<u8> and &[u8].

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

I’m fine with having both.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Ah while not explicitly stated, we definitely wanted to keep those impls!

This comment has been minimized.

@thestinger

thestinger Dec 12, 2014

@SimonSapin: The implementations on vectors and slices don't have seeking. These implementations provide more functionality at the expense of being slower and more verbose.

**Environment variables**:
* `vars` (renamed from `env`): yields a vector of `(OsStrBuf, OsStrBuf)` pairs.

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Does "yields X" here mean "returns X"? I’m kinda used to it meaning "returns an iterator of X".

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Here I believe this means returns because calling vars is basically taking a snapshot of the process's environment variables (other threads can modify them while you iterate over them).

This comment has been minimized.

@thestinger

thestinger Dec 12, 2014

That's a very inefficient API.

This comment has been minimized.

@erickt

erickt Dec 13, 2014

@thestinger: do you have a better suggestion? The only thing more optimal I can think of that's thread safe would be:

fn with_vars<T>(f: |EnvIterator| -> T) -> T { ... }

Where EnvIterator is an Iterator<(OsStrBuf, OsStrBuf)>. Since we have the environment locked inside this closure, we could get away with not allocating the key-value pairs, but instead wrap the raw *u8s. That could work if someone really needs the performance.

Or we have vars return an EnvIterator that internally holds the environment lock, and frees it on drop. On the plus side, no closures, on the minus sign, users would have to be good about wrapping var() calls in a scope, or they could inadvertently block everyone else from accessing the environment. I could see this happening fairly frequently in main, unfortunately.

This comment has been minimized.

@thestinger

thestinger Dec 13, 2014

Those are the semantics of all other mutable shared memory types, although they use an RAII lock.

This comment has been minimized.

@erickt

erickt Dec 15, 2014

I would be okay with .vars() returning an iterator that held a lock. In the worst case, it's a pretty straightforward deadlock to document and find, and if we ever get eager drops then this wouldn't be an issue anymore.

* `remove_var` (renamed from `unsetenv`): takes a `IntoOsStrBuf`-bounded value.
* `join_paths`: take an `IntoIterator<T>` where `T: IntoOsStrBuf`, yield a `Result<OsString, JoinPathsError>`.
* `split_paths` take a `IntoOsStrBuf`, yield an `Iterator<Path>`.

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

What are these two? For environment variables like PATH, with an OS-specific separator? At first, join_paths sounds like Path::join. Shouldn’t this module just provide the separator, to be used with general-purpose split and connect methods?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Yeah they're the analog of the methods in os today, and they're precisely dealing with : vs ; as well as some weird quoting rules on windows. I believe that the quoting problem on windows is why we're not just exposing a separator to use with other methods (sadly).

I do, however, prefer the name connect to join slightly!

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Either name is fine, I was just referring to StrVector::connect. I didn’t know about Windows quoting there. I’d say it’s a feature more than a problem, since on Unix : is valid in directory names but then that name apparently can’t be used at all in :-separated path lists!

consolidated into just `MemReader` and `MemWriter`:
`MemReader` (like today's `BufReader`)
- construct from `&[u8]`

This comment has been minimized.

@reem

reem Dec 12, 2014

Today's MemReader provides a 'static Reader - there should be a replacement if it is removed in favor of a wrapper around &[u8].

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

Just to make sure I'm following, you'd like to make sure that there's a primitive for an owned MemReader?

This comment has been minimized.

@reem

This comment has been minimized.

@huonw

huonw Dec 18, 2014

Member

What about:

struct MemReader<Data: Deref<[u8]> {
    // ...
}

and then MemReader<Vec<u8>> and MemReader<&[u8]> etc. work.

The `IoErrorKind` type will become `std::io::ErrorKind`, and
`ShortWrite` will be dropped (it is no longer needed with the new
`Writer` semantics), which should decrease its footprint. The
`OtherIoError` variant will become `Other` now that `enum`s are

This comment has been minimized.

@reem

reem Dec 12, 2014

OtherIoError -> io::ErrorKind::Other seems like a regression in ergonomics. This is true of basically all the ErrorKind variants, since ErrorKind is pretty long. Could we have a shorter name?

This comment has been minimized.

@tshepang

tshepang Dec 12, 2014

Contributor

@reem probably because this hasn't been implented yet: rust-lang/rust#18073?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 12, 2014

Member

This is following our current conventions for enums and namespacing, and names can always be imported into modules with new names!

This comment has been minimized.

@SimonSapin

SimonSapin Dec 12, 2014

Contributor

Namely: use std::io::ErrorKind as IoError;, then IoError::Other

This comment has been minimized.

@reem

reem Dec 13, 2014

I meant could ErrorKind be renamed to be shorter, since it seems like the main culprit for increase in length of the name.

RFC discusses the most significant problems below.

This section only covers specific problems with the current library; see
[Vision for IO] for a higher-level view. section.

This comment has been minimized.

@untitaker

untitaker Jan 23, 2015

Contributor

...for a higher-level view. section.

Typo?

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 26, 2015

The fs sub-RFC has now been posted.

@yazaddaruvala

This comment has been minimized.

Copy link

yazaddaruvala commented Jan 28, 2015

@aturon given that the eventual goal is to have both blocking and nonblocking implementations of the std::io crate do you think it makes sense to move the new blocking APIs into std::io::blocking rather than just std::io? (replace io with fs or process where it makes sense).

Summary:
std::io::nonblocking could then be created lazily. The naming would let users know that std::io doesn't have a default, there are two options and its up to them to pick.

Long Story:
I think we can all agree, blocking io and nonblocking io are just different, neither is really better than the other. They just have different trade offs and should be used appropriately. Given that:

Option 1: Call the APIs read vs read_async

  • You're sort of suggesting that async is a special kind of read. Also since one has 2x the characters, its slightly discouraging (both to write and read).
  • Its not explicit that read is blocking and may accidentally be used when it shouldn't.

Option 2: Call the APIs read_sync vs read_async

  • Just a bit verbose

Option 3: Call the APIs blocking::read vs nonblocking::read

  • It would be a little unusual for the same mod to use both types of io. Generally then (when used exclusively) you would get simple function names in both cases. eg. read

I would be happy with either Option 2 or 3, preferring 3. Option 1 just leaves me a little uneasy..

@nodakai

This comment has been minimized.

Copy link

nodakai commented Jan 28, 2015

@yazaddaruvala I think it's more practical to have std::io for blocking I/O and std::io::nonblocking for non-blocking I/O. I doubt anyone would complain about the "asymmetry" between them. (Maybe we can lift std::io::blocking to std::io...)

@tshepang

This comment has been minimized.

Copy link
Contributor

tshepang commented Jan 28, 2015

Maybe we can lift std::io::blocking to std::io..

I don't understand this @nodakai.

@nodakai

This comment has been minimized.

Copy link

nodakai commented Jan 28, 2015

@tshepang I meant re-export such as std::io::fs::Filestd::io::File

@yazaddaruvala

This comment has been minimized.

Copy link

yazaddaruvala commented Jan 29, 2015

@nodakai I understand the desire to have a "default" io, especially given that its currently the only io. But is it really the right long term philosophy?

If the only current difference is import std::io vs import std::io::blocking is it really worth causing asymmetry? i.e. implicitly endorsing one over the other?

I've only played around with Nodejs a little, and some of its ideas are definitely controversial, but I think everyone can agree the one thing it did really well was educate all of its users about the difference between blocking and nonblocking io. And yes you could achieve this through documentation.. but similar to immutable by default, syntax/explicitness (at minimal cost) is the best way to educate people.

@reem

This comment has been minimized.

Copy link

reem commented Jan 29, 2015

I've actually been doing a lot of experimentation and work on non-blocking IO and I can say that I do think that blocking IO is a better default model for Rust. It fits much better in with the borrow system for resource management, since the usage of all resources is deterministic relative to the structure of the code, whereas this is not true at all for asynchronous actions.

Additionally, until (if?) Rust has true async/await support or even simple generators via yield it is pretty hard to write clean asynchronous code or integrate well with the borrow checker. Clean non-blocking IO interfaces also inevitably end up doing a lot of double-buffering and allocation, which can really hamper performance.

The non-clean, extremely low-level bindings for asynchronous IO already exist in the form of mio, and I really think there is no need to integrate these things into std::io until they have a time to mature and we figure out the best way to model asynchronous behavior in Rust. We can always add a std::nio or std::aio in the future, though I'm still not convinced that there's anything wrong with this stuff just living in the cargo-verse.

@tshepang

This comment has been minimized.

Copy link
Contributor

tshepang commented Jan 29, 2015

std::nio/std::aio are neat names, much better than std::io::nonblocking

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 30, 2015

I agree with what several others have said here: I think std::io is fine, with the potential to either grow in place to encompass async IO or else add std::aio later on.

@MarkusJais

This comment has been minimized.

Copy link

MarkusJais commented Feb 3, 2015

I agree, too. std::aio sounds much better than std::io::nonblocking.

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Feb 3, 2015

An amendment for std::net has been posted.

@mzabaluev

This comment has been minimized.

Copy link
Contributor

mzabaluev commented Feb 5, 2015

Regarding with_extra, I wonder if there could be a more self-explanatory way to use it:

impl<T> Vec<T> where T: Copy {
    pub unsafe fn fill_more<F, E>(&mut self, len: usize, op: F) -> Result<usize, E>
        where F: FnOnce(&mut [T]) -> Result<usize, E>
    { ... }
}
@erickt

This comment has been minimized.

Copy link

erickt commented Feb 5, 2015

@mzabaluev: Or maybe Vec::with_uninitialized(...)?

@zemlanin zemlanin referenced this pull request Feb 15, 2015

Merged

io reform update #9

Manishearth added a commit to Manishearth/rust that referenced this pull request Mar 15, 2015

Rollup merge of rust-lang#23379 - kballard:tweak-stdio-docs-no-raw-co…
…nstructors, r=alexcrichton

`std::io` does not currently expose the `stdin_raw`, `stdout_raw`, or
`stderr_raw` functions. According to the current plans for stdio (see
rust-lang/rfcs#517), raw access will likely be provided using the
platform-specific `std::os::{unix,windows}` modules. At the moment we
don't expose any way to do this. As such, delete all mention of the
`*_raw` functions from the `stdin`/`stdout`/`stderr` function
documentation.

While we're at it, remove a few `pub`s from items that aren't exposed.
This is done just to lessen the confusion experienced by anyone who
looks at the source in an attempt to find the `*_raw` functions.

@aturon aturon referenced this pull request Mar 23, 2015

Merged

RFC for read_all #980

@aturon aturon referenced this pull request Jan 27, 2016

Merged

Add File::try_clone #31069

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.