Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAmend RFC 517: Add material on std::fs #739
Conversation
This comment has been minimized.
This comment has been minimized.
|
A general comment: some of the functionality currently in These are not permanent changes, nor or they intended to be stabilized immediately. In many cases, the movement just signifies that we do not have a cross-platform implementation yet. Very little in In other words, this RFC stabilizes only the parts that land in |
aturon
referenced this pull request
Jan 26, 2015
Merged
RFC: io and os reform: initial skeleton #517
This comment has been minimized.
This comment has been minimized.
|
Note that with this revision, both The recursive make/remove directory APIs have not yet been renamed, though the proposed names are not ideal. I'd like to suggest |
This comment has been minimized.
This comment has been minimized.
|
By renamed, do you mean in your branch where you are making these changes? |
This comment has been minimized.
This comment has been minimized.
Ah, no sorry. I meant, from the original version of the RFC. |
This comment has been minimized.
This comment has been minimized.
|
There were some worries about the name |
This comment has been minimized.
This comment has been minimized.
|
For reference, here's why |
kjpgit
reviewed
Jan 26, 2015
| #### Files | ||
| [Files]: #files | ||
|
|
||
| The `File` type will largely stay as it is today, except that it will |
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 26, 2015
what happens when close fails, because you are using an async filesystem? (note this is different than caring about the server crashing). and please don't say fsync/flush each time because some people might want to run some rust programs from a script and then run 'sync' manually.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 27, 2015
Member
This RFC is largely concerned with the API of std::fs and how it's affected by the implementation details. I believe that we'll always have a Drop implementation which calls close on the file descriptor, so in some sense this is an implementation detail. Do you have an idea in mind where this would change the API of File, however?
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 27, 2015
The API http://doc.rust-lang.org/std/io/fs/struct.File.html doesn't tell me how I know close succeeded, so I view it as incomplete. I also think what functions panic is part of the public API that shouldn't change.
As I posted http://discuss.rust-lang.org/t/fs-file-should-panic-if-implicit-close-fails-and-no-panic-is-active/1349 , I think there should be an explicit .close() method that people can call, that returns an error, and otherwise drop should panic if implicit close() fails and there is not already a panic in progress. I think that makes the most code correct.
edit: and having an explicit .close() could change what errors are returned by other functions, e.g. read() could then panic or return an error ("file is closed").
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 28, 2015
Member
I think we may want to continue the discussion about explicit close over here.
kjpgit
reviewed
Jan 26, 2015
| [Files]: #files | ||
|
|
||
| The `File` type will largely stay as it is today, except that it will | ||
| use the `AsPath` bound everywhere. |
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 26, 2015
Where is the file descriptor management / safety strategy? FDs are a global heap, just like memory, that can alias each other, have double frees, dangling pointers, etc. What precautions are taken by the IO types to make fd handling safe? It would be shame if safe rust code couldn't corrupt memory, but could corrupt the FD table.
I think at a bare minimum, IO types should abort if(errno == EBADF) on any operation, but that is closing the barn door after the horses are out.
I'm looking at as_raw_fd and wondering what its purpose is. Is anything mutating that going to be marked as unsafe? What about reading it? Obviously that fd could be invalidated by the time you get it, if another thread got a mutex on the file and closed it.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 27, 2015
Member
I consider our story here very analagous to that of plain old memory management pointers like Box<T> You can get a very unsafe *mut T, for example, safely from a Box<T> but mutation is all unsafe (similar to how as_raw_fd is safe but usage is likely unsafe). Similarly, Box<T> doesn't attempt to check if the memory is valid and I don't think that our bindings will special-case EBADF as well.
In general each object will own its file descriptor. The descriptor can be inspected and lent out via a safe API (but in this case platform specific) and relies on modification being unsafe elsewhere. You are, however, guaranteed that the raw component (in this case a file descriptor) is valid so long as the file itself is still alive.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 28, 2015
to be clear, this is what bothers me:
let p = Path::new("/dev/null");
let f = File::open(&p).unwrap().as_raw_fd();
println!("fd: {}", f);
That's a dangling, non-sensical "pointer" even in single threaded, 100%
safe code. Can as_raw_fd just be made unsafe?
On Mon, Jan 26, 2015 at 7:42 PM, Alex Crichton notifications@github.com
wrote:
In text/0517-io-os-reform.md
#739 (comment):+*
remove_dir(renamed fromrmdir). TakeAsPathbound.
+*remove_dir_all(renamed fromrmdir_recursive). Take
AsPathbound.
+*walk_dir. TakeAsPathbound. Yield an iterator overIoResult<DirEntry>.
+Links:
+
+*hard_link(renamed fromlink). TakeAsPathbound.
+*sym_link(renamed fromsymlink). TakeAsPathbound.
+*read_link(renamed formreadlink). TakeAsPathbound.
+
+#### Files
+[Files]: #files
+
+TheFiletype will largely stay as it is today, except that it will
+use theAsPathbound everywhere.I consider our story here very analagous to that of plain old memory
management pointers like Box You can get a very unsafe *mut T, for
example, safely from a Box but mutation is all unsafe (similar to how
as_raw_fd is safe but usage is likely unsafe). Similarly, Box doesn't
attempt to check if the memory is valid and I don't think that our bindings
will special-case EBADF as well.In general each object will own its file descriptor. The descriptor can be
inspected and lent out via a safe API (but in this case platform specific)
and relies on modification being unsafe elsewhere. You are, however,
guaranteed that the raw component (in this case a file descriptor) is valid
so long as the file itself is still alive.—
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rfcs/pull/739/files#r23581430.
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 28, 2015
Member
In what way is that unsafe? The fd is just an integer, and I can't see how you'd extract any memory unsafety from it. In the worst case, it's closed while there's another copy of it floating around, in which case calls using it start returning errors.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 28, 2015
Member
How do I get a *mut T from a Box using safe code?
Something like:
let foo = Box::new(3);
let foo_ptr = &*foo as *const i32;
drop(foo);
println!("{:?}", foo_ptr);Can as_raw_fd just be made unsafe?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 28, 2015
Member
Er sorry, didn't finish that last comment, I was going to mention that your example is very similar to:
use std::ffi::CString;
let c_str = CString::from_slice(b"foo").as_ptr();
unsafe { c_funtion(c_str); }This is also passing a dangling pointer to an external function, but I don't think it necessarily means we should make as_ptr and unsafe method.
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 28, 2015
On Wed, Jan 28, 2015 at 10:39 AM, Steven Fackler notifications@github.com
wrote:
In what way is that unsafe? The fd is just an integer, and I can't see how
you'd extract any memory unsafety from it. In the worst case, it's closed
while there's another copy of it floating around, in which case calls using
it start returning errors.No, the worst case is it aliases another later opened file and you
truncate /etc/passwd without meaning to. You can also pass it to open
/proc/self/fd/N and get the wrong (aliased) file. Heh.
alexcrichton
reviewed
Jan 26, 2015
| extension methods on this structure. | ||
|
|
||
| * `set_perm` (renamed from `chmod`). Take `AsPath` bound, and a | ||
| `FilePermissions` value. The `FilePermissions` type will be revamped |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 26, 2015
Member
We may want to consider the pluralization of Permissions vs perm in set_perm. Right now this recommends .perm() on FileAttr as well as set_perm (both singular) but the struct FilePermissions is pluralized. Is it worth making sure they're all consistent? I'd be fine either way but might lean slightly towards everything singular over everything plural.
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
Jan 26, 2015
|
Are you sure you want hard- and sym-links in the cross-platform portion of the filesystem API? They're ubiquitous on POSIX, but (as I understand it) not terribly well supported on Windows, and then there's platforms that don't support them at all, or support links that are neither hard nor soft (like Mac OS/HFS+ "shortcuts"). It seems to me that should definitely be squirreled away in std::platform, along with things like mknod() and mkfifo(). |
alexcrichton
reviewed
Jan 26, 2015
| The `open_mode` function will take an `OpenOptions` struct, which will | ||
| encompass today's `FileMode` and `FileAccess` and support a | ||
| builder-style API. | ||
|
|
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 26, 2015
Member
One method I think we forgot to consider is truncate on files today. Right now the function matches Unix semantics where it basically acts as a set_len function. It will shorten the file or extend it (filling in with 0s) if necessary.
This function, however doesn't exist on Windows to my knowledge (cc @retep998, I may be missing something). Currently we're using SetEndOfFile which operates slightly different than the Unix variant. On Windows you must first seek to a position and then "truncate" the file, which is to say that Windows will simply set the length of the file on disk to the current pointer in the file.
Unfortunately, the current semantics may lead to some loss on windows. Our "emulation" of ftruncate on Windows involves seeking the file to the desired length, calling SetEndOfFile, and then seeking back to where the file used to be. The "seek back" operation, however, may fail, which would mean the operation actually completed successfully but we're forced to still return an error.
I think I would personally recommend matching Windows semantics rather than unix semantics in this case. We would drop the u64 argument to the function, call SetEndOfFile on Windows, and unix would be a combination of lseek (to learn the position) followed by ftruncate.
This comment has been minimized.
This comment has been minimized.
alexcrichton
Jan 26, 2015
Member
Aha! @retep998 has pointed out SetFileInformationByHandle to me which worked like a charm!
In that case I think we're fine :)
This comment has been minimized.
This comment has been minimized.
|
@Screwtapello Windows supports hard and symbolic links just fine. Really the issue is not whether the OS supports them, but whether the filesystem supports them. In which case all we have to do is error if the filesystem doesn't support hard or symbolic links. |
This comment has been minimized.
This comment has been minimized.
|
It may also be worth talking about Should we return |
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 26, 2015
|
my 2cents on EINTR. If the user goes out of their way to set up a signal handler for SIGINT, they probably don't want you to blindly loop on it when it happens. And if they really want to mask that signal temporarily (e.g. during a certain close() or other operation), they can also do that with global sigprocmask() state. (threading issues and unwinding / restore issues around that not withstanding). |
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 26, 2015
|
I'd also note that on bsd and linux, SA_RESTART can be used with sigaction(2), to give more control over syscall restarting. edit: and my gist is, the user can do this if they dont want the rust functions to fail, rather than have rust re-implement the wheel here |
jfager
reviewed
Jan 26, 2015
| **Links**: | ||
|
|
||
| * `hard_link` (renamed from `link`). Take `AsPath` bound. | ||
| * `sym_link` (renamed from `symlink`). Take `AsPath` bound. |
This comment has been minimized.
This comment has been minimized.
jfager
Jan 26, 2015
I never see 'symlink' broken up into 'sym link', it's a full-blown portmanteau. Seems like this should be either 'symlink' or 'soft_link'.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
jmesmon
Jan 27, 2015
What's the value in adding an underscore the sym_link here? Even the wikipedia article on symbolic links notes the term symlink as an alias.
jfager
reviewed
Jan 27, 2015
| The `stat` method will be renamed to `attr`, yield a `FileAttr`, and | ||
| take `&self`. | ||
|
|
||
| The `fsync` method will be renamed to `flush`, and `datasync` will be |
This comment has been minimized.
This comment has been minimized.
jfager
Jan 27, 2015
This seems like a bad renaming - the key feature of fsync over fflush is that the former propagates all the way to disk (yes, w/ caveats) while the latter only hands buffered data over to the os. The implementation is currently in terms of 'fsync', calling this 'flush' is confusing.
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 27, 2015
I agree, that caught me off guard as well. It's making flush a very context-dependent.name.
This comment has been minimized.
This comment has been minimized.
jmesmon
Jan 27, 2015
If we're going full on descriptive names, file_sync and data_sync would be clearer. flush has an overloaded meaning that will cause confusion. And given these are methods on a file, it'd propose sync and data_sync as better non-repetitive choices.
Edit: or even more explicit:
sync_all and sync_data.
This comment has been minimized.
This comment has been minimized.
kjpgit
Jan 28, 2015
Is it even possible to call it file.flush(), seeing how .flush is already a writer trait?
This comment has been minimized.
This comment has been minimized.
As @retep998 mentioned, Windows does have good support for these functions. I believe that the symlink-creating function did not exist in XP which is certainly one of the pain points of exposing the API. We will likely just return an error for XP, however (as @retep998 mentioned).
Could you elaborate a bit on this? Do you mean that the some filesystems that OSX uses do not support this? Or are you thinking that some distributions simply don't have the @kjpgit I find your comments pretty convincing, it sounds like we should punt |
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 27, 2015
|
I take back what I said about EINTR handling, it's pretty hard (forgot about SIGCHLD, SIGWNCH, etc., and SA_RESTART special cases are a mile long). Also, Python appears to be going down the 'retry everything` route: https://www.python.org/dev/peps/pep-0475/ ("PEP 475 - Retry system calls failing with EINTR") That does seem saner and makes things much easier to test and reason about. edit: to be clear, python can get away with that because they have integrated the 'catch SIGINT' into the interpreter and raise an exception on that, because that's one thing people actually want to let change the control flow of their program (although it's still pretty hard / impossible to test that, I can say from experience, when literally every single line of code can throw an exception). for rust that might be harder, as panics aren't recoverable and I'm not sure if you register your own signal handlers by default (would be obviously bad for FFI). However, I think it's generally good practice to force people to 'opt in' to potentially unsafe or buggy or really hard to test behavior, and having to 'opt in' to use system calls that randomly fail due to EINTR (based on some totally random SIGCHLD caused by another library) might be more prudent, even if it irritates someone who wants a blocking read/write to be 'cancelable' WITH A CUSTOM SIGNAL HANDLER THAT KEEPS THE PROGRAM STILL RUNNING (Default ctrl-c behavior terminates process which will still work). |
Valloric
reviewed
Jan 27, 2015
| `perm` accessors. The various `os::platform` modules will offer | ||
| extension methods on this structure. | ||
|
|
||
| * `set_perm` (renamed from `chmod`). Take `AsPath` bound, and a |
This comment has been minimized.
This comment has been minimized.
Valloric
Jan 27, 2015
Please consider calling this set_permissions. perm is not a commonly used abbreviation of permissions, while dir is for directory for instance.
jmesmon
reviewed
Jan 27, 2015
|
|
||
| The `FileType` module will be renamed to `FileKind`, and the | ||
| underlying `enum` will be hidden (to allow for platform differences | ||
| and growth). It will expose at least `is_file` and `is_dir`; the other |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 27, 2015
|
I just looked at my web server code and I have a bare .write() call on a socket (as well as .push), and no retry loop (it was in a try!). I've been writing sockets code for 15 years and I still wouldn't expect that to fail spuriously with EINTR due to some random library potentially having a signal handler. So I think it's time to put 1970s-era signals interrupting 'blocking' system calls in the grave, and for people who want interruption either 1) use non blocking io, which is is far saner (you control where and what thread(s) get notified) 2) use explicit _noretry versions of "blocking" system calls. |
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
Jan 27, 2015
Modern filesystems can store lots of different kinds of things. Probably nothing is universal, but files and directories are nearly so. Hard-links and symlinks are widespread, followed by a long tail of less-common things like HFS+ "shortcuts", FIFOs, alternate data streams, and so forth. A cross-platform API has to draw the line somewhere, and I was expecting Rust's cross-platform API to draw the line at files and directories. It's true that both POSIX and Windows support creating symlinks, but first-class support would also require If it's really possible to implement solid support for hard-links and symlinks on all supported platforms, then fair enough, let's do it. |
nrc
assigned
aturon
Jan 27, 2015
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 28, 2015
|
What is the thread safety (and expensiveness there of) of File / BufferedWriter? Is it going to be crappy like glibc, where every fwrite() takes a mutex lock, and you have to use fwrite_unlocked() to get decent performance, and pinky swear it's not being used in multiple threads? It would be great if the ownership semantics of rust mean you don't have to internally lock for no reason. But what about stdout/stderr handles? It looks like stdout() creates a private buffer for the existing fd (1,2) and flushes but doesn't close on drop. So... no internal mutex lock should ever be needed, right? Just clarifying. If a close() method gets added to Writer, the stdout()/stderr() handles might have to intercept that and actually not close the underlying fd. Only when main function ends would stdout/stderr get closed? (you could close stdout first and if that fails, panic to stderr). |
This comment has been minimized.
This comment has been minimized.
kjpgit
commented
Jan 28, 2015
|
fyi LineBufferedWriter.into_inner seems like a bad method WRT error handling. it calls flush but no error response? Also, should stdout() return a fully buffered writer if there is no terminal, like c does? |
jminer
reviewed
Jan 29, 2015
| * `remove_dir` (renamed from `rmdir`). Take `AsPath` bound. | ||
| * `remove_dir_all` (renamed from `rmdir_recursive`). Take | ||
| `AsPath` bound. | ||
| * `walk_dir`. Take `AsPath` bound. Yield an iterator over `IoResult<DirEntry>`. |
This comment has been minimized.
This comment has been minimized.
jminer
Jan 29, 2015
What is the difference between read_dir and walk_dir as they both return iterators over DirEntrys?
This comment has been minimized.
This comment has been minimized.
cmr
Jan 29, 2015
Member
walk is recursive, read is not, as they are today in std::old_io. This is standard terminology.
jminer
reviewed
Jan 29, 2015
| as well (possibly via platform-specific extensions). | ||
| * `remove_dir` (renamed from `rmdir`). Take `AsPath` bound. | ||
| * `remove_dir_all` (renamed from `rmdir_recursive`). Take | ||
| `AsPath` bound. |
This comment has been minimized.
This comment has been minimized.
jminer
Jan 29, 2015
I feel like "create" and "delete" are used rather than "make" and "remove." I don’t think the names are bad as they are, but I think they would be better if they matched the terminology people commonly use.
I checked the file manager on Windows and Linux Mint and one on Android, and all of them had a menu item to "delete" a file/directory (OS X just has "Move to Trash" and "Empty Trash"). Even the Linux man page for rmdir() says it "deletes" a directory. Similarly, the mkdir() and open()/creat() man pages say that they can "create" a file/directory. Here’s a random page I found while looking for something slightly different: http://apple.stackexchange.com/q/73876/66147. There are multiple uses of "create" and "delete" (in some form) on the page.
I would rename:
remove_filetodelete_filemake_dirtocreate_dirmake_dir_alltocreate_dir_allremove_dirtodelete_dirremove_dir_alltodelete_dir_all
This comment has been minimized.
This comment has been minimized.
jmesmon
Feb 5, 2015
mkdir is an abbreviation of make directory. The only reason the manual avoids saying "make directory" is that it would be repeating it self.
The proposed renaming (to delete/create) doesn't make sense.
Frankly, I prefer unlink because it's more explicit about what's happening, but I can understand why others are more in favor of remove
This comment has been minimized.
This comment has been minimized.
jminer
Feb 5, 2015
I don't see why create/delete make less sense than make/remove. They mean close to the same thing, except create/delete are the verbs people almost always use. I think it would take (a little) more effort to remember remove_file if people normally think of "deleting" files.
I have seen multiple programs where "remove" is used to remove a file from a project/playlist without deleting it from the file system and "delete" is used to also delete it from the file system (e.g., jEdit, foobar2000, Qt Creator).
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
This comment has been minimized.
This comment has been minimized.
|
This RFC looks like it's fairly close to being accepted, so I've created a PR for its implementation: rust-lang/rust#21936. As with the previous PRs, nothing is set in stone and I'll be sure to update with any final revisions to this RFC! |
This comment has been minimized.
This comment has been minimized.
|
One part I'd like to bring up here as well is that the It's unclear to me to what extent we should be providing this contextual information. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
This comment has been minimized.
This comment has been minimized.
|
When creating/opening a file, there are actually 3 combinations that need to be represented:
The current implementation proposed merely has a |
This comment has been minimized.
This comment has been minimized.
|
Another possible option along that route would be: /// If the `create` option is specified then this flag indicates whether
/// creation will fail if the file already exists.
fn exclusive(&mut self, exclusive: bool) -> &mut OpenOptions; |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 4, 2015
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 5, 2015
This comment has been minimized.
This comment has been minimized.
|
I've updated the RFC to match the implementation; the changes are pretty minor renamings that have already been proposed on the thread above. |
This comment has been minimized.
This comment has been minimized.
|
This discussion on this RFC seems to have slowed down quite a bit and it seems like the overall design is well agreed-upon. There's still a number of points to bikeshed on in a few aspects such as various naming here and there, but this RFC has reached a point where it's ready to merge. There's also a number of extensions to this API which can happen over time which should all be backwards compatible within the framework created here. I'd encourage lingering comments to be placed on the current PR (rust-lang/rust#21936) as well. Thanks again for the comments everyone! |
aturon commentedJan 26, 2015
The IO reform RFC is being split into several semi-independent pieces, posted as PRs like this one.
This RFC amendment adds the file system section.
Rendered