Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCross-platform file system abstractions #10
Comments
killercup
added
the
tracking issue
label
Feb 20, 2018
This comment has been minimized.
This comment has been minimized.
|
From the etherpad:
|
This comment has been minimized.
This comment has been minimized.
|
Maybe also relevant (also from the etherpad):
|
This comment has been minimized.
This comment has been minimized.
matthiasbeyer
commented
Mar 1, 2018
|
I want to simply dump this here: I'm the author of imag and we have a rather nice abstraction for FS access... maybe you can learn from that. Our abstraction is rather specific for our use-case (each file has a TOML header and a "content" section), but the basic concept (a "store" which holds the files and they can then be borrowed from that store, so that concurrent access to one file is not possible) could be useful for other CLI apps. More information about the |
This comment has been minimized.
This comment has been minimized.
sicking
commented
Mar 2, 2018
|
One thing that we talked about at Mozilla which would be useful in Firefox, as well as possibly useful to expose to the web, would be an API for doing atomic file writes in "small" files. As I understood it, the best way to do truly atomic writes efficiently is to copy the file, while inserting the desired modifications, into a new file. Once the new file has been written issue a rename to the original file name. Alternatively, if the file is small enough and multiple modifications are expected, read the file contents into memory and each time that a modification should be done write out a new file and rename to the desired file destination. The nice thing with this approach is that it can be done without complex journaling files, and without needing to call flush() which can be quite a perf bottleneck. This was something that we were thinking of using for the numerous small configuration/database files that Firefox keeps. I think something like this would fit well with Rust's focus on performance and safety. I'm not sure that this is particularly CLI specific though. But since other filesystem stuff was discussed I figured I'd mention it. |
This comment has been minimized.
This comment has been minimized.
luser
commented
Mar 21, 2018
For some reason I thought the APIs in AFAIK it's OK to use verbatim paths in all Windows APIs, so perhaps a crate that provides a small wrapper type would be good enough? Something like (If someone writes such a crate I would also like it to have an easy |
This comment has been minimized.
This comment has been minimized.
luser
commented
Mar 21, 2018
FYI this link 404s now, the code seems to have moved to https://github.com/dherman/verbatim |
This comment has been minimized.
This comment has been minimized.
luser
commented
Mar 21, 2018
...and after actually reading a little, it sounds like that crate is about 90% of what I proposed in my previous comment. :) |
This comment has been minimized.
This comment has been minimized.
|
See also: I would be interested to here more about how other ecosystems solve this. In particular, how do they deal with relative paths? |
This comment has been minimized.
This comment has been minimized.
XAMPPRocky
commented
Mar 21, 2018
|
@killercup I believe normalisation on happens on HFS+ on macOS, I don't believe APFS does normalisation on the mac. |
This comment has been minimized.
This comment has been minimized.
|
On Unicode normalization and HFS: BurntSushi/ripgrep#845 |
This comment has been minimized.
This comment has been minimized.
My thought was to just start with a "normalize_path" crate/function that people can use regardless of what path API they are using, kind of like This crate would handle
Then on top of this we could look into an "easy paths" api that is more like python's pathlib combined with easy_strings to help with the prototyper case (easy to reach for needed utilities, less concern for borrow checker at cost of memory or cpu time). This would call "normalize_path" on any untrusted input. I've been providing feedback on
What is your concern with relative paths? |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
Mar 21, 2018
This is not generally possible in POSIX because of symlinks: if That said, Rust kind of has this already in the form of |
This comment has been minimized.
This comment has been minimized.
|
I'm mixed on what I'd expect for that symlink scenario. Either way, I assume that with a But for pathlib, apparently, the symlink bug is always there. I'm surprised though, I thought handling of |
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
Mar 21, 2018
Pathlib should actually be OK: in my example, as it resolves
In POSIX, the kernel just takes each path segment and hands it to the filesystem to resolve, and the filesystem physically records a Plan9 behaves the way you expect, but then it doesn't have symlinks so this whole situation just isn't a problem there. |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 23, 2018
•
|
One of the issues that would need to be addressed in a better way is A method that both concatenates a path or completely replaces the existing one based on the input ... when did one ever have that use-case? I'd guess that pretty much every piece of code using The problem is not that the method exists, but that the individual operations that this method combines do not exist as their own methods. Considering this and the Windows-related issues, I think this makes a good case for having separate types for absolute and relative paths. |
This comment has been minimized.
This comment has been minimized.
|
I've tried to do a quick and dirty summary of this. Please point out where I need to expand it! I know on either users or reddit, I saw complaints about
Anyone know where these were so we can reach out to the people that have concerns? Or add your own thoughts on these topics? |
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
Mar 24, 2018
I'd expect that to be the majority use-case for dealing with user-specified paths.
The wrinkle in the above is unwrapping the result of |
This comment has been minimized.
This comment has been minimized.
i30817
commented
Mar 24, 2018
•
|
Join is useful to make absolute paths from relative paths, which in turn are useful for nearly all cfg files, for both sharing across OSes and moving cfg files. If you can guarantee some validations of the user made paths on a cfg (is relative, doesn't use any (not just the current) OS forbidden characters except '/' and closed quotes, len(cfgpath.parent().join(relativepath)) isn't larger than MAX_PATH_LENGTH in the current platform ) you can even assure it's somewhat OS portable. Though i suppose it's better than all those preconditions are specified in a higher level cfg abstraction, join could still be be useful (in spite of being easy to blow up by passing a absolute path to the suffix). It would have to deal with stuff like I think java solution here was to make 'path' iteration be at directory granularity? Or path is not iterable i don't remember. |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 24, 2018
•
I think it's highly dangerous that
do completely different things. I don't think most people will expect that "joining" one path to an existing path can destroy their existing path. I think a serious path implementation should clearly separate those operations, e. g.
"joining" an absolute path should not even compile. |
This comment has been minimized.
This comment has been minimized.
vitiral
commented
Mar 24, 2018
•
|
I am just seeing this now. First, I'd like to announce the release of 0.4.0 of path_abs which now uses "absolute" instead of "canonicalized" paths. This means that you can have a path with symlinks that may or may not exist, but I'm going to just do a checklist of things from this thread and open issues that aren't covered: File path handling
Edit: I missed some
|
This comment has been minimized.
This comment has been minimized.
vitiral
commented
Mar 24, 2018
|
Also, ergo_fs is related to this discussion. |
This comment has been minimized.
This comment has been minimized.
vitiral
commented
Mar 24, 2018
•
|
Edit: I moved my comment about weird |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 24, 2018
|
The full glory of handling Windows paths: https://googleprojectzero.blogspot.de/2016/02/the-definitive-guide-on-win32-to-nt.html TL;DR: Depending on how low-level you want the API to be, you have to handle 7 different path types. |
This comment has been minimized.
This comment has been minimized.
vitiral
commented
Mar 24, 2018
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 25, 2018
|
@i30817 I played around with things, and I just did:
Of course none of this acceptable when you generically try to support everything various filesystems throw at you. But if you are in control of the files created/written/read this gets you a lot of assurances and things you just don't have to worry about anymore. |
This comment has been minimized.
This comment has been minimized.
i30817
commented
Mar 25, 2018
|
So a warning then, i can live with that. |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 26, 2018
•
|
No, in my case I just return Regarding your previous comment, rewriting is pretty much a non-option. You absolutely don't want innocent looking code suddenly clash with differently named, existing files. Also, That's also the case why I'm ruling out paths whose segments end with dots and spaces: Pretty much nobody knows the fun things Windows will do with such paths! |
This comment has been minimized.
This comment has been minimized.
i30817
commented
Mar 26, 2018
•
|
Well, if you make it a Err dependent on the current OS i can live with that. I'd be opposed to making it so that unix paths on a cfg running on unix had to have restrictions from other OSes. And, i'm holding out a hope for sanity and that a new windows filesystem happens without all of the cruft or that windows can eventually run linux filesystems in a VFS (like squashfs files), so i'd like a way to turn that Err off honestly (since i doubt that there is a accessible way to tell which filesystem type the files are in, otherwise i'd prefer that). edit: speaking of that, anyone with windows tried to mount a filesystem file with their VFS and see if it has the same limitations in terms of forbidden characters? https://en.wikipedia.org/wiki/Installable_File_System There is also winfuse which is a FUSE port to windows (and probably dead). |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 26, 2018
•
|
No, a path is either fine on all systems or an error on all systems, that's the whole idea about it. Testing on one platform and having CI for the other ones should leave you with a very high degree of certainty that the code is correct. |
This comment has been minimized.
This comment has been minimized.
i30817
commented
Mar 26, 2018
•
|
This is actually terrible then because it's very likely that at least some files aren't controlled by the programmer but the user scanning a filesystem from the program. Forcing the user to rename their files for a artificiality like that that only serves to fix windows (in practice) is very very troublesome for the user. When the user tries to copy his own data to windows/FAT the filesystem will balk, and the CFG will panic or warn and that is more than enough to see what's wrong. You shouldn't upfront the cost of this thing that may never happen to users. I was more than for a mitigation strategy when i thought the FAT filesystem would simply copy the files but replace the forbidden characters by a default character - i thought it could be managed well enough without user intervention - but if in addition to forcing user manual renames you have to do it before you think about moving the data is too much. I beseech you not to throw out the baby (consistent separator, always relative paths to the cfg file) with the bathwater (forcing aleatory file renames to users). |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 26, 2018
•
|
Yep. Managing code where the developer has control over the path and file names is my use case for the library I'm toying with. Such a library would be exactly what's needed for reading/writing/modifying configuration in a cross-platform compatible manner. It's not meant to deal with all the insanities operating systems have invented. If that's your use case, use Rust's path. Getting things right and reliable has its cost. In some cases it makes sense to pay this cost to get the reliability, in some cases it doesn't.
The point is that these things do happen. Chrome had a security issue due to Windows' general path traversal craziness. If one of the largest IT corps with a highly skilled security department can't deal with paths, what's the chance some random developer can? |
This comment has been minimized.
This comment has been minimized.
vitiral
commented
Mar 26, 2018
•
|
@soc I think this conversation should be moved to a separate issue since I think it is a very small part of the other concerns here. |
This comment has been minimized.
This comment has been minimized.
soc
commented
Mar 26, 2018
|
@vitiral Thanks, agree on that. |
This comment has been minimized.
This comment has been minimized.
Screwtapello
commented
May 28, 2018
|
I've been thinking about path-handling this weekend, after I posted a question to Reddit and had a discussion with @vitiral about his path_abs crate. I've come up with a model that covers my needs and expectations, but I'm interested to hear whether it would suit other people too. MotivationFor tools that take a path on the command-line and just use it immediately (think For tools that take a path from a config file or a database, tools run as batch jobs, or tools that generate or manipulate paths, relative paths can make problems difficult to diagnose: You get a report saying "File not found: some/relative/path", you look in the place you thought the tool would look and the file is definitely present, so clearly the tool was looking somewhere else—but where? To avoid confusion, I want every tool I write to use complete paths, so that when I look at a log-line or error message I can tell exactly what it was looking at. DesignI want to use "monotonic paths" as my standard in-memory representation of a filesystem path. Some definitions: A path is a sequence of zero-or-more components (in the A path's head is:
A path's tail is zero-or-more A monotonic path is one whose tail contains only A monotonic absolute path is a monotonic path whose head contains (on POSIX) a A monotonic relative path is a monotonic path whose head contains no components. A monotonic relative path can be blindly appended to another monotonic path without breaking its monoticity. Example monotonic paths:
Example non-monotonic paths:
ImplementationMaking a path monotonic involves making the head monotonic, and making the tail monotonic. Making the head monotonic is easy, since you can just pass it to Making the tail monotonic involves removing the To remove
You could go even further and resolve as many symlinks as possible instead of just the ones preceeding In Rust, I imagine there would be AlternativesJust use whatever path you were given, as-isAs mentioned in the motivation, this can lead to confusion when a user comes up with a path in one context, then gives it to a program that uses it in a different context (a different time, a different host, a different working directory). Turning relative paths into fully-explicit paths makes it clearer what context the program is using. If you get a relative path, just join it onto the current working directoryRelative paths may contain any number of A bigger problem is that this approach doesn't work on Windows. If your current working directory is If you get a relative path, just use
|
This comment has been minimized.
This comment has been minimized.
soc
commented
May 28, 2018
|
I played with some of my own ideas in https://github.com/soc/paths-rs. The focus is different though: I'm largely interested in being able to have paths (e. g. in config files) that can be read/used/written/moved across operating systems with the guarantee that a path that has been constructed is valid across all operating systems. |
This comment has been minimized.
This comment has been minimized.
|
I discussed a but in the gitter about this, but a lot of times when I write a CLI with a config file, or a environmental variable on a unix/linux platform I run into the dreaded I know that the shell would traditionally handle these expansions, but people (myself included) do put these shorthands in config files, variables, or use them fairly often with CLI arguments. It would be really nice to have a wrapper crate that I can trust to handle all paths/expansions when dealing with a CLI program. I agree with above about the cross platfrom as well. |
This comment has been minimized.
This comment has been minimized.
soc
commented
Jun 2, 2018
•
|
Yes, that's exactly my goal. It's not implemented though, but the idea is to have special tokens like These tokens are intended to exist explicitly in the serialized format (as strings in a config file) as well as in the memory representation and are only resolved when converting |
AndyGauge
referenced this issue
Jul 2, 2018
Open
Add an example to read an environment variable #422
This comment has been minimized.
This comment has been minimized.
gdouezangrard
commented
Jul 12, 2018
•
|
Just mentioning the following, hope that could be helpful:
Thanks for your time. |
This comment has been minimized.
This comment has been minimized.
Could you be more specific of what lessons we should learn from go? I'd rather not assume incorrectly and miss the information you are trying to share. |
This comment has been minimized.
This comment has been minimized.
gdouezangrard
commented
Jul 13, 2018
•
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for that overview, that will be really helpful. I do feel like to certain audiences, POSIX behavior would be surprising, so there is a trade off of being familiar to POSIX people or not be surprising to non-POSIX people (like if you iterate on
Even Python recognized using strings is broken and now supports bytes as well :). Granted, we do need a good way of interacting with the native Path string type. |
This comment has been minimized.
This comment has been minimized.
What is the uyse case for appending an absolute path to an absolute path? I feel like the "replace" is meant to act like CWD handling. I could see having distinct and more semantically meaningful names like
I think this is an area where documentation is needed to clear this up but you can use https://doc.rust-lang.org/std/path/struct.PathBuf.html#impl-Extend%3CP%3E |
This comment has been minimized.
This comment has been minimized.
gdouezangrard
commented
Jul 24, 2018
•
In a program to manage OS trees, I have to copy That seems counter-intuitive with what I'm used to do in other languages, where I expect
|
This comment has been minimized.
This comment has been minimized.
First, not all languages do that. I know of at least Python and C++ have the same behavior as Rust. I suspect the difference is whether the API is meant to be a convenience over string manipulation or if it treats paths as first class objects.
I'm a little confused at how
Oh, right, that does make things more annoying. Definitely something to keep in mind when we get to the "simplified path API" (I plan to get to it after |
This comment has been minimized.
This comment has been minimized.
gdouezangrard
commented
Jul 25, 2018
•
I fail to see how replacing is a sensible default behavior but obviously that must be good if Python, C++ and Rust do that. From a security point of view, if I accept a user provided path
EDIT: Just looked at the static file serving example from Rocket. They do
I still don't understand then. Could you give me an example? |
This comment has been minimized.
This comment has been minimized.
gdouezangrard
commented
Jul 25, 2018
•
|
This is a summary of the things I would absolutely need when working with paths in Rust, so I don't have to roll-out my own validation and sanitization methods to prevent path traversal, I don't have to be too far from what POSIX defines and I can work with normalized paths without syscalls. Normalization with Pure Lexical ProcessingLike path::normalize("/////var/lib/../../etc/mozilla/"); // --> /etc/mozillaNon-destructive
|
This comment has been minimized.
This comment has been minimized.
|
There might be some useful notes for this in a soon-to-close RFC: rust-lang/rfcs#2188 |
This comment has been minimized.
This comment has been minimized.
soc
commented
Sep 12, 2018
|
@gdouezangrard Thanks for this useful overview, I think I'll follow that approach in my paths crate. |
killercup commentedFeb 20, 2018
•
edited by epage
(moderated summary by the WG)
..,..,//not being auto- handledpathlib-like API.format!a path for a command line argument or mutating a path can cause problems with the non-UTF8 nature ofOsStr@killercup's original post
In the first meeting, we talked a bit about the pain points of dealing with files and path in a cross-platform manner.
One idea is to create or improve crates that provide higher-level abstractions than
stdin this area.