Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: path reform #474

Merged
merged 6 commits into from
Dec 19, 2014
Merged

RFC: path reform #474

merged 6 commits into from
Dec 19, 2014

Conversation

aturon
Copy link
Member

@aturon aturon commented Nov 19, 2014

This RFC reforms the design of the std::path module in preparation for API
stabilization. The path API must deal with many competing demands, and the
current design handles many of them, but suffers from some significant problems
given in "Motivation" below. The RFC proposes a redesign modeled loosely on the
current API that addresses these problems while maintaining the advantages of
the current design.

Thanks to @kballard, who helped spark some of the initial ideas over Ike's sandwiches!

Rendered

@aturon
Copy link
Member Author

aturon commented Nov 19, 2014

impl<Sized? P> FromIterator<P> for PathBuf where P: AsPath { .. }
impl<Sized? P> Extend<P> for PathBuf where P: AsPath { .. }

impl<Sized? P> Path where P: AsPath {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/implementation detail: will putting the type parameter here cause inference problems with e.g. Path::new?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw Given #447, this will probably have to move inward, but at the moment this seemed the most concise way to present the API.

@nikomatsakis
Copy link
Contributor

I don't work much with paths, but I figure if the API itself gets a sign-off from @alexcrichton, @wycats, and @kballard, it's got to be fine. And we can always grow it over time if needed.

The general approach of having a borrowed type (Path) and a mutable, owning variant (PathBuf) seems like a great idea and I have a feeling this is going to become "a thing". One thing I was curious about was the decision to make Path unsized. Is this necessary for some reason? (One reason I can imagine might be BorrowFrom, if we were to change that to use associated types.)

In general, I imagine that many of the "view" types we will create will not be unsized, so it'd be good to know where unsizedness is helpful. Of course, one could imagine allowing types to "opt in" to being unsized, perhaps using a negative impl. (And of course the more unsized types we have, of course, the more Sized? annotations are required on traits and so forth.)

Anyway, this is not criticism of this RFC, which looks great, more just thinking aloud about the trend that it signals.

impl<Sized? P> Extend<P> for PathBuf where P: AsPath { .. }

impl<Sized? P> Path where P: AsPath {
pub fn new(path: &str) -> &Path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current API has a new_opt constructor that returns Option<Path>. It seems that such option has been removed in this new API, is it intentional? See rust-lang/rust#14048

@michaelsproul
Copy link

Some of the bare functions in std::io::fs look very C like. Would there would be any interest in extending the PathExtensions API to include the same functionality while providing more Rustic names? We could have .contents instead of readdir, .make_dir instead of mkdir, etc. The C-style names could still be accessible through libc.

pub fn new(path: &str) -> &Path;

pub fn as_str(&self) -> Option<&str>
pub fn as_str_lossy(&self) -> Cow<String, str>; // Cow will replace MaybeOwned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be to_str_lossy? Not sure which of the as_ or to_ naming conventions should “win” for Cow.

SimonSapin added a commit to SimonSapin/rust-wtf8 that referenced this pull request Nov 19, 2014

It is not clear how best to incorporate the
[WTF-8 implementation](https://github.com/SimonSapin/rust-wtf8) (or how much to
incorporate) into `libstd`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-wtf8 currently duplicates some standard library code, changing the char, str, and String types to their respective supersets CodePoint, Wtf8, and Wtf8Buf. To avoid the duplication, libstd could maybe have private functions that are generic over these types, so that optimization on monomorphized functions could still take advantage e.g. of the LLVM range asserts on char.

@Valloric
Copy link

It might be useful to look at pathlib from Python 3 as a source of inspiration; it also has to deal with posix/windows paths etc.


* **Where did `push_many` and friends go?** They're replaced by implementing
`FromIterator` and `Extend`, following a similar pattern with the `Vec`
type. (Some work will be needed to retain full efficiency when doing so.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that Vec::push_all has not been deprecated because of ergonomics and performance problems and that, at this point, using extend is much, much slower than using push_all.

test extend   ... bench:      2868 ns/iter (+/- 9)
test push_all ... bench:        76 ns/iter (+/- 0)

Have any concrete plans been made to fix this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test extend   ... bench:        43 ns/iter (+/- 4)
test push_all ... bench:        33 ns/iter (+/- 4)
extern crate test;

use test::Bencher as B;

#[bench]
fn push_all(b: &mut B) {
    let mut vec = vec![];
    let values = [1i, .. 10];

    b.iter(|| {
        vec.push_all(&values)
    })
}

#[bench]
fn extend(b: &mut B) {
    let mut vec = vec![];
    let values = [1i, .. 10];

    b.iter(|| {
        vec.extend(values.iter().map(|x| *x))
    })
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memcpy overhead will be significant at N=10.

extern crate test;

#[inline(never)]
fn prepare() -> (Vec<u8>, Vec<u8>) {
    (vec!(), Vec::from_elem(1024, 0))
}

#[bench]
pub fn extend(b: &mut test::Bencher) {
    let (mut dst, src) = prepare();
    b.iter(|| {
        dst.clear();
        dst.extend(src.iter().map(|v| *v));
        test::black_box(&dst);
    });
}

#[bench]
pub fn push_all(b: &mut test::Bencher) {
    let (mut dst, src) = prepare();
    b.iter(|| {
        dst.clear();
        dst.push_all(src.as_slice());
        test::black_box(&dst);
    });
}

@alexcrichton
Copy link
Member

Tracking issue

@l0kod
Copy link

l0kod commented Dec 24, 2014

A bit late, but the Path comparison should take care if the file system is case-sensitive.

This should prevent security bugs like the CVE-2014-9390.

@SimonSapin
Copy link
Contributor

A bit late, but the Path comparison should take care if the file system is case-sensitive.

If so, should it also account for Apple’s very own modified variant of Unicode Normalization Form D, on HSF+?

Regardless of HSF+ details, determining which filesystem a given path is on requires some system calls, which might be more expensive than a simple string comparison.

@l0kod
Copy link

l0kod commented Dec 24, 2014

Regardless of HSF+ details, determining which filesystem a given path is on requires some system calls, which might be more expensive than a simple string comparison.

Yes if it's automatic, but a Path optional flag could tell if the path should be considered case-sensitive or not.
This flag could be automatically set with an exists()-like method call (cf. PathExtensions trait), which should have a negligible impact.

@retep998
Copy link
Member

Regarding case sensitivity, on Windows using normal paths is case insensitive but if you use a \\?\ path there is no normalization and everything is case sensitive.
Perhaps we could have two forms of comparison, one that compares the strings, the other that compares the real paths and does system calls?

@erickt
Copy link

erickt commented Jan 3, 2015

For reference, the c++ standards committee just accepted a filesystem rfc.

@blaenk
Copy link
Contributor

blaenk commented Jan 3, 2015

Yeah, I think it would be very useful and informative to look into the combined efforts of the committee to see the motivations behind the choices they made. It could allow us to recognize something we may have overlooked in this paths reform RFC.

@aturon
Copy link
Member Author

aturon commented Jan 3, 2015

FWIW as I've been working on the implementation I've been moving steadily closer to what Boost did, which is presumably close to this. I will take a look ASAP.

@retep998
Copy link
Member

So I think what should happen is that whenever we call a system function with a path, we should check the length of the path. If it is safely within MAX_PATH then we go ahead and call the function. If the path is too large then we first check whether it is absolute or relative. If the path is relative we return an error. If the path is absolute then we check whether it is a \\?\ path. If it isn't then standardize the path by prepending \\?\ (if it is a UNC path that starts with \\ but not \\?\ then the path should be prefixed by \\?\UNC\), converting / to \, and accounting for .. and .. We can then pass this path to the function.

EDIT: We should also probably convert all absolute paths to \\?\ paths when turning a relative path into an absolute path or normalizing a path. Also when appending a path onto a \\?\ path, we should normalize the appended path.

aturon added a commit to aturon/rust that referenced this pull request Feb 3, 2015
As part of [RFC 474](rust-lang/rfcs#474), this
commit renames `std::path` to `std::old_path`, leaving the existing path
API in place to ease migration to the new one. Updating should be as
simple as adjusting imports, and the prelude still maps to the old path
APIs for now.

[breaking-change]
aturon added a commit to aturon/rust that referenced this pull request Feb 3, 2015
Implements [RFC 474](rust-lang/rfcs#474); see
that RFC for details/motivation for this change.

This initial commit does not include additional normalization or
platform-specific path extensions. These will be done in follow up
commits or PRs.
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 4, 2015
This PR implements [path reform](rust-lang/rfcs#474), and motivation and details for the change can be found there.

For convenience, the old path API is being kept as `old_path` for the time being. Updating after this PR is just a matter of changing imports to `old_path` (which is likely not needed, since the prelude entries still export the old path API).

This initial PR does not include additional normalization or platform-specific path extensions. These will be done in follow up commits or PRs.

[breaking-change]

Closes rust-lang#20034
Closes rust-lang#12056
Closes rust-lang#11594
Closes rust-lang#14028
Closes rust-lang#14049
Closes rust-lang#10035
@Centril Centril added the A-file Proposals relating to file systems. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-file Proposals relating to file systems.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet