Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path::join should concat paths even if the second path is absolute #16507

Closed
carllerche opened this issue Aug 14, 2014 · 24 comments
Closed

Path::join should concat paths even if the second path is absolute #16507

carllerche opened this issue Aug 14, 2014 · 24 comments

Comments

@carllerche
Copy link
Member

Given:

let a = Path::new("/foo");
let b = Path::new("/bar");

I would expect that a.join(&b) would return /foo/bar, however it returns /bar. Given my experience w/ path joining in Ruby and Go, I would expect that join concats two paths and does some normalization to remove double slashes, etc...

There is also a need for a fn that expands a path relative to a given location, which is what Path::join does today. In Ruby, it is named expand_path.

@aturon
Copy link
Member

aturon commented Aug 14, 2014

cc @kballard

Some other libraries (Haskell, SML basis) only provide something akin to our current join. Not sure about Boost.

I don't see a great Windows story here, either: Path::new("c:\\temp").join(Path::new("c:\\windows")). We'd need to figure out a coherent specification.

Honestly, it seems a bit weird to me to take something that's syntactically an absolute path and want to treat it as a relative path. In my experience using path, the current join function is what I want. It'd be helpful to know a bit more why this case is coming up for you.

@aturon aturon added the A-libs label Aug 14, 2014
@lilyball
Copy link
Contributor

I agree with @aturon. The only sensible operation when joining an absolute path onto some other path is to get the absolute path back. Doing anything else is just weird, and only makes sense if you actually think of paths as strings, where "join" is "append, then normalize". I do not understand why Go's path.Join behaves in this way, although they are actually taking strings as arguments.

FWIW Ruby actually matches our join. At least, if you use Pathname it does: Pathname("a").join(Pathname("/b")) == Pathname("/b"). File.join() matches Go, but it also operates on strings, and AFAIK Pathname is considered the better way to manipulate paths. Note that File.join() is actually documented as "Returns a new string formed by joining the strings using File::SEPARATOR", and so is not actually even attempting to represent paths, merely do a trivial bit of string manipulation.

@retep998
Copy link
Member

retep998 commented Sep 7, 2014

In C++ using std::path when I join the two paths C:\foo and C:\bar I get C:\foo\C:\bar.

@lilyball
Copy link
Contributor

@retep998 C++ doesn't have a std::path.

@retep998
Copy link
Member

@kballard I was using the <filesystem> library which is a TS going through the final committee vote and MSVC has already implemented it.

@lilyball
Copy link
Contributor

@retep998 Sounds like a really crappy path definition then. I guess that's what you get with design-by-committee.

According to MSDN, basic_path::operator/ is defined as concatenating the strings that the paths wrap. That's a really braindead approach. If it's just going to do string operations, why would I ever use std::path instead of just using std::strings directly?

@retep998
Copy link
Member

Because std::path is the only way to work with unicode paths in a cross-platform manner (using std::string is locale-specific), all the filesystem APIs work with std::path instead of std::string, and it has methods to extract various components of the path.
path::operator/ does a bit more than just concatenation according to cppreference.

@retep998
Copy link
Member

The C++ committee has recently been dealing with this problem.
http://isocpp.org/files/papers/n4211.html issue 64

@Boddlnagg
Copy link
Contributor

C# (or rather the .NET standard library) also matches the current behavior:
"If path2 contains an absolute path, this method returns path2." (MSDN)

@l0kod
Copy link
Contributor

l0kod commented Feb 10, 2015

The join() function is often misused and leads to security vulnerabilities.

Moreover, there is now a documentation indirection pointing to PathBuf::push() :(

The Path reform should get ride of this dangerous behavior. The join() could return something like Result<PathBuf, PathError> to exclude an absolute path as argument.
There is also a name inconsistency between the Path::join() and the PathBuf::push().

cc #20034

@aturon
Copy link
Member

aturon commented Feb 10, 2015

The join() function is often misused and leads to security vulnerabilities.

Note that this talks about not just absolute paths, but also relative paths that begin with .. for example. I don't think that simply ruling out joining absolute paths (which certainly has its uses) is enough to solve this problem, and I'm not sure that std should be legislating policy at this level.

Moreover, there is now a documentation indirection pointing to PathBuf::push()

@steveklabnik What's our standing policy here? Do we prefer to have multiple copies of very similar docs?

There is also a name inconsistency between the Path::join() and the PathBuf::push().

This bothered me with the old path API as well, but I wasn't able to come up with a better solution. Do you have any suggestions?

cc @alexcrichton

@l0kod
Copy link
Contributor

l0kod commented Feb 11, 2015

The PR #5023 added an unsafe_join() and an is_restricted() functions who seem to have been lost…

I like splice() to express that the path argument is concatenated and do not replace self.
I think that Path and PathBuf should have the same function name as they have the same meaning.

Of course, this apply to join_many().

Name ideas:

  • splice()
  • push()
  • concat()
  • chain()
  • append()
  • nest()

@retep998
Copy link
Member

Just here to confirm when using a standard Windows function to join paths, absolute paths will replace the existing path wholesale. Drive relative paths such as C:foo count as absolute paths for this purpose.

@alexcrichton
Copy link
Member

There is also a name inconsistency between the Path::join() and the PathBuf::push().

If we're feeling really ambitious we could overload / for both PathBuf and Path (consuming PathBuf in the former case) to produce a PathBuf in both cases.

@aturon
Copy link
Member

aturon commented Feb 16, 2015

I'm going to close this particular issue as WONTFIX: we plan to stay with the current join semantics. (If someone strongly disagrees, at this point I think an RFC would probably be warranted, as the current behavior is from an approved RFC).

As far as naming, we could potentially continue bikeshedding, but not on this issue. I will note that we should not use the same name for Path and PathBuf both because that would be strongly against convention and because, partly due to the deref from PathBuf to Path, it would likely be very confusing.

I would love to have a general convention for pairs of functions where one mutates and the other creates a new copy, but I don't have a good candidate. In any case, let's address that elsewhere.

@aturon aturon closed this as completed Feb 16, 2015
@l0kod
Copy link
Contributor

l0kod commented Mar 25, 2015

This bothered me with the old path API as well, but I wasn't able to come up with a better solution. Do you have any suggestions?

@aturon, what do you think about an unambiguous new Path::append<P: AsPath>(&self, path: P) -> PathBuf to only append a path?

Looking at the current Rust API, push() functions only add/push an element/filename (but do not erase nor merge) into self.
On the other hand, insert() can add or replace an element (e.g. Vec or BTreeSet).

The PathBuf::push<P: AsPath>(&mut self, path: P) would look better if renamed to PathBuf::insert(&mut self, path: PathMode) with something like enum PathMode<P: AsPath> { Absolute(P), Relative(P) } to have explicit behavior/cast (inspired by the collections::Bound).

The Path::join() is already know in other languages so it may be a good choice to keep it like this or it could maybe be renamed to Path::merge().

@tarcieri
Copy link
Contributor

tarcieri commented Apr 19, 2018

Seems I'm several years late to the party here, but I wanted to chime in and say I found this particular behavior rather unexpected, and worry it could potentially lead to security issues if attackers manage to clobber higher level paths by passing an absolute path to join, i.e. directory traversal.

Granted that any code doing join on attacker-controlled paths should probably be doing quite a few checks to ensure they're getting the desired results, but it's certainly not clear from the outset that joining to an absolute path can have this outcome.

You can imagine someone naively checking a path does not contain .., doing a join, and expecting that the result will always be within the provided parent directory.

@theronic
Copy link

Chiming in to say this caught me out and cost me about an hour of struggle, and will cause a whole generation of insecure roll-your-own webservers who naively take in "/etc/passwd" as a URI, to be joined onto "./resources".

Joining paths should descend deeper unless otherwise specified, and absolutely not climb out of the current directory.

@lilyball
Copy link
Contributor

What you describe would also completely break the ability to join e.g. ../foo onto a path. What's more, the behavior you describe is only appropriate for some use cases, and other use cases absolutely want to have "/foo/bar".join("/baz") == "/baz".

If you need security guarantees when taking in user-supplied paths, you need to enforce those guarantees yourself, because different people have different expectations. For example, even if you do want the "never climb out of a given root directory", you may still want to support ../ path components within that.

@tarcieri
Copy link
Contributor

What you describe would also completely break the ability to join e.g. ../foo onto a path.

There was a fun article about this I came across today by the Fuchsia developers: Dot Dot Considered Harmful

If you need security guarantees when taking in user-supplied paths, you need to enforce those guarantees yourself

As it were, I have a crate that attempts to address some of these issues: canonical-path.

The semantics aren't quite where I'd like them to be yet (despite a 1.0 version number), but if people are interested in working with paths in security critical contexts, I plan on continuing to address the shortcomings, and would like to put out a breaking 2.0 version with more restrictive behavior.

@retep998
Copy link
Member

retep998 commented Nov 28, 2018

Security minded path enforcement is actually really complicated due to crazy things on Windows. There have been so many security holes in browsers due to various techniques to escape out of a path. Basically, Rust should not try to make security guarantees regarding path concatenation because it is an extremely difficult task and impossible without hitting the filesystem. Giving people a false sense of security when concatenating paths would only make security worse!

@lilyball
Copy link
Contributor

I don't think canonicalizing paths ahead of time is actually the solution to security issues. Canonicalization involves resolving symlinks as well.

I think the correct solution for the general case of "don't let a user-supplied path escape the sandbox" is to standardize the user-supplied path (where "standardize" means remove all ./ components and resolve all ../ components, such that any ../ components left in a standardized path are guaranteed to be at the front of the path), then check if the result is an absolute path and if it is convert it to a relative one (for POSIX paths this would be make it relative to /, for Windows paths I guess just make it relative to whatever filesystem root it references, e.g. C:\foo\bar would become just foo\bar), strip any leading ../ components, and then join it onto your sandbox. This should produce the desired behavior (where the sandbox is treated as the filesystem root).

Of course, this approach does ignore symlinks, so e.g. if foo is a symlink then it resolves foo/../bar to bar instead of looking through the symlink first. If you don't like this behavior and want to handle symlinks, then you'll have to decide how you want your sandboxing to behave if the symlink points outside of the sandbox. Even here I'm not sure if the canonicalization approach is a good one, because if your sandbox root includes any symlinks, then the result of canonicalization will appear to point outside the sandbox.

As the hacker news comments on that very article point out, there's a reason this sort of canonicalization is normally done in the kernel, specifically so the symlinks can't change in between checking the path and using the path. The same comments also mention the O_BENEATH flag to openat() which is meant to handle this sort of "don't let the user-supplied path escape my sandbox", but that flag doesn't appear to be part of POSIX.

@tarcieri
Copy link
Contributor

tarcieri commented Nov 28, 2018

@kballard indeed the join logic the canonical-path crate is less-than-ideal (which is part of what I was alluding to earlier):

https://github.com/iqlusioninc/crates/blob/master/canonical-path/src/lib.rs#L129

This does not enforce the property that any joined paths are children of the current path, but merely that the resulting path is also canonical. I think it would be worthwhile to enforce the former property.

The algorithm I'd use for that is slightly different from what you were originally suggesting, and works more like what you were alluding to in the second half of your post (the first two steps are what it does today):

  • Begin with a CanonicalPath (which is absolute, free of all relative path components, and contains no symlinks at the time it was resolved)
  • Join the provided path fragment, canonicalizing the result as another CanonicalPath
  • [Missing in current impl] Ensure the resulting CanonicalPath is parented in the original parent path

Additionally, I think it'd probably make sense to ban all relative path components from the argument (i.e. the relative path to be joined).

Indeed the current implementation contains a TOCTTOU-style race condition where directories/symlinks can be renamed/relocated after resolution. At some expense each borrow of the underlying &Path could verify it is still canonical before allowing it to be borrowed, but that only reduces the duration of the TOCTTOU and can't completely eliminate it.

@Kixunil
Copy link
Contributor

Kixunil commented Feb 4, 2020

Some more years later (:D), I randomly found this and I think the reason join() is confusing is because it doesn't actually say what's the intent of the code. Let's look at some reasons why would you want to join paths in the first place:

  • You have a directory and want to generate a name of file in it. The name should not contain any slashes.
  • You have a path supplied to your application and you want to support both absolute and relative paths. Relative paths are relative to a specific directory as opposed to working directory.
  • You have paths coming from possibly untrusted source and you want to create a directory structure safely - basically acting as chroot. In this case absolute paths must be treated as relative to chroot, but any .. attempting to go out of that chroot must point at chroot. Apart from safely extracting archives, something like this is sometimes used in Makefiles (install: cp foo $(DESTDIR)$(PREFIX)/bin/foo)

Current join seems to perform relative paths resolution. A more safe and clear approach would be having multiple methods, one for each case.

// Return result, alternatively it could panic:
fn file_in_dir<I>(&self, components: I) -> Result<PathBuf, ComponentContainsSlashError> 
where I: Copy + IntoIterator, <I as IntoIterator>::Item: AsRef<OsStr>;

// Alternatively, using newtype SlashlessOsStr that can only be created from str and guarantees to not contain slashes
fn file_in_dir<I>(&self, components: I) -> PathBuf 
where I: Copy + IntoIterator, <I as IntoIterator>::Item: AsRef<SlashlessOsStr>;

// join renamed
fn as_base_to(&self, maybe_relative_trusted_path: &Path) -> PathBuf;

// Sloppy chroot assumes there are no symlinks, so /foo/../bar will become /bar
fn as_sloppy_chroot(&self, containing: &Path) -> PathBuf;

// Does exactly what chroot would: resolves symlinks in a way that makes sure they don't point outside
// I have no clue what it should do on windows if you use "c:\" in containing, possibly return error with kind InvalidInput
fn as_chroot(&self, containing: &Path) -> io::Result<PathBuf>;

Open to bikeshedding obviously. It'd be nice to gather other use cases if there are such, make a set of reasonably-named methods with reasonable interfaces, then deprecate join and push in favor of them in order to improve security and robustness.

I may write a crate for this at some point.

matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this issue Feb 11, 2024
…=lnicola

Remove `ffi_returns_twice` references

This feature has just been [removed](rust-lang#120502)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants