New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonicalizePath behaves strangely with paths that do not exist #23

Closed
ekmett opened this Issue Feb 22, 2015 · 12 comments

Comments

Projects
None yet
4 participants
@ekmett
Member

ekmett commented Feb 22, 2015

@ekmett

This comment has been minimized.

Show comment
Hide comment
Member

ekmett commented Feb 22, 2015

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 24, 2015

Member

Seeing as canonicalizePath is primarily for dereferencing symbolic links, I'm not sure why there is a need for it to work on non-existent paths at all. I think it should simply result in an error if the path doesn't exist, just like how realpath works on Linux. Plus, there are many reasons it can fail, not simply because it doesn't exist, but also due to permission problems, network issues, symbolic-link loops, etc. There's really no guarantee.

The alternative is to have it fallback to pure path normalization whenever it fails. I'm not a fan of this, because it means errors are silently swallowed.

However, I understand that:

  • Someone might have a use case that I couldn't think of ­– in particular, I'd be interested to hear how others use this function.
  • For backward-compatibility it might be better to stick with the lenient behavior (of course, it can still break existing code …).

So perhaps we could add another function (e.g. realPath) that has the strict behavior while canonicalizePath becomes the lenient option.

Hence, the choice is between:

  • Make canonicalizePath strict and always raise errors if the path doesn't exist.
  • Make canonicalizePath lenient and add a stricter version realPath that raises errors if the path doesn't exist or is inaccessible.

Comments?

Member

Rufflewind commented Feb 24, 2015

Seeing as canonicalizePath is primarily for dereferencing symbolic links, I'm not sure why there is a need for it to work on non-existent paths at all. I think it should simply result in an error if the path doesn't exist, just like how realpath works on Linux. Plus, there are many reasons it can fail, not simply because it doesn't exist, but also due to permission problems, network issues, symbolic-link loops, etc. There's really no guarantee.

The alternative is to have it fallback to pure path normalization whenever it fails. I'm not a fan of this, because it means errors are silently swallowed.

However, I understand that:

  • Someone might have a use case that I couldn't think of ­– in particular, I'd be interested to hear how others use this function.
  • For backward-compatibility it might be better to stick with the lenient behavior (of course, it can still break existing code …).

So perhaps we could add another function (e.g. realPath) that has the strict behavior while canonicalizePath becomes the lenient option.

Hence, the choice is between:

  • Make canonicalizePath strict and always raise errors if the path doesn't exist.
  • Make canonicalizePath lenient and add a stricter version realPath that raises errors if the path doesn't exist or is inaccessible.

Comments?

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Feb 24, 2015

Contributor

I very regularly want to have absolute paths to simplify other parts of the code, such as when a subprocess will be run from a different directory. I will generally use canonicalizePath in that case, even in the absence of a symlink. It would be convenient for that use case if the function worked on non-existent paths.

I think I prefer the idea of keeping the current function with the current behavior, and adding a new function with the new behavior. That would prevent accidentally writing code that compiles with both an old and new version of directory, but has buggy runtime behavior on an old version.

Contributor

snoyberg commented Feb 24, 2015

I very regularly want to have absolute paths to simplify other parts of the code, such as when a subprocess will be run from a different directory. I will generally use canonicalizePath in that case, even in the absence of a symlink. It would be convenient for that use case if the function worked on non-existent paths.

I think I prefer the idea of keeping the current function with the current behavior, and adding a new function with the new behavior. That would prevent accidentally writing code that compiles with both an old and new version of directory, but has buggy runtime behavior on an old version.

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 24, 2015

Member

I very regularly want to have absolute paths to simplify other parts of the code, such as when a subprocess will be run from a different directory.

Ah, that's a good point. But do you care about whether it dereferences symbolic links?

Member

Rufflewind commented Feb 24, 2015

I very regularly want to have absolute paths to simplify other parts of the code, such as when a subprocess will be run from a different directory.

Ah, that's a good point. But do you care about whether it dereferences symbolic links?

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Feb 24, 2015

Contributor

It's actually irrelevant to me in that use case; I just want an absolute path. I could generally just append the relative path to the current working directory, but canonicalizePath feels more correct, so I reach for it first.

Contributor

snoyberg commented Feb 24, 2015

It's actually irrelevant to me in that use case; I just want an absolute path. I could generally just append the relative path to the current working directory, but canonicalizePath feels more correct, so I reach for it first.

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 24, 2015

Member

That's kind of my argument: I think canonicalizePath should be reserved strictly for dereferencing symbolic links, not to absolutize a path. The latter operation can be made to work all the time and completely predictably.

Perhaps we can add another convenience function to directory:

makeAbsolute :: FilePath -> IO FilePath
makeAbsolute path = fmap (</> path) getCurrentDirectory

and update the documentation in System.FilePath.makeRelative to reflect this.

Member

Rufflewind commented Feb 24, 2015

That's kind of my argument: I think canonicalizePath should be reserved strictly for dereferencing symbolic links, not to absolutize a path. The latter operation can be made to work all the time and completely predictably.

Perhaps we can add another convenience function to directory:

makeAbsolute :: FilePath -> IO FilePath
makeAbsolute path = fmap (</> path) getCurrentDirectory

and update the documentation in System.FilePath.makeRelative to reflect this.

@argiopetech

This comment has been minimized.

Show comment
Hide comment
@argiopetech

argiopetech Feb 24, 2015

Member

One small quibble. realpath (which is seemingly implemented inconsistently across a variety of platforms per T4215) is perfectly fine with some paths which don't exist. For example:

$ ls /tmp/bar
ls: cannot access /tmp/bar: No such file or directory
$ realpath /tmp/bar
/tmp/bar
$ realpath /tmp/bar/baz
realpath: ‘/tmp/bar/baz’: No such file or directory
Member

argiopetech commented Feb 24, 2015

One small quibble. realpath (which is seemingly implemented inconsistently across a variety of platforms per T4215) is perfectly fine with some paths which don't exist. For example:

$ ls /tmp/bar
ls: cannot access /tmp/bar: No such file or directory
$ realpath /tmp/bar
/tmp/bar
$ realpath /tmp/bar/baz
realpath: ‘/tmp/bar/baz’: No such file or directory
@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 24, 2015

Member

@argiopetech realpath (shell utility) and realpath (POSIX function), despite the names, can do very different things. The former doesn't even exist on some systems.

Member

Rufflewind commented Feb 24, 2015

@argiopetech realpath (shell utility) and realpath (POSIX function), despite the names, can do very different things. The former doesn't even exist on some systems.

@argiopetech

This comment has been minimized.

Show comment
Hide comment
@argiopetech

argiopetech Feb 24, 2015

Member

I think it should simply result in an error if the path doesn't exist, just like how realpath works on Linux

I inferred the command line tool from this statement. Apologies for the misunderstanding.

I'm in favor of not silently swallowing errors. I do like the idea of a makeAbsolute function which would replace canonicalizePath for absolutizing paths, leaving canonicalizePath to potentially fail due to permission/existence/etc. errors (thereby maintaining backward compatibility).

Potentially this would allow us to close T5014 as WillNotFix with a reference to the new function and leave us to tackle T4215.

Fortunately, T4215 may have been fixed for POSIX systems when T4113 was fixed in 96d3478. It now throws errors as in realpath() on Linux and FreeBSD. It will need to be checked on an OS X machine to be certain (I don't have one). Windows effectively returns an escaped version of whatever you pass it, regardless of existence, so we'll need to decide whether to leave that as is to avoid breaking backward compatibility or to add some checks to maintain cross-platform compatibility.

Member

argiopetech commented Feb 24, 2015

I think it should simply result in an error if the path doesn't exist, just like how realpath works on Linux

I inferred the command line tool from this statement. Apologies for the misunderstanding.

I'm in favor of not silently swallowing errors. I do like the idea of a makeAbsolute function which would replace canonicalizePath for absolutizing paths, leaving canonicalizePath to potentially fail due to permission/existence/etc. errors (thereby maintaining backward compatibility).

Potentially this would allow us to close T5014 as WillNotFix with a reference to the new function and leave us to tackle T4215.

Fortunately, T4215 may have been fixed for POSIX systems when T4113 was fixed in 96d3478. It now throws errors as in realpath() on Linux and FreeBSD. It will need to be checked on an OS X machine to be certain (I don't have one). Windows effectively returns an escaped version of whatever you pass it, regardless of existence, so we'll need to decide whether to leave that as is to avoid breaking backward compatibility or to add some checks to maintain cross-platform compatibility.

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 24, 2015

Member

The latter operation can be made to work all the time and completely predictably.

I retract that somewhat. Turns out it's possible for getCurrentDirectory to fail if the working directory is inaccessible. This is because getCurrentDirectory always returns the physical path, which requires dereferencing symlinks. (But this should be a rare and unusual occurrence.)

Member

Rufflewind commented Feb 24, 2015

The latter operation can be made to work all the time and completely predictably.

I retract that somewhat. Turns out it's possible for getCurrentDirectory to fail if the working directory is inaccessible. This is because getCurrentDirectory always returns the physical path, which requires dereferencing symlinks. (But this should be a rare and unusual occurrence.)

@Rufflewind Rufflewind added this to the 1.2.2.1 milestone Mar 3, 2015

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Mar 6, 2015

Member

Edit: actually, disregard this post. Neil pointed out that this doesn't really work for symlinks.


Besides makeAbsolute, I think another really useful function would be something like canonicalizePath but without symbolic link dereferencing. Call it, say, simplifyPath:

simplifyPath "./foo" == "foo"
simplifyPath "/foo/../bar" == "/bar"

However, such a function would be entirely pure, so it's not really within the scope of directory. It should belong in filepath instead.

Member

Rufflewind commented Mar 6, 2015

Edit: actually, disregard this post. Neil pointed out that this doesn't really work for symlinks.


Besides makeAbsolute, I think another really useful function would be something like canonicalizePath but without symbolic link dereferencing. Call it, say, simplifyPath:

simplifyPath "./foo" == "foo"
simplifyPath "/foo/../bar" == "/bar"

However, such a function would be entirely pure, so it's not really within the scope of directory. It should belong in filepath instead.

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Mar 8, 2015

Member
  • Leave canonicalizePath as is and deprecate it. Rationale: backward compatibility; inconsistent behavior (aside from the aforementioned problems, it also fails to resolve symbolic links on Windows). [Edit: changed to return reasonable results even for nonexistent paths.]
  • Add realPath, which correctly resolves symbolic links and always requires the target to exist. [Edit: skipped as this seems redundant; just use canonicalizePath while verifying existence]
  • Add makeAbsolute, which prepends the current working directory to a relative path (does nothing for absolute paths).
  • Update the docs in makeRelative to refer to makeAbsolute instead of canonicalizePath.
Member

Rufflewind commented Mar 8, 2015

  • Leave canonicalizePath as is and deprecate it. Rationale: backward compatibility; inconsistent behavior (aside from the aforementioned problems, it also fails to resolve symbolic links on Windows). [Edit: changed to return reasonable results even for nonexistent paths.]
  • Add realPath, which correctly resolves symbolic links and always requires the target to exist. [Edit: skipped as this seems redundant; just use canonicalizePath while verifying existence]
  • Add makeAbsolute, which prepends the current working directory to a relative path (does nothing for absolute paths).
  • Update the docs in makeRelative to refer to makeAbsolute instead of canonicalizePath.

@Rufflewind Rufflewind modified the milestones: 1.2.2.2, 1.2.2.1 Apr 11, 2015

@Rufflewind Rufflewind modified the milestones: 1.2.3.0, 1.2.2.2 Apr 24, 2015

Rufflewind added a commit to Rufflewind/directory that referenced this issue May 31, 2015

Implement safer version of canonicalizePath for POSIX
Apply realpath to the largest prefix of the given path that is still
accessible (that can be stat successfully) so that canonicalizePath will
return reasonable results even for non-existent paths.  It should very
rarely throw exceptions now.

Fixes haskell#23.

@Rufflewind Rufflewind closed this in f7b8c8a Jun 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment