Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename.dirname is not handling multiple / on Unix #4549

Closed
vicuna opened this Issue May 7, 2008 · 9 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

commented May 7, 2008

Original bug ID: 4549
Reporter: till
Status: closed (set by @xavierleroy on 2013-08-31T10:44:09Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.10.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5429
Monitored by: @rixed @ygrek @hcarty

Bug description

Under unix / and // in a path are semantically the same. The case where double / are present in a path is rare but leads to dirname returning eronous results:

Filename.dirname (Filename.dirname "a/b//c") -> "a/b" instead of "a"

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 4, 2008

Comment author: @damiendoligez

We currently treat an empty path component as designating the current directory ".". It's not clear to me which is the best way.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 3, 2009

Comment author: till

It seems that all the linux c functions remove them from the path. The whole unix semantic of dirname can be found here:

http://www.opengroup.org/onlinepubs/9699919799/utilities/dirname.html

It is also different on trailing "/" (dirname "b/a/" is "b"...). This difference is more likely to break code. I, for one, would like to see it be congruent with unix.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 3, 2009

Comment author: till

Here is a full implementation which, I believe, respects the posix semantic (I chose to project // to /, this is left unspecified in the posix guideline). It should behave roughly the same as the version currently in the standard_library but some corner cases are going to be different. This code is up for grabs and bot my employer and me would be happy to give the copyright to the inria if that can help.

let rec skip_end_slashes s n =
if n <= 0 || s.[n-1] <> '/' then
n
else
skip_end_slashes s (n-1)

(*
http://www.opengroup.org/onlinepubs/9699919799/utilities/basename.html
*)
let basename s =
if s = "" then
"."
else
let end_pos = skip_end_slashes s (String.length s) in
if end_pos = 0 then
"/"
else
let start_pos =
let rec loop n =
if n <= 0 || s.[n-1] = '/' then
n
else
loop (n-1)
in
loop end_pos
in
String.sub ~pos:start_pos ~len:(end_pos-start_pos) s

(*
http://www.opengroup.org/onlinepubs/9699919799/utilities/dirname.html
*)
let dirname s =
if s = "" then
"."
else
let end_pos = skip_end_slashes s (String.length s) in
if end_pos = 0 then
"/"
else
let rec loop n =
if n = 0 then
"."
else if s.[n-1] = '/' then
let end_pos = skip_end_slashes s (n-1) in
if end_pos = 0 then
"/"
else
String.sub ~pos:0 ~len:end_pos s
else
loop (n-1)
in
loop end_pos

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2011

Comment author: @rixed

So, anything new wrt. this minor annoyance ?
If you come up with a patch I'd gladly test it (on Unix only, though).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 18, 2011

Comment author: @jberdine

I guess that it would be more portable to use Filename.dir_sep instead of '/'. Is this guaranteed to be a string of length 1, or would using something like Filename.check_suffix and Filename.chop_suffic instead of skip_end_slashes be needed?

Also note that the filename_concat function in ocamlbuild/my_std.ml involves some (non-portable) special case code for a related issue with Filename.concat. My limited understanding is that analogously changing Filename.concat would allow removing ocamlbuild's filename_concat.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 19, 2011

Comment author: till

unix (i.e. POSIX) has very precisely defined semantics (e.g. two dir_sep in a row in a path are equivalent to just one unless they are at the beginning of the path); I don't know windows well enough to be sure that changing the dirsep is enough.
I'd be happy to review any Unix specific code; I do not feel competent for the Windows part.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 19, 2011

Comment author: @jberdine

The Windows documentation unfortunately does not have the same level of detail:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247.aspx

Appealing to Cygwin, cygpath strips repeated slashes when consuming Windows paths, except at the beginning of the path where the meaning is significant.

Also, the Windows system calls have always interpreted both '' and '/' as a path separator. In the command shell '/' is used for command line options and therefore not reliable as a path separator, but if paths are quoted, then '/' can be used as a separator. Quoting is admittedly a nightmare. I am not familiar enough with the ocaml code base to know if using pathnames only as arguments to system calls, and removing the special casing for dir_sep, is feasible.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2011

Comment author: @damiendoligez

dir_sep is used by concat, and concat is used by users of Filename to build paths that can be passed to shell commands, so I don't think you can get rid of dir_sep.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 6, 2012

Comment author: @damiendoligez

I have committed a POSIX-compliant version of filename and dirname in trunk (rev 11999).
Still needs to be tested on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.