Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.split does not handle . & .. properly #38593

Closed
csiemens mannequin opened this issue Jun 5, 2003 · 6 comments
Closed

os.path.split does not handle . & .. properly #38593

csiemens mannequin opened this issue Jun 5, 2003 · 6 comments
Labels
stdlib Python modules in the Lib dir

Comments

@csiemens
Copy link
Mannequin

csiemens mannequin commented Jun 5, 2003

BPO 749261

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2003-06-13.19:44:54.000>
created_at = <Date 2003-06-05.01:03:01.000>
labels = ['invalid', 'library']
title = 'os.path.split does not handle . & .. properly'
updated_at = <Date 2003-06-13.19:44:54.000>
user = 'https://bugs.python.org/csiemens'

bugs.python.org fields:

activity = <Date 2003-06-13.19:44:54.000>
actor = 'jepler'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2003-06-05.01:03:01.000>
creator = 'csiemens'
dependencies = []
files = []
hgrepos = []
issue_num = 749261
keywords = []
message_count = 6.0
messages = ['16253', '16254', '16255', '16256', '16257', '16258']
nosy_count = 2.0
nosy_names = ['jepler', 'csiemens']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue749261'
versions = []

@csiemens
Copy link
Mannequin Author

csiemens mannequin commented Jun 5, 2003

The os.path.split() & posixpath.split() functions in my
opinion do not handle '.' & '..' at the end of a path
properly which causes os.path.dirname() &
os.path.basename() to also return the wrong result
because they are directly based on os.path.split().

I'll demonstrate the Unix Python case (the Windows
ntpath.py case is just a close parallel variation).

Example:
>python
Python 2.1.1
>>> posixpath.split('.')
('', '.')
>>> posixpath.split('..')
('', '..')

Yet:
>>> posixpath.split('./')
('..', '')
>>> posixpath.split('../')
('..', '')

Now '.' really represents './', and '..' really
represents '../'
Since the split() function simply uses a string split
on '/' to
find directories, it goofs up on this one case. The
'.' and
'..' are like the slash character in the sense that
they all
only refer to directories.
The '.' & '..' can never be files in Unix or Windows, so I
think that the split() function should treat paths like:
.
..
dir/.
dir/..
/dir1/dir2/.
/dir1/dir2/..
as not having a file portion, just as if:
./
../
dir/./
dir/../
/dir1/dir2/./
/dir1/dir2/../
respectively were given instead.

The fix in posixpath.py for this is just to put a
little path
processing code at the beginning of the split() function
that looks for the follow cases:
if p in ['.','..'] or p[-2:] == '/.' or p[-3:] ==
'/..':
p = p+'/'
And then go into all the regular split() code.
In fix in ntpath.py is very similar.

@csiemens csiemens mannequin closed this as completed Jun 5, 2003
@csiemens csiemens mannequin added invalid stdlib Python modules in the Lib dir labels Jun 5, 2003
@jepler
Copy link
Mannequin

jepler mannequin commented Jun 8, 2003

Logged In: YES
user_id=2772

I don't believe this behavior is a bug. os.path.split's task is to split the last component of a path from the other components, regardless of whether any of the components actually names a directory.

Another property of os.path.split is that eventually this loop will terminate:
while path != "": path = os.path.split(path)[0]
with your proposed change, this would not be true for paths that initially contain a "." or ".." component (since os.path.split("..") -> ('..', ''))

@csiemens
Copy link
Mannequin Author

csiemens mannequin commented Jun 12, 2003

Logged In: YES
user_id=794244

Ok, I see your points, but I have 2 points.

Point 1:
Your loop 'while path != "": path = os.path.split(path)[0]'
won't stop with an absolute path because it will get down
to '/' and go into infinite spin.
OK, so you can modify it to be:
while path != "" and path != '/':path =os.path.split(path)[0]
But this too will spin if start with an absolute path that has
more than 2 slashes - like '//dir1/dir2' or '///dir1/dir2'
at the
front of the path.
OK, you can fix that up to by doing something like:
old_path = ''
while path != old_path:
old_path = path
path = os.path.split(path)[0]
But that final loop will work with my new os.path.split
proposal - which makes me wonder if your assertion that
split should have the 'terminate loop' property.

Point 2:
You may be right about os.path.split's slated task/job.
So maybe the change shouldn't be done to os.path.split(),
but rather os.path.dirname() & os.path.basename() should
be changed to not just simply return the 1st and 2nd
components of split(), but rather try to be as "smart" as
possible and dirname's intention is to return the directory
portion, and basename's intention is to return the (end)
filename portion - if possible. With paths like /abc/xyz
you have no idea if xyz is a file or dir, so the default should
be 'file'. Currently /abc/xyz/ knows that xyz is a dir and
returns /abc/xyz for the dirname and '' for the basename.
My point is that currently basename/dirname are "smart"
and not just returning the last component that is a file or
is a directory, otherwise it would return /abc for the dirname
and xyz/ for the basename.
So given the current behavior of dirname/basename, they
should be smart in ALL "we can tell its a directory" cases
such as:
.
..
dir/.
dir/..
/dir1/dir2/.
/dir1/dir2/..

So do I have a good Point #1, and more importantly do I have
a good Point #2 - and if I do I could change this bug's title
to be os.path.dirname/basename related.

Curtis Siemens

@jepler
Copy link
Mannequin

jepler mannequin commented Jun 13, 2003

Logged In: YES
user_id=2772

OK-- so my statement of the "important property" of split
was only correct in the case of a non-absolute path.

The important point is that split shortens the path whenever
it contains more than one component. You propose that of
the values given by repeated splits of "/foo/.." or
"foo/..", you'll never see the one-component return "foo" or
"/foo". Why do you believe that in the loop
while 1:
p = os.path.split(p)[0]
that p should never have one those values? To me this seems
obviously incorrect.

You didn't respond to my point that os.path.split is about
components, not about whether those components name
directories. For instance, because "/usr/local/bin" names a
directory on my system, shouldn't
os.path.split("/usr/local/bin") -> ('/usr/local/bin', '') if
your test really is about whether the final component names
a directory? To me this seems obviously incorrect.

Let me also address your claim that because of this split
behavior, basename and dirname behave improperly. This is
also wrong. In "/tmp/.." and "/usr/local/bin", the first
names an entry ".." in the directory "/tmp", and the second
names an entry "bin" in the directory "/usr/local", just
like "/bin/sh" names an entry "sh" in the directory "/bin".

I strongly believe this bug should be marked closed,
resolution: invalid.

@csiemens
Copy link
Mannequin Author

csiemens mannequin commented Jun 13, 2003

Logged In: YES
user_id=794244

Ok, I like the statment,
"split shortens the path whenever it contains more than one
component"
I can go with that definition of os.path.split()
because that's consistent for all paths, absolute or relative,
and given that definition I'll agree that split is about
components.

Ok, onto dirname/basename which are really the source of my
concern. I looked at the python documentation for basename()
and I think that it points out a problem that has been
tolerated.
It states:
Note that the result of this function is different from the
Unix basename program; where basename for '/foo/bar/'
returns 'bar', the basename() function returns an empty
string ('').
You state that the final component of a path should be
returned for basename() irregardless if it is a file or
directory.
I can get behind that, but then I think that statement supports
the Unix basename function implementation where /foo/bar/
has 'bar' (or 'bar/') returned for basename because /foo/bar
and /foo/bar/ are the same path, and to me 'bar' or 'bar/' is
the same single component since the trailing slash (and only
the trailing slash(es) case) is redundant. Am I way off on
this?

@jepler
Copy link
Mannequin

jepler mannequin commented Jun 13, 2003

Logged In: YES
user_id=2772

Interestingly, it appears that back in Python version 1.2,
os.path.split may have behaved in the way you described.
From
http://www.via.ecp.fr/python-doc/python-lib/posixpath.html :
split (p) -- function of module posixpath
Split the pathname p in a pair (head, tail), where tail
is the last pathname component and head is everything
leading up to that. If p ends in a slash (except if it is
the root), the trailing slash is removed and the operation
applied to the result; otherwise, join(head, tail) equals p.
The tail part never contains a slash. Some boundary cases:
if p is the root, head equals p and tail is empty; if p is
empty, both head and tail are empty; if p contains no slash,
head is empty and tail equals p.

By version 1.4, the behavior had
changed:http://www.python.org/doc/1.4/lib/node75.html
split(p)
Split the pathname p in a pair (head, tail), where tail
is the last pathname component and head is everything
leading up to that. The tail part will never contain a
slash; if p ends in a slash, tail will be empty. If there is
no slash in p, head will be empty. If p is empty, both head
and tail are empty. Trailing slashes are stripped from head
unless it is the root (one or more slashes only). In nearly
all cases, join(head, tail) equals p (the only exception
being when there were multiple slashes separating head from
tail).

This change in the Python CVS was made by Guido himself,
between the 1.2 and 1.3 releases:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/posixpath.py.diff?r1=1.15&r2=1.16

Since the behavior you are now proposing was one that Guido
explicitly got rid of, it seems like an uphill battle to ask
for it back, especially since the current behavior has been
clearly documented for the 1.3, 1.4, 1.5, 1.6, 2.0, 2.1, and
2.2 releases (the last 7 major releases, spanning something
like 8 years---or about the time of the introduction of
keyword arguments, according to 1.3's Misc/NEWS)

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

0 participants