Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib glob case sensitivity issue on Windows #70842

Closed
udoeberhardt mannequin opened this issue Mar 28, 2016 · 5 comments
Closed

pathlib glob case sensitivity issue on Windows #70842

udoeberhardt mannequin opened this issue Mar 28, 2016 · 5 comments
Labels
OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@udoeberhardt
Copy link
Mannequin

udoeberhardt mannequin commented Mar 28, 2016

BPO 26655
Nosy @pfmoore, @pitrou, @tjguk, @zware, @serhiy-storchaka, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-03-31.10:03:41.724>
created_at = <Date 2016-03-28.13:24:02.878>
labels = ['type-bug', 'library', 'OS-windows']
title = 'pathlib glob case sensitivity issue on Windows'
updated_at = <Date 2016-03-31.10:03:41.724>
user = 'https://bugs.python.org/udoeberhardt'

bugs.python.org fields:

activity = <Date 2016-03-31.10:03:41.724>
actor = 'SilentGhost'
assignee = 'none'
closed = True
closed_date = <Date 2016-03-31.10:03:41.724>
closer = 'SilentGhost'
components = ['Library (Lib)', 'Windows']
creation = <Date 2016-03-28.13:24:02.878>
creator = 'udo.eberhardt'
dependencies = []
files = []
hgrepos = []
issue_num = 26655
keywords = []
message_count = 5.0
messages = ['262570', '262574', '262593', '262664', '262668']
nosy_count = 7.0
nosy_names = ['paul.moore', 'pitrou', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'udo.eberhardt']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue26655'
versions = ['Python 3.5']

@udoeberhardt
Copy link
Mannequin Author

udoeberhardt mannequin commented Mar 28, 2016

On Windows Path.glob does not always return the file name with correct case.

If the current directory contains a file named MixedCase.txt then the following script:

import pathlib
p = pathlib.Path('.')
print(list(p.glob('*.txt')))
print(list(p.glob('Mixedcase.txt')))

yields:
[WindowsPath('MixedCase.txt')]
[WindowsPath('mixedcase.txt')]

Problem: The result of the second call to glob should be 'MixedCase.txt' as well. I would expect that glob returns a file name exactly as it is spelled in the file system.

@udoeberhardt udoeberhardt mannequin added stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Mar 28, 2016
@serhiy-storchaka
Copy link
Member

The problem is that there is no way to just read a file name exactly as it is spelled in the file system. Iterating all names in the directory and finding the one that match specified name ignoring case is not such effective as checking that specified file name exists.

@udoeberhardt
Copy link
Mannequin Author

udoeberhardt mannequin commented Mar 29, 2016

So this is a trade-off between consistent behavior and efficiency. My point of view is that glob is for enumerating matching files and it should consistently return the real file names. Typically glob will be called with a pattern like '*.txt' and it will have to iterate names anyway, right? In the special case that it is called with a literal name it could do the same to produce consistent results. A user who wants to check (more efficiently) if a literal name exists, can use Path.exists().

The statement in the doc could be:
Note: To find the literal names in the file system, glob always enumerates files and directories. To check more efficiently whether a specific file exists, use exists().

@serhiy-storchaka
Copy link
Member

For now glob(r'c:\very\long\case\insensitive\path\*.txt') have to iterate names only in one directory. For restoring actual path case it have to iterate all parent directories: r'c:\very\long\case\insensitive\path', r'c:\very\long\case\insensitive', r'c:\very\long\case', r'c:\very\long', r'c:\very', and 'c:\\'.

@udoeberhardt
Copy link
Mannequin Author

udoeberhardt mannequin commented Mar 30, 2016

Meanwhile I realized this problem as well. There is no easy solution to determine exact spelling of the entire path. So it seems there is no simple solution to my problem. The concept of treating file system paths case-insensitive (as Windows does) seems to be a bad idea.

@SilentGhost SilentGhost mannequin closed this as completed Mar 31, 2016
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant