New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect os.path.supports_unicode_filenames #38817
Comments
At least on OSX, unicode file names are pretty much fully |
Logged In: YES What happens if you try to create a file using Unicode names? In other words, does writing Unicode to an ASCII file system ever |
Logged In: YES On POSIX platforms in general, detecting Unicode file name On OSX, the situation is somewhat different from POSIX, as The documentation for supports_unicode_filenames says True if arbitrary Unicode strings can be used as file names While the first part is true for OSX, I don't think the |
Logged In: YES Brett: As for "writing Unicode to an ASCII file system": So you can put Latin-1, KOI8-r, EUC-JP, UTF-8, gb2312, etc |
Logged In: YES
(I'm not 100% sure, but I think the OS corrects that)
It is, we had a long discussion about that back when I
Darwin is a posix platform, so I'll have to add a switch to |
Logged In: YES
I'm relatively sure that the OS doesn't. The OS won't If you think setting the flag for darwin is fine in |
Logged In: YES Done in rev. 1.61 of posixpath.py. (Actually, OSX does complain when you feed open() a non-valid |
Logged In: YES Reopeing as the fix I checked in caused problems in |
Logged In: YES (forgot to mention: my checkin was backed out) |
Logged In: YES Hmm, two years later and this still hasn't been resolved. Is anyone (Btw. the only code using os.path.supports_unicode_filenames that I'm |
Logged In: YES I don't care about this issue, as I think |
Maybe os.path.supports_unicode_filenames should be deprecated. On Linux both the things work, even if the value of os.path.supports_unicode_filenames is still False:
>>> os.path.supports_unicode_filenames
False
>>> open(u'fòòbàr', 'w')
<open file u'f\xf2\xf2b\xe0r', mode 'w' at 0x9470778>
>>> os.listdir(u'.')
[u'f\xf2\xf2b\xe0r', ...]
>>> open(u'fòòbàr')
<open file u'f\xf2\xf2b\xe0r', mode 'r' at 0x9470778> |
In addition, whether or not true unicode filenames are supported really depends, at least on Linux, on the *filesystem*, not on the OS (for some definition of support). In other words, I think os.path.supports_unicode_filenames is an API design that is broken and should probably be dropped. |
Additionally it filters out test_pep277 on some platforms. But seemingly, it is not needed anymore with this patch. |
If it is decided to keep supports_unicode_filenames, here is a patch for test_os.py that verifies the value of supports_unicode_filenames against the following line from the documentation: |
With r78594, test_pep277 is active on all platforms having Unicode-friendly filesystem encoding. |
There are at least three messages stating that os.path.supports_unicode_filenames should go so can someone please provide a definitive statement regarding its future. |
test_pep277.patch removes the usage of os.path.supports_unicode_filenames from test_pep277: the test still pass on Debian Sid (Linux). Can someone test the patch on Mac OS X, FreeBSD and Solaris (and maybe other POSIX/UNIX OSes)? About Windows: supports_unicode_filenames is False if sys.getwindowsversion().platform < 2: win32s (0) or Windows 9x/ME (1). I don't know win32s, but I know that Windows 9x/ME is not more supported. |
Oops, forget test_pep277.patch: I misunderstood r81149 (new way to detect if the filesystem supports unicode or not). test_pep277 fails with my patch on Linux with LC_CTYPE=C. |
r84701 fixes supports_unicode_filenames's definition in Python 3.2 (and r84702 in Python 3.1): os.listdir(str) now always return unicode filenames (including non-ascii characters). |
It depends on the locale encoding: $ LC_CTYPE=C ./python
Python 3.2a2+ (py3k, Sep 11 2010, 01:48:43)
>>> import sys; sys.getfilesystemencoding()
'ascii'
>>> open('\xe9', 'w').close()
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) With utf-8, surrogates are forbidden. Eg. $ ./python
Python 3.2a2+ (py3k, Sep 11 2010, 01:48:43)
>>> import sys; sys.getfilesystemencoding()
'utf-8'
>>> open('\uDC00', 'w').close()
...
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed On Windows, Python uses the unicode API and so the unicode support doesn't depend on the locale encoding (on the ansi code page). Surrogates are accepted on Windows: '\uDC00' is a valid filename. I think that supports_unicode_filenames is still useful to check if the filesystem API uses bytes (Linux, FreeBSD, Solaris, ...) or characters (Mac OS X, Windows). Mac OS X is a special case because the C API uses char* (byte string), but the filesystem encoding is fixed to utf-8 and it doesn't accept invalid utf-8 filenames. So I would like to say that supports_unicode_filenames should be True on Mac OS X (which was the initial request). |
Win32s is long gone. It was an emulation layer to support Win32 on |
Sounds reasonable. |
r84784 sets os.path.supports_unicode_filenames to True on Mac OS X (macpath module). About test_supports_unicode_filenames.patch. test_unicode_listdir() is wrong: os.listdir(str) always return str (see r84701). "verify that the new file's name is equal to the name we tried" check of test_unicode_filename() is also wrong: newfile.name is always equal to fname, it doesn't depend on support_unicode_filenames. Since the test is wrong, I don't want to commit it. test_pep277 is enough to test the creation of files with unicode names. I don't see anything else to do now, so I close this issue. Reopen it if I forgot something, or open a new issue. |
I backported r84701 and r84784 to Python 2.7 (r84787). |
There seems to be some confusion about the macpath.py module. I'm not sure why it even exists in Python 3. Note it has to do with obsolete Classic MacOS-style paths (colon-separated paths) which are available on Mac OS X only through deprecated Carbon interfaces. I'm not even sure that those style paths do support unicode. More importantly, the underlying Carbon interfaces that macpath.py uses were removed for Python 3. AFAIK, virtually nothing on OS X uses these style paths anymore and, with the removal of all the old Mac Carbon support in Python 3, I don't think there is any Python module that can use these paths other than macpath. I think this module should be marked for deprecation and removed. There is no reason to modify it nor add a NEWS note, even for 2.7. |
(I've opened bpo-9850 to document the brokenness of macpath and suggest its deprecation and removal.) |
Oops. I thought that Mac OS X uses macpath, but in fact it is posixpath. Can you try my new patch posixpath_darwin.patch? I reopen the issue because I patched the wrong module. I suppose that Python 2.7 has the same issue: posixpath should be patched, not macpath. My patch leaves macpath with supports_unicode_filenames=True. If I understood correctly: macpath should be removed (bpo-9850). |
No problems noted with a quick test of posixpath_darwin.patch on 10.6 so looks good. It will get regression tested on more configurations sometime later. |
Ok thanks. Fix commited to 3.2 (r84866) and 2.7 (r84868). I kept my patch on macpath (supports_unicode_filenames=True) because it is still valid (even if it is not used). Or is it wrong that Mac OS 9 speaks unicode? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: