New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shutil.rmtree fails on non ascii filenames #68860
Comments
I run python 2.7 on Windows 7 and the function rmtree of the shutil package fails to remove files with a non ascii filename:
WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'H:\\ihre_perso\xa6\xeanlichen_Zugangsdaten600.jpg' Please let me know if i can help with something. |
Can you confirm whether it also fails if you pass in a unicode string? eg shutil.rmtree(u"filename.txt") |
I've confirmed the issue. It does indeed only occur if the string passed to rmtree is bytes. I discovered this during my investigation of cherrypy/cherrypy#1467. The following script will replicate the failure on Windows systems on Python 2 and Python 3, but not on other operating systems: --- from __future__ import unicode_literals
import os
import shutil
os.mkdir('temp')
with open('temp/Слава Україні.html', 'w'):
pass print(os.listdir(b'temp')[0]) shutil.rmtree(b'temp') The error on Python 2.7 is this: ????? ???????.html
Traceback (most recent call last):
File "C:\Users\jaraco\p\cherrypy\issue-1467.py", line 15, in <module>
shutil.rmtree(b'temp')
File "C:\Program Files\Python27\lib\shutil.py", line 252, in rmtree
onerror(os.remove, fullname, sys.exc_info())
File "C:\Program Files\Python27\lib\shutil.py", line 250, in rmtree
os.remove(fullname)
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'temp\\????? ???????.html' This issue might be related to bpo-25911 or bpo-24230 or bpo-18713 or bpo-16656 or bpo-9820 and probably others. It's not obvious to me browsing through those tickets why Windows should behave differently when a bytestring is passed to listdir. Perhaps I'll delve into those tickets in more depth. |
See also bpo-16700. On Windows there are two sets of API: Unicode and bytes. File names are stored in Unicode (UTF-16) in modern filesystems and encoded to bytes by system for bytes API. Unfortunately this encoding is lossfull. Windows try to find the closest equivalent if the character is not encodable with current codepage (for example drops diacritics) and silently replaces it with "?" if can't find anything appropriate. We can't do anything with this from Python side except using Unicode API. |
Use Unicode on Python 3, it will work on all platforms. Problem solved :-) |
I agree. I was able to apply a fairly simple fix to setuptools to address the failure (pypa/setuptools@8579495). I suggest closing this ticket as won't fix. |
I'm afraid I need to re-open this issue. Although passing unicode names to rmtree fixes the issue on Windows systems, it causes problems on Linux systems where LC_ALL=C. Consider this script: ################################# from __future__ import unicode_literals
import os
import shutil
os.mkdir('temp')
with open('temp/Слава Україні.html'.encode('utf-8'), 'w'):
pass print(os.listdir(b'temp')[0]) shutil.rmtree('temp')
################################# Invoked thus, a UnicodeDecodeError occurs: vagrant@trusty:/vagrant$ LC_ALL=C python2.7 issue24672.py
Слава Україні.html
Traceback (most recent call last):
File "issue24672.py", line 15, in <module>
shutil.rmtree('temp')
File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
fullname = os.path.join(path, name)
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128) This is the same error seen trying to rmtree an extraction of Sphinx (a package containing an offending non-ascii character):: vagrant@trusty:/vagrant$ wget 'https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz' -O - | tar xz 100%[========================================================>] 4,397,246 2.06MB/s in 2.0s 2016-12-20 19:07:23 (2.06 MB/s) - written to stdout [4397246/4397246] vagrant@trusty:/vagrant$ LC_ALL=C python2.7 -c "import shutil; shutil.rmtree(u'Sphinx-1.5.1')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
fullname = os.path.join(path, name)
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 8: ordinal not in range(128) Is the solution to call rmtree with unicode on Windows, but with bytes when on Python 2 and Linux? What else can be done? |
Lib/posixpath.py needs a huge amount of work to behave correctly for either bytes or Unicode paths. I don't know why Lib/ntpath.py is okay here, but the code is different so I suspect it just happens to not need the same conversion. Switching for each platform is probably the only way, unless you find someone willing to go through and make Unicode paths viable on Python 2.7 (this came up earlier today on one of the lists). |
In pypa/setuptools#706, I've addressed this additional concern. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: