Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutil.rmtree fails on non ascii filenames #68860

Closed
SteffenKampmann mannequin opened this issue Jul 20, 2015 · 9 comments
Closed

shutil.rmtree fails on non ascii filenames #68860

SteffenKampmann mannequin opened this issue Jul 20, 2015 · 9 comments
Labels
OS-windows stdlib Python modules in the Lib dir

Comments

@SteffenKampmann
Copy link
Mannequin

SteffenKampmann mannequin commented Jul 20, 2015

BPO 24672
Nosy @pfmoore, @jaraco, @vstinner, @tjguk, @zware, @serhiy-storchaka, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-12-21.19:48:45.605>
created_at = <Date 2015-07-20.08:40:07.286>
labels = ['library', 'OS-windows']
title = 'shutil.rmtree fails on non ascii filenames'
updated_at = <Date 2016-12-21.19:48:45.603>
user = 'https://bugs.python.org/SteffenKampmann'

bugs.python.org fields:

activity = <Date 2016-12-21.19:48:45.603>
actor = 'jaraco'
assignee = 'none'
closed = True
closed_date = <Date 2016-12-21.19:48:45.605>
closer = 'jaraco'
components = ['Library (Lib)', 'Windows']
creation = <Date 2015-07-20.08:40:07.286>
creator = 'Steffen Kampmann'
dependencies = []
files = []
hgrepos = []
issue_num = 24672
keywords = []
message_count = 9.0
messages = ['246971', '246973', '271661', '271664', '271666', '271699', '283707', '283710', '283776']
nosy_count = 8.0
nosy_names = ['paul.moore', 'jaraco', 'vstinner', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'Steffen Kampmann']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue24672'
versions = ['Python 2.7']

@SteffenKampmann
Copy link
Mannequin Author

SteffenKampmann mannequin commented Jul 20, 2015

I run python 2.7 on Windows 7 and the function rmtree of the shutil package fails to remove files with a non ascii filename:

File "C:\\Users\\skampmann\\AppData\\Local\\Continuum\\Anaconda\\lib\\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
File "C:\\Users\\skampmann\\AppData\\Local\\Continuum\\Anaconda\\lib\\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
File "C:\\Users\\skampmann\\AppData\\Local\\Continuum\\Anaconda\\lib\\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
File "C:\\Users\\skampmann\\AppData\\Local\\Continuum\\Anaconda\\lib\\shutil.py", line 252, in rmtree    onerror(os.remove, fullname, sys.exc_info())
File "C:\\Users\\skampmann\\AppData\\Local\\Continuum\\Anaconda\\lib\\shutil.py", line 250, in rmtree    os.remove(fullname)

WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'H:\\ihre_perso\xa6\xeanlichen_Zugangsdaten600.jpg'

Please let me know if i can help with something.

@SteffenKampmann SteffenKampmann mannequin added the stdlib Python modules in the Lib dir label Jul 20, 2015
@tjguk
Copy link
Member

tjguk commented Jul 20, 2015

Can you confirm whether it also fails if you pass in a unicode string? eg

shutil.rmtree(u"filename.txt")

@jaraco
Copy link
Member

jaraco commented Jul 30, 2016

I've confirmed the issue. It does indeed only occur if the string passed to rmtree is bytes. I discovered this during my investigation of cherrypy/cherrypy#1467. The following script will replicate the failure on Windows systems on Python 2 and Python 3, but not on other operating systems:

---
# encoding: utf-8

from __future__ import unicode_literals

import os
import shutil

os.mkdir('temp')

with open('temp/Слава Україні.html', 'w'):
    pass

print(os.listdir(b'temp')[0])

shutil.rmtree(b'temp')

The error on Python 2.7 is this:

????? ???????.html
Traceback (most recent call last):
  File "C:\Users\jaraco\p\cherrypy\issue-1467.py", line 15, in <module>
    shutil.rmtree(b'temp')
  File "C:\Program Files\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Program Files\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'temp\\????? ???????.html'

This issue might be related to bpo-25911 or bpo-24230 or bpo-18713 or bpo-16656 or bpo-9820 and probably others.

It's not obvious to me browsing through those tickets why Windows should behave differently when a bytestring is passed to listdir. Perhaps I'll delve into those tickets in more depth.

@jaraco jaraco changed the title shutil.rmtree failes on non ascii filenames shutil.rmtree fails on non ascii filenames Jul 30, 2016
@serhiy-storchaka
Copy link
Member

See also bpo-16700.

On Windows there are two sets of API: Unicode and bytes. File names are stored in Unicode (UTF-16) in modern filesystems and encoded to bytes by system for bytes API. Unfortunately this encoding is lossfull. Windows try to find the closest equivalent if the character is not encodable with current codepage (for example drops diacritics) and silently replaces it with "?" if can't find anything appropriate. We can't do anything with this from Python side except using Unicode API.

@vstinner
Copy link
Member

Use Unicode on Python 3, it will work on all platforms. Problem solved :-)

@jaraco
Copy link
Member

jaraco commented Jul 30, 2016

I agree. I was able to apply a fairly simple fix to setuptools to address the failure (pypa/setuptools@8579495).

I suggest closing this ticket as won't fix.

@jaraco
Copy link
Member

jaraco commented Dec 20, 2016

I'm afraid I need to re-open this issue.

Although passing unicode names to rmtree fixes the issue on Windows systems, it causes problems on Linux systems where LC_ALL=C. Consider this script:

#################################
# encoding: utf-8

from __future__ import unicode_literals

import os
import shutil

os.mkdir('temp')

with open('temp/Слава Україні.html'.encode('utf-8'), 'w'):
    pass

print(os.listdir(b'temp')[0])

shutil.rmtree('temp')
#################################

Invoked thus, a UnicodeDecodeError occurs:

vagrant@trusty:/vagrant$ LC_ALL=C python2.7 issue24672.py 
Слава Україні.html
Traceback (most recent call last):
  File "issue24672.py", line 15, in <module>
    shutil.rmtree('temp')
  File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)

This is the same error seen trying to rmtree an extraction of Sphinx (a package containing an offending non-ascii character)::

vagrant@trusty:/vagrant$ wget 'https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz' -O - | tar xz
--2016-12-20 19:07:21-- https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz
Resolving files.pythonhosted.org (files.pythonhosted.org)... 151.101.33.63
Connecting to files.pythonhosted.org (files.pythonhosted.org)|151.101.33.63|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4397246 (4.2M) [binary/octet-stream]
Saving to: ‘STDOUT’

100%[========================================================>] 4,397,246 2.06MB/s in 2.0s

2016-12-20 19:07:23 (2.06 MB/s) - written to stdout [4397246/4397246]

vagrant@trusty:/vagrant$ LC_ALL=C python2.7 -c "import shutil; shutil.rmtree(u'Sphinx-1.5.1')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 8: ordinal not in range(128)

Is the solution to call rmtree with unicode on Windows, but with bytes when on Python 2 and Linux? What else can be done?

@jaraco jaraco reopened this Dec 20, 2016
@zooba
Copy link
Member

zooba commented Dec 20, 2016

Lib/posixpath.py needs a huge amount of work to behave correctly for either bytes or Unicode paths. I don't know why Lib/ntpath.py is okay here, but the code is different so I suspect it just happens to not need the same conversion.

Switching for each platform is probably the only way, unless you find someone willing to go through and make Unicode paths viable on Python 2.7 (this came up earlier today on one of the lists).

@jaraco
Copy link
Member

jaraco commented Dec 21, 2016

In pypa/setuptools#706, I've addressed this additional concern.

@jaraco jaraco closed this as completed Dec 21, 2016
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

6 participants