tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

4kir4 · 2015-01-25T11:02:16Z

BPO	23315
Nosy	@gpshead, @scoder, @vstinner, @ezio-melotti, @4kir4, @serhiy-storchaka

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-05-31.15:11:08.992>
created_at = <Date 2015-01-25.11:02:15.989>
labels = ['type-bug', 'expert-unicode']
title = 'tempfile.mkdtemp fails with non-ascii paths on Python 2'
updated_at = <Date 2020-05-31.15:11:08.991>
user = 'https://github.com/4kir4'

bugs.python.org fields:

activity = <Date 2020-05-31.15:11:08.991>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2020-05-31.15:11:08.992>
closer = 'serhiy.storchaka'
components = ['Unicode']
creation = <Date 2015-01-25.11:02:15.989>
creator = 'akira'
dependencies = []
files = []
hgrepos = []
issue_num = 23315
keywords = []
message_count = 8.0
messages = ['234662', '234664', '257333', '257334', '257338', '257340', '257342', '370480']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'scoder', 'vstinner', 'ezio.melotti', 'akira', 'serhiy.storchaka', 'risto3']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23315'
versions = ['Python 2.7']

4kir4 · 2015-01-25T11:02:15Z

Python 2.7.9 (default, Jan 25 2015, 13:41:30) 
  [GCC 4.9.2] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import os, sys, tempfile
  >>> d = u'\u20ac'.encode(sys.getfilesystemencoding()) # non-ascii
  >>> if not os.path.isdir(d): os.makedirs(d)
  ... 
  >>> os.environ['TEMP'] = d
  >>> tempfile.mkdtemp(prefix=u'')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File ".../python2.7/tempfile.py", line 331, in mkdtemp
      file = _os.path.join(dir, prefix + name + suffix)
    File ".../python2.7/posixpath.py", line 80, in join
      path += '/' + b
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

Related: https://bugs.python.org/issue1681974

vstinner · 2015-01-25T12:05:11Z

Why do you use an unicode prefix? Does it work with a bytes prefix?

You should use Python 3 if you want the best Unicode support.

risto3 · 2016-01-02T07:42:23Z

I notice similar problems, as found when running the test suite for lxml 3.5.0 on python2.7

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)

the code snippet is in test_io.py", line 276

266 def test_etree_parse_io_error(self):
267 # this is a directory name that contains characters beyond latin-1
268 dirnameEN = _str('Directory')
269 dirnameRU = _str('ÐšÐ\260Ñ\032Ð\260Ð\273Ð\276Ð\263')
270 filename = _str('nosuchfile.xml')
271 dn = tempfile.mkdtemp(prefix=dirnameEN)
272 try:
273 self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
274 finally:
275 os.rmdir(dn)
276 dn = tempfile.mkdtemp(prefix=dirnameRU)
277 try:
278 self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
279 finally:
280 os.rmdir(dn)

even if I change dirnameRU to a simple French 'Répertoire' I still get errors...

It is not an option to upgrade to 3.0, sorry.

BTW, I tried passing dirnameRU.encode('utf-8') but that just generates
a different error:

ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 278, in test_etree_parse_io_error
    self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
  File "/opt/local/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)

risto3 · 2016-01-02T07:58:22Z

If I also add .encode('utf-8') to filename on line 278, that seems gets over the pathname problem.

I guess it comes down to the fact that if sys.filesystemencoding() is utf-8, which in my case it is (on SunOS), I believe these conversion should be automatic.

risto3 · 2016-01-02T08:59:45Z

curiously enough, I was able to test with python3.5.
The same errors result, and the same workaround seems to get over it.

serhiy-storchaka · 2016-01-02T09:37:48Z

The similar problem in Python 3 was addressed in bpo-24230. But this was a new feature.

As for lxml tests, I suggest to use bytes names compatible with all Windows OEM encodings (consisting of ASCII + b'\xa9\xb0\xb2\xb3\xb4\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc8\xc9\xe6\xf0\xf1\xf3\xf4\xf5\xf6\xf7') and with UTF-8.

risto3 · 2016-01-02T10:28:34Z

This turns out to be related to the locale environment set to 'C'.

A UTF-8 locale seems to get over the issue.

A fellow pkgsrc colleague filed an issue with lxml already relating to that fact for the test suite (https://bugs.launchpad.net/lxml/+bug/1522052)

cheers

serhiy-storchaka · 2020-05-31T15:11:09Z

Python 2.7 is no longer supported.

4kir4 mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 25, 2015

serhiy-storchaka closed this as completed May 31, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

4kir4 mannequin commented Jan 25, 2015

4kir4 mannequin commented Jan 25, 2015

vstinner commented Jan 25, 2015

risto3 mannequin commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

serhiy-storchaka commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

serhiy-storchaka commented May 31, 2020

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

Comments

4kir4 mannequin commented Jan 25, 2015

4kir4 mannequin commented Jan 25, 2015

vstinner commented Jan 25, 2015

risto3 mannequin commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

serhiy-storchaka commented Jan 2, 2016

risto3 mannequin commented Jan 2, 2016

serhiy-storchaka commented May 31, 2020