Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

Closed
4kir4 mannequin opened this issue Jan 25, 2015 · 8 comments
Closed

tempfile.mkdtemp fails with non-ascii paths on Python 2 #67504

4kir4 mannequin opened this issue Jan 25, 2015 · 8 comments
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@4kir4
Copy link
Mannequin

4kir4 mannequin commented Jan 25, 2015

BPO 23315
Nosy @gpshead, @scoder, @vstinner, @ezio-melotti, @4kir4, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-05-31.15:11:08.992>
created_at = <Date 2015-01-25.11:02:15.989>
labels = ['type-bug', 'expert-unicode']
title = 'tempfile.mkdtemp fails with non-ascii paths on Python 2'
updated_at = <Date 2020-05-31.15:11:08.991>
user = 'https://github.com/4kir4'

bugs.python.org fields:

activity = <Date 2020-05-31.15:11:08.991>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2020-05-31.15:11:08.992>
closer = 'serhiy.storchaka'
components = ['Unicode']
creation = <Date 2015-01-25.11:02:15.989>
creator = 'akira'
dependencies = []
files = []
hgrepos = []
issue_num = 23315
keywords = []
message_count = 8.0
messages = ['234662', '234664', '257333', '257334', '257338', '257340', '257342', '370480']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'scoder', 'vstinner', 'ezio.melotti', 'akira', 'serhiy.storchaka', 'risto3']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23315'
versions = ['Python 2.7']

@4kir4
Copy link
Mannequin Author

4kir4 mannequin commented Jan 25, 2015

Python 2.7.9 (default, Jan 25 2015, 13:41:30) 
  [GCC 4.9.2] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import os, sys, tempfile
  >>> d = u'\u20ac'.encode(sys.getfilesystemencoding()) # non-ascii
  >>> if not os.path.isdir(d): os.makedirs(d)
  ... 
  >>> os.environ['TEMP'] = d
  >>> tempfile.mkdtemp(prefix=u'')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File ".../python2.7/tempfile.py", line 331, in mkdtemp
      file = _os.path.join(dir, prefix + name + suffix)
    File ".../python2.7/posixpath.py", line 80, in join
      path += '/' + b
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

Related: https://bugs.python.org/issue1681974

@4kir4 4kir4 mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 25, 2015
@vstinner
Copy link
Member

Why do you use an unicode prefix? Does it work with a bytes prefix?

You should use Python 3 if you want the best Unicode support.

@risto3
Copy link
Mannequin

risto3 mannequin commented Jan 2, 2016

I notice similar problems, as found when running the test suite for lxml 3.5.0 on python2.7

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)

the code snippet is in test_io.py", line 276

266 def test_etree_parse_io_error(self):
267 # this is a directory name that contains characters beyond latin-1
268 dirnameEN = _str('Directory')
269 dirnameRU = _str('КÐ\260Ñ\032Ð\260Ð\273Ð\276Ð\263')
270 filename = _str('nosuchfile.xml')
271 dn = tempfile.mkdtemp(prefix=dirnameEN)
272 try:
273 self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
274 finally:
275 os.rmdir(dn)
276 dn = tempfile.mkdtemp(prefix=dirnameRU)
277 try:
278 self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
279 finally:
280 os.rmdir(dn)

even if I change dirnameRU to a simple French 'Répertoire' I still get errors...

It is not an option to upgrade to 3.0, sorry.

BTW, I tried passing dirnameRU.encode('utf-8') but that just generates
a different error:

ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 278, in test_etree_parse_io_error
    self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
  File "/opt/local/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)

@risto3
Copy link
Mannequin

risto3 mannequin commented Jan 2, 2016

If I also add .encode('utf-8') to filename on line 278, that seems gets over the pathname problem.

I guess it comes down to the fact that if sys.filesystemencoding() is utf-8, which in my case it is (on SunOS), I believe these conversion should be automatic.

@risto3
Copy link
Mannequin

risto3 mannequin commented Jan 2, 2016

curiously enough, I was able to test with python3.5.
The same errors result, and the same workaround seems to get over it.

@serhiy-storchaka
Copy link
Member

The similar problem in Python 3 was addressed in bpo-24230. But this was a new feature.

As for lxml tests, I suggest to use bytes names compatible with all Windows OEM encodings (consisting of ASCII + b'\xa9\xb0\xb2\xb3\xb4\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc8\xc9\xe6\xf0\xf1\xf3\xf4\xf5\xf6\xf7') and with UTF-8.

@risto3
Copy link
Mannequin

risto3 mannequin commented Jan 2, 2016

This turns out to be related to the locale environment set to 'C'.

A UTF-8 locale seems to get over the issue.

A fellow pkgsrc colleague filed an issue with lxml already relating to that fact for the test suite (https://bugs.launchpad.net/lxml/+bug/1522052)

cheers

@serhiy-storchaka
Copy link
Member

Python 2.7 is no longer supported.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants