Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with LC_ALL=C #761

Closed
kmike opened this issue Jan 3, 2013 · 12 comments
Closed

problems with LC_ALL=C #761

kmike opened this issue Jan 3, 2013 · 12 comments
Labels
C: encoding Related to text encoding and likely, UnicodeErrors type: bug A confirmed bug or unintended behavior

Comments

@kmike
Copy link

kmike commented Jan 3, 2013

There is a pattern of using open(path, 'r').read() without explicit encoding in pip:

This pattern causes issues under Python 3.x with ASCII locale because file contents is decoded using ascii in this case and it fails for non-ascii data.

The first occurance (in setup.py) is clearly wrong IMHO: the utility function is used for reading pip's own index.txt and news.txt files which are encoded to utf8. It may cause the following exception:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 871: ordinal not in range(128)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/_5/cbsg50991szfp1r9nwxpx8580000gq/T/pip-61p_z7-build/setup.py", line 31, in <module>

    "\n\n" + read("docs", "news.txt"))

  File "/var/folders/_5/cbsg50991szfp1r9nwxpx8580000gq/T/pip-61p_z7-build/setup.py", line 9, in read

    return codecs.open(os.path.join(os.path.abspath(os.path.dirname(__file__)), *parts), 'r').read()

  File "/Users/kmike/svn/pip/.tox/py32-ascii/lib/python3.2/encodings/ascii.py", line 26, in decode

    return codecs.ascii_decode(input, self.errors)[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 871: ordinal not in range(128)

if the following is added to pip's own tox.ini:

[testenv:py32-ascii]
basepython = python3.2
setenv = LC_ALL=C

The second is more tricky and I didn't debug it. It causes the following exception:

Unpacking /Users/kmike/svn/DAWG/.tox/dist/DAWG-0.5.3.zip
  Running setup.py egg_info for package from file:///Users/kmike/svn/DAWG/.tox/dist/DAWG-0.5.3.zip

Exception:
Traceback (most recent call last):
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/basecommand.py", line 107, in main
    status = self.run(options, args)
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/commands/install.py", line 256, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 1042, in prepare_files
    req_to_install.run_egg_info()
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 241, in run_egg_info
    "%(Name)s==%(Version)s" % self.pkg_info())
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 334, in pkg_info
    data = self.egg_info_data('PKG-INFO')
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/site-packages/pip-1.2.1-py3.2.egg/pip/req.py", line 274, in egg_info_data
    data = fp.read()
  File "/Users/kmike/svn/DAWG/.tox/py32-locale/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2130: ordinal not in range(128)

in https://github.com/kmike/DAWG testing suite (https://github.com/kmike/DAWG/blob/master/tox.ini).
DAWG package has a non-ascii README.rst (which is loaded to long_description, binary under Python 2.x and unicode under Python 3.x).

Under Python 2.x this works fine because req.py doesn't try to decode the data.

@hltbra
Copy link
Contributor

hltbra commented Mar 28, 2013

It seems the first point, the issue on setup.py, is solved by @qwcode. But the second point of req.py and fp.read() seems to be buggy yet.

@kmike
Copy link
Author

kmike commented Mar 28, 2013

For the record: I gave up on this :) I don't know how to make reliably installable packages with non-ascii metadata under Python 2.x.

As for the first point, I think the commited solution is fragile (because non-ascii chars could accidently be introduced again) and it is better to explicitly decode news.txt from ascii to prevent such errors in future.

@kmike
Copy link
Author

kmike commented Mar 28, 2013

Also, I think pip should have Travis/tox environment with LC_ALL=C to test against this.

@domenkozar
Copy link
Contributor

Same here:

$ LANG="POSIX" pip install -e .
Obtaining file:///home/ielectric/dev/pyramid_jinja2
  Running setup.py egg_info for package from file:///home/ielectric/dev/pyramid_jinja2

Cleaning up...
Exception:
Traceback (most recent call last):
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/basecommand.py", line 134, in main
    status = self.run(options, args)
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/commands/install.py", line 236, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/req.py", line 1047, in prepare_files
    req_to_install.run_egg_info()
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/req.py", line 262, in run_egg_info
    "%(Name)s==%(Version)s" % self.pkg_info())
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/req.py", line 355, in pkg_info
    data = self.egg_info_data('PKG-INFO')
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/site-packages/pip/req.py", line 295, in egg_info_data
    data = fp.read()
  File "/home/ielectric/dev/pyramid_jinja2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1087: ordinal not in range(128)

@domenkozar
Copy link
Contributor

cc @pauleveritt

@pauleveritt
Copy link

Indeed, this was the source of my problem. In one shell I had this in my environment:

LANG=en_US.UTF-8

...and pip worked. In another shell I didn't, and got the UnicodeDecodeError exception.

openstack-gerrit pushed a commit to openstack-archive/solum that referenced this issue Nov 20, 2013
With that, building and uploading wheels to PyPI is only one "python
setup.py bdist_wheel" away.

Also, changes LC_ALL from C to UTF-8 to get around a bug in pip:

pypa/pip#761

Change-Id: I508ab6436c29b23f6b8d56c0ec63eb26f3568819
@rbarrois
Copy link

Hi there,

This issue seems to still be running; the way I see it, there seems to be two options:

  • Consider that all setup.py files are UTF-8, expect users to use a UTF-8 locale, and don't change anything
  • Consider that all setup.py files are UTF-8, and add explicit codecs.open(__file__, 'r', 'utf-8') instead of open(__file__, 'r') in pip/req.py
  • Prepare for non-UTF-8, non-ascii setup.py, and emulate Python's handling of coding: utf-8 & co markers.

It seems that fixing the manually written micro-scripts around lines 600 and 285, as well as egg_info_data at line 296 of req.py are enough to install packages with UTF-8 setup.py and metadata with LC_ALL=C.

I'd like to contribute a patch to fix this issue; which option would you prefer?

@pradyunsg
Copy link
Member

Is this still a thing?

@auvipy
Copy link

auvipy commented May 7, 2019

@pradyunsg I haven't face this in recent times. I saw this before though. This might not be existing anymore. But in the face of ambiguity, refuse the temptation to guess.

@gutsytechster
Copy link
Contributor

Can anyone tell me how to reproduce this(if it's still a bug)? The initial comment seems to be referring to the older piece of code and I really am not able to understand much out ot it.

Though if it's not present, we can then close it. :)

@uranusjr
Copy link
Member

I believe this part of the code base has been removed. pip now uses distlib to read legacy metadata (egg-info), which always uses UTF-8.

@pradyunsg
Copy link
Member

In that case, let's close this! :)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: encoding Related to text encoding and likely, UnicodeErrors type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

9 participants