New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when setup.cfg contains non-ASCII and LC_ALL=C #1062

Open
benoit-pierre opened this Issue Jun 13, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@benoit-pierre
Copy link
Member

benoit-pierre commented Jun 13, 2017

> cat >setup.py <<\EOF
# -*- coding: utf-8 -*-
from setuptools import setup
setup(
name='test',
version='1.0',
description='éàïñ',
)
EOF
> env LC_ALL=C python3 ./setup.py --description
éàïñ
> cat >setup.cfg <<\EOF
[metadata]
name = test
version = 1.0
description = éàïñ
EOF
> cat >setup.py <<\EOF
from setuptools import setup
setup()
EOF
> env LC_ALL=C python3 ./setup.py --description
Traceback (most recent call last):
  File "./setup.py", line 2, in <module>
    setup()
  File "/usr/lib/python3.6/distutils/core.py", line 121, in setup
    dist.parse_config_files()
  File "/usr/lib/python3.6/site-packages/setuptools/dist.py", line 355, in parse_config_files
    _Distribution.parse_config_files(self, filenames=filenames)
  File "/usr/lib/python3.6/distutils/dist.py", line 395, in parse_config_files
    parser.read(filename)
  File "/usr/lib/python3.6/configparser.py", line 697, in read
    self._read(fp, filename)
  File "/usr/lib/python3.6/configparser.py", line 1015, in _read
    for lineno, line in enumerate(fp, start=1):
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)

With PKG-INFO always written as UTF-8, I think it would make sense to load setup.cfg as UTF-8 too.

I see in the code that there's a setuptools/py36compat.py providing a patched parse_config_files but as seen in the traceback, it's not called. Is that normal?

According to the log it was added to fix #889, but adding a test for that fails:

diff --git i/setuptools/tests/test_config.py w/setuptools/tests/test_config.py
index 8bd2a494..2b118b70 100644
--- i/setuptools/tests/test_config.py
+++ w/setuptools/tests/test_config.py
@@ -288,6 +288,15 @@ class TestMetadata:
         with get_dist(tmpdir) as dist:
             assert set(dist.metadata.classifiers) == expected

+    def test_no_interpolation(self, tmpdir):
+        fake_env(
+            tmpdir,
+            '[metadata]\n'
+            'description = %(message)s\n'
+        )
+        with get_dist(tmpdir) as dist:
+            assert dist.metadata.description == '%(message)s'
+

 class TestOptions:

Anyway, here is my attempt at fixing: master...benoit-pierre:fix_889_and_non-ascii_in_setup.cfg

I can make proper PR(s) for those changes that are OK.

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jun 15, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jul 18, 2017

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jul 26, 2017

@jaraco

This comment has been minimized.

Copy link
Member

jaraco commented Jul 27, 2017

I started diving into this. I first found I could ignore most of the commits and focus on 9f81546. But even that single commit attempts to do too much, fixing two issues at the same time.

So I started to focus just on the first issue, that #889 seems not to be passing. I incorporated the test as dd404fd2e (thanks for that). But then when I went to find out why the test wasn't passing, all looked well except that, as you mentioned, the compatibility code isn't being called.

It seems that in 6913024 referencing #394, the compatibility code was overridden. And shame on me for not having a unit test to capture the regression.

@benoit-pierre

This comment has been minimized.

Copy link
Member

benoit-pierre commented Jul 27, 2017

Check da92ba4 for an alternative implementation for supporting this. Unfortunately, this can't be done for Python 2, as returning Unicode strings will break various things, e.g.: the sdist command when setup.cfg contains:

[sdist]
formats = zip

Because the distutils Python 2 code will check for str instances...

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Jul 27, 2017

@jaraco

This comment has been minimized.

Copy link
Member

jaraco commented Jul 27, 2017

Okay, the fix for #889 is now in the master. There were two flaws. One was that the compatibility behavior was not being invoked. The other was that the compatibility behavior was not working properly because the interpolation=None had to be called in two places.

@jaraco

This comment has been minimized.

Copy link
Member

jaraco commented Jul 27, 2017

I should think that Setuptools should be able to support non-ascii characters in a setup.cfg file even if the target system is LANG=C. I believe LANG=C only affects default encoding behavior. Where setuptools can specify encoding behavior, the tests should run despite LANG=C.

But one thing I'm noticing is that these patches are affecting the behavior in py36compat Distribution_parse_config_files, where the docstring specifically cautions against editing that code. That code was copied from cpython and should match what's being released. If we want to tweak that behavior, we should probably contribute that behavior upstream first and then provide forward compatibility to Setuptools users by copying that code to the compat module.

@benoit-pierre

This comment has been minimized.

Copy link
Member

benoit-pierre commented Jul 27, 2017

The interpolation test still fails on Python 2 because Distribution_parse_config_files is not used. And if it is used, then interpolation=None cannot be used with Python 2, and parser.get(...,raw=True) must be used instead.

diff --git i/setuptools/py36compat.py w/setuptools/py36compat.py
--- i/setuptools/py36compat.py
+++ w/setuptools/py36compat.py
@@ -2,6 +2,8 @@ import sys
 from distutils.errors import DistutilsOptionError
 from distutils.util import strtobool
 from distutils.debug import DEBUG
+from setuptools.extern import six
+from setuptools.extern.six.moves.configparser import ConfigParser
 
 
 class Distribution_parse_config_files:
@@ -13,10 +15,9 @@ class Distribution_parse_config_files:
     as implemented in distutils.
     """
     def parse_config_files(self, filenames=None):
-        from configparser import ConfigParser
 
         # Ignore install directory options if we have a venv
-        if sys.prefix != sys.base_prefix:
+        if six.PY3 and sys.prefix != sys.base_prefix:
             ignore_options = [
                 'install-base', 'install-platbase', 'install-lib',
                 'install-platlib', 'install-purelib', 'install-headers',
@@ -33,7 +34,7 @@ class Distribution_parse_config_files:
         if DEBUG:
             self.announce("Distribution.parse_config_files():")
 
-        parser = ConfigParser(interpolation=None)
+        parser = ConfigParser()
         for filename in filenames:
             if DEBUG:
                 self.announce("  reading %s" % filename)
@@ -44,13 +45,13 @@ class Distribution_parse_config_files:
 
                 for opt in options:
                     if opt != '__name__' and opt not in ignore_options:
-                        val = parser.get(section,opt)
+                        val = parser.get(section,opt,raw=True)
                         opt = opt.replace('-', '_')
                         opt_dict[opt] = (filename, val)
 
             # Make the ConfigParser forget everything (so we retain
             # the original filenames that options come from)
-            parser.__init__(interpolation=None)
+            parser.__init__()
 
         # If there was a "global" section in the config file, use it
         # to set Distribution options.
@@ -69,12 +70,6 @@ class Distribution_parse_config_files:
                     raise DistutilsOptionError(msg)
 
 
-if sys.version_info < (3,):
-    # Python 2 behavior is sufficient
-    class Distribution_parse_config_files:
-        pass
-
-
 if False:
     # When updated behavior is available upstream,
     # disable override here.
@jaraco

This comment has been minimized.

Copy link
Member

jaraco commented Jul 27, 2017

@benoit-pierre I've published a call for help in distutils-sig. Given your passion and involvement, I want to acknowledge that you may be interested in assisting in this effort. If so, please comment in one of the tickets (#889 or the sister distutils ticket).

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Oct 25, 2017

improve encoding handling for `setup.cfg`
Support the same mechanism as for Python sources for declaring
the encoding to be used when reading `setup.cfg` (see PEP 263),
and return the results of reading it as Unicode.

Fix pypa#1062 and pypa#1136.

@benoit-pierre benoit-pierre referenced a pull request that will close this issue Oct 25, 2017

Open

improve encoding handling for `setup.cfg` #1180

benoit-pierre added a commit to benoit-pierre/setuptools that referenced this issue Oct 25, 2017

improve encoding handling for `setup.cfg`
Support the same mechanism as for Python sources for declaring
the encoding to be used when reading `setup.cfg` (see PEP 263),
and return the results of reading it as Unicode.

Fix pypa#1062 and pypa#1136.
@uchuugaka

This comment has been minimized.

Copy link

uchuugaka commented Feb 11, 2018

Hi there, does this issue still persist today for 2.7.x ?

openstack-gerrit pushed a commit to openstack/requirements that referenced this issue Jun 22, 2018

Rollback dulwich to 0.19.2
We need to have 0.19.2 until 0.19.4 is
released, the error problem is explained in [1].
Avoid unicode characters (e.g. the digraph ij in my surname) in setup.cfg,
+    since setuptools doesn't deal well with them. See
+    pypa/setuptools#1062. (Jelmer Vernooij, #637)

dulwich 0.18.3 breaks in instack-undercloud with:

Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-ph6c_nsn/dulwich/setup.py", line 96, in <module>
        **setup_kwargs
      File "/repos/instack-undercloud/.tox/py36/lib/python3.6/site-packages/setuptools/__init__.py", line 128, in setup
        _install_setup_requires(attrs)
      File "/repos/instack-undercloud/.tox/py36/lib/python3.6/site-packages/setuptools/__init__.py", line 121, in _install_setup_requires
        dist.parse_config_files(ignore_option_errors=True)
      File "/repos/instack-undercloud/.tox/py36/lib/python3.6/site-packages/setuptools/dist.py", line 492, in parse_config_files
        _Distribution.parse_config_files(self, filenames=filenames)
      File "/usr/lib64/python3.6/distutils/dist.py", line 395, in parse_config_files
        parser.read(filename)
      File "/usr/lib64/python3.6/configparser.py", line 697, in read
        self._read(fp, filename)
      File "/usr/lib64/python3.6/configparser.py", line 1015, in _read
        for lineno, line in enumerate(fp, start=1):
      File "/repos/instack-undercloud/.tox/py36/lib64/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 48: ordinal not in range(128)

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-ph6c_nsn/dulwich/

[1]: dulwich/dulwich@1b6509f

Closes-Bug: 1778004
Change-Id: Icabb56b9c6b2d99d78a7e1a77f8fe7a4ea6fa0c8

m-flak pushed a commit to m-flak/gajim that referenced this issue Nov 3, 2018

renefritze added a commit to dune-community/dune-xt-common that referenced this issue Nov 8, 2018

renefritze added a commit to dune-community/dune-xt-common that referenced this issue Nov 20, 2018

mlaferrera added a commit to PUNCH-Cyber/stoq that referenced this issue Jan 8, 2019

Remove UTF8 characters from README
UTF8 characters in README causes issues in some instances
(pypa/setuptools#1062)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment