Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup.cfg should standardize on UTF-8 for encoding #1702

Closed
jaraco opened this issue Feb 23, 2019 · 10 comments
Closed

setup.cfg should standardize on UTF-8 for encoding #1702

jaraco opened this issue Feb 23, 2019 · 10 comments
Assignees

Comments

@jaraco
Copy link
Member

jaraco commented Feb 23, 2019

In jaraco/configparser#34, I learned that although setuptools v40.7.0 presumably added support for non-ASCII, there are still environments where loading non-ASCII is failing.

configparser # easy_install --version
setuptools 40.8.0 from c:\python37\lib\site-packages (Python 3.7)
configparser 3.7.2 # python setup.py egg_info
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    package_dir={'': 'src'},  
  File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 144, in setup
    _install_setup_requires(attrs)  
  File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 137, in _install_setup_requires
    dist.parse_config_files(ignore_option_errors=True)
  File "C:\Python37\lib\site-packages\setuptools\dist.py", line 702, in parse_config_files
    self._parse_config_files(filenames=filenames)
  File "C:\Python37\lib\site-packages\setuptools\dist.py", line 599, in _parse_config_files
    (parser.read_file if six.PY3 else parser.readfp)(reader)
  File "C:\Python37\lib\configparser.py", line 717, in read_file
    self._read(f, source)
  File "C:\Python37\lib\configparser.py", line 1014, in _read
    for lineno, line in enumerate(fp, start=1):
  File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 103: character maps to <undefined>
@jaraco
Copy link
Member Author

jaraco commented Feb 23, 2019

Hmm. On further investigation, I discovered that encoding detection is employed, meaning that adding # coding: utf-8 to the file corrected the issue.

@jaraco
Copy link
Member Author

jaraco commented Feb 23, 2019

In this comment, the user suggests the default encoding should be UTF-8. I agree and in fact I'd like to work toward the only encoding to be UTF-8 (drop support for encoding detection).

@jaraco jaraco changed the title UnicodeDecodeError on Windows with non-ascii in setup.cfg setup.cfg should standardize on UTF-8 for encoding Feb 23, 2019
@jaraco
Copy link
Member Author

jaraco commented Feb 23, 2019

I see that the encoding detection was added in #1180. @benoit-pierre, do you recall why you chose to have the encoding declared rather than demanding and relying on UTF-8?

@benoit-pierre
Copy link
Member

Backward compatibility: to no break some existing workflows (using an encoding other than UTF-8 with a corresponding locale).

@jayvdb
Copy link
Contributor

jayvdb commented Feb 25, 2019

My suggestion was to at least reduce the prevalence of this problem, but detecting intentionally non-user locales like "POSIX" or "C", and upgrade to UTF-8 for those.

Requiring people to add a coding declaration to non-coding files in every PyPI package containing a non-ascii property, such as author name, is horribly unnecessary churn.

If a user has a specific locale set, that can be a problem solved another day. But again, if the setup.cfg cant be decoded with the user locale, the fallback should be to attempt to decode it with the best guess of the authors locale - utf-8 , especially for any package which has been downloaded from pypi which strongly implies the users locale is irrelevant as the file is not written by the user, but by an author on the other side of the world.

@jayvdb
Copy link
Contributor

jayvdb commented Feb 26, 2019

Also related, I believe pytest was reading it was ascii, but now defaults to utf-8. It is causing breakages in pytest 3.3.2, at least, but not in current pytest.

@jaraco
Copy link
Member Author

jaraco commented Mar 9, 2019

I am strongly inclined to assume UTF-8 (which also supports ASCII). I'm also inclined to remove support for the coding unless there's a strong use-case for it.

@jaraco
Copy link
Member Author

jaraco commented Mar 9, 2019

@jayvdb Would you be willing to put together a PR?

@stanislavlevin
Copy link

stanislavlevin commented Mar 28, 2019

Hi @jaraco , I have a related issue.
setup.cfg contains Unicode symbols and set explicit UTF-8 encoding:

"# -*- coding: utf-8 -*-                                                         
[metadata]
...

When I run tox to test itself under Python2:

Processing ./.tox/.tmp/package/2/tox-3.8.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/src/tmp/pip-req-build-jUmuTe/setup.py", line 18, in <module>
        package_dir={"": "src"},
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 144, in setup
        _install_setup_requires(attrs)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in _install_setup_requires
        dist.parse_config_files(ignore_option_errors=True)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 702, in parse_config_files
        self._parse_config_files(filenames=filenames)
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 599, in _parse_config_files
        (parser.read_file if six.PY3 else parser.readfp)(reader)
      File "/usr/lib64/python2.7/ConfigParser.py", line 324, in readfp
        self._read(fp, filename)
      File "/usr/lib64/python2.7/ConfigParser.py", line 479, in _read
        line = fp.readline()
      File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib64/python2.7/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/src/tmp/pip-req-build-jUmuTe/

This is because "edit_config" doesn't pass down the original encoding:

(Pdb) bt
  /usr/src/RPM/BUILD/python-module-tox-3.8.0/setup.py(18)<module>()
-> package_dir={"": "src"},
  /usr/lib/python2.7/site-packages/setuptools/__init__.py(145)setup()
-> return distutils.core.setup(**attrs)
  /usr/lib64/python2.7/distutils/core.py(151)setup()
-> dist.run_commands()
  /usr/lib64/python2.7/distutils/dist.py(953)run_commands()
-> self.run_command(cmd)
  /usr/lib64/python2.7/distutils/dist.py(972)run_command()
-> cmd_obj.run()
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(54)run()
-> self.make_distribution()
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(78)make_distribution()
-> orig.sdist.make_distribution(self)
  /usr/lib64/python2.7/distutils/command/sdist.py(456)make_distribution()
-> self.make_release_tree(base_dir, self.filelist.files)
  /usr/lib/python2.7/site-packages/setuptools/command/sdist.py(168)make_release_tree()
-> self.get_finalized_command('egg_info').save_version_info(dest)
  /usr/lib/python2.7/site-packages/setuptools/command/egg_info.py(191)save_version_info()
-> edit_config(filename, dict(egg_info=egg_info))
> /usr/lib/python2.7/site-packages/setuptools/command/setopt.py(74)edit_config()
-> opts.write(f)
(Pdb)

The output is something like:

[metadata]
name = tox
locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

@jaraco
Copy link
Member Author

jaraco commented Apr 5, 2019

Given that setopt and its edit_config function need to write to the config file, I'm even more strongly inclined now to remove support for specifying an encoding in setup.cfg files and instead insist on UTF-8, especially since commands like bdist_rpm invoke egg_info which in turn rewrites the config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants