Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP doesn't read setup.cfg in UTF-8, which causes UnicodeDecodeError #8931

Closed
fireattack opened this issue Sep 28, 2020 · 23 comments · Fixed by #9684
Closed

PIP doesn't read setup.cfg in UTF-8, which causes UnicodeDecodeError #8931

fireattack opened this issue Sep 28, 2020 · 23 comments · Fixed by #9684
Labels
type: enhancement Improvements to functionality

Comments

@fireattack
Copy link

fireattack commented Sep 28, 2020

Environment

  • pip version: 20.2.3
  • Python version: 3.8.5
  • OS: Win7x64

Description

When installing anything within a path that has a setup.cfg, pip will read it but fails at decoding if the file contains non-ASCII characters. The cfg file is proper UTF-8 but PIP doesn't open it as so. It tries to use "GBK" (my locale) and fails.

Related, but not the same issue: #8717 (this is caused by pyvenv.cfg)

Expected behavior

It should read setup.cfg in UTF-8.

How to Reproduce

  1. open a path.
  2. Create a setup.cfg file with
[metadata]
name = test
version = 0.0.1
packages =
    test
description = '計算ツール'
  1. Run pip install requests

Output

D:\test>pip install requests
ERROR: Exception:
Traceback (most recent call last):
  File "c:\program files\python3\lib\site-packages\pip\_internal\cli\base_command.py", line 228, in _main
    status = self.run(options, args)
  File "c:\program files\python3\lib\site-packages\pip\_internal\cli\req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 245, in run
    options.use_user_site = decide_user_install(
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 664, in decide_user_install
    if site_packages_writable(root=root_path, isolated=isolated_mode):
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 609, in site_packages_writab
le
    get_lib_location_guesses(root=root, isolated=isolated))
  File "c:\program files\python3\lib\site-packages\pip\_internal\commands\install.py", line 600, in get_lib_location_gue
sses
    scheme = distutils_scheme('', user=user, home=home, root=root,
  File "c:\program files\python3\lib\site-packages\pip\_internal\locations.py", line 109, in distutils_scheme
    d.parse_config_files()
  File "c:\program files\python3\lib\distutils\dist.py", line 406, in parse_config_files
    parser.read(filename)
  File "c:\program files\python3\lib\configparser.py", line 697, in read
    self._read(fp, filename)
  File "c:\program files\python3\lib\configparser.py", line 1017, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 88: illegal multibyte sequence
@fireattack fireattack changed the title PIP doesn't read setup.cfg using UTF-8 PIP doesn't read setup.cfg in UTF-8, which causes UnicodeDecodeError Sep 28, 2020
@uranusjr
Copy link
Member

distutils is a built-in module, you’ll need to file the issue against CPython instead. https://bugs.python.org/

@uranusjr uranusjr added the resolution: wrong project Should be reported elsewhere label Sep 28, 2020
@fireattack
Copy link
Author

Thanks for the quick reply.

Interestingly, setuptools itself doesn't fail with such cfg file. Are they doing something different?
I was also wondering why we read setup.cfg to begin with for a global installation.

@fireattack
Copy link
Author

fireattack commented Sep 28, 2020

OK, it looks to me setuptools work arounds this issue on their own (pypa/setuptools#1180), is there any chance we do the same? I feel like distutils may have to be that way for backward compatibility reasons.

I knew this probably is too much to ask, so feel free to do what you think is the best.

@uranusjr
Copy link
Member

Man, they basically re-implemented the whole thing. I’m not sure pip should be responsible to maintaining this implementation.

Maybe we should just catch the parsing error and carry on pretending the file does not exist instead. pip does not actually need any of the data you mentioned above; it just reads distutils configuration. The usage is honestly quite niche, and I doubt many would even notice the difference.

@uranusjr uranusjr added type: enhancement Improvements to functionality and removed resolution: wrong project Should be reported elsewhere labels Sep 28, 2020
@fireattack
Copy link
Author

fireattack commented Sep 28, 2020

Oh yeah, totally agreed with you. If we don't need anything ciritial from cfg file (or, at least, in case it isn't, like global installation shown here), it's reasonable to just ignore the error.

@jaraco
Copy link
Member

jaraco commented Oct 9, 2020

Today this issue bit me too in jaraco/configparser#58.

@uranusjr
Copy link
Member

uranusjr commented Oct 9, 2020

Thinking about this more, this needs to be reimplemented eventually when pip migrates off distutils anyway, so we may as well do it right now. @jaraco Would it be a good idea if we extract setuptools’s config-parsing implementation into a standalone library that pip can vendor?

@jaraco
Copy link
Member

jaraco commented Oct 9, 2020

In the case of distutils_scheme, there's currently an effort underway to deprecate distutils in which the distutils-schemes are being ported to the sysconfig module.

It's hard for me to say, but I'm a little reluctant to say that we should break out just the "parse config files" behavior into a separate library. On the other hand, that would eliminate one aspect of the dependency on distutils and limit the divergence from setuptools, so perhaps that makes sense.

@pradyunsg
Copy link
Member

Is this still a thing in pip 20.3?

@pradyunsg pradyunsg added the S: awaiting response Waiting for a response/more information label Dec 1, 2020
@no-response no-response bot removed the S: awaiting response Waiting for a response/more information label Dec 1, 2020
@fireattack
Copy link
Author

fireattack commented Dec 1, 2020

Just tested, still broken here.

(v) D:\temp>python -m pip install pip -U
Collecting pip
  Using cached pip-20.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.1.1
    Uninstalling pip-20.1.1:
      Successfully uninstalled pip-20.1.1
Successfully installed pip-20.3

(v) D:\temp>pip install requests
Collecting requests
  Using cached requests-2.25.0-py2.py3-none-any.whl (61 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2020.11.8-py2.py3-none-any.whl (155 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.2-py2.py3-none-any.whl (136 kB)
Installing collected packages: urllib3, idna, chardet, certifi, requests
ERROR: Exception:
Traceback (most recent call last):
  File "d:\temp\v\lib\site-packages\pip\_internal\cli\base_command.py", line 210, in _main
    status = self.run(options, args)
  File "d:\temp\v\lib\site-packages\pip\_internal\cli\req_command.py", line 180, in wrapper
    return func(self, options, args)
  File "d:\temp\v\lib\site-packages\pip\_internal\commands\install.py", line 392, in run
    installed = install_given_reqs(
  File "d:\temp\v\lib\site-packages\pip\_internal\req\__init__.py", line 82, in install_given_reqs
    requirement.install(
  File "d:\temp\v\lib\site-packages\pip\_internal\req\req_install.py", line 781, in install
    scheme = get_scheme(
  File "d:\temp\v\lib\site-packages\pip\_internal\locations.py", line 184, in get_scheme
    scheme = distutils_scheme(
  File "d:\temp\v\lib\site-packages\pip\_internal\locations.py", line 108, in distutils_scheme
    d.parse_config_files()
  File "C:\Program Files\Python3\lib\distutils\dist.py", line 406, in parse_config_files
    parser.read(filename)
  File "C:\Program Files\Python3\lib\configparser.py", line 697, in read
    self._read(fp, filename)
  File "C:\Program Files\Python3\lib\configparser.py", line 1017, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 93: illegal multibyte sequence

(v) D:\temp>

(Having the same setup.cfg above in cwd)

@henryiii
Copy link
Contributor

henryiii commented Feb 26, 2021

Since distutils is deprecated and slated for removal in 3.12, maybe this is a good idea now?

@uranusjr
Copy link
Member

Yeah. The code path will eventually be removed altogether and replaced by an equivalent based on sysconfig. The migration has started in #9626, but the road ahead is still long.

@uranusjr
Copy link
Member

Also the new implementation does not currently contain the cfg parsing logic at all, and if no-one complains maybe we can just kill that feature entirely.

@henryiii
Copy link
Contributor

What was it used for?

@uranusjr
Copy link
Member

It can be used to control certain distutils (and by extension setuptools) behaviours, like what directory to store intermediate objects during build, compiler options, etc. Very low-level, implementation-dependant stuffs, and some even in direct conflict with other pip options, so hopefully nobody is using them, but I’ve seen pip users use worse “features”.

@jaraco
Copy link
Member

jaraco commented Feb 28, 2021

Why not a quick and dirty fix to suppress the error when it occurs and while at it, deprecate the code path, so if it is parsed and it contains settings that affect pip, raise warnings that it's going away, thereby detecting users who might be relying on it?

@jaraco
Copy link
Member

jaraco commented Feb 28, 2021

I'm also okay with just removing the behavior and seeing who cries as long as there's a rapid response to bring the behavior back or an escape hatch to re-enable the behavior.

@uranusjr
Copy link
Member

uranusjr commented Mar 1, 2021

The logic is buried deep in distutils and triggered when pip calls distutils to know where to install stuff into, distutils does not offer a way to disable the behaviour. So pip can’t just suppress the error since it needs those values to function, and the only way out of this (from what I can tell) is to remove reliance on those distutils values, which is what #9626 is going for.

@layday
Copy link
Member

layday commented Mar 1, 2021

I was under the impression that it was this line that was raising:

d.parse_config_files()

Would it not be possible to wrap it in a try-except?

@uranusjr
Copy link
Member

uranusjr commented Mar 1, 2021

We can, but the location behaviour would change in subtle ways when that fails 😟

But now that I think of it, maybe that’d still be a good enough stopgap before we purge distutils? Nothing would change if the parsing succeeds (which is most of the time), and the behaviour will “mostly work” when it fails, which might be good enough.

@layday
Copy link
Member

layday commented Mar 1, 2021

What does "mostly work" mean? If I have, say:

[install]
install_scripts = foo

in my setup.cfg and the parsing fails then the scripts would not be where I expect. (Although this is already rather fragile because pip does not respect distutils configuration when uninstalling packages.)

@uranusjr
Copy link
Member

uranusjr commented Mar 1, 2021

That would not fail though. The parsing would only fail if the file contains content not decodable with the platform encoding.

distutils also reads multiple config locations, including system-wide distutils.cfg, and for values like home, prefix, etc. which are honoured on uninstallation IIRC. Those are likely the more “popular” existing usages (although likely still not very wide-spread at this point, at least I hope not).

@layday
Copy link
Member

layday commented Mar 1, 2021

That would not fail though. The parsing would only fail if the file contains content not decodable with the platform encoding.

Yeah, I understand that.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants