Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source and editable installs fail in some locales #1747

Closed
EliahKagan opened this issue Nov 28, 2023 · 0 comments · Fixed by #1748
Closed

Source and editable installs fail in some locales #1747

EliahKagan opened this issue Nov 28, 2023 · 0 comments · Fixed by #1748

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Nov 28, 2023

Methods of installing GitPython that run code from setup.py fail in some locales. This does not affect installing from a wheel, but it does affect installing from an sdist, or installing from a local directory, including the editable install procedure recommended for development in the readme. The same problem happens when building GitPython. Building or installing using the old method of running setup.py directly is also affected. The error is of this form, though the codec will not always be gbk:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 473: illegal multibyte sequence

I am unsure if this ever happens in practice on Unix-like systems, whose locales are usually UTF-8. However, it happens on Windows systems in which README.md cannot be decoded using the system's active ANSI code page. This is a rarely-changed systemwide setting, so changing the user preferred languages, display language, or input method are not workarounds. I discovered this on a Simplified Chinese (zh-CN) build of Windows Server 2022 while using it to test some WSL-related test helper logic in #1745. Such a system uses ANSI code page 936. README.md is UTF-8, but it currently happens that it can be decoded with code page 1252, which Windows builds for Western European languages use as their ANSI code page. I expect encodings other than cp936 to fail as well.

With the PyPI sdist for GitPython 3.1.40 (on Python 3.12 x86-64, though I expect all supported versions to be affected):

(.venv) C:\Users\Administrator\gptest> pip install --no-binary GitPython GitPython
Collecting GitPython
  Using cached GitPython-3.1.40.tar.gz (200 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "C:\Users\Administrator\gptest\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "C:\Users\Administrator\gptest\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\Administrator\gptest\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-8j1uxmuv\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-8j1uxmuv\overlay\Lib\site-packages\setuptools\build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-8j1uxmuv\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 20, in <module>
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 473: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

A workaround is to pass -X utf8 to python. For example, python -X utf8 -m pip install ... can be used for installation.

The fix should be straightforward. Importing setup.py confirms that the specific cause is reading README.md:

(.venv) C:\Users\Administrator\repos\GitPython [main ≡]> python -c 'import setup'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Administrator\repos\GitPython\setup.py", line 20, in <module>
    long_description = rm_file.read()
                       ^^^^^^^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 477: illegal multibyte sequence

It can be fixed by passing encoding="utf-8". I have proposed this change in #1748.

gitdb and smmap are unaffected, because gitdb does not open files to read those data in setup.py, while smmap does open README.md but passes encoding="utf-8".

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Nov 28, 2023
This passes `encoding="utf-8"` to the `open` calls in setup.py, so
the readme, version, and requirements files are always read as
UTF-8, even on systems whose locale is not UTF-8, as is typically
the case on many Windows systems for non-European languages.

The specific problem was caused by the README.md file. The
requirements files are less likely to contain characters not in the
ASCII subset, though they could come to contain them, at least in
comments. The VERSION file is even less likely to ever contain
such characters. Nonetheless, for consistency, because it is a best
practice, and because it appears to be the intent of the existing
code, encoding="utf=8" is added for opening all of them.

This change is tested on a system whose locale uses Windows code
page 936. Editable installation, as well as the other affected ways
of installing (and building) described in gitpython-developers#1747, are now working.
(Installing from a pre-built wheel was never affected.)
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Nov 29, 2023
This passes `encoding="utf-8"` to the `open` calls in setup.py, so
the readme, version, and requirements files are always read as
UTF-8, even on systems whose locale is not UTF-8, as is typically
the case on many Windows systems for non-European languages.

The specific problem was caused by the README.md file. The
requirements files are less likely to contain characters not in the
ASCII subset, though they could come to contain them, at least in
comments. The VERSION file is even less likely to ever contain
such characters. Nonetheless, for consistency, because it is a best
practice, and because it appears to be the intent of the existing
code, encoding="utf=8" is added for opening all of them.

This change is tested on a system whose locale uses Windows code
page 936. Editable installation, as well as the other affected ways
of installing (and building) described in gitpython-developers#1747, are now working.
(Installing from a pre-built wheel was never affected.)
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Nov 29, 2023
This passes `encoding="utf-8"` to the `open` calls in setup.py, so
the readme, version, and requirements files are always read as
UTF-8, even on systems whose locale is not UTF-8. This fixes the
bug described in gitpython-developers#1747 where installation other than from a
pre-built wheel would fail on many Windows systems using
non-European languages.

The specific problem occurred with the README.md file. The
requirements files are less likely to contain characters not in the
ASCII subset, though maybe they could come to contain them, perhaps
in comments. The VERSION file is even less likely to ever contain
such characters. Nonetheless, for consistency, because it is a best
practice, and because it appears to be the intent of the existing
code, encoding="utf=8" is added for opening all of them.

This change is tested on a system whose locale uses Windows code
page 936. Editable installation, as well as the other affected ways
of installing (and building) described in gitpython-developers#1747, are now working.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant