-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 548: illegal multibyte sequence #12771
Comments
Are you able to share the contents of the |
@matthewhughes934
|
@matthewhughes934 |
It’s probably best to always use A PR would be much welcomed. |
I guess the underlying issue was: the file looks to be UTF-8 encoded but you're working in an environment that uses a simplified Chinese locale, and so uses GBK for decoding. I guess an alternative solution would be to run Python in UTF-8 mode (https://docs.python.org/3/using/windows.html#utf-8-mode) |
👍 happy to get a PR up. I'm wondering two things:
|
Unfortunately, requirements aren't the only things in a requirement file. However, the documentation states that requirement files should be UTF-8 by default, so this seems like a simple bug in Of course, even though this is technically a bug fix, it is still a breaking change, potentially, so we need to consider how we handle that. (We could fall back to the system encoding if UTF8 fails, with a deprecation warning - this won't avoid mojibake, but it will catch outright encoding failures). |
Ah, right, I forgot about paths. Falling back with a deprecation warning sounds like the way to go. |
For the case where: * a requirements file is encoded as UTF-8, and * some bytes in the file are incompatible with the system locale In this case, fallback to decoding as UTF-8 as a last resort (rather than crashing on the `UnicodeDecodeError`). This behaviour was added when parsing the request file, rather than in `auto_decode` as it didn't seem to belong in a generic util (though that util looks to only be ever called when parsing requirements files anyway). Perhaps we should just go straight to UTF-8 without querying the system locale (unless there is a PEP-263 style comment), per the docs[1]: > Requirements files are utf-8 encoding by default But to avoid a breaking change just warn if decoding with this locale fails then fallback to UTF-8 [1] https://pip.pypa.io/en/stable/reference/requirements-file-format/#encoding Fixes: pypa#12771
Description
When the unit attempted to install dependencies using the
pip install -r requirements.txt
command, an errorUnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 548: illegal multibyte sequence
occurred. The error log is as follows:error trace:
Expected behavior
Properly install dependencies.
pip version
24.0
Python version
3.10.14
OS
window10
How to Reproduce
When the unit attempted to install dependencies using the
pip install -r requirements.txt
command, an errorUnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 548: illegal multibyte sequence
occurred. The error log is as follows:error trace:
Output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: