Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows #86619

Closed
HaujetZhao mannequin opened this issue Nov 24, 2020 · 3 comments
Labels
3.9 only security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@HaujetZhao
Copy link
Mannequin

HaujetZhao mannequin commented Nov 24, 2020

BPO 42453
Nosy @pfmoore, @vstinner, @tjguk, @zware, @zooba, @HaujetZhao

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-11-24.16:27:44.580>
created_at = <Date 2020-11-24.16:26:27.424>
labels = ['OS-windows', 'type-crash', '3.9']
title = 'utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows'
updated_at = <Date 2020-11-24.16:40:57.641>
user = 'https://github.com/HaujetZhao'

bugs.python.org fields:

activity = <Date 2020-11-24.16:40:57.641>
actor = 'HaujetZhao'
assignee = 'none'
closed = True
closed_date = <Date 2020-11-24.16:27:44.580>
closer = 'vstinner'
components = ['Windows']
creation = <Date 2020-11-24.16:26:27.424>
creator = 'HaujetZhao'
dependencies = []
files = []
hgrepos = []
issue_num = 42453
keywords = []
message_count = 3.0
messages = ['381753', '381754', '381756']
nosy_count = 6.0
nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'zach.ware', 'steve.dower', 'HaujetZhao']
pr_nums = []
priority = 'normal'
resolution = 'third party'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue42453'
versions = ['Python 3.9']

@HaujetZhao
Copy link
Mannequin Author

HaujetZhao mannequin commented Nov 24, 2020

When using pip install package_name installing a package, it will generate a installed-files.txt file, which records the file that the package contains.

When updating or uninstalling the package, pip will need to read the installed-files.txt file, then delete the old files.

If the package installed contains files whose name has unicode character like 文件, the problem will occur.

In China (I don't know other places), for historical reasons, the Windows default system codec is gbk, so the installed-files.txt file is also written with gbk codec when installing a package.

When it comes to updating or uninstalling, the pip will use utf-8 codec to read the installed-files.txt file. Since the file contains non ascii characters, it went error:

  File "d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 1424, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 343: invalid start byte in installed-files.txt file at path: d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\Markdown_Toolbox-0.0.8-py3.9.egg-info\installed-files.txt

I hate that default gbk system codec, but this set is fixed on Windows.

So, my suggestion is, make a try except at the error point, if the utf-8 codec went wrong reading installed-files.txt, then let gbk codec have a go.

Or, more foundamental solution is, when pip writing text files, strictly use utf-8 codec instead of the default system codec.

@HaujetZhao HaujetZhao mannequin added 3.9 only security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 24, 2020
@vstinner
Copy link
Member

Please report the issue to https://github.com/pypa/pip

pip is not part of Python stdlib.

@HaujetZhao
Copy link
Mannequin Author

HaujetZhao mannequin commented Nov 24, 2020

got it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

1 participant