pip install: UnicodeDecodeError on Windows #1291

sscherfke · 2013-11-04T12:18:33Z

pip install <package> fails on Windows, if the projects description (e.g, its long description) is in utf-8.

(simpy) C:\Users\sscherfke\Code\simpy>pip install .
Unpacking c:\users\sscherfke\code\simpy
  Running setup.py egg_info for package from file:///c%7C%5Cusers%5Csscherfke%5Ccode%5Csimpy

Cleaning up...
Exception:
Traceback (most recent call last):
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\basecommand.py", line 134, in main
    status = self.run(options, args)
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\commands\install.py", line 236, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 1134, in prepare_files
    req_to_install.run_egg_info()
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 264, in run_egg_info
    "%(Name)s==%(Version)s" % self.pkg_info())
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 357, in pkg_info
    data = self.egg_info_data('PKG-INFO')
  File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 297, in egg_info_data
    data = fp.read()
  File "C:\Users\sscherfke\Envs\simpy\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1235: character maps to <undefined>

Storing complete log in C:\Users\sscherfke\pip\pip.log

The problem seems to be, that req.egg_info_data() (currently line 317 reads the egg-info created by python setup.py egg_info with the system's default encoding, which is not utf-8 on Windows (but on most *nix systems).

With Python 3, it should be no problem if you use utf-8 in your README/CHANGES/AUTHORS.txt (or whatever), so pip should read files as unicode by default:

Changing lines 296 and 297 (in pip 1.4.1; 316 and 317 in the repo) to

fp = open(filename, 'rb')
data = fp.read().decode('utf-8')

fixes the problem for me.

The test setups was:

Windows 7 64bit
Python 3.3.1
pip 1.4.1
setuptools 0.9.8

The text was updated successfully, but these errors were encountered:

thedrow · 2013-11-04T12:56:51Z

Can you create a pull request with a test case to ensure this bug won't regress?

sscherfke · 2013-11-13T16:49:47Z

@thedrow Can the PR be merged or is there something missing?

pfmoore · 2013-11-13T16:55:46Z

With Python 3, it should be no problem if you use utf-8 in your README/CHANGES/AUTHORS.txt (or whatever), so pip should read files as unicode by default:

What if those files are not in UTF-8? As far as I know, there's no requirement that they have to be.

sscherfke · 2013-11-13T16:58:55Z

The probability that they are is imho much higher then that they are in the windows default encoding, because utf-8 is the default on mac/linux and also in Python 3. You could also wrap it with a try/except block and use the default encoding if utf-8 should fail.

thedrow · 2013-11-13T17:00:58Z

@sscherfke First of all, add tests to the pull request to ensure this bug won't regress.
And I was about to write that you should wrap it with a try..catch that uses the default encoding if it cannot decode it to UTF8 but you are ahead of me :)

sscherfke · 2013-11-13T17:02:10Z

@thedrow try default, fall back to utf-8 or orther way around?

thedrow · 2013-11-13T17:07:33Z

It depends on which one is the most common use case. Ask the core developers.

pfmoore · 2013-11-13T18:37:20Z

The point is that the encoding of such files is not defined. So you can't reliably say that any encoding will be correct (some, like latin1, will never give encoding errors, but that doesn't mean they are necessarily right).

Neither the egg-info nor the metadata specs make any mention of encodings, which is unfortunate, but reflects the fact that they were written when Python tended to assume that only ASCII would be used for anything that needed interoperability. Hence this bug.

The correct fix (Metadata 2.0) is of course to clearly specify encodings. But that's a way off yet, and in the meantime we have to be prepared to accept arbitrary data.

My view is that we should

Make sure that we avoid exceptions caused by incompatible encoding assumptions.
Avoid mojibake-related errors caused by guessing the wrong encoding as much as we can.
Try to avoid damaging the data as a last resort.

Whether we use the platform encoding or UTF-8 doesn't really affect (1). In both cases, there is the possibility of invalid data. To address (1) we need to progressively fall back through a series of encodings, finishing with latin-1 (as that is the commonly used encoding that accepts all 256 byte values, and so will never error).

Using UTF-8 addresses (2), as UTF-8 is probably the most common encoding we will see (due to its prevalence on Unix systems).

For (3) it's really about how the data is used, and I think that's out of scope for this patch.

So if you add exception handling and a fallback - I'd suggest the platform default and then latin-1 in that order if UTF-8 fails - I think that would be a good solution.

Whether UTF-8 or platform default is the best choice for the initial attempt is something I doubt anyone can tell you. I suspect that a relatively small number of projects go outside ASCII anyway. For those that do, if you're on Unix UTF-8 is the platform default so it makes no difference. On Windows, it boils down to which is the most important case - installing stuff developed by other Windows developers, or installing stuff developed by Unix developers. Honestly, that's going to be an almost totally arbitrary decision.

thedrow · 2013-11-13T22:08:17Z

I think @pfmoore has a point. I completely agree.

sscherfke · 2013-11-15T22:16:41Z

    try:
        # Try utf-8
        with open(filename, 'rb') as fp:
            data = fp.read().decode('utf-8')
    except UnicodeDecodeError:
        try:
            # Try the system’s default encoding
            with open(filename, 'r') as fp:
                data = fp.read()
        except UnicodeDecodeError:
            # Our last resort is latin1 which never throws an error
            # (but returns nonsense instead :-))
            with open(filename, 'rb') as fp:
                data = fp.read().decode('latin1')
    return data

This surely doesn’t look very friendly but shouldn’t raise any UnicodeDecodeError (as far as I’ve tested it).

What would be the preferred way for a pip testcase? To use actual files with varying encodings or to mock open() and pass varying bytes instead?

sscherfke · 2013-11-21T10:18:51Z

Created a new pull request #1331 which fixes the issue and looks a bit nicer then the snippet I posted above. :) Also added a new unit test.

while0pass · 2014-03-19T10:43:10Z

C:\> pip --version
pip 1.5.4 from C:\Python27\lib\site-packages (Python 2.7)

C:\>pip install ipython
Downloading/unpacking ipython
Cleaning up...
Exception:
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\pip\basecommand.py", line 122, in main
    status = self.run(options, args)
  File "C:\Python27\lib\site-packages\pip\commands\install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bun
dle)
  File "C:\Python27\lib\site-packages\pip\req.py", line 1229, in prepare_files
    req_to_install.run_egg_info()
  File "C:\Python27\lib\site-packages\pip\req.py", line 292, in run_egg_info
    logger.notify('Running setup.py (path:%s) egg_info for package %s' % (self.setup_py, s
elf.name))
  File "C:\Python27\lib\site-packages\pip\req.py", line 265, in setup_py
    import setuptools
  File "C:\Python27\lib\site-packages\setuptools\__init__.py", line 12, in <module>
    from setuptools.extension import Extension
  File "C:\Python27\lib\site-packages\setuptools\extension.py", line 7, in <module>
    from setuptools.dist import _get_unpatched
  File "C:\Python27\lib\site-packages\setuptools\dist.py", line 15, in <module>
    from setuptools.compat import numeric_types, basestring
  File "C:\Python27\lib\site-packages\setuptools\compat.py", line 19, in <module>
    from SimpleHTTPServer import SimpleHTTPRequestHandler
  File "C:\Python27\lib\SimpleHTTPServer.py", line 27, in <module>
    class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
  File "C:\Python27\lib\SimpleHTTPServer.py", line 208, in SimpleHTTPRequestHandler
    mimetypes.init() # try to read system mime.types
  File "C:\Python27\lib\mimetypes.py", line 358, in init
    db.read_windows_registry()
  File "C:\Python27\lib\mimetypes.py", line 258, in read_windows_registry
    for subkeyname in enum_types(hkcr):
  File "C:\Python27\lib\mimetypes.py", line 249, in enum_types
    ctype = ctype.encode(default_encoding) # omit in 3.x!
UnicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 9: ordinal not in ran
ge(128)

Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
  File "C:\Python27\lib\site-packages\pip\__init__.py", line 185, in main
    return command.main(cmd_args)
  File "C:\Python27\lib\site-packages\pip\basecommand.py", line 161, in main
    text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 68: ordinal not in ra
nge(128)

C:\>pip --help
Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
  File "C:\Python27\lib\site-packages\pip\__init__.py", line 177, in main
    cmd_name, cmd_args = parseopts(initial_args)
  File "C:\Python27\lib\site-packages\pip\__init__.py", line 138, in parseopts
    general_options, args_else = parser.parse_args(args)
  File "C:\Python27\lib\optparse.py", line 1399, in parse_args
    stop = self._process_args(largs, rargs, values)
  File "C:\Python27\lib\optparse.py", line 1439, in _process_args
    self._process_long_opt(rargs, values)
  File "C:\Python27\lib\optparse.py", line 1514, in _process_long_opt
    option.process(opt, value, values, self)
  File "C:\Python27\lib\optparse.py", line 788, in process
    self.action, self.dest, opt, value, values, parser)
  File "C:\Python27\lib\optparse.py", line 810, in take_action
    parser.print_help()
  File "C:\Python27\lib\optparse.py", line 1669, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
  File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1257: ordinal not in
range(128)

C:\>pip
Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
  File "C:\Python27\lib\site-packages\pip\__init__.py", line 177, in main
    cmd_name, cmd_args = parseopts(initial_args)
  File "C:\Python27\lib\site-packages\pip\__init__.py", line 148, in parseopts
    parser.print_help()
  File "C:\Python27\lib\optparse.py", line 1669, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
  File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1257: ordinal not in
range(128)

thedrow · 2014-03-19T11:12:30Z

@while0pass I'm not sure if it's related. Try to uninstall pip and installl it again.
If the issue persists just open a new ticket.

thedrow · 2014-03-19T11:13:58Z

Why is this issue still open when #1395 & #1396 were already merged? @dstufft

BlakeWL · 2015-03-06T07:20:40Z

I get a error that is UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 33: ordinal not in range(128) in windows ,and it is cause by the registration table change of windows ,and the way to solve is opening the mimetypes.py in the C:\python27\Lib ,and found the code like ‘default_encoding = sys.getdefaultencoding()’,before it ,add the code below:
if sys.getdefaultencoding() != 'gbk':
reload(sys)
sys.setdefaultencoding('gbk')
default_encoding = sys.getdefaultencoding()

and then it is ok.

piotr-dobrogost · 2015-03-06T10:04:21Z

@BlakeWL

What is registration table?

BlakeWL · 2015-03-07T04:34:28Z

@piotr-dobrogost
when you run regedit in windows,the registration table will show up.

piotr-dobrogost · 2015-03-07T20:34:16Z

It's called Windows Registry (http://en.wikipedia.org/wiki/Windows_Registry) not registration table.

tanbro · 2015-05-28T03:39:59Z

can this patch work ?

--- C:/Python27/Lib/site-packages/pip/download.py   ÖÜËÄ 5ÔÂ 28 09:40:28 2015
+++ C:/Python27/Lib/site-packages/pip/download.py   ÖÜËÄ 5ÔÂ 28 11:34:40 2015
@@ -881,6 +881,13 @@ def _download_http_url(link, session, temp_dir):
         ext = os.path.splitext(resp.url)[1]
         if ext:
             filename += ext
+    try:
+        if isinstance(temp_dir, unicode):
+            temp_dir = temp_dir.encode(sys.getfilesystemencoding())
+        if isinstance(filename, unicode):
+            filename = filename.encode(sys.getfilesystemencoding())
+    except NameError:
+        pass
     file_path = os.path.join(temp_dir, filename)
     with open(file_path, 'wb') as content_file:
         _download_url(resp, link, content_file)

krader1961 · 2015-11-16T19:56:33Z

FYI, There is a Python standard for specifying the character encoding of a python module: PEP 263. Which you can read here: https://www.python.org/dev/peps/pep-0263/.

allenwyma · 2016-03-22T06:49:46Z

any word on this? just got this today

dstufft · 2017-03-24T16:25:56Z

Closing this, I believe that this has been fixed.

arisobel · 2017-05-11T20:17:07Z

Not for me....
Which lib should I have to upgrade i/o to get away from this problem?

mixmastamyk · 2018-06-11T22:44:47Z

Still happening. Is this related or different?:

e:\repos\fr>pip install -e .

Obtaining file:///E:/repos/fr
Installing collected packages: fr
  Found existing installation: fr 3.0a0
    Uninstalling fr-3.0a0:
      Successfully uninstalled fr-3.0a0
  Running setup.py develop for fr
    Complete output from command c:\users\TheUser\python36\python.exe -c "import setuptools, tokenize;__file_
_='E:\\repos\\fr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.
close();exec(compile(code, __file__, 'exec'))" develop --no-deps:
    running develop
    running egg_info
    writing fr.egg-info\PKG-INFO
    writing dependency_links to fr.egg-info\dependency_links.txt
    writing requirements to fr.egg-info\requires.txt
    writing top-level names to fr.egg-info\top_level.txt
    reading manifest file 'fr.egg-info\SOURCES.txt'
    writing manifest file 'fr.egg-info\SOURCES.txt'
    running build_ext
    Creating c:\users\TheUser\python36\lib\site-packages\fr.egg-link (link to .)
    fr 3.0a0 is already the active version in easy-install.pth
    c:\users\TheUser\python36\lib\site-packages\setuptools\dist.py:397: UserWarning: Normalizing '3.00a0' to
'3.0a0'
      normalized_version,
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "E:\repos\fr\setup.py", line 62, in <module>
        'Topic :: Utilities',
      File "c:\users\TheUser\python36\lib\site-packages\setuptools\__init__.py", line 129, in setup
        return distutils.core.setup(**attrs)
      File "c:\users\TheUser\python36\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "c:\users\TheUser\python36\lib\distutils\dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "c:\users\TheUser\python36\lib\distutils\dist.py", line 974, in run_command
        cmd_obj.run()
      File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 36, in run
        self.install_for_development()
      File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 152, in install_
for_development
        self.process_distribution(None, self.dist, not self.no_deps)
      File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\easy_install.py", line 726, in pro
cess_distribution
        self.install_egg_scripts(dist)
      File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 187, in install_
egg_scripts
        script_text = strm.read()
      File "c:\users\TheUser\python36\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1303: character maps to <undefined>


    ----------------------------------------
  Rolling back uninstall of fr
Command "c:\users\TheUser\python36\python.exe -c "import setuptools, tokenize;__file__='E:\\repos\\fr\\setup.
py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(cod
e, __file__, 'exec'))" develop --no-deps" failed with error code 1 in E:\repos\fr\

pradyunsg · 2018-06-12T01:32:44Z

That error seems to be coming from setuptools. Try python setup.ly develop, if that errors out, it's a setuptools issue. If it doesn't, could you file a new bug report for it?

lock · 2019-06-02T11:03:40Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

sscherfke mentioned this issue Nov 21, 2013

Avoid UnicodeDecodeErrors when reading egg_info files. Fix issue #1291. #1331

Closed

myint mentioned this issue Mar 25, 2014

Error when install autopep8 under Python 3.3 on Windows hhatto/autopep8#130

Closed

krader1961 mentioned this issue Nov 16, 2015

Installation on Windows with Anaconda xonsh/xonsh#487

Closed

xavfernandez added the type: bug A confirmed bug or unintended behavior label Apr 3, 2016

dstufft closed this as completed Mar 24, 2017

pradyunsg added the project: setuptools Related to setuptools label Jun 13, 2018

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pip install: UnicodeDecodeError on Windows #1291

pip install: UnicodeDecodeError on Windows #1291

sscherfke commented Nov 4, 2013

thedrow commented Nov 4, 2013

sscherfke commented Nov 13, 2013

pfmoore commented Nov 13, 2013

sscherfke commented Nov 13, 2013

thedrow commented Nov 13, 2013

sscherfke commented Nov 13, 2013

thedrow commented Nov 13, 2013

pfmoore commented Nov 13, 2013

thedrow commented Nov 13, 2013

sscherfke commented Nov 15, 2013

sscherfke commented Nov 21, 2013

while0pass commented Mar 19, 2014

thedrow commented Mar 19, 2014

thedrow commented Mar 19, 2014

BlakeWL commented Mar 6, 2015

piotr-dobrogost commented Mar 6, 2015

BlakeWL commented Mar 7, 2015

piotr-dobrogost commented Mar 7, 2015

tanbro commented May 28, 2015

krader1961 commented Nov 16, 2015

allenwyma commented Mar 22, 2016

dstufft commented Mar 24, 2017

arisobel commented May 11, 2017

mixmastamyk commented Jun 11, 2018

pradyunsg commented Jun 12, 2018

lock bot commented Jun 2, 2019

pip install: UnicodeDecodeError on Windows #1291

pip install: UnicodeDecodeError on Windows #1291

Comments

sscherfke commented Nov 4, 2013

thedrow commented Nov 4, 2013

sscherfke commented Nov 13, 2013

pfmoore commented Nov 13, 2013

sscherfke commented Nov 13, 2013

thedrow commented Nov 13, 2013

sscherfke commented Nov 13, 2013

thedrow commented Nov 13, 2013

pfmoore commented Nov 13, 2013

thedrow commented Nov 13, 2013

sscherfke commented Nov 15, 2013

sscherfke commented Nov 21, 2013

while0pass commented Mar 19, 2014

thedrow commented Mar 19, 2014

thedrow commented Mar 19, 2014

BlakeWL commented Mar 6, 2015

piotr-dobrogost commented Mar 6, 2015

BlakeWL commented Mar 7, 2015

piotr-dobrogost commented Mar 7, 2015

tanbro commented May 28, 2015

krader1961 commented Nov 16, 2015

allenwyma commented Mar 22, 2016

dstufft commented Mar 24, 2017

arisobel commented May 11, 2017

mixmastamyk commented Jun 11, 2018

pradyunsg commented Jun 12, 2018

lock bot commented Jun 2, 2019