Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 326 unicode decode error #374

Merged
merged 5 commits into from
Feb 7, 2012

Conversation

msabramo
Copy link
Contributor

This is a proposed fix for #326

412daad - Modify console_to_str to handle non-ASCII (UTF-8) input.
b154c82 - Modify Logger.log so that it can output UTF-8 encoded messages.

$ hwprefs os_type
Mac OS X 10.6.8 (10K549)

$ python -V
Python 3.1.1

$ pwd
/Users/marc/dev/git-repos/anyserializer/.tox/py31

$ pip install --upgrade PyYAML | egrep -v '^xxx'
...
    Compiling with an SDK that doesn't seem to exist: /Developer/SDKs/MacOSX10.4u.sdk
    Please check your Xcode installation
    ...
    build/temp.macosx-10.3-fat-3.1/check_libyaml.c: In function ‘main’:
    build/temp.macosx-10.3-fat-3.1/check_libyaml.c:5: error: ‘yaml_parser_t’ undeclared (first use in this function)
    build/temp.macosx-10.3-fat-3.1/check_libyaml.c:5: error: (Each undeclared identifier is reported only once
    build/temp.macosx-10.3-fat-3.1/check_libyaml.c:5: error: for each function it appears in.)
    build/temp.macosx-10.3-fat-3.1/check_libyaml.c:5: error: expected ‘;’ before ‘parser’
    ...

@msabramo
Copy link
Contributor Author

Here are the beginnings of a test -- I haven't had a chance to make this a proper test yet:

$ python -V
Python 3.2.2

$ cat broken_emits_utf8/setup.py 
# -*- coding: utf-8 -*-

from distutils.core import setup
import sys

class FakeError(Exception):
    pass

if sys.argv[1] == 'install':
    sys.stdout.buffer.write(b'\nThis package prints out UTF-8 stuff like:\n')
    sys.stdout.buffer.write('* return type of ‘main’ is not ‘int’\n'.encode('utf-8'))
    sys.stdout.buffer.write('* Björk Guðmundsdóttir [ˈpjœr̥k ˈkvʏðmʏntsˌtoʊhtɪr]'.encode('utf-8'))
    raise FakeError('this package designed to fail on install')

setup(name='broken',
      version='0.2broken',
      py_modules=['broken'],
      )

$ pip install broken_emits_utf8/ | grep -v 'xxx'
...
  File "/Users/marc/python/virtualenvs/py3.1-phpserialize/lib/python3.2/site-packages/pip-1.0.2-py3.2.egg/pip/__init__.py", line 230, in call_subprocess
    line = console_to_str(stdout.readline())
  File "/Users/marc/python/virtualenvs/py3.1-phpserialize/lib/python3.2/site-packages/pip-1.0.2-py3.2.egg/pip/backwardcompat.py", line 60, in console_to_str
    return s.decode(console_encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 17: ordinal not in range(128)
...

@msabramo
Copy link
Contributor Author

I've made good progress on writing a test for this issue, but then I ran into an issue with ScriptTest:

https://bitbucket.org/ianb/scripttest/issue/10/

I have a fix and a pull request for the above - without it, the test will fail.

@msabramo
Copy link
Contributor Author

Re: the test_install_package_that_emits_unicode test in a790180...

With Python 2, without the ScriptTest fix at
https://bitbucket.org/ianb/scripttest/pull-request/3/:

 $ python -V
 Python 2.7

 $ nosetests -v -s test_unicode.py 
 Install a package with a setup.py that emits UTF-8 output and then fails. ... ERROR

 ======================================================================
 ERROR: Install a package with a setup.py that emits UTF-8 output and then fails.
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/Users/marc/python/virtualenvs/django/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
     self.test(*self.arg)
   File "/Users/marc/dev/git-repos/pip/tests/test_unicode.py", line 27, in test_install_package_that_emits_unicode
     result = run_pip('install', to_install, expect_error=True)
   File "/Users/marc/dev/git-repos/pip/tests/test_pip.py", line 473, in run_pip
     result = env.run('pip', *args, **kw)
   File "/Users/marc/dev/git-repos/pip/tests/test_pip.py", line 361, in run
     return TestPipResult(super(TestPipEnvironment, self).run(cwd=cwd, *args, **kw), verbose=self.verbose)
   File "/Users/marc/python/virtualenvs/django/lib/python2.7/site-packages/scripttest/__init__.py", line 246, in run
     stdout = string(stdout).replace('\r\n', '\n')
   File "/Users/marc/python/virtualenvs/django/lib/python2.7/site-packages/scripttest/backwardscompat.py", line 9, in string
     return string.decode('utf-8')
   File "/Users/marc/python/virtualenvs/django/lib/python2.7/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 337: ordinal not in range(128)

 ----------------------------------------------------------------------
 Ran 1 test in 4.900s

 FAILED (errors=1)

With Python 2, with the ScriptTest fix at
https://bitbucket.org/ianb/scripttest/pull-request/3/:

 $ python -V
 Python 2.7

 $ nosetests -v -s test_unicode.py 
 Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

 ----------------------------------------------------------------------
 Ran 1 test in 15.372s

 OK

With Python 3, without fix in 412daad:

 $ python -V
 Python 3.2.2

 $ nosetests -v -s test_unicode.py 
 Install a package with a setup.py that emits UTF-8 output and then fails. ... FAIL

 ======================================================================
 FAIL: Install a package with a setup.py that emits UTF-8 output and then fails.
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/Users/marc/python/virtualenvs/pip-python3/lib/python3.2/site-packages/nose/case.py", line 198, in runTest
     self.test(*self.arg)
   File "/Users/marc/dev/git-repos/pip/tests/test_unicode.py", line 31, in test_install_package_that_emits_unicode
     assert 'UnicodeDecodeError' not in result.stdout
 AssertionError

 ----------------------------------------------------------------------
 Ran 1 test in 37.364s

 FAILED (failures=1)

With Python 3, with fix in 412daad:

 $ python -V
 Python 3.2.2

 $ nosetests -v -s test_unicode.py 
 Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

 ----------------------------------------------------------------------
 Ran 1 test in 60.864s

 OK

@msabramo
Copy link
Contributor Author

msabramo commented Nov 5, 2011

Anyone get a chance to try this? Just curious if it works for other folks.

@msabramo
Copy link
Contributor Author

Anyone get a chance to try this?

@@ -70,8 +70,12 @@ def log(self, level, msg, *args, **kw):
if self.explicit_levels:
## FIXME: should this be a name, not a level number?
rendered = '%02i %s' % (level, rendered)
if hasattr(consumer, 'write'):
consumer.write(rendered+'\n')
if hasattr(consumer, 'buffer'): # Python 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we encapsulate this check into a utility method in pip.backwardcompat? I'd like to avoid sprinkling the codebase with more ad-hoc Python 3 checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, take a look at commit efe4495.

@carljm
Copy link
Contributor

carljm commented Dec 12, 2011

Thanks for this work, it looks really good! Made a couple comments, I'll give it some more thorough testing soon. Sorry I didn't get to it sooner.

I've also asked Ian if we can get ScriptTest released, so I can merge this without making pip's tests dependent on an unreleased version of ScriptTest. https://bitbucket.org/ianb/scripttest/issue/11/release

@msabramo
Copy link
Contributor Author

Commit efe4495 is my attempt to address carljm's comment about encapsulating a Python 2 vs. Python 3 check. Let me know if that works or if it needs tweaking.

marc@hyperion:~/dev/git-repos/pip$ PAGER=cat git log -1 --oneline
efe4495 Encapsulate `write`/`buffer` Python 2 vs. Python 3 check into a utility method (`fwrite`) in pip.backwardcompat.
marc@hyperion:~/dev/git-repos/pip$ workon pip-python2
(pip-python2) marc@hyperion:~/dev/git-repos/pip$ python -V
Python 2.7
(pip-python2) marc@hyperion:~/dev/git-repos/pip$ nosetests -v -s test_unicode.py
Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

----------------------------------------------------------------------
Ran 1 test in 3.920s

OK

(pip-python2) marc@hyperion:~/dev/git-repos/pip$ workon pip-python3
(pip-python3) marc@hyperion:~/dev/git-repos/pip$ python -V
Python 3.2.2
(pip-python3) marc@hyperion:~/dev/git-repos/pip$ nosetests -v -s test_unicode.py
Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

----------------------------------------------------------------------
Ran 1 test in 15.467s

OK

@carljm
Copy link
Contributor

carljm commented Jan 3, 2012

Update looks good. I think we're waiting on a ScriptTest release here.

@carljm
Copy link
Contributor

carljm commented Feb 6, 2012

ScriptTest 1.2 is out, putting this on my list for review and merge.

@carljm
Copy link
Contributor

carljm commented Feb 6, 2012

So with this branch (merged up to develop), tests pass on Python 3.2, but the new test fails on Python 2.7 with a UnicodeEncodeError.

@msabramo
Copy link
Contributor Author

msabramo commented Feb 7, 2012

Doh. OK, I'll try to take a look at the Python 2.7 test failure soon.

@msabramo
Copy link
Contributor Author

msabramo commented Feb 7, 2012

Hmmm. It seems to work for me. The develop branch didn't have test_unicode.py and BrokenEmitsUTF8 in it so I copied them from my branch:

~/dev/git-repos$ cp pip/tests/test_unicode.py pip-upstream/tests/
~/dev/git-repos$ cp -pr pip/tests/packages/BrokenEmitsUTF8 pip-upstream/tests/packages/

Then I ran the test:

~/dev/git-repos$ cd pip-upstream/
~/dev/git-repos/pip-upstream$ nosetests -v -s test_unicode.py
Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

----------------------------------------------------------------------
Ran 1 test in 2.190s

OK

Here's the version of Python that I was using:

~/dev/git-repos/pip-upstream$ python -V
Python 2.7.2
~/dev/git-repos/pip-upstream$ which python
/Users/marca/.pythonbrew/pythons/Python-2.7.2/bin/python

and here's what I happen to have installed:

~/dev/git-repos/pip-upstream$ pip freeze
Fabric==1.3.4
Pygments==1.4
ScriptTest==1.2
distribute==0.6.24
ipython==0.12
mercurial==2.0
nose==1.1.2
pep8==0.6.1
pycrypto==2.5
pyflakes==0.5.0
readline==6.2.1
ssh==1.7.11
virtualenv==1.7
wsgiref==0.1.2
yolk==0.4.3

@carljm
Copy link
Contributor

carljm commented Feb 7, 2012

Mmm, I miscommunicated. I haven't merged your branch into develop yet. When I said "merged up to develop" what I meant was that (locally) I merged develop into your branch, so it would be fully up to date with changes in develop in the meantime. So of course develop branch doesn't have the new test file, it doesn't have any of the changes from your branch :-)

What your results indicate is that your added test apparently passes with the current code under Python 2.7, but fails with the code changes in your branch; the reverse of the situation with Python 3. We need to update the code so that your test passes under both Python 2 and 3.

@msabramo
Copy link
Contributor Author

msabramo commented Feb 7, 2012

Ah ok. So I need to try rebasing my branch on to develop and then test that. Thanks for the clarification.

@msabramo
Copy link
Contributor Author

msabramo commented Feb 7, 2012

Hmmm. Maybe I'm still not doing what you did.

(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ git branch
  develop
* fix-issue-326-UnicodeDecodeError

(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ git rebase upstream/develop
First, rewinding head to replay your work on top of it...
Applying: Modify console_to_str to handle non-ASCII (UTF-8) input.
Applying: Modify Logger.log so that it can output UTF-8 encoded messages.
Applying: Revise b154c82 so that it works with Python 2
Applying: Add a test (test_install_package_that_emits_unicode) for https://github.com/pypa/pip/issues/326 and
Applying: Encapsulate `write`/`buffer` Python 2 vs. Python 3 check into a utility method (`fwrite`) in pip.backwardcompat.

(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ PAGER=cat git log -10 --oneline
b4ee573 Encapsulate `write`/`buffer` Python 2 vs. Python 3 check into a utility method (`fwrite`) in pip.backwardcompat.
f0b1d65 Add a test (test_install_package_that_emits_unicode) for https://github.com/pypa/pip/issues/326 and https://github.com/pypa/pip/pull/374
3e5f8d3 Revise b154c82 so that it works with Python 2
27af4de Modify Logger.log so that it can output UTF-8 encoded messages.
49b8afb Modify console_to_str to handle non-ASCII (UTF-8) input.
900d4fd Merge pull request #451 from skorokithakis/develop
d8babe7 --target-dir implies --ignore-installed.
877b5e5 Use a smaller package than mock as a test download.
575eeac Update AUTHORS and changelog.
ec5d336 Fix shutil.move and test.

(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ nosetests -v -s test_unicode.py
Install a package with a setup.py that emits UTF-8 output and then fails. ... ok

----------------------------------------------------------------------
Ran 1 test in 2.172s

OK

(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ python -V
Python 2.7.2
(pip)marca@SCML-MarcA:~/dev/git-repos/pip$ which python
/Users/marca/.pythonbrew/venvs/Python-2.7.2/pip/bin/python

@carljm carljm merged commit efe4495 into pypa:develop Feb 7, 2012
@carljm
Copy link
Contributor

carljm commented Feb 7, 2012

Ok, how embarrassing. I updated my Python 3.2 virtualenv to ScriptTest 1.2 but forgot to do the same for my Python 2.7 env, so it was still running ScriptTest 1.1.1. Everything passes on all versions, 2.4 - 3.2. Merged - thanks very much for the contribution, and sorry for wasting your time with my stupidity!

@msabramo
Copy link
Contributor Author

msabramo commented Feb 7, 2012

Ha ha. No worries! I'm glad that this turned out to be useful.

Hope to meet you someday. Maybe at PyCon 2012?

@carljm
Copy link
Contributor

carljm commented Feb 7, 2012

I'll be there!

@qwcode
Copy link
Contributor

qwcode commented Jan 23, 2013

hey @msabramo, the test for this is failing on win py3.3
http://jenkins.qwcode.com/job/pip_win_33/8/console

and here it is with more output

any quick thoughts? otherwise, I look at this closer later.

======================================================================
FAIL: Install a package with a setup.py that emits UTF-8 output and then fails.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Administrator\marcus\pipVE3\lib\site-packages\nose\case.py", li
ne 198, in runTest
    self.test(*self.arg)
  File "c:\Users\Administrator\marcus\pip\tests\test_unicode.py", line 24, in te
st_install_package_that_emits_unicode
    assert 'FakeError: this package designed to fail on install' in result.stdou
t, str(result)
AssertionError: Script result: pip install c:\Users\Administrator\marcus\pip\tes
ts\packages\BrokenEmitsUTF8
  return code: 1
-- stdout: --------------------
Unpacking c:\users\administrator\marcus\pip\tests\packages\brokenemitsutf8
  Running setup.py egg_info for package from file:///c%7C%5Cusers%5Cadministrato
r%5Cmarcus%5Cpip%5Ctests%5Cpackages%5Cbrokenemitsutf8
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "c:\Users\Administrator\marcus\pip\tests\tests_cache\test_ws\.virtual
env\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 458:
character maps to <undefined>
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "c:\Users\Administrator\marcus\pip\tests\tests_cache\test_ws\.virtualenv\
lib\encodings\cp1252.py", line 23, in decode

    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 458: char
acter maps to <undefined>

----------------------------------------
Command python setup.py egg_info failed with error code 1 in c:\users\administra
tor\marcus\pip\tests\tests_cache\test_ws\tmp\pip-lw2nqo-build
Storing complete log in c:\Users\Administrator\marcus\pip\tests\tests_cache\test
_ws\pip-log.txt

-- created: -------------------
  pip-log.txt  (2599 bytes)
  tmp\pip-lw2nqo-build
          broken.py  (0 bytes)
          pip-egg-info
          setup.py  (1043 bytes)

----------------------------------------------------------------------

@msabramo
Copy link
Contributor Author

From a quick look, I notice that the Windows system is using CP-1252 as its encoding. The test in question emits UTF-8 so that's probably why there is a UnicodeDecodeError. I wonder if the code in question can be forced to use UTF-8 rather than CP-1252.

@srinu-dipl
Copy link

i have this error occur when i run pip-freeze.txt
collect2: error: ld returned 1 exit status

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1


Cleaning up...
Command /home/administrator/projects/rapidpro/env/bin/python -c "import setuptools, tokenize;file='/home/administrator/projects/rapidpro/env/build/gnureadline/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-21Niad-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/administrator/projects/rapidpro/env/include/site/python2.7 failed with error code 1 in /home/administrator/projects/rapidpro/env/build/gnureadline
Traceback (most recent call last):
File "/home/administrator/projects/rapidpro/env/bin/pip", line 11, in
sys.exit(main())
File "/home/administrator/projects/rapidpro/env/local/lib/python2.7/site-packages/pip/init.py", line 185, in main
return command.main(cmd_args)
File "/home/administrator/projects/rapidpro/env/local/lib/python2.7/site-packages/pip/basecommand.py", line 161, in main
text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 66: ordinal not in range(128)

@Ivoz
Copy link
Contributor

Ivoz commented Apr 9, 2015

@srinu-dipl please report your problem in a new issue, it's also helpful to say the command you entered and the version of software that you're using (python, pip, virtualenv, operating system)

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants