Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError #303

Closed
nedbat opened this issue May 5, 2014 · 17 comments
Closed

UnicodeDecodeError #303

nedbat opened this issue May 5, 2014 · 17 comments
Labels
bug Something isn't working html

Comments

@nedbat
Copy link
Owner

nedbat commented May 5, 2014

Originally reported by douglasbsf (Bitbucket: douglasbsf, GitHub: douglasbsf)


Traceback (most recent call last):
  File "/usr/local/bin/coverage", line 8, in <module>
    load_entry_point('coverage==3.7.1', 'console_scripts', 'coverage')()
  File "/Library/Python/2.7/site-packages/coverage/cmdline.py", line 721, in main
    status = CoverageScript().command_line(argv)
  File "/Library/Python/2.7/site-packages/coverage/cmdline.py", line 461, in command_line
    **report_args)
  File "/Library/Python/2.7/site-packages/coverage/control.py", line 662, in html_report
    return reporter.report(morfs)
  File "/Library/Python/2.7/site-packages/coverage/html.py", line 113, in report
    self.report_files(self.html_file, morfs, self.config.html_dir)
  File "/Library/Python/2.7/site-packages/coverage/report.py", line 84, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/Library/Python/2.7/site-packages/coverage/html.py", line 253, in html_file
    html = html.decode(encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3418: ordinal not in range(128)

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Original comment by douglasbsf (Bitbucket: douglasbsf, GitHub: douglasbsf)


Solution: html = html.decode(encoding,'ignore') on /coverage/html.py line 253

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Can you please provide more detail? In particular, a reproducible test case would be appreciated. My guess is that you have a non-ASCII character in a comment in your source file, and that if you add a coding comment to the top, coverage will work fine:

# -*- coding: utf8 -*-

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Original comment by douglasbsf (Bitbucket: douglasbsf, GitHub: douglasbsf)


I tried this solution, but here it's didn't work. So I tried the proposed solution.
This problem occurred on Fedora 18 using Python 2.7.3

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Original comment by douglasbsf (Bitbucket: douglasbsf, GitHub: douglasbsf)


Another detail: I'm using django unit test like this: coverage run --source='.' manage.py test client.tests .
Using a simple code, it's works fine, like this: coverage run myprogram.py
The problem occurs only when I call "coverage html"

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Is there any chance you could supply a reproducible test case?

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Original comment by douglasbsf (Bitbucket: douglasbsf, GitHub: douglasbsf)


It's a piece of a IPTV project (Private) and It's only work complete. I would like but I can't.
Thank you for your time, your are so helpful.

@nedbat
Copy link
Owner Author

nedbat commented May 5, 2014

Issue #295 was marked as a duplicate of this issue.

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Robert Sussland (Bitbucket: rsussland, GitHub: rsussland)


I am having the same issue. There are no non-unicode characters in any of the python source files, however the test suite I am running coverage on downloads files with unicode file names and processes file with unicode characters. I cannot send you my code as it is pulling data from an on-site database. The stack trace doesn't show which file triggered the error:


  File "/Users/rsussland/pop/bin/coverage", line 9, in <module>
    load_entry_point('coverage==4.0a0', 'console_scripts', 'coverage')()
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/cmdline.py", line 747, in main
    status = CoverageScript().command_line(argv)
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/cmdline.py", line 467, in command_line
    **report_args)
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/control.py", line 679, in html_report
    return reporter.report(morfs)
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/html.py", line 109, in report
    self.report_files(self.html_file, morfs, self.config.html_dir)
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/report.py", line 81, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/Users/rsussland/pop/lib/python2.7/site-packages/coverage-4.0a0-py2.7-macosx-10.6-intel.egg/coverage/html.py", line 241, in html_file
    html = html.decode(encoding)
  File "/Users/rsussland/pop/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position [231299 (bb)](https://bitbucket.org/ned/coveragepy/commits/231299): invalid start byte

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Robert Sussland (Bitbucket: rsussland, GitHub: rsussland)


The workaround -- ignore errors -- works just fine for me. Also, both the original reporter and myself are on Macs.

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Ian Cordasco (Bitbucket: icordasc, GitHub: Unknown)


The problem is that ignoring errors when encoding unicode in ascii is that you're going to lose data. What data is trying to be used, I'm not certain, but @rsussland could you please put your traceback in a fenced code block, i.e., precede it with three backticks and a newline, and follow it by a newline and three backticks. That will make it much easier to read.

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Robert Sussland (Bitbucket: rsussland, GitHub: rsussland)


Made the change -- I understand that there may be some loss of data but before I was getting 0% data as the html failed to generate.

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Ian Cordasco (Bitbucket: icordasc, GitHub: Unknown)


So I'm looking at this code, and I'm wondering why decode is even called. It seems to be a special case for Python 2.6 and 2.7 but the string API in those versions is different than Python 3. You can do 'foo bar bogus'.encode('ascii') with confidence on Python 2. I'm not sure how well it will work with xmlcharrefreplace, but I suspect that is possibly the only problem. I'm going to investigate a bit.

@nedbat
Copy link
Owner Author

nedbat commented Jun 5, 2014

Original comment by Robert Sussland (Bitbucket: rsussland, GitHub: rsussland)


I'm not familiar enough with this code to determine whether the policy is to work with code points or bytestrings, but unless you import unicode_literals, 'foo bar bogus' is already an ascii encoded byte string, so calling encode on it will decode with the ascii codec and re-encode it with the same codec. If you are intending to work with code points then you do need to decode, but I don't see any of the standard patterns for doing that in the html.py file -- for example, you are calling with open instead of with codecs.open, so your other strings are already encoded as bytestrings and not code points, in which case no need to decode at all, but care must be taken when combining bytestrings of different encodings (they are all ascii byte strings).

@nedbat
Copy link
Owner Author

nedbat commented Jun 9, 2014

@rsussland The line of code in question is dealing with the HTML version of your source files. I find it very hard to believe that you have no non-ascii characters in your source file. Perhaps in a comment? A curly apostrophe? The data you download for your tests doesn't matter, that isn't part of the HTML report.

In Python 2.7, do this:

open("mysource.py").read().decode('ascii')

Does it succeed, or raise an exception? Also, it looks like your file is really large? Can you share any of the code with me?

@nedbat
Copy link
Owner Author

nedbat commented Jun 9, 2014

Sorry, I see that you say the file isn't in the error message. At the very least, I can add some information there so these problems are easier to diagnose while we decide on an approach.

@nedbat
Copy link
Owner Author

nedbat commented Jun 9, 2014

OK, I added better error reporting in e5c3104996da (bb) . Can you use it to provide some more information about your failure?

@nedbat
Copy link
Owner Author

nedbat commented Jun 12, 2014

I've fixed this in 529fef3d32ab (bb)

Regardless of how a Python file has undecodable characters in it, we should prevent errors from stopping the HTML report.

@nedbat nedbat closed this as completed Jun 12, 2014
@nedbat nedbat added major bug Something isn't working html labels Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html
Projects
None yet
Development

No branches or pull requests

1 participant