New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support --source-encoding option. #256

Merged
merged 4 commits into from Jun 3, 2018

Conversation

Projects
None yet
2 participants
@lisongmin
Contributor

lisongmin commented May 21, 2018

This PR add --source-encoding suggest in #148 .

Note that io.open will load file as unicode under python2, which may change the coding behavior.

@lisongmin lisongmin force-pushed the lisongmin:not-utf-8-encoding-support branch from b11c88a to a0df533 May 21, 2018

@codecov

This comment has been minimized.

codecov bot commented May 21, 2018

Codecov Report

Merging #256 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #256      +/-   ##
==========================================
+ Coverage   88.76%   88.83%   +0.06%     
==========================================
  Files          13       13              
  Lines        1478     1487       +9     
  Branches      267      268       +1     
==========================================
+ Hits         1312     1321       +9     
  Misses        108      108              
  Partials       58       58
Impacted Files Coverage Δ
gcovr/tests/test_gcovr.py 97.67% <100%> (+0.11%) ⬆️
gcovr/tests/test_gcov_parser.py 100% <100%> (ø) ⬆️
gcovr/html_generator.py 96.53% <100%> (+0.01%) ⬆️
gcovr/__main__.py 89.17% <100%> (+0.13%) ⬆️
gcovr/gcov.py 83.46% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17dfaf6...21ebbaa. Read the comment docs.

@latk

This comment has been minimized.

Member

latk commented May 21, 2018

This is fantastic! Thank you very much for looking into this. I'll review properly in the next couple of days.

BTW the Travis test failure was just a small style problem. To check for those before you commit you can run flake8 locally, e.g. python3 -m flake8 gcovr/.

If you'd like to work on other issues, please consider first making a comment on the corresponding issue or mentioning this in the chat room. I really appreciate your work, but some issues first need some discussion on the best design.

@lisongmin

This comment has been minimized.

Contributor

lisongmin commented May 21, 2018

Ok. Test should pass now.

I forgot run it locally. It is greate if there is a .flake8 file under the project root and autopep8 as well. modern editor with plugin may auto check if it is there(like vim, vs code).

Do you mind change the file contents to unicode in python2?

@lisongmin

This comment has been minimized.

Contributor

lisongmin commented May 21, 2018

And it is less tolerant than #156 because encode/decode error handler is replace but not surrogateescape for python2 compat. Any way, we cat get coverage even if encoding error. that is important.

@latk latk added the needs review label May 22, 2018

@latk

I've reviewed this PR more thoroughly. Encoding issues are difficult! I'll try to work on test cases with various encodings, and will not merge until there are some tests that work on all platforms.

@@ -235,7 +236,8 @@ def print_html_report(covdata, options):
if options.output is None:
sys.stdout.write(htmlString + '\n')
else:
OUTPUT = open(options.output, 'w')
OUTPUT = io.open(options.output, 'w', encoding=options.html_encoding,
errors='replace')

This comment has been minimized.

@latk

latk May 29, 2018

Member

for HTML output, setting errors='xmlcharrefreplace' might be more appropriate

options.add_argument(
'--source-encoding',
help="Override the declared source encoding. "
"Defaults to the system default encoding.",

This comment has been minimized.

@latk

latk May 29, 2018

Member

The --source-encoding and --html-encoding options should reference each other. Now that the source encoding can be specified explicitly, the --html-encoding help message is incorrect.
It might also be nice to show the system default encoding (available through locale.getpreferredencoding()).

This comment has been minimized.

@lisongmin

lisongmin May 29, 2018

Contributor

They should independency? the transition in python:

source -- read and convert to --> unicode(or str on python3) -- convert to utf-8 and write --> html(utf-8)

--html-encoding is show in broswer, and utf-8 is suitable for that.

@@ -87,7 +85,8 @@ def is_non_code(code):
#
def process_gcov_data(data_fname, covdata, source_fname, options, currdir=None):
logger = Logger(options.verbose)
INPUT = open(data_fname, "r")
INPUT = io.open(data_fname, "r", encoding=options.source_encoding,
errors='replace')

This comment has been minimized.

@latk

latk May 29, 2018

Member

I've been thinking about whether it is correct to decode the gcov files in one go. The problem is that the gcov tool is encoding transparent. The output files contain ASCII annotations and the original source file contents. So opening these files will only work under an ASCII-compatible encoding.

Perhaps this restriction should be mentioned in the --source-encoding help message. It would also be good to have relevant tests. I'll try to develop a test case in the course of this week.

This comment has been minimized.

@lisongmin

lisongmin May 29, 2018

Contributor

I tested it locally with non-ascii chars and output is correct before make this PR.

  • input with utf-8 and output with gbk
  • input with utf-8 and output with utf-8

This comment has been minimized.

@lisongmin

lisongmin May 29, 2018

Contributor

Which tests are you wanted to add? I can have a try although i'm not familiar to gcovr's test framwork.

if sys.version_info < (3, 0):
code = code.strip().translate(None, '}{')
else:
code = code.strip().translate(noncode_mapper)

This comment has been minimized.

@latk

latk May 29, 2018

Member

I was a bit confused by this change, but apparently unicode.translate() from Python 2.7 behaves like str.translate() in Python 3. It might be good to make that requirement explicit, e.g. by adding a six dependency and:

import six
...

def is_non_code(code):
    assert isinstance(code, six.text_type)
    ...

This comment has been minimized.

@lisongmin

lisongmin May 29, 2018

Contributor

io.open returns unicode on python2 and str on python3

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

This comment has been minimized.

@lisongmin

lisongmin May 29, 2018

Contributor

As we can ensure io.open return unicode, adding additional module six is not necessary in my opinion.

lisongmin and others added some commits May 29, 2018

add encoding tests
This checks --source-encoding for utf8 and cp1252 (iso-8895-1, latin-1),
and --html-encoding for utf8, cp1252, and iso-8859-15 (latin-9).
Note that utf8 is the default for --html-encoding.

Untested: treatment of encoding errors when reading gcov files.
@latk

latk approved these changes Jun 3, 2018

I have updated the PR with tests for the encoding behaviour, and a small documentation improvement. I'll do the merge now. Sorry that it took so long!

@latk latk merged commit ff4c43d into gcovr:master Jun 3, 2018

4 checks passed

codecov/patch 100% of diff hit (target 88.76%)
Details
codecov/project 88.83% (+0.06%) compared to 17dfaf6
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@latk latk removed the needs review label Jun 3, 2018

@latk latk referenced this pull request Jun 3, 2018

Closed

Fix unicode exceptions #156

@lisongmin lisongmin deleted the lisongmin:not-utf-8-encoding-support branch Jun 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment