Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support --source-encoding option. #256

Merged
merged 4 commits into from Jun 3, 2018

Conversation

lisongmin
Copy link
Contributor

This PR add --source-encoding suggest in #148 .

Note that io.open will load file as unicode under python2, which may change the coding behavior.

@codecov
Copy link

codecov bot commented May 21, 2018

Codecov Report

Merging #256 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #256      +/-   ##
==========================================
+ Coverage   88.76%   88.83%   +0.06%     
==========================================
  Files          13       13              
  Lines        1478     1487       +9     
  Branches      267      268       +1     
==========================================
+ Hits         1312     1321       +9     
  Misses        108      108              
  Partials       58       58
Impacted Files Coverage Δ
gcovr/tests/test_gcovr.py 97.67% <100%> (+0.11%) ⬆️
gcovr/tests/test_gcov_parser.py 100% <100%> (ø) ⬆️
gcovr/html_generator.py 96.53% <100%> (+0.01%) ⬆️
gcovr/__main__.py 89.17% <100%> (+0.13%) ⬆️
gcovr/gcov.py 83.46% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17dfaf6...21ebbaa. Read the comment docs.

@latk
Copy link
Member

latk commented May 21, 2018

This is fantastic! Thank you very much for looking into this. I'll review properly in the next couple of days.

BTW the Travis test failure was just a small style problem. To check for those before you commit you can run flake8 locally, e.g. python3 -m flake8 gcovr/.

If you'd like to work on other issues, please consider first making a comment on the corresponding issue or mentioning this in the chat room. I really appreciate your work, but some issues first need some discussion on the best design.

@lisongmin
Copy link
Contributor Author

Ok. Test should pass now.

I forgot run it locally. It is greate if there is a .flake8 file under the project root and autopep8 as well. modern editor with plugin may auto check if it is there(like vim, vs code).

Do you mind change the file contents to unicode in python2?

@lisongmin
Copy link
Contributor Author

And it is less tolerant than #156 because encode/decode error handler is replace but not surrogateescape for python2 compat. Any way, we cat get coverage even if encoding error. that is important.

Copy link
Member

@latk latk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this PR more thoroughly. Encoding issues are difficult! I'll try to work on test cases with various encodings, and will not merge until there are some tests that work on all platforms.

@@ -235,7 +236,8 @@ def print_html_report(covdata, options):
if options.output is None:
sys.stdout.write(htmlString + '\n')
else:
OUTPUT = open(options.output, 'w')
OUTPUT = io.open(options.output, 'w', encoding=options.html_encoding,
errors='replace')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for HTML output, setting errors='xmlcharrefreplace' might be more appropriate

options.add_argument(
'--source-encoding',
help="Override the declared source encoding. "
"Defaults to the system default encoding.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --source-encoding and --html-encoding options should reference each other. Now that the source encoding can be specified explicitly, the --html-encoding help message is incorrect.
It might also be nice to show the system default encoding (available through locale.getpreferredencoding()).

Copy link
Contributor Author

@lisongmin lisongmin May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should independency? the transition in python:

source -- read and convert to --> unicode(or str on python3) -- convert to utf-8 and write --> html(utf-8)

--html-encoding is show in broswer, and utf-8 is suitable for that.

@@ -87,7 +85,8 @@ def is_non_code(code):
#
def process_gcov_data(data_fname, covdata, source_fname, options, currdir=None):
logger = Logger(options.verbose)
INPUT = open(data_fname, "r")
INPUT = io.open(data_fname, "r", encoding=options.source_encoding,
errors='replace')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about whether it is correct to decode the gcov files in one go. The problem is that the gcov tool is encoding transparent. The output files contain ASCII annotations and the original source file contents. So opening these files will only work under an ASCII-compatible encoding.

Perhaps this restriction should be mentioned in the --source-encoding help message. It would also be good to have relevant tests. I'll try to develop a test case in the course of this week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it locally with non-ascii chars and output is correct before make this PR.

  • input with utf-8 and output with gbk
  • input with utf-8 and output with utf-8

Copy link
Contributor Author

@lisongmin lisongmin May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests are you wanted to add? I can have a try although i'm not familiar to gcovr's test framwork.

if sys.version_info < (3, 0):
code = code.strip().translate(None, '}{')
else:
code = code.strip().translate(noncode_mapper)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused by this change, but apparently unicode.translate() from Python 2.7 behaves like str.translate() in Python 3. It might be good to make that requirement explicit, e.g. by adding a six dependency and:

import six
...

def is_non_code(code):
    assert isinstance(code, six.text_type)
    ...

Copy link
Contributor Author

@lisongmin lisongmin May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.open returns unicode on python2 and str on python3

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we can ensure io.open return unicode, adding additional module six is not necessary in my opinion.

lisongmin and others added 3 commits May 29, 2018 22:07
This checks --source-encoding for utf8 and cp1252 (iso-8895-1, latin-1),
and --html-encoding for utf8, cp1252, and iso-8859-15 (latin-9).
Note that utf8 is the default for --html-encoding.

Untested: treatment of encoding errors when reading gcov files.
Copy link
Member

@latk latk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR with tests for the encoding behaviour, and a small documentation improvement. I'll do the merge now. Sorry that it took so long!

@latk latk merged commit ff4c43d into gcovr:master Jun 3, 2018
@latk latk removed the needs review label Jun 3, 2018
@latk latk mentioned this pull request Jun 3, 2018
@lisongmin lisongmin deleted the not-utf-8-encoding-support branch June 4, 2018 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants