New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor gcov parsing code, support upcoming GCC 8 changes #228

Merged
merged 7 commits into from Mar 6, 2018

Conversation

Projects
None yet
3 participants
@latk
Member

latk commented Feb 25, 2018

This PR refactors the gcovr.gcov module. There are a lot of changes. In general, I tried to extract smaller, more meaningful functions, and reduce repeated code. Most functions should now fit into one or two screens.

  • Finding the source file from a .gcov file is now done by guess_source_file_name(), guess_source_file_name_via_aliases() and guess_source_file_name_heuristics(). The code is now clearer and less deeply nested.

  • The actual .gcov file parsing code was gathered into a GcovParser class. This class carries the parsing state from one line to the next. Previously, all of these members were variables around a huge loop.

    • parse_all_lines() runs the parser.
    • parse_line() parses a single line in a gcov report, and keeps track of relevant state.
    • parse_code_line() parses the normal 4: 7: foo() style gcov lines (execution count : line number : source code).
    • parse_tag_line() handles extra information, such as branch statistics and GCC 8 specialization sections.
    • parse_exclusion_marker assists parse_line() with markers like GCOV_EXCL_LINE etc.
    • check_unclosed_exclusions() and check_unrecognized_lines() emit warnings after parsing has completed
    • update_coverage() writes the data into a CoverageData object.
  • find_potential_working_directories_via_objdir() implements heuristics to suggest a gcov working directory.

  • run_gcov_and_process_files() runs the external gcov tool, collects the output, and processes the generated files. select_gcov_files_from_stdout() assists with finding the output files.

  • apply_filter_include_exclude() extracts the common filter application. It receives a (relative) filename, a list of filters that must match, and a list of exclusion filters that must not match. As filters are now processed in a single place, this will make them easier to improve via #191.

  • A gcovr.utils.Logger class has been created. This helps to have a more uniform message output format, and avoids countless if: options.verbose: ... branches. The available methods are:

    • verbose_msg() basically just calls if verbose: self.msg().
    • msg() writes info to STDOUT.
    • warn() writes a (WARNING) to STDERR.
    • error() writes an (ERROR) to STDERR.

    I suppose this can be improved/refined in the future.

  • The CoverageData class was simplified.

    • As an optimization, the constructor used to steal data structures from the gcov parsing code. Now, it maintains its own data structures that are only modified through the update() method. This drastically simplifies the code at no noticeable loss of performance.
    • smaller functions such as find_consecutive_ranges() were extracted.
    • this could be simplified even further with the collections.Counter class, but that would definitively break Python 2.6 compatibility.

The changes to support the upcoming GCC 8 were made in response to #226. The new gcov report format gets new sections that report coverage individually for each template or macro specialization. This PR does not support them, but merely strips away the section delimiters. The result is that branches are parsed for such lines, but I'm not sure how coverage from multiple sections on the same line is combined.

To prevent future problems like this, the parsing code was made more “robust”. The GcovParser.parse_all_lines() method catches any exceptions during parsing. The exception and the problematic line are stored. After parsing, the check_unrecognized_lines() method can be called to report these lines and exceptions. It will emit a warning on STDERR with all the offending lines, and also points to the Github issues. Previously, this message would only be shown in verbose mode, now it is shown always. Any exceptions are also printed as warnings (but I didn't find out how to include the stack trace). Afterwards, gcovr continues as usual and will not exit with an error. The implications of this design are:

  • If there is a parsing problem, there will always be a message on the console.
  • This will likely go unnoticed if gcovr is used automated, e.g. on a CI server, as gcovr will continue successfully.
  • There is one message per parsed file, which may spam the console for huge projects. There is currently no deduplication.
  • The error message points directly to Github rather than vaguely at “the gcovr developers”, which should make it easier for users to report problems.
  • The exceptions are reported separately from the lines that caused them, which makes it more difficult to act on a bug report.

So there's still a lot that can be improved with regards to gcov parsing and logging, but this is a start.

latk added some commits Feb 23, 2018

gcov: furher refactoring
 - extract filter handling into apply_filter_include_exclude() function
 - split up process_datafile() function
simplify CoverageData
 - remove insignificant optimizations in constructor
 - extract find_consecutive_ranges()
gcov: support GCC 8 gcov format
See #226 for background.

At this opportunity, the parsing was made a bit more robust.

  - Any exceptions during parsing are swallowed.
    Parsing will always succeed, but may be garbage.
  - After parsing, failed lines and any exceptions are summarized.
  - These errors are shown always, not just in verbose mode.
  - The error message links directly to the Github issues.
@codecov

This comment has been minimized.

codecov bot commented Feb 25, 2018

Codecov Report

Merging #228 into master will increase coverage by 3.5%.
The diff coverage is 84.35%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #228     +/-   ##
========================================
+ Coverage      84%   87.5%   +3.5%     
========================================
  Files          11      12      +1     
  Lines        1194    1337    +143     
  Branches      248     240      -8     
========================================
+ Hits         1003    1170    +167     
+ Misses        131     116     -15     
+ Partials       60      51      -9
Impacted Files Coverage Δ
gcovr/utils.py 66.42% <100%> (+4.12%) ⬆️
gcovr/tests/test_gcov_parser.py 100% <100%> (ø)
gcovr/gcov.py 81.96% <78.39%> (+12.91%) ⬆️
gcovr/__main__.py 88.54% <88.88%> (+0.08%) ⬆️
gcovr/coverage.py 98.57% <97.36%> (-1.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12e0ec0...6d8c00c. Read the comment docs.

@hugovk

This comment has been minimized.

hugovk commented Feb 25, 2018

Is Python 2.6 still needed? It's been EOL for over 4 years and is no longer receiving security updates.

It's also little used. Here's the pip installs for gcovr from PyPI for last month:

python_version percent download_count
2.7 67.45% 4,356
3.4 16.80% 1,085
3.5 13.16% 850
2.6 1.39% 90
3.6 1.18% 76
3.7 0.02% 1

Source: pypinfo --start-date -55 --end-date -25 --percent --pip --markdown gcovr pyversion

(Python 3.3 is also EOL not listed here, how about dropping that too?)

gnomesysadmins pushed a commit to GNOME/nautilus that referenced this pull request Feb 26, 2018

CI: disable coverage report
gcovr is broken with GCC 8 due to changes in Gcov report format. Revert
when gcovr/gcovr#228 is merged.

gnomesysadmins pushed a commit to GNOME/nautilus that referenced this pull request Feb 26, 2018

CI: disable coverage report
gcovr is broken with GCC 8 due to changes in Gcov report format. Revert
when gcovr/gcovr#228 is merged.
@@ -228,3 +228,43 @@ def build_filter(regex):
return re.compile(re.escape(os.getcwd() + "\\") + regex)
else:
return re.compile(os.path.realpath(regex))
class Logger(object):

This comment has been minimized.

@mayeut

mayeut Feb 27, 2018

Contributor

Why use a custom logger rather than python logging module ?

This comment has been minimized.

@latk

latk Feb 27, 2018

Member

Good point, I didn't know about logging previously. I'm open to switching to a “proper” logging framework in the future. Right now a low-ceremony custom logger is the least invasive change, so I would prefer to keep this part as it is.

args, kwargs: str.format arguments
"""
pattern = pattern + "\n"
sys.stdout.write(pattern.format(*args, **kwargs))

This comment has been minimized.

@mayeut

mayeut Feb 27, 2018

Contributor

if no output file has been specified then the coverage output will be output on stdout as well. Is that a problem ?

This comment has been minimized.

@latk

latk Feb 27, 2018

Member

I just extracted the existing sys.stderr.write() and print() calls into this class. So each message continues to go were it went previously: warnings/errors to stderr, and --verbose messages to stdout.

Arguably that's wrong (for exactly your reason) and should be redesigned. However, this PR already has more than enough changes … (sorry about that!)

This comment has been minimized.

@mayeut

mayeut Feb 28, 2018

Contributor

If there are no regressions, that's fine by me. An issue shall probably be opened regarding this, and, as you said, this is another topic.

for line in lines:
try:
self.parse_line(line, exclude_unreachable_branches)
except Exception as ex:

This comment has been minimized.

@mayeut

mayeut Feb 27, 2018

Contributor

This shall probably be an opt-in behavior. I'd rather have gcovr fail with an error message and reporting we can opt-in to ignore those by adding flag --ignore-parsing-errors or something like that.

This comment has been minimized.

@latk

latk Feb 27, 2018

Member

You're absolutely right. I'll add a commit to this PR. Thank you for spotting this issue!

This comment has been minimized.

@latk

latk Mar 3, 2018

Member

Ok, I've added a --gcov-ignore-parse-errors flag that is default-off.

If the flag is enabled, gcovr will

  • print all offending lines,
  • print all caught exceptions,
  • and continue processing.

If the flag is not enabled (the default), gcovr does not continue but

  • also prints offending lines and caught exceptions,
  • then displays an error message that the flag can be enabled,
  • then rethrows the first exception if one was caught in order to display its traceback
  • or sys.exit(1) if no exceptions were caught.

In all cases, the errors are processed only after a the complete file has been parsed.

don't --gcov-ignore-parse-errors by default
The new `--gcov-ignore-parse-errors` flag controls whether the
`gcovr.gcov.GcovParser.parse_all_lines()` method throws on errors.

If the flag is enabled, gcovr will

 - print all offending lines,
 - print all caught exceptions,
 - and continue processing.

If the flag is not enabled (the default), gcovr does not continue but

 - displays an error message that the flag can be enabled,
 - then rethrows the first exception if one was caught
 - or `sys.exit(1)` if no exceptions were caught.

In all cases, the errors are processed only *after* a the complete file
has been parsed.

To implement this change, the `check_*()` methods of `GcovParser` were
pulled into the `parse_all_lines()` method.

@latk latk merged commit 6d8c00c into gcovr:master Mar 6, 2018

4 checks passed

codecov/patch 84.35% of diff hit (target 84%)
Details
codecov/project 87.5% (+3.5%) compared to 12e0ec0
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@latk latk deleted the latk:simplify-gcov branch Mar 6, 2018

@latk latk removed the needs review label Mar 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment