Skip to content

Differential coverage, date and owner binning - plus a few minor enhancements and refactoring (was "Diffcov initial") #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 81 commits into from

Conversation

henry2cox
Copy link
Collaborator

@henry2cox henry2cox commented Jul 31, 2020

This request contains an implementation of differential coverage + date- and owner- binning.
A paper which describes the approach and the development methodology that they enable can be found at https://arxiv.org/abs/2008.07947.
The basic idea is to take advantage of history to identify un-exercised code which has been recently added or changed, as well as unchanged code which is no longer exercised.
The two features - differential and binning - are orthogonal in that you can use either, both, or neither. With neither feature enabled, the result is the same as 'vanilla lcov' that we see today.
The new features were added to the 'gendiffcov' script - which is a fork of the original 'genhtml'. I did this initially so that I could run both scripts for testing, and then left it this way. 'genhtml' is a subset of 'gendiffcov' now - so it isn't necessary to keep both.
There are some other changes to extract common functionality into a perl module, as well as some 'filtering' to make branch coverage statistics usable in our framework by ignoring branch data for lines which do not seem to have any conditionals.

Additional comment added 23 Sept 2020:
A good way to understand the differential coverage and date/owner binning features is to run the testcases.
Clone repo and unpack
cd .../lcov/tests/gendiffcov/simple
make
now open the 'index.html' file in each of the generated subdirectories.
They test the various combinations of differential vs legacy coverage, binning or not - as well as some options to control which data is displayed. Note that all these tests use the same tiny example from the above paper - so features related to multiple source directories and multiple source files per directory - are not evident.

@oberpar
Copy link
Contributor

oberpar commented Aug 11, 2020

Thank you for your contribution to LCOV. Differential coverage analysis support has been a long-standing TODO for LCOV and from a first look, your implementation seems very feature-rich.

That said, I do think that this patch set will need a lot of work before it can be considered for inclusion into the LCOV mainline code. I think the effort would be worth it though, both for your internal users who would gain potential improvements to usability and stability and also long-term support via the upstream integration, but also for LCOV users who can then make use of this new functionality.

I can offer to support you with my review feedback. Below I've listed some areas that need changes.

1. Split up commits

The amount of code contained in your patch set makes it very hard to handle from a reviewer's point of view. Please split it up into smaller chunks, where each one relates to one functional aspect.

Here are some functional aspects that deserve separate commits:

  • author binning
  • date binning
  • differential coverage
  • filtering

Additional commits may be required for staging (e.g. for splitting out the renaming required when introducing package concepts), or introducing prerequisite functionality.

2. Integration

Your current implementation duplicates a lot of genhtml and genpng code, and unnecessarily introduces new executables for functionality that can be integrated into existing tools. I would suggest to merge the gendiffcov tool into genhtml, and the gendiffpng tool into genpng.

I would also ask you to try to minimize the changes to the existing logic as much as possible. This will help keep the code readable and maintainable, and reduce the chance of introducing regressions into the existing functionality.

As an example take function write_source_line() in gendiffcov/genhtml: gendiffcov generates the same type of output line, with some new and some changed components. Your implementation looks like a major rewrite of this function, which makes it difficult for readers to relate the new lines to anything done in previous commits, and which increases the chance for new bugs.

A better approach would be to add additional calls to new subroutines that encapsulate the logic to generate the new pieces into the existing function logic. The original logic should be kept intact when possible.

3. Coding style

I understand that the coding style used in large portions of LCOV is somewhat special but mixing different styles makes maintaining code an unnecessarily complex task.

Here are some rules that your code should follow:

  • use words_separated_by_underscore instead of camelCase
  • use tab indentation - a tab expands to 8 blanks
  • if the indentation makes your code move too far to the right, this is an indication that the code is too complex and should be split up into multiple functions
  • put opening braces for subroutines on the next line after the declaration
  • function prototypes should declare parameters (sub func($$))
  • add comments to function declarations that describe their purpose, and if they work on complex datastructures (e.g. hashes) then also describe their layout
  • add comments before complex code blocks
  • add newlines for readability after blocks of variable declarations, blocks of related code, etc.
  • remove all non-functionality whitespace changes

Also each commit must be accompanied with a descriptive commit message including a Signed-off-by line. See also Contribution guidelines

4. Usability

I find it very difficult to memorize the TLAs your implementation assigns to each class of coverage state change, and to decipher their meaning. Also the color coding seems to add very little extra information.

I'd like to suggest to label the lines, functions, branches in a different way that can be more easily deciphered intuitively. Instead of giving a hard-to-memorize TLA to each state change, why not spell it out?

As an example, the state of a line in a coverage set according to what I think the TLAs mean can be either one of the following:

  • Does not exist
  • Exists but is not covered
  • Exists and is covered
  • Line is excluded (though I'm not sure how you detect excluded lines)

You could simply assign an easy to memorize symbol to each of these states, e.g.:

  • #: Does not exist
  • 0: Exists but is not covered
  • 1: Exists and is covered
  • X: Line is excluded

Then you can represent the state changes from base to current via a transition such as base -> current. This would give the following TLA mapping:

  • Coverage rate changes

    • UBC: 0->0
    • GBC: 0->1
    • LBC: 1->0
    • CBC: 1->1
  • Insertions

    • UNC: #->0
    • GNC: #->1
  • Deletions

    • DUB: 0->#
    • DCB: 1->#
  • Removed exclusions

    • UIC: X->0
    • GIC: X->1
  • Added exclusions

    • EUB: 0->X
    • ECB: 1->X

A similar, abbreviated version could be used for representing branch state changes such as: +- ++ -- +# #+ +X

Regarding color coding I think it would make the most sense to use colors only to indicate that an element needs to be looked at, in order of priority:

  • red: anything that changed to 0 with the exception of 0->0
  • yellow: 0->0
  • blue: anything else

5. Miscellaneous

Other observations/feedback:

  • Is there really a need for DateTime/Format/W3CDTF.pm? Requiring new modules always causes fall-out with users
  • Remove debugging outputs and comments
  • --show-details seems to be deactivated in gendiffcov
  • Do not change the line coloring when no base-file is specified
  • Create an lcov directory in $LIB_DIR when installing and put all lcov library installables there
  • Use a plug-in concept for the annotation scripts, e.g. --annotate git => uses script in /usr/lib/lcov/annotate/git
  • Scripts should check any non-standard tooling that is required to function (e.g. p4) and abort with a dedicated error message if those are missing
  • gendiffcov emits warnings when combining -s or --legend with branch coverage
  • Add more tests

@henry2cox
Copy link
Collaborator Author

henry2cox commented Aug 11, 2020 via email

@henry2cox
Copy link
Collaborator Author

When fixing an issue that one of my users discovered, I noticed that I had forgotten to add/push one of the key files needed by the 'test/gendiffcov/simple' testcase (to exercise the 'annotate' code, without requiring an actual P4 or git repo).

@henry2cox
Copy link
Collaborator Author

I just submitted some more changes and fixes. In the process, I discovered that I had somehow failed to submit some lines in some files - not sure how that happened). I checked that things are complete (in several different ways) this time.

With respect to the earlier comments:

Is there really a need for DateTime/Format/W3CDTF.pm? Requiring new modules always causes fall-out with users

I use the W3CDTF package to parse timestamps returned from 'git' and 'p4' in the annotate script.
It is possible to remove this use from the gendiffcov (genhtml) source - but then we push the burden to the user to handle the common case of translation from revision control date format to age-in-days.
I think it is more reasonable to put that common functionality in a central location.
It IS possible to selectively load the module so as to not error out if the module is missing, unless it is actually required by the annotate feature (that is: do not emit an error until we need the package and cannot find it).

--show-details seems to be deactivated in gendiffcov

Fixed. '--show-details' also puts entries into the corresponding TLA column for 'hit' lines or branches (i.e., the GNC, GIC, GBC and CBC columns).

Do not change the line coloring when no base-file is specified

Fixed.

Use a plug-in concept for the annotation scripts, e.g. --annotate git => uses script in /usr/lib/lcov/annotate/git

I do not think I understand this request. Could you elaborate.

Scripts should check any non-standard tooling that is required to function (e.g. p4) and abort with a dedicated error message if those are missing

The 'git' or 'p4' or other revision control system interface is called from within the user's 'annotate' script. I provide a couple of samples - but the expectation is that many users will have to write their own (using these as a model). There is no way for me to know what other dependencies the user script may have.
The sample scripts only work in very simple build environments where the revision control layout and the build layout are identical (no renaming or links); my experience is that almost all non-trivial projects are more complicated than that.

gendiffcov emits warnings when combining -s or --legend with branch coverage

I could not reproduce the issues. The option seems to work in my sandbox.

Add more tests

Is there an interface that we can use, to generate .info files for the perl tests?
That way, we could run the testsuite - and check the differential coverage :-)
(We do exactly this for Java, internally - but I haven't done anything about perl, yet.
My intent is to add the Java support scripts to the 'scripts' library directory, though.

I hope this helps.

Henry

@henry2cox
Copy link
Collaborator Author

Just pushed some additional functionality.
Now generating 'project summary tables' using date or owner bin as the primary key.
Tables are created at both the top-level and directory level.
This feature is quite useful in project status or or project review meetings - to see what has been worked on lately, who is working on what, and whether coverage holes remain which need to be addressed before the next release.
(This feature was requested by one of my internal users - and has turned out to be quite popular with the others.)

@henry2cox
Copy link
Collaborator Author

...and pushed another set of changes.
This contains some changes required to support Verilog expression coverage (note that the Verilog interface code is not upstreamed, yet. We need some explicit permissions from Synopsys, Cadence, and Mentor - as well as approval from our upper management.)
I also added more consistency and error checking, especially to the 'diff' and 'annotate' interfaces - mainly driven by some common mistakes and problems that I have seen as new teams adopt the tool.
Finally, there are a few bug fixes.

henry2cox and others added 7 commits September 14, 2020 14:48
details should continue to show those details
BUGFIX:  --show-details feature now works with differential categories
FUNCTIONALITY:  use original LCOV source covered/not covered color
scheme if not using differential categorization.
REFACTOR:  share common utility code.
TESTABILITY:  missed diffcov test files.
'--simplified-colors' option to make source code view less busy.
FUNCTIONALITY:  more error/consistency checking for annotations and diff
file.
BUGFIX:  handle nested repositories and submodules in git annotation.
meklort and others added 2 commits September 18, 2020 07:36
Signed-off-by: Evan Lojewski <github@meklort.com>
PERFORMANCE:  fix JSON::PP performance bug
TESTABILITY:  integrate gendiffcov tests into lcov framework
PORTABILITY:  'realpath' seems not to exist on old linux releases - use 'readlink -e' - which seems more available and does the same thing.
REFACTOR/MERGE:  merge all changes to 'genhtml', 'genpng' scripts.  Don't keep separate differential coverage versions.
@henry2cox
Copy link
Collaborator Author

As above: just pushed another enhancement, to support hierarchical HTML report (following directory structure of source code).
This addresses issue #97 that I filed earlier this week

@henry2cox henry2cox changed the title Diffcov initial Differential coverage, date and owner binning - plus a few minor enhancements and refactoring (was "Diffcov initial") Oct 8, 2020
@henry2cox
Copy link
Collaborator Author

The above commit implements the feature (workaround) described in issue #98 that I filed a day or two ago.

functions which are defined on the same file/line
BUGFIX:  some bug fixes and merge error corrections.
REFACTOR:  hold function coverage alias groups.  (All functions which
are found on the same file/line are assumed to be aliases of each
other.)
BUGFIX: function coverage detail page sort order should use demangled
names.
llvm/clang
FUNCTIONALITY:  support environment variable in lcovrc files
CLEANUP: refactor common code.
to file, option to preserve geninfo intermediate data
@henry2cox henry2cox force-pushed the diffcov_initial branch 3 times, most recently from 6842017 to b1f8fe7 Compare September 28, 2022 21:23
@henry2cox henry2cox force-pushed the diffcov_initial branch 3 times, most recently from ef46161 to 7434d33 Compare October 6, 2022 16:24
henry2cox and others added 2 commits October 10, 2022 14:15
particular, to handle bogus out-of-range coverpoints inserted if
nontrivial macro is on the last line of the source file.
Split 'line' filter into 'brace', 'blank', 'range' into separate
controls.
@GitMensch
Copy link

Thank you for your work on this, I've just stumbled over the need to "compare" two results and this seems to solved the problem.

@GitMensch GitMensch mentioned this pull request Oct 18, 2022
man/genhtml.1 Outdated
can be generated using the command "git diff \-\-relative SHA_base SHA_current", or using the "p4udiff" or "gitdiff" sample scripts (found in the share/lcov/support\-scripts directory shipped as part of this release).
"p4udiff" accepts either a changelist ID or the literal string "sandbox"; "sandbox" indicates that there are modified files which have not been checked in.

These scripts post\-process the 'p4' or 'git' output to (optionally) remove files that are not of interest and to explicitly not files whcih have not changed. It is useful to note unchanged files (denoted by lines of the form

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These scripts post\-process the 'p4' or 'git' output to (optionally) remove files that are not of interest and to explicitly not files whcih have not changed. It is useful to note unchanged files (denoted by lines of the form
These scripts post\-process the 'p4' or 'git' output to (optionally) remove files that are not of interest and to explicitly not files which have not changed. It is useful to note unchanged files (denoted by lines of the form

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks...fixed in my local sandbox.
Will push this (with some other fixes) after testing is complete.

@henry2cox henry2cox mentioned this pull request Oct 31, 2022
@al-babych-fivetran
Copy link

I'm not sure that this is related to your PR, but genhtml provide invalid HTML:

<td width="5%" class="headerCovTableHead" title="Uncovered New Code""><span class="tlaUNC">UNC</span></td>

^ second " after title

@henry2cox
Copy link
Collaborator Author

henry2cox commented Nov 1, 2022 via email

peak memory during parallel execution.

FIXES: missing semicolon, 'executable' flag, merge child data, debug
logging
… better performance (for that step - Amdahl suggests that overall benefit is much less).
@henry2cox
Copy link
Collaborator Author

Closing this P$ in favor of newly created #169 - which contains all the same code, but a single flattened commit.

@henry2cox henry2cox closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants