-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lcov reports should not use line number zero in BRDA: records #1846
Comments
Can you share that project? |
Sure: https://github.com/MillionConcepts/pdr Using that, a repro recipe should be
|
If I understand correctly what BRDA: records mean (which I am about 90% sure of), the line number is supposed to be the line where the branch actually is, and the block and branch numbers are supposed to be, together, a unique but meaningless identifier for the branch's destination. Based on this, arc records with negative fromno shouldn't generate BRDA: records at all, and you can just make up block and branch numbers for all the different tonos for any given positive fromno (it seems to be fine to use zero in those fields). Also, |
Thanks. Can you point me to specific entries in the .coverage file that have nonsensical entries in the arcs table? BTW, I'm using Python 3.11. Your setup.py says 3.9 should work, but it fails when trying to import inspect.get_annotations. |
I do not know which entries in the arcs table are causing the problem. However, if you run "coverage lcov" you should see several BRDA:0 lines in the output. If there aren't any, try "git switch develop" and then re-running the tests, which may also clear up the inspect.get_annotations / python 3.9 issue.
…On Thu, Sep 5, 2024, at 4:45 PM, Ned Batchelder wrote:
Thanks. Can you point me to specific entries in the .coverage file that have nonsensical entries in the arcs table? BTW, I'm using Python 3.11. Your setup.py says 3.9 should work, but it fails when trying to import inspect.get_annotations.
—
Reply to this email directly, view it on GitHub <#1846 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACPSC7QNSCRP2O7AVK2YILZVC7FTAVCNFSM6AAAAABNVAXYIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZSGU4TSMRYHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry, you said:
I'm curious what entries in the arc table seem wrong to you? |
Ah, I misunderstood what you meant by "nonsensical entries in the arcs table". I'm attaching a SQL dump of the coverage database that I'm looking at. I have added a
I don't know if this has anything to do with the It is also not clear to me what negative numbers other than -1 mean. This database was generated from git commit 092675e442438ddc39d4bb9e9d0d5f4c1e317f29. The
|
Another piece of information that might be useful to you: here is a comparison between the output of "Normalizing both files for comparison" means:
Abstractly, every difference shown in this attachment represents a bug in one of the three tools used, but I do not know which one. For purpose of this bug report, however, please note that the BRDA records are consistently totally different. There are occasional differences in DA records as well. |
A couple of times earlier I said "GCDA" when I meant "BRDA". Sorry. No idea where GC came from. |
I'm coming to the unfortunate conclusion that the coverage.py lcov output is simply wrong with respect to branches. It produces BRDA lines that name the destination line, but should name the source line. The lcov format is poorly documented. This seems to be the best we have: https://manpages.debian.org/stretch/lcov/geninfo.1.en.html . It says:
"which" is supposed to be "with" I guess? And line number isn't clarified. The xml2lcov tool reports the source line number, which makes sense. Changing to that interpretation will solve the 0 line number problem. @bradb423 What do you think? |
…rts. This fixes five serious bugs: - The first field of a BRDA: line may not be zero (nedbat#1846) - The first field of a BRDA: line is supposed to be the *source* line of a branch, not the destination line. - The fourth field of a BRDA: line is supposed to be “-” when the branch was *never reached*, not when it was reached but never taken (which is what a branch’s presence in missing_arcs means). As far as I can tell, coverage.py currently doesn’t know of the existence of branches that were never reached. - The decision of whether to emit a DA: line at all is now taken according to what’s in analysis.statements, not what’s in analysis.executed and analysis.missing; this is important because some lines (e.g. the beginnings of docstrings) may appear in analysis.executed but *not* in analysis.statements. - We no longer attempt to call branch-coverage-related Analysis methods when analysis.has_arcs is false. And two minor annoyances: - DA: and BRDA: lines are now emitted strictly in ascending order by (source) line number. - Source file records are now sorted by *relative* pathname, not absolute pathname from the coverage database.
…rts. This fixes five serious bugs: - The first field of a BRDA: line may not be zero (nedbat#1846). - The first field of a BRDA: line is supposed to be the *source* line of each instrumented branch, not the destination line. - The fourth field of a BRDA: line is supposed to be “-” when the branch was *never reached*, not when it was reached but never/always taken (which is what a branch’s presence in missing_arcs means). As far as I can tell, coverage.py currently doesn’t know of the existence of branches that were never reached. - The decision of whether to emit DA: and BRDA: lines at all is now taken strictly according to what’s in analysis.statements. This is important because some lines may appear in analysis.executed and/or analysis.executed_branch_arcs but *not* in analysis.statements. For example, the beginnings of docstrings are like this, as is the phantom line 1 of an empty __init__.py in Python 3.10 and earlier. (I am pleased to note that the special casing of empty __init__.py in the test suite is no longer required after this patch.) - We no longer attempt to call branch-coverage-related Analysis methods when analysis.has_arcs is false. And two minor annoyances: - DA: and BRDA: lines are now emitted strictly in ascending order by (source) line number. - Source file records are now sorted by *relative* pathname, not absolute pathname from the coverage database.
* [lcov] Style improvements to lcovreport.py - Rename LcovReporter.get_lcov to LcovReporter.lcov_file because the old name sounds like it returns something. lcov_file was selected by analogy with XmlReporter.xml_file. - Move the increment of self.total from lcov_file to the loop in LcovReporter.report for better separation of concerns. * [lcov] honor skip_empty; improve consistency with xml2lcov Implement skip_empty mode in LcovReporter. Also, improve consistency with what you get by running ‘coverage xml’ and then ‘xml2lcov’ by not emitting any vacuous TN: lines, not emitting LF:0 LH:0 for empty files, and not emitting BRF:0 BRH:0 for files with no branches. * [lcov] don’t write DA-line checksums by default DA-line checksums bulk up the .lcov file and provide only a weak assurance that the source file being processed for analysis matches the source file that was used to generate coverage data. Current versions of the LCOV suite discourage their use. Add a boolean configuration option, lcov.line_checksums, which controls whether checksums are generated for DA lines. Consistent with the current behavior of the LCOV suite, the default is to not generate any checksums. * [lcov] Re-implement lcov reports using the same algorithm as XML reports. This fixes five serious bugs: - The first field of a BRDA: line may not be zero (#1846). - The first field of a BRDA: line is supposed to be the *source* line of each instrumented branch, not the destination line. - The fourth field of a BRDA: line is supposed to be “-” when the branch was *never reached*, not when it was reached but never/always taken (which is what a branch’s presence in missing_arcs means). As far as I can tell, coverage.py currently doesn’t know of the existence of branches that were never reached. - The decision of whether to emit DA: and BRDA: lines at all is now taken strictly according to what’s in analysis.statements. This is important because some lines may appear in analysis.executed and/or analysis.executed_branch_arcs but *not* in analysis.statements. For example, the beginnings of docstrings are like this, as is the phantom line 1 of an empty __init__.py in Python 3.10 and earlier. (I am pleased to note that the special casing of empty __init__.py in the test suite is no longer required after this patch.) - We no longer attempt to call branch-coverage-related Analysis methods when analysis.has_arcs is false. And two minor annoyances: - DA: and BRDA: lines are now emitted strictly in ascending order by (source) line number. - Source file records are now sorted by *relative* pathname, not absolute pathname from the coverage database. --------- Co-authored-by: Zack Weinberg <zack@owlfolio.org>
Likely, the best lcov coverage data format description is at https://github.com/linux-test-project/lcov/blob/master/man/geninfo.1, in the "TRACEFILE FORMAT" section.
A more up-to-date version would note that the 'branch number' can be .. anything. In particular, it may be a human-readable expression which would help a user understand the expression truth table. The 'block number' is there so that the compiler/toolchain can accurately describe cases when the same source gives rise to multiple different expressions (for example, with different numbers of branches or with different associated expressions). This is not uncommon when C++ template functions use temple arguments in some branch expression, or (less commonly) when a tool does aggressive inlining. As noted in the usage message for lcov's
Yes. Typo. Will fix. |
Thanks for this link. It has been hard to find published information about the format. Do you publish this content in a more readable format someplace? It would help us tool authors provide support for the lcov format.
Yes, coverage.py is fundamentally line oriented for now. |
BTW, I've made a pull request against the lcov.1 file: https://github.com/linux-test-project/lcov/pull/320/files |
Thanks, this is really helpful. I had completely misunderstood block numbers; I thought they were for disambiguating multiple branch expressions on the same line, not for disambiguating template expansions and the like. Also, I had no idea that the "branch number" is really a "branch destination label" and can be a human-readable string. In my local working copy I now have the something = True
if something:
print("Yes, something")
else:
print("No, nothing") coming out like this:
And, in turn, genhtml's hover text for the [ + - ] on line 3 is quite a bit more useful.
Internally, what coverage.py knows about branch expressions is that line N is the source of more than one control flow arc (== 'edge' in most compiler literature), with destination lines X, Y, Z. (There are almost never more than two destinations, but it can happen.) It does not know under what conditions each arc will be taken. So in def fn(a, b): # 1
if a or b: # 2
print("a or b") # 3
if a and b: # 4
print ("a and b") # 5
fn(True, False)
fn(False, False) the internal representation knows that line 2 is the source of control flow arcs leading to lines 3 and 4, both of which were executed at least once. It also knows that line 4 is the source of a control flow arc leading to line 4, which was never executed, and another arc that exits the function, which was executed at least once. But it does not know anything about the subexpressions within lines 2 and 4 and so it cannot report that line 4 was never executed because of A less contrived example, cut down from real code: def crunch(*args):
pass
def fn(data):
if any(("spectrum" in d.lower() for d in data)): # line 5
crunch(data)
fn([])
fn(["image"])
fn(["image", "time-series"]) The internal records tell us that control flow went from line 5 to two different "exit" targets (leaving the generator expression and leaving the entire function), and that it could have also gone to line 6 but never did. In my working tree this will be directly reflected in
This is not ideal; the distinction between exit 4 and exit 5 only makes sense in terms of the bytecode translation of the function (in which the generator expression turns into a lambda). The native report generator, Cobertura-format XML coverage reports impose further loss of information. The relevant bits of <lines>
<line number="1" hits="1"/>
<line number="2" hits="0"/>
<line number="4" hits="1"/>
<line number="5" hits="1" branch="true"
condition-coverage="50% (1/2)" missing-branches="6"/>
<line number="6" hits="0"/>
<line number="8" hits="1"/>
<line number="9" hits="1"/>
<line number="10" hits="1"/>
</lines> because |
At present: no. It is just in the source code (...definitive) and the man page (...hopefully accurate). |
I would characterize it differently...the 'branch number' (or branch expression) is there to tell us how the conditional expression was evaluated. Consider if (a || b) {
bb_1;
else {
bb_2;
} The conditional on the first line has 3 possible outcomes:
In the first two cases, If the conditional is simple, then branch coverage doesn't tell you anything that you could not deduce from line coverage...but real control code is often pretty complicated, and scenario testing is really important. The fly in the ointment here is that most tools don't generate a reduced truth table - for example, gcc/gcov will show 4 outcomes for the above example. |
Describe the bug
coverage lcov
can generateBRDA:
records with line number zero. Current versions of lcov'sgenhtml
reject such lines:To Reproduce
The bad BRDA records are coming from here:
coveragepy/coverage/lcovreport.py
Lines 108 to 112 in 5467e1f
I do not understand how to hit this case. The commentary, and the description of what negative numbers mean in arc records (https://coverage.readthedocs.io/en/7.6.1/api_coveragedata.html#coverage.CoverageData.arc) make it sound like unexecuted conditional return statements could cause the problem. Running
coverage run --branch test.py
on this fileproduces a
.coverage
database whosearc
table does contain negative numbers, but runningcoverage lcov
on that database produceswhich does not use line zero and looks correct to me (well, modulo the fact that a human would say there isn't a branch on line 4? but that's a separate issue).
Looking at the database for the real program that produced the actual bad lcov-format report that prompted this issue, I see lots more cases of negative numbers in the arc table, but the associated positive numbers are too large to be actual line numbers in the associated file, which means I don't actually understand what the records in the arc table mean, so I'm stuck.
Expected behavior
coverage lcov
should generate a .lcov file thatgenhtml
does not reject.The text was updated successfully, but these errors were encountered: