Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in coverage percentage between Python 3.7 and Python 3.8 #866

Closed
sanjioh opened this issue Nov 4, 2019 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@sanjioh
Copy link

sanjioh commented Nov 4, 2019

Describe the bug
The coverage percentage appears to be sensible to the Python version under which coverage run and coverage combine + coverage report are executed. Specifically, the coverage percentage varies depending on the combinations of Python versions those commands are run with (while using the same version of coverage.py).

I have experienced the following patterns:

coverage run coverage combine + coverage report coverage percentage
Python 3.7 Python 3.7 100%
Python 3.8 Python 3.8 100%
Python 3.7 Python 3.8 92%
Python 3.8 Python 3.7 100%

To Reproduce

  • Clone this repository: https://github.com/sanjioh/tox-interpreters.
  • Run tox -r -e py38-tox314,coverage-report. This should report a 100% coverage.
  • Run tox -r -e py37-tox314,coverage-report. This should report a 92% coverage.
  • If basepython for the coverage-report tox env is changed to python3.7, coverage consistently reports 100%, as per the above table.

coverage.py version: 4.5.4

Expected behavior
I would expect to get 100% coverage for the combination of coverage run run with Python 3.7, coverage combine + coverage report run with Python 3.8.

Thanks for your support, please let me know if you need any further information.

@sanjioh sanjioh added the bug Something isn't working label Nov 4, 2019
@nedbat
Copy link
Owner

nedbat commented Nov 5, 2019

This is an unfortunate side effect of changes in details of the trace function between versions of Python. In 3.7 and earlier, a decorated function definition would invoke the trace function only for the decorator line, not for the def line. In 3.8 and later, the trace function is called for both the decorator line and the def line.

So your Python 3.7 run collected the data that the decorator lines were run, but not the def line, because that is how 3.7 behaves. When reporting the coverage on 3.8, coverage.py knows that both the decorator and def lines could have been marked as run, and sees only the decorator line in the collected data, and so marks the def line as not run.

I'm not sure what coverage.py can do to improve this situation. I guess in theory the "what could have been run" logic could use the Python version noted in the data, not the Python version it's running against, but I haven't thought through how feasible that is.

Can you coordinate to use the same version for both measurement and reporting?

@sanjioh
Copy link
Author

sanjioh commented Nov 5, 2019

Hi,

thanks for thorough explanation, that makes sense indeed.
My suspects were towards decorated functions, but I couldn't figure out the reasons behind.
Just a little weirdness I've noticed: what's special about the _split() method that makes coverage behave differently (and correctly)? Could this be a starting point for something?

Screen Shot 2019-11-05 at 09 48 03

While it would be surely nice for coverage.py to abstract away the details of the underlying tracing logic, I totally understand that's not something trivial to achieve.
For now, I'm going to combine the coverage results across all the Python versions I'm supporting, so the impact of this issue should be minimal.

Thanks again for your help.

@nedbat
Copy link
Owner

nedbat commented Nov 5, 2019

I noticed that about _split also. The default argument for sep causes the trace function to be called for the def line also, which I hadn't realized before. This gets messy... :(

@nedbat nedbat closed this as completed Nov 5, 2019
@ArturKlauser
Copy link

Just FYI (I ran into the same issue):
I think this problem is bound to become more prevalent now that python 3.8 is starting to become the 'default' python version in places, e.g. CI environments. E.g. I was running coverage over a test matrix of python versions and then combining+reporting with the default (i.e. newest stable) python version, which brings out this decorator problem. Maybe add to FAQ?

ArturKlauser added a commit to ArturKlauser/pymol-pdb-plugin that referenced this issue Dec 4, 2019
When running `coverage combine` under python 3.8, function definition lines
after function decorators are marked as not executed. This issue doesn't happen
with earlier versions of python, so using v3.7 instead.
See nedbat/coveragepy#866 for an explanation.
@nedbat
Copy link
Owner

nedbat commented Dec 4, 2019

@ArturKlauser I'm wondering what kind of thing to put in the FAQ. It could be, "If you measure and report on different versions of Python, you could get confusing results." Or it could be, "If you measure on 3.7 and report on 3.8, a decorated function will mark the 'def' line as not run."

That is, how specific a FAQ entry are you thinking of?

@ArturKlauser
Copy link

I was thinking more along the lines of the latter, but a bit more general, like "If you measure on < 3.8 and report on >= 3.8, a decorated function will mark the 'def' line as not run." The < 3.8 is from my experience with reporting on 2.7 and 3.7, but I assume it generally holds for < 3.8 given your description of the cause earlier in this bug report. The >= 3.8 is my assumption that the current 3.8.0 behavior is going to be kept going forward.

I think two good candidate locations for this warning would be the FAQ about "Q: Why do the bodies of functions (or classes) show as executed, but the def lines do not?" or close to it, or the end of the "Things that cause trouble" page.

Thanks for considering it.

@nedbat
Copy link
Owner

nedbat commented Dec 8, 2019

I've updated the FAQ in 8ed8cbe

ronf added a commit to ronf/asyncssh that referenced this issue Apr 23, 2022
When collecting coverage data on Python 3.7 and earlier but generating
the coverage report in 3.8 or later, some decorators show incorrect
coverage information (see nedbat/coveragepy#866
for more details).

This commit changes the github action to use Python 3.7 when generating
the coverage report, hopefully working around this issue for some of the
Windows-only modules which are only run on Python 3.6 and 3.7.
@mthuurne
Copy link

Did the decorator coverage change again in Python 3.11? When measuring branch coverage on both 3.10 and 3.11, everything is fine if I run coverage combine under Python 3.10, but if I combine under 3.11 the branch coverage will be incomplete, saying the def line doesn't jump to the decorator's line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants