New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add coverage via codecov
#3463
Add coverage via codecov
#3463
Conversation
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. |
8d69e3b
to
edfe59e
Compare
edfe59e
to
c9075d0
Compare
I kinda think coverage is useless, maybe even counterproductive. The basic problem is that covering lines is like sticking a single fork in a cake: it just requires a single data point. Real coverage would mean covering every possible state, which is extremely hard to do in a non-functional language. Sometimes coverage sort of gets you to test that different inputs, by at least forcing you to give inputs that go through each case in all your if-else trees, but it's extremely limited. Real coverage, IMO, means testing:
(I think of this like math intervals: your cases to test might be: x < 0, x == 0, x == 1, 1 < x < 10, 10 <= x < 20, 20 <= x, math.isnan(x), math.isinf(x)) but that's not what codecov does, because it's not smart enough to define what your cases are by looking at your code, all it can tell is what lines exist. Here's some thinkpieces that are on my side that I googled to prop up my argument and make it look more impressive:
Thinking about it, I'll grant that in a dynamic language like python, basic coverage is useful to catch the kind of typoes that aren't syntax errors, where other languages would use a typechecker instead. Here's 100% coverage in cynical diagram form: |
I largely agree that line-by-line coverage % isn't a very useful metric in its own right. You're right that coverage doesn't tell the whole story; all it does it let you know you've covered a line with some test, but doesn't tell you anything about the quality of those tests themselves. That said, I think that most of those blog posts are trying to address the sort of cargo-culty dev-management audience who chase metrics as end-all-be-all targets. But, in doing so, I think the posts maybe throw the baby out with the bathwater; by rehashing the same point that coverage % is a misleading metric to try to target, they miss other legitimate uses where tools like codecov can still provide informative feedback. Coverage just has to be treated as a tool in the toolbelt, not the only tool. One of the blog posts you linked does highlight this pretty well, though!
For SCT in particular, with our many CLI tests that run without checking results, coverage % is especially useless. Yet, line-by-line coverage highighting is still able to give me a better understanding of the codebase, because it gets the gears going in my head: "Oh, this branching path isn't covered. Why is that? What do the existing tests look like? Oh, huh. Our tests look representative of typical usage. Is this part of the codebase even useful, then? Why was this structured like this? What was the point of this section if it isn't ever reached? Can it be refactored? etc." This applies doubly so during pull requests where we have more flexibility to restructure something. I feel like coverage is sort of like the checklist box in our PR template:
You could definitely argue that this checkbox is just as useless as coverage. "Relevant tests"? What does that even mean? Yet, it can still act as a trigger to get people to start asking questions about what it is they're writing. Still, I'd want to make sure that contributors try to understand that testing is nuanced and context-dependent; a covered line can still be poorly tested, and an uncovered line can be acceptable. |
One other thing. I really like the list you've posted here:
When I contributed to Now, SCT isn't nearly at this level of rigor; I feel like we often just "do the thing" rather than "do the thing and make sure we did the thing correctly". But, my hope is that by having coverage, it will nudge us to start writing our code in such a way where coverage would become more useful (i.e. by thinking about various cases). Maybe that's wishful thinking, though. Cart before the horse. 😅 |
omg, i love it |
This reminds me of how youtube censored subscriber counts and started rounding to significant digits instead. Because it was causing too much drama and people focusing on the wrong thing. I wish codecov was like that. Maybe if it just gave a colour. I'm satisfied with using coverage as a tool that we understand the limitations of. |
Codecov submits reports via an API, so using a slash as-is will cause issues.
@joshuacwnewton @kousu For me code coverage is a tool, and as for all the tool, saying that it's not perfect and can be missuses is not a correct reason to not use it at all when we are professional. I'm 100% okay with you about this religion thinking that numbers are the only way to think. And I'm not saying that we should have a 100% coverage on this project, it would make no sense. But coverage is a great tool, and not having it is just an handicap for us since we will need to take decision without proper tool to help us analyze and understand our current state. For me code coverage is required for multiples reason: 1/ Validate Python syntaxPython is not a compiled language and so we can't be sure that our code is correctly type without controlling that every line is executed at minimum one time. 2/ Having a clear idea of which part of the system lack testsHaving 10.000+ unit tests doesn't mean you tested all the important part of your system. A code coverage allow you to know what are the code section that are not covered and so you can make sure there is no important part in this list or plan some test tasks. Also, code coverage can inform you of the number of time every line has been executed. It's a good way to know if your actual state make sense (important piece are more tested that nice-to-have side effect feature) 3/ Tests are part of the code, you need to respect them as wellEvery time you make a change on the code it's interesting to know if the code coverage increase, decrease or is pretty much stable. It's a great and really easy way to politely ask the contributor to add some test case to cover his new changes. Bonus: The retro-analysis of testsSometimes you have a lot of tests, and when I add some code and want to cover it correctly (because I don't like manual testing even during my development phase) I like to just check the coverage system and see which test-case already test this section of the code:
|
It sounds like we're all on the same page:
Given that, I was pretty neutral about whether or not it shows up automatically on PRs. But then I was suddenly convinced by @RignonNoel's first point, "Validate Python syntax": I think that alone is a good reason to be able to see, easily, whether each line of code has actually been run once. (On second thought this might be caught by a linter, which I assume we're running already? But on third thought maybe the linter wouldn't catch things like typos in variable names, etc. Either way, adding code coverage sounds like a mildly good idea.) |
Yep! Your suspicion is correct, @mguaypaq. We have a linting GitHub Action, but right now the linter only runs on changed lines (because otherwise it would be far too noisy for new PRs). So, I definitely agree that that's a pro of the coverage tool. Overall, I think this is worth trying out! And we can always return to this topic once we've had a chance to work with it for a few weeks/months, too. :) Would either of you (@RignonNoel or @mguaypaq) be able to review the code changes in this PR, then? This is my first time trying to set up codecov, and while l think I've got it set up well, this PR was created half a year ago, so there may be ways to configure codecov that I've missed. (e.g. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't worked with codecov before, but from looking at the documentation, this looks like a fine configuration to start with. I would be comfortable if you merged this after addressing the comments below.
pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-api.xml testing/api | ||
pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-cli.xml testing/cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to testing/api
and testing/cli
, there's now testing/batch_processing
. Should that also be included in some coverage report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the context given in #3463 (comment), it might be a bit tricky to get coverage, since the batch_processing
is invoked via a shell script, and pytest
is only used to validate the end results of the pipeline script. In other words, pytest
isn't used to invoke the processing itself, so I'm not sure if we can hook into any line-by-line coverage scanning. :(
This was added in #3463 about six months ago, but none of the current developers have been paying attention to the resulting bot-generated comments on all the pull requests, so it's just noise.
Checklist
GitHub
PR contents
Description
This PR adds coverage reporting via
codecov
. Some quick links:Linked issues
Fixes #2702.