Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coverage via codecov #3463

Merged
merged 11 commits into from Jan 18, 2022
Merged

Add coverage via codecov #3463

merged 11 commits into from Jan 18, 2022

Conversation

joshuacwnewton
Copy link
Member

@joshuacwnewton joshuacwnewton commented Jul 14, 2021

Checklist

GitHub

PR contents

Description

This PR adds coverage reporting via codecov. Some quick links:

Linked issues

Fixes #2702.

@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@129e537). Click here to learn what that means.
The diff coverage is n/a.

Flag Coverage Δ
api-tests 24.18% <0.00%> (?)
cli-tests 56.66% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

.ci.sh Outdated Show resolved Hide resolved
@kousu
Copy link
Contributor

kousu commented Jul 19, 2021

I kinda think coverage is useless, maybe even counterproductive. The basic problem is that covering lines is like sticking a single fork in a cake: it just requires a single data point. Real coverage would mean covering every possible state, which is extremely hard to do in a non-functional language. Sometimes coverage sort of gets you to test that different inputs, by at least forcing you to give inputs that go through each case in all your if-else trees, but it's extremely limited.

Real coverage, IMO, means testing:

  • the base case(s)
  • the error cases
  • one or two 'typical' cases
  • the extreme cases

(I think of this like math intervals: your cases to test might be: x < 0, x == 0, x == 1, 1 < x < 10, 10 <= x < 20, 20 <= x, math.isnan(x), math.isinf(x))

but that's not what codecov does, because it's not smart enough to define what your cases are by looking at your code, all it can tell is what lines exist.

Here's some thinkpieces that are on my side that I googled to prop up my argument and make it look more impressive:

Thinking about it, I'll grant that in a dynamic language like python, basic coverage is useful to catch the kind of typoes that aren't syntax errors, where other languages would use a typechecker instead.

Here's 100% coverage in cynical diagram form:

100% coverage.jpg

@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Jul 19, 2021

I largely agree that line-by-line coverage % isn't a very useful metric in its own right. You're right that coverage doesn't tell the whole story; all it does it let you know you've covered a line with some test, but doesn't tell you anything about the quality of those tests themselves.

That said, I think that most of those blog posts are trying to address the sort of cargo-culty dev-management audience who chase metrics as end-all-be-all targets. But, in doing so, I think the posts maybe throw the baby out with the bathwater; by rehashing the same point that coverage % is a misleading metric to try to target, they miss other legitimate uses where tools like codecov can still provide informative feedback. Coverage just has to be treated as a tool in the toolbelt, not the only tool.

One of the blog posts you linked does highlight this pretty well, though!

Code coverage is still useful

While it's dangerous to use code coverage for target setting, collecting coverage metrics can still be useful.

Some people use it to find areas where coverage is weak. There may be good reasons that some parts of a code base are sparsely covered by tests, but doing a manual inspection once in a while is a good idea. Perhaps you find that all is good, but you may also discover that a quality effort is overdue.

In some projects, I've had some success watching the code coverage trend. When I review pull requests, I first review the changes by looking at them. If the pull request needs improvement, I work with the contributor to get the pull request to an acceptable quality. Once that is done, I've already made my decision on merging the code, and then I measure code coverage. It doesn't influence my decision to merge, but it tells me about the trend. On some projects, I've reported that trend back to the contributor while closing the pull request. I wouldn't report the exact number, but I'd remark that coverage went up, or down, or remained the same, by 'a little', 'much', etc. The point of that is to make team members aware that testing is important.

Sometimes, coverage goes down. There are many good reasons that could happen. Watching how coverage evolves over time doesn't mean that you have to pounce on developers every time it goes down, but it means that if something looks odd, it may be worth investigating.

For SCT in particular, with our many CLI tests that run without checking results, coverage % is especially useless. Yet, line-by-line coverage highighting is still able to give me a better understanding of the codebase, because it gets the gears going in my head: "Oh, this branching path isn't covered. Why is that? What do the existing tests look like? Oh, huh. Our tests look representative of typical usage. Is this part of the codebase even useful, then? Why was this structured like this? What was the point of this section if it isn't ever reached? Can it be refactored? etc." This applies doubly so during pull requests where we have more flexibility to restructure something.

I feel like coverage is sort of like the checklist box in our PR template:

  • I've added relevant tests for my contribution

You could definitely argue that this checkbox is just as useless as coverage. "Relevant tests"? What does that even mean? Yet, it can still act as a trigger to get people to start asking questions about what it is they're writing.

Still, I'd want to make sure that contributors try to understand that testing is nuanced and context-dependent; a covered line can still be poorly tested, and an uncovered line can be acceptable.

@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Jul 19, 2021

One other thing. I really like the list you've posted here:

  • the base case(s)
  • the error cases
  • one or two 'typical' cases
  • the extreme cases

When I contributed to scikit-learn, they were really good about explicitly defining the various cases in the code itself. For example, take this snippet of code. That's 100+ lines of just input validation to this function. They spend a lot of effort to check assumptions and expectations. Combining coverage highlighting with this sort of rigor gave a really good sense of which cases had tests written for them and which didn't, so I found coverage to be real useful whenever I would submit a PR.

Now, SCT isn't nearly at this level of rigor; I feel like we often just "do the thing" rather than "do the thing and make sure we did the thing correctly". But, my hope is that by having coverage, it will nudge us to start writing our code in such a way where coverage would become more useful (i.e. by thinking about various cases). Maybe that's wishful thinking, though. Cart before the horse. 😅

@jcohenadad
Copy link
Member

Here's 100% coverage in cynical diagram form:

omg, i love it

@kousu
Copy link
Contributor

kousu commented Jul 20, 2021

. On some projects, I've reported that trend back to the contributor while closing the pull request. I wouldn't report the exact number, but I'd remark that coverage went up, or down, or remained the same, by 'a little', 'much', etc. The point of that is to make team members aware that testing is important.

This reminds me of how youtube censored subscriber counts and started rounding to significant digits instead. Because it was causing too much drama and people focusing on the wrong thing. I wish codecov was like that. Maybe if it just gave a colour.

I'm satisfied with using coverage as a tool that we understand the limitations of.

@joshuacwnewton joshuacwnewton marked this pull request as ready for review July 26, 2021 16:23
@RignonNoel
Copy link
Contributor

@joshuacwnewton @kousu For me code coverage is a tool, and as for all the tool, saying that it's not perfect and can be missuses is not a correct reason to not use it at all when we are professional.

I'm 100% okay with you about this religion thinking that numbers are the only way to think. And I'm not saying that we should have a 100% coverage on this project, it would make no sense. But coverage is a great tool, and not having it is just an handicap for us since we will need to take decision without proper tool to help us analyze and understand our current state.


For me code coverage is required for multiples reason:

1/ Validate Python syntax

Python is not a compiled language and so we can't be sure that our code is correctly type without controlling that every line is executed at minimum one time.

2/ Having a clear idea of which part of the system lack tests

Having 10.000+ unit tests doesn't mean you tested all the important part of your system. A code coverage allow you to know what are the code section that are not covered and so you can make sure there is no important part in this list or plan some test tasks.

Also, code coverage can inform you of the number of time every line has been executed. It's a good way to know if your actual state make sense (important piece are more tested that nice-to-have side effect feature)

3/ Tests are part of the code, you need to respect them as well

Every time you make a change on the code it's interesting to know if the code coverage increase, decrease or is pretty much stable. It's a great and really easy way to politely ask the contributor to add some test case to cover his new changes.

Bonus: The retro-analysis of tests

Sometimes you have a lot of tests, and when I add some code and want to cover it correctly (because I don't like manual testing even during my development phase) I like to just check the coverage system and see which test-case already test this section of the code:

  • I can detect which test maybe need some refactor EVEN if they don't fail (comments no more relevant with the new refactor)
  • I can see if maybe I can just adapt an actual test or if I need to create a new one
  • I can see if the section is maybe already not that well covered and decide to put the effort to test it well since I have the section in mind and my tools already setup for it

@mguaypaq
Copy link
Member

It sounds like we're all on the same page:

  • good tests are important;
  • code coverage can be useful as a tool for thinking;
  • code coverage would be harmful if we used it rigidly.

Given that, I was pretty neutral about whether or not it shows up automatically on PRs. But then I was suddenly convinced by @RignonNoel's first point, "Validate Python syntax": I think that alone is a good reason to be able to see, easily, whether each line of code has actually been run once.

(On second thought this might be caught by a linter, which I assume we're running already? But on third thought maybe the linter wouldn't catch things like typos in variable names, etc. Either way, adding code coverage sounds like a mildly good idea.)

@joshuacwnewton
Copy link
Member Author

joshuacwnewton commented Jan 13, 2022

(On second thought this might be caught by a linter, which I assume we're running already? But on third thought maybe the linter wouldn't catch things like typos in variable names, etc. Either way, adding code coverage sounds like a mildly good idea.)

Yep! Your suspicion is correct, @mguaypaq. We have a linting GitHub Action, but right now the linter only runs on changed lines (because otherwise it would be far too noisy for new PRs). So, I definitely agree that that's a pro of the coverage tool.


Overall, I think this is worth trying out! And we can always return to this topic once we've had a chance to work with it for a few weeks/months, too. :)

Would either of you (@RignonNoel or @mguaypaq) be able to review the code changes in this PR, then? This is my first time trying to set up codecov, and while l think I've got it set up well, this PR was created half a year ago, so there may be ways to configure codecov that I've missed. (e.g. the .codecov.yml docs.)

Copy link
Member

@mguaypaq mguaypaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't worked with codecov before, but from looking at the documentation, this looks like a fine configuration to start with. I would be comfortable if you merged this after addressing the comments below.

Comment on lines +44 to +45
pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-api.xml testing/api
pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-cli.xml testing/cli
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to testing/api and testing/cli, there's now testing/batch_processing. Should that also be included in some coverage report?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the context given in #3463 (comment), it might be a bit tricky to get coverage, since the batch_processing is invoked via a shell script, and pytest is only used to validate the end results of the pipeline script. In other words, pytest isn't used to invoke the processing itself, so I'm not sure if we can hook into any line-by-line coverage scanning. :(

.github/workflows/tests.yml Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
@joshuacwnewton joshuacwnewton merged commit e491614 into master Jan 18, 2022
@joshuacwnewton joshuacwnewton deleted the jn/2702-fix-code-coverage branch January 18, 2022 14:53
RignonNoel pushed a commit to RignonNoel/spinalcordtoolbox that referenced this pull request Jan 21, 2022
@joshuacwnewton joshuacwnewton added the CI category: TravisCI, GitHub Actions, etc. label Jan 26, 2022
@joshuacwnewton joshuacwnewton added this to the 5.5 milestone Jan 26, 2022
mguaypaq added a commit that referenced this pull request Jul 11, 2022
This was added in #3463 about six months ago, but none of the current
developers have been paying attention to the resulting bot-generated
comments on all the pull requests, so it's just noise.
mguaypaq added a commit that referenced this pull request Jul 12, 2022
This was added in #3463 about six months ago, but none of the current
developers have been paying attention to the resulting bot-generated
comments on all the pull requests, so it's just noise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI category: TravisCI, GitHub Actions, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify coverage is producing useful information
5 participants