Add coverage via `codecov` #3463

joshuacwnewton · 2021-07-14T18:46:56Z

Checklist

GitHub

I've given this PR a concise, self-descriptive, and meaningful title
I've linked relevant issues in the PR body
I've applied the relevant labels to this PR
I've applied a release milestone (major, minor, patch) in line with Semantic Versioning guidelines
I've assigned a reviewer

PR contents

I've consulted SCT's internal developer documentation to ensure my contribution is in line with any relevant design decisions
I've added relevant tests for my contribution
I've updated the relevant documentation for my changes, including argparse descriptions, docstrings, and ReadTheDocs tutorial pages

Description

This PR adds coverage reporting via codecov. Some quick links:

Linked issues

Fixes #2702.

codecov · 2021-07-14T18:49:46Z

Codecov Report

❗ No coverage uploaded for pull request base (master@129e537). Click here to learn what that means.
The diff coverage is n/a.

Flag	Coverage Δ
api-tests	`24.18% <0.00%> (?)`
cli-tests	`56.66% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

.ci.sh

kousu · 2021-07-19T17:29:20Z

I kinda think coverage is useless, maybe even counterproductive. The basic problem is that covering lines is like sticking a single fork in a cake: it just requires a single data point. Real coverage would mean covering every possible state, which is extremely hard to do in a non-functional language. Sometimes coverage sort of gets you to test that different inputs, by at least forcing you to give inputs that go through each case in all your if-else trees, but it's extremely limited.

Real coverage, IMO, means testing:

the base case(s)
the error cases
one or two 'typical' cases
the extreme cases

(I think of this like math intervals: your cases to test might be: x < 0, x == 0, x == 1, 1 < x < 10, 10 <= x < 20, 20 <= x, math.isnan(x), math.isinf(x))

but that's not what codecov does, because it's not smart enough to define what your cases are by looking at your code, all it can tell is what lines exist.

Here's some thinkpieces that are on my side that I googled to prop up my argument and make it look more impressive:

Thinking about it, I'll grant that in a dynamic language like python, basic coverage is useful to catch the kind of typoes that aren't syntax errors, where other languages would use a typechecker instead.

Here's 100% coverage in cynical diagram form:

joshuacwnewton · 2021-07-19T18:31:47Z

I largely agree that line-by-line coverage % isn't a very useful metric in its own right. You're right that coverage doesn't tell the whole story; all it does it let you know you've covered a line with some test, but doesn't tell you anything about the quality of those tests themselves.

That said, I think that most of those blog posts are trying to address the sort of cargo-culty dev-management audience who chase metrics as end-all-be-all targets. But, in doing so, I think the posts maybe throw the baby out with the bathwater; by rehashing the same point that coverage % is a misleading metric to try to target, they miss other legitimate uses where tools like codecov can still provide informative feedback. Coverage just has to be treated as a tool in the toolbelt, not the only tool.

One of the blog posts you linked does highlight this pretty well, though!

Code coverage is still useful

While it's dangerous to use code coverage for target setting, collecting coverage metrics can still be useful.

Some people use it to find areas where coverage is weak. There may be good reasons that some parts of a code base are sparsely covered by tests, but doing a manual inspection once in a while is a good idea. Perhaps you find that all is good, but you may also discover that a quality effort is overdue.

In some projects, I've had some success watching the code coverage trend. When I review pull requests, I first review the changes by looking at them. If the pull request needs improvement, I work with the contributor to get the pull request to an acceptable quality. Once that is done, I've already made my decision on merging the code, and then I measure code coverage. It doesn't influence my decision to merge, but it tells me about the trend. On some projects, I've reported that trend back to the contributor while closing the pull request. I wouldn't report the exact number, but I'd remark that coverage went up, or down, or remained the same, by 'a little', 'much', etc. The point of that is to make team members aware that testing is important.

Sometimes, coverage goes down. There are many good reasons that could happen. Watching how coverage evolves over time doesn't mean that you have to pounce on developers every time it goes down, but it means that if something looks odd, it may be worth investigating.

For SCT in particular, with our many CLI tests that run without checking results, coverage % is especially useless. Yet, line-by-line coverage highighting is still able to give me a better understanding of the codebase, because it gets the gears going in my head: "Oh, this branching path isn't covered. Why is that? What do the existing tests look like? Oh, huh. Our tests look representative of typical usage. Is this part of the codebase even useful, then? Why was this structured like this? What was the point of this section if it isn't ever reached? Can it be refactored? etc." This applies doubly so during pull requests where we have more flexibility to restructure something.

I feel like coverage is sort of like the checklist box in our PR template:

I've added relevant tests for my contribution

You could definitely argue that this checkbox is just as useless as coverage. "Relevant tests"? What does that even mean? Yet, it can still act as a trigger to get people to start asking questions about what it is they're writing.

Still, I'd want to make sure that contributors try to understand that testing is nuanced and context-dependent; a covered line can still be poorly tested, and an uncovered line can be acceptable.

joshuacwnewton · 2021-07-19T18:50:40Z

One other thing. I really like the list you've posted here:

the base case(s)

the error cases

one or two 'typical' cases

the extreme cases

When I contributed to scikit-learn, they were really good about explicitly defining the various cases in the code itself. For example, take this snippet of code. That's 100+ lines of just input validation to this function. They spend a lot of effort to check assumptions and expectations. Combining coverage highlighting with this sort of rigor gave a really good sense of which cases had tests written for them and which didn't, so I found coverage to be real useful whenever I would submit a PR.

Now, SCT isn't nearly at this level of rigor; I feel like we often just "do the thing" rather than "do the thing and make sure we did the thing correctly". But, my hope is that by having coverage, it will nudge us to start writing our code in such a way where coverage would become more useful (i.e. by thinking about various cases). Maybe that's wishful thinking, though. Cart before the horse. 😅

jcohenadad · 2021-07-19T20:11:10Z

Here's 100% coverage in cynical diagram form:

omg, i love it

kousu · 2021-07-20T02:40:39Z

. On some projects, I've reported that trend back to the contributor while closing the pull request. I wouldn't report the exact number, but I'd remark that coverage went up, or down, or remained the same, by 'a little', 'much', etc. The point of that is to make team members aware that testing is important.

This reminds me of how youtube censored subscriber counts and started rounding to significant digits instead. Because it was causing too much drama and people focusing on the wrong thing. I wish codecov was like that. Maybe if it just gave a colour.

I'm satisfied with using coverage as a tool that we understand the limitations of.

Codecov submits reports via an API, so using a slash as-is will cause issues.

RignonNoel · 2022-01-12T17:06:28Z

@joshuacwnewton @kousu For me code coverage is a tool, and as for all the tool, saying that it's not perfect and can be missuses is not a correct reason to not use it at all when we are professional.

I'm 100% okay with you about this religion thinking that numbers are the only way to think. And I'm not saying that we should have a 100% coverage on this project, it would make no sense. But coverage is a great tool, and not having it is just an handicap for us since we will need to take decision without proper tool to help us analyze and understand our current state.

For me code coverage is required for multiples reason:

1/ Validate Python syntax

Python is not a compiled language and so we can't be sure that our code is correctly type without controlling that every line is executed at minimum one time.

2/ Having a clear idea of which part of the system lack tests

Having 10.000+ unit tests doesn't mean you tested all the important part of your system. A code coverage allow you to know what are the code section that are not covered and so you can make sure there is no important part in this list or plan some test tasks.

Also, code coverage can inform you of the number of time every line has been executed. It's a good way to know if your actual state make sense (important piece are more tested that nice-to-have side effect feature)

3/ Tests are part of the code, you need to respect them as well

Every time you make a change on the code it's interesting to know if the code coverage increase, decrease or is pretty much stable. It's a great and really easy way to politely ask the contributor to add some test case to cover his new changes.

Bonus: The retro-analysis of tests

Sometimes you have a lot of tests, and when I add some code and want to cover it correctly (because I don't like manual testing even during my development phase) I like to just check the coverage system and see which test-case already test this section of the code:

I can detect which test maybe need some refactor EVEN if they don't fail (comments no more relevant with the new refactor)
I can see if maybe I can just adapt an actual test or if I need to create a new one
I can see if the section is maybe already not that well covered and decide to put the effort to test it well since I have the section in mind and my tools already setup for it

mguaypaq · 2022-01-12T20:02:55Z

It sounds like we're all on the same page:

good tests are important;
code coverage can be useful as a tool for thinking;
code coverage would be harmful if we used it rigidly.

Given that, I was pretty neutral about whether or not it shows up automatically on PRs. But then I was suddenly convinced by @RignonNoel's first point, "Validate Python syntax": I think that alone is a good reason to be able to see, easily, whether each line of code has actually been run once.

(On second thought this might be caught by a linter, which I assume we're running already? But on third thought maybe the linter wouldn't catch things like typos in variable names, etc. Either way, adding code coverage sounds like a mildly good idea.)

joshuacwnewton · 2022-01-13T18:18:51Z

(On second thought this might be caught by a linter, which I assume we're running already? But on third thought maybe the linter wouldn't catch things like typos in variable names, etc. Either way, adding code coverage sounds like a mildly good idea.)

Yep! Your suspicion is correct, @mguaypaq. We have a linting GitHub Action, but right now the linter only runs on changed lines (because otherwise it would be far too noisy for new PRs). So, I definitely agree that that's a pro of the coverage tool.

Overall, I think this is worth trying out! And we can always return to this topic once we've had a chance to work with it for a few weeks/months, too. :)

Would either of you (@RignonNoel or @mguaypaq) be able to review the code changes in this PR, then? This is my first time trying to set up codecov, and while l think I've got it set up well, this PR was created half a year ago, so there may be ways to configure codecov that I've missed. (e.g. the .codecov.yml docs.)

mguaypaq

I haven't worked with codecov before, but from looking at the documentation, this looks like a fine configuration to start with. I would be comfortable if you merged this after addressing the comments below.

mguaypaq · 2022-01-14T19:54:01Z

.ci.sh

+  pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-api.xml testing/api
+  pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-cli.xml testing/cli


In addition to testing/api and testing/cli, there's now testing/batch_processing. Should that also be included in some coverage report?

Re: the context given in #3463 (comment), it might be a bit tricky to get coverage, since the batch_processing is invoked via a shell script, and pytest is only used to validate the end results of the pipeline script. In other words, pytest isn't used to invoke the processing itself, so I'm not sure if we can hook into any line-by-line coverage scanning. :(

.github/workflows/tests.yml

This was added in #3463 about six months ago, but none of the current developers have been paying attention to the resulting bot-generated comments on all the pull requests, so it's just noise.

Add coverage via codecov

338f814

joshuacwnewton commented Jul 14, 2021

View reviewed changes

.ci.sh Outdated Show resolved Hide resolved

joshuacwnewton force-pushed the jn/2702-fix-code-coverage branch from 8d69e3b to edfe59e Compare July 14, 2021 23:01

Split code coverage into API and CLI tests

c9075d0

joshuacwnewton force-pushed the jn/2702-fix-code-coverage branch from edfe59e to c9075d0 Compare July 14, 2021 23:49

joshuacwnewton self-assigned this Jul 17, 2021

joshuacwnewton requested a review from Drulex July 26, 2021 16:17

joshuacwnewton added 2 commits July 26, 2021 12:22

codecov.yml: Add config to make comment less verbose

a5da8b3

Merge branch 'master' into jn/2702-fix-code-coverage

0ba7556

joshuacwnewton marked this pull request as ready for review July 26, 2021 16:23

joshuacwnewton added 3 commits July 26, 2021 12:26

tests.yml: Update flag labels to better indicate test type

f6412bc

tests.yml: Attempt to escape slashes to pass them in URL

431a8f5

Codecov submits reports via an API, so using a slash as-is will cause issues.

Remove slashes from coverage flag names

21bdc14

joshuacwnewton requested review from mguaypaq and RignonNoel January 14, 2022 15:58

mguaypaq approved these changes Jan 14, 2022

View reviewed changes

joshuacwnewton added 4 commits January 17, 2022 12:18

tests.yml: Update Codecov action from v1 -> v2

4eb0d7f

.ci.sh: Explicitly target test subfolders (api/, cli/)

a8fa2f1

.ci.sh: Add note about batch_processing + codecov

b5ed8c6

Merge branch 'master' into jn/2702-fix-code-coverage

6e75c48

joshuacwnewton merged commit e491614 into master Jan 18, 2022

joshuacwnewton deleted the jn/2702-fix-code-coverage branch January 18, 2022 14:53

RignonNoel pushed a commit to RignonNoel/spinalcordtoolbox that referenced this pull request Jan 21, 2022

Add coverage via codecov (spinalcordtoolbox#3463)

1663e02

joshuacwnewton added the CI category: TravisCI, GitHub Actions, etc. label Jan 26, 2022

joshuacwnewton added this to the 5.5 milestone Jan 26, 2022

mguaypaq mentioned this pull request Jan 26, 2022

Ubuntu 20.04 runners are intermittently stalling out during the runs of the test suite #3661

Closed

mguaypaq mentioned this pull request Jul 11, 2022

Remove coverage reporting via codecov #3837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coverage via `codecov` #3463

Add coverage via `codecov` #3463

joshuacwnewton commented Jul 14, 2021 •

edited

codecov bot commented Jul 14, 2021 •

edited

kousu commented Jul 19, 2021 •

edited

joshuacwnewton commented Jul 19, 2021 •

edited

Code coverage is still useful

joshuacwnewton commented Jul 19, 2021 •

edited

jcohenadad commented Jul 19, 2021

kousu commented Jul 20, 2021

RignonNoel commented Jan 12, 2022

mguaypaq commented Jan 12, 2022

joshuacwnewton commented Jan 13, 2022 •

edited

mguaypaq left a comment

mguaypaq Jan 14, 2022

joshuacwnewton Jan 17, 2022 •

edited

		pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-api.xml testing/api
		pytest --cov=spinalcordtoolbox --cov-config setup.cfg --cov-branch --cov-report=xml:cov-cli.xml testing/cli

Add coverage via codecov #3463

Add coverage via codecov #3463

Conversation

joshuacwnewton commented Jul 14, 2021 • edited

Checklist

GitHub

PR contents

Description

Linked issues

codecov bot commented Jul 14, 2021 • edited

Codecov Report

kousu commented Jul 19, 2021 • edited

joshuacwnewton commented Jul 19, 2021 • edited

Code coverage is still useful

joshuacwnewton commented Jul 19, 2021 • edited

jcohenadad commented Jul 19, 2021

kousu commented Jul 20, 2021

RignonNoel commented Jan 12, 2022

1/ Validate Python syntax

2/ Having a clear idea of which part of the system lack tests

3/ Tests are part of the code, you need to respect them as well

Bonus: The retro-analysis of tests

mguaypaq commented Jan 12, 2022

joshuacwnewton commented Jan 13, 2022 • edited

mguaypaq left a comment

Choose a reason for hiding this comment

mguaypaq Jan 14, 2022

Choose a reason for hiding this comment

joshuacwnewton Jan 17, 2022 • edited

Choose a reason for hiding this comment

Add coverage via `codecov` #3463

Add coverage via `codecov` #3463

joshuacwnewton commented Jul 14, 2021 •

edited

codecov bot commented Jul 14, 2021 •

edited

kousu commented Jul 19, 2021 •

edited

joshuacwnewton commented Jul 19, 2021 •

edited

joshuacwnewton commented Jul 19, 2021 •

edited

joshuacwnewton commented Jan 13, 2022 •

edited

joshuacwnewton Jan 17, 2022 •

edited