introduce code coverage analysis in CI #10180

Ekleog-NEAR · 2023-11-15T13:12:48Z

Note: I’m removing the preparatory "Build neard" job here. The reason is twofold:

I didn’t manage to get coverage output to actually write *.profraw files without the whole target folder
Uploading the whole compressed (to 3G) target folder as an artifact takes 6 minutes for compression and 15 minutes for upload
Just building, on the commit on which I was, took 2 minutes, multiplied by 4 it gives 8 minutes, so quite a lot less than the compress+upload time
So overall we’re spending less time by just redoing the compilation work, as opposed to uploading+downloading it; and the tests themselves are relatively fast too so having them on the bigger machines (8cores, enough for them to not be the bottleneck) shouldn’t hog them for too long.

codecov · 2023-11-16T01:57:30Z

Codecov Report

Attention: 880 lines in your changes are missing coverage. Please review.

Comparison is base (6b4dd20) 85.96% compared to head (c3d6ab5) 72.01%.
Report is 4211 commits behind head on master.

Files	Patch %	Lines
chain/chain/src/store_validator/validate.rs	3.73%	205 Missing and 1 partial ⚠️
chain/chain/src/test_utils/kv_runtime.rs	82.72%	149 Missing and 35 partials ⚠️
chain/chain/src/resharding.rs	79.29%	40 Missing and 25 partials ⚠️
chain/chain/src/flat_storage_creator.rs	83.91%	31 Missing and 24 partials ⚠️
chain/chain-primitives/src/error.rs	46.46%	50 Missing and 3 partials ⚠️
chain/chain/src/test_utils.rs	62.77%	51 Missing ⚠️
chain/chain/src/store_validator.rs	59.09%	34 Missing and 2 partials ⚠️
chain/chain/src/missing_chunks.rs	86.91%	17 Missing and 8 partials ⚠️
chain/chain/src/blocks_delay_tracker.rs	92.89%	19 Missing and 5 partials ⚠️
chain/chain/src/validate.rs	87.97%	13 Missing and 9 partials ⚠️
... and 19 more

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #10180       +/-   ##
===========================================
- Coverage   85.96%   72.01%   -13.96%     
===========================================
  Files         243      706      +463     
  Lines       46461   141278    +94817     
  Branches        0   141278   +141278     
===========================================
+ Hits        39942   101744    +61802     
- Misses       6519    34846    +28327     
- Partials        0     4688     +4688

Flag	Coverage Δ
backward-compatibility	`0.08% <0.00%> (?)`
db-migration	`0.08% <0.00%> (?)`
genesis-check	`1.23% <0.00%> (?)`
integration-tests	`36.31% <57.08%> (?)`
linux	`71.91% <83.49%> (?)`
linux-nightly	`71.77% <83.57%> (?)`
macos	`54.51% <73.58%> (?)`
pytests	`1.46% <0.00%> (?)`
sanity-checks	`1.26% <0.00%> (?)`
unittests	`68.35% <73.58%> (?)`
upgradability	`0.13% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nagisa · 2023-11-16T15:39:29Z

I didn’t manage to get coverage output to actually write *.profraw files without the whole target folder

This would be great to figure out a reason for and submit an issue about in rust-lang/rust. My intuition is that the binary should be sufficient for this (as all the coverage instrumentation is part of the LLVM IR that makes it to the final binary build), but I wouldn’t be surprised if there are some files like separated debuginfo-esque files that are required for the coverage.

Just building, on the commit on which I was, took 2 minutes, multiplied by 4 it gives 8 minutes, so quite a lot less than the compress+upload time

That's the case for now though. It is conceivable that we might want to add additional such super-integration-tests in the future in which case it again makes sense to merge those jobs together at least to some degree – and having to figure out how to merge them well is an annoying and fairly arbitrary exercise (that eventually leads to a nasty behemoth that we had with the "style checks" CI job we had in buildkite)

nagisa · 2023-11-16T15:42:28Z

Codecov Report

This is kinda bogus as it gives a report between 8000 commits ago and now... but I guess it does work.

Ekleog-NEAR · 2023-11-16T15:55:19Z

This would be great to figure out a reason for and submit an issue about in rust-lang/rust. My intuition is that the binary should be sufficient for this (as all the coverage instrumentation is part of the LLVM IR that makes it to the final binary build), but I wouldn’t be surprised if there are some files like separated debuginfo-esque files that are required for the coverage.

I sent it as a question on taiki-e/cargo-llvm-cov#320 ; to see whether I’m missing something :) in practice, I can say it at least seems like cargo-llvm-cov is searching for object files in target/${profile} at some point while generating the report, considering the error message in this run.

That's the case for now though. It is conceivable that we might want to add additional such super-integration-tests in the future in which case it again makes sense to merge those jobs together at least to some degree – and having to figure out how to merge them well is an annoying and fairly arbitrary exercise (that eventually leads to a nasty behemoth that we had with the "style checks" CI job we had in buildkite)

Agreed; I’m hoping we can get to a resolution that’d allow to build a single neard again before this happens :) TBH I really don’t understand why the attempt without the whole target folder didn’t work, I just observed during the tests (eg. this run, and interestingly the genesis changes run did generate profraw files without issues even without the whole target folder) that find would list zero *.profraw files if not having the whole target folder, but having the whole target folder would work just fine.

Overall my hope is that taiki-e will tell me on the issue I linked above that I’m wrong and it should only need the binary; and then I’ll go back to debugging and trying to figure out why all the jobs except genesis changes failed when running with only the binary.

This is kinda bogus as it gives a report between 8000 commits ago and now... but I guess it does work.

Right; the reason is actually that codecov was already enabled 8000 commits ago, but hasn’t received any report since then, so as soon as this lands it should start giving a report between master and now again :)

.github/workflows/ci.yml

nagisa · 2023-11-16T15:50:16Z

.github/workflows/ci.yml

+      # See https://github.com/taiki-e/cargo-llvm-cov/issues/292
+      - run: find target -name '*.profraw' -delete
+
+      # Run integration tests


Hmph, splitting these nextest run lines goes in opposition to #9415. If its only serving to provide some sort of categorization within the tool, I would personally much rather keep the single command option.

TBH I think #9415 should be solved by using something like just test, not by having everything stuffed into nextest, among other reasons because if we did it we could move clippy out and have a cache for it, not needing to rebuild it from scratch each time :)

Also, this split doesn’t change the fact that cargo nextest run still runs all the tests and works fine. I do think the categorization within the tool is useful, especially considering yesterday’s discussion that debugging integration tests is very bad dev UX, and thus we’d mostly be interested in the code coverage for unit-tests only.

Well, so back in buildkite days we also had "cargo nextest run", but there were like 5 different ways to run integration tests and you had to pay really close attention to which specific invocation was failing – one of them failing would usually mean the others were passing.

This was exactly what prompted me to start working on CI in the first place. My honest evaluation is that we got into that state back then precisely because it was normal and natural to "just add another invocation of cargo run/test/nextest/etc with my snowflake combination of flags".

Having a single cargo nextest invocation I believe does a great job pushing against such configurations (and I hope it will eventually push enough to obviate the need for nightly vs. non-nightly configurations as well.) Splitting invocations into parts goes back towards normalizing having more of the snowflake configurations.

Now, to make this actionable at all – I wonder if nextest could be made to export test coverage on a per-test basis. All of the tests are distinct binary invocations, which should allow us to produce coverage files for each individual test (and therefore could then eventually be filtered by e.g. crate, module, etc. after the fact), not just two broad categories of unit vs integration tests.

Unfortunately cargo llvm-cov nextest appears to just run cargo nextest after setting up the environment, but perhaps target runner scripts could be used as a mechanism to isolate and produce coverage information for each individual test (there’s even a section on debug output in the documentation?)

we could move clippy out and have a cache for it

I don’t see why we couldn’t have a cache right now. We just need to also cache the target/style directory (does rust-cache not do it…?)

FWIW my belief is that the current setup with target/style is actually better for local development purposes as developers no longer risk accidentally invalidating their cached proper build when they run cargo clippy -p foo or something of the sort (by the virtue of not needing to run cargo clippy manually at all.) Though now that I write this, I realized there’s a bug in how style tests set the TARGET_DIR – it should be different for each test that builds code (in particular themis should have its own TARGET_DIR so it does not get blocked waiting on the target directory lock…)

(does rust-cache not do it…?)

Considering the clippy test is always among the last ones to complete and is written as slow (so 120+ seconds), I’d guess it doesn’t do it.

Counter-proposal: what if we had a Justfile with:

A rule for unit tests

A rule for clippy that runs it with target_dir set to target/style

A rule for integration tests

A just test rule that runs all of them

Then developers could run just test, and CI could run the rules independently, and maybe even override the target folder depending on whether we can fix target folder caching some other way. We could also easily add the nightly configuration into the Justfile, so that there’s indeed a single command line to run. WDYT?

(I think changes to llvm-cov or cargo nextest might be a great idea as there are already many profraw files and we’d "just" need to figure out which are the right ones; but will not be reasonably doable before november 26, especially as I’m OOO the second half of next week and have a lot of other stuff to do by then)

(Also, fwiw, I personally don’t really care about having a single command line to run right now: it’s much faster for me to upload to CI and wait for the results, than to test locally, currently, but that’s part of the devenv issues we’re discussing on zulip right now, and you’re right that having a single command would in theory be good)

Actually, relatedly, in order to be able to use the github actions for eg. lychee (that I’m introducing in #10217), we’ll need to run the tests outside of cargo nextest anyway, unless we want to check them twice in CI. In particular, we could start using the clippy action, which AFAIU has much nicer error reporting than our current "manually read the logs" system.

So I think this confirms the want for a tool like just, so that we could have just test run lychee in a way similar to what is being run by CI :)

Hm, so the weekend gave me ample of time to consider the last message. I agree that having nice reporting within GitHub is really nice.

But I also see a major (I think) flaw in what I think you’re proposing we should do. To double check, are you suggesting that we maintain .github/workflows/ci.yaml and a Justfile and that they would each implement the same checks independently? e.g. where Justfile would run cargo clippy, the GHA would run the clippy action? I think doing so is precisely the mistake I would like to avoid. If we’re adding a Justfile, the CI could run exactly the just commands that ought to be run as part of a local development flow (and ideally that command is singular or is split up into sub-jobs programatically) but that does not come with the nice error reporting benefits.

But unless we eschew those benefits, I don’t see how can we ensure, especially in the long term, that the checks run via the justfile remain synchronized with those run within GHA. And that means that ultimately we would still sometimes see instances where a successful just pass does not guarantee the CI passing (or vice versa, for that matter.)

My current goal is indeed to have the CI run just nextest-unit && coverage && just nextest-integration && coverage, where just nextest for local use would also run the two.

However, for clippy, in order to get the nice reporting, I would indeed have a just clippy for local development that would not be DRY with the clippy-action for github, because by definition clippy-action cannot know to use our Justfile (or literally any other way to run all the tests we could have), any more than just could automatically figure out the command clippy-action runs.

So I’d say that by definition, I’d say that we can only have two of the following three:

nice error reporting

not implementing error reporting ourselves

locally-run command is DRY with remotely-run command

I also personally think we should choose the first two, and whenever we come across an instance where just and the CI workflow don’t give the same result we can consider that as a bug and treat it by probably just copy-pasting again from the updated upstream clippy-action.

That being said, this PR is only about nextest splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action :) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?

That being said, this PR is only about nextest splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action :) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?

I mean… whatever plan we come up and start following should scale to whatever future ambitions we may have, which is why I think discussing clippy and lychee is also important… In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just commands to run for the CI.

This could be something similar to the wasmtime’s determine job which generates a job matrix for the successive jobs. In our case we’d parse something along the lines of just --summary and generate a job matrix consisting of all the individual subcommands to run across different builders.

One thing I already like about just setup is that it also allows for maintaining the different flag combinations for different OSes (so we can move --ignore integration-tests on macos down into the justfile.)

In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just commands to run for the CI.

While I’m philosophically of the same opinion, at the same time I’m wondering whether the time to implement and the effort to maintain such a determination are financially worth the savings in compute time 😅 but it’d be really nice to have that!

.github/workflows/ci.yml

… the repo

nagisa

ok lets land this for now, I think Justfile is a reasonable future solution to the downgrades in developer experience this PR might temporarily introduce if left unattended…

Ekleog-NEAR · 2023-11-20T12:00:22Z

Awesome, thank you! I’ll do my best to submit a PR with the Justfile by tomorrow EOD :)

This reverts commit 956add9.

Ekleog-NEAR added C-housekeeping Category: Refactoring, cleanups, code quality A-security Area: Security issues A-CI Area: Continuous Integration labels Nov 15, 2023

Ekleog-NEAR force-pushed the coverage branch 3 times, most recently from 7c9837e to 63e5b2e Compare November 15, 2023 14:03

introduce code coverage analysis in CI

ee7b9e5

Ekleog-NEAR force-pushed the coverage branch 2 times, most recently from 37cfed9 to f90423b Compare November 15, 2023 14:26

try workspace cleaning in case it’s the thing breaking it all

f35de0e

Ekleog-NEAR force-pushed the coverage branch from f90423b to f35de0e Compare November 15, 2023 14:28

Ekleog-NEAR added 6 commits November 15, 2023 14:59

try removing llvm_profile_file from nextest job

82ffa9f

remove export-prefix

5c21b41

try with proper github_env syntax

4653fd5

use simpler way of setting the env

2a739fd

add debug target listing

bd4812a

try with llvm profile file override again

d1ad35f

Ekleog-NEAR force-pushed the coverage branch from 5b919ae to d1ad35f Compare November 16, 2023 01:00

Ekleog-NEAR added 2 commits November 16, 2023 01:18

remove llvm_profile_file trick

3ff8d46

let cargo llvm-cov find the right cargo profile

b089cdf

Ekleog-NEAR added 3 commits November 16, 2023 01:59

try using what seems to work everywhere

b7571f9

debug what’s going wrong with pytests

41d72b8

actually delete all profraw files

dc548a1

Ekleog-NEAR force-pushed the coverage branch from 09e5552 to 73c71ec Compare November 16, 2023 02:35

disable integration-tests for macos at the right place

e7c20cd

Ekleog-NEAR force-pushed the coverage branch from 73c71ec to e7c20cd Compare November 16, 2023 02:35

Ekleog-NEAR added 3 commits November 16, 2023 11:21

try creating the target folder early?

557baed

try not using pytest as workdir

d4c5a79

and with creating target folder?

41d555c

Ekleog-NEAR added 2 commits November 16, 2023 12:49

compress target before uploading

a974887

do not upload whole target folder

6705812

Ekleog-NEAR force-pushed the coverage branch from e464c0f to 17ad121 Compare November 16, 2023 13:34

discard preparatory build neard job

ae63426

Ekleog-NEAR force-pushed the coverage branch from 17ad121 to ae63426 Compare November 16, 2023 13:35

cleanup test lines

edccf7c

Ekleog-NEAR requested a review from nagisa November 16, 2023 14:00

Ekleog-NEAR marked this pull request as ready for review November 16, 2023 14:01

Ekleog-NEAR requested a review from a team as a code owner November 16, 2023 14:01

Ekleog-NEAR mentioned this pull request Nov 16, 2023

Generate coverage reports externally, no cargo binary taiki-e/cargo-llvm-cov#320

Open

nagisa reviewed Nov 16, 2023

View reviewed changes

review comments

f1fcc7c

Ekleog-NEAR enabled auto-merge November 19, 2023 09:35

upload with codecov token to avoid intermittent failure to autodetect…

a1c1976

… the repo

Ekleog-NEAR force-pushed the coverage branch from 21c0733 to 3908c6f Compare November 20, 2023 09:26

Merge remote-tracking branch 'upstream/master' into coverage

0bf0eac

Ekleog-NEAR force-pushed the coverage branch from 3908c6f to 0bf0eac Compare November 20, 2023 09:28

Ekleog-NEAR requested a review from nagisa November 20, 2023 11:11

Merge remote-tracking branch 'upstream/master' into coverage

c3d6ab5

Ekleog-NEAR mentioned this pull request Nov 20, 2023

add cargo audit check #10216

Merged

nagisa approved these changes Nov 20, 2023

View reviewed changes

Ekleog-NEAR added this pull request to the merge queue Nov 20, 2023

Merged via the queue into near:master with commit 956add9 Nov 20, 2023
15 checks passed

Ekleog-NEAR deleted the coverage branch November 20, 2023 12:19

nagisa added a commit to Ekleog-NEAR/nearcore that referenced this pull request Nov 23, 2023

Revert "introduce code coverage analysis in CI (near#10180)"

5aa90fd

This reverts commit 956add9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce code coverage analysis in CI #10180

introduce code coverage analysis in CI #10180

Ekleog-NEAR commented Nov 15, 2023 •

edited

codecov bot commented Nov 16, 2023 •

edited

nagisa commented Nov 16, 2023

nagisa commented Nov 16, 2023

Ekleog-NEAR commented Nov 16, 2023 •

edited

nagisa Nov 16, 2023

Ekleog-NEAR Nov 17, 2023

nagisa Nov 17, 2023 •

edited

nagisa Nov 17, 2023 •

edited

Ekleog-NEAR Nov 17, 2023 •

edited

Ekleog-NEAR Nov 19, 2023 •

edited

nagisa Nov 20, 2023

Ekleog-NEAR Nov 20, 2023

nagisa Nov 20, 2023

Ekleog-NEAR Nov 20, 2023

nagisa left a comment

Ekleog-NEAR commented Nov 20, 2023

introduce code coverage analysis in CI #10180

introduce code coverage analysis in CI #10180

Conversation

Ekleog-NEAR commented Nov 15, 2023 • edited

codecov bot commented Nov 16, 2023 • edited

Codecov Report

nagisa commented Nov 16, 2023

nagisa commented Nov 16, 2023

Ekleog-NEAR commented Nov 16, 2023 • edited

nagisa Nov 16, 2023

Choose a reason for hiding this comment

Ekleog-NEAR Nov 17, 2023

Choose a reason for hiding this comment

nagisa Nov 17, 2023 • edited

Choose a reason for hiding this comment

nagisa Nov 17, 2023 • edited

Choose a reason for hiding this comment

Ekleog-NEAR Nov 17, 2023 • edited

Choose a reason for hiding this comment

Ekleog-NEAR Nov 19, 2023 • edited

Choose a reason for hiding this comment

nagisa Nov 20, 2023

Choose a reason for hiding this comment

Ekleog-NEAR Nov 20, 2023

Choose a reason for hiding this comment

nagisa Nov 20, 2023

Choose a reason for hiding this comment

Ekleog-NEAR Nov 20, 2023

Choose a reason for hiding this comment

nagisa left a comment

Choose a reason for hiding this comment

Ekleog-NEAR commented Nov 20, 2023

Ekleog-NEAR commented Nov 15, 2023 •

edited

codecov bot commented Nov 16, 2023 •

edited

Ekleog-NEAR commented Nov 16, 2023 •

edited

nagisa Nov 17, 2023 •

edited

nagisa Nov 17, 2023 •

edited

Ekleog-NEAR Nov 17, 2023 •

edited

Ekleog-NEAR Nov 19, 2023 •

edited