Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce code coverage analysis in CI #10180

Merged
merged 26 commits into from Nov 20, 2023
Merged

Conversation

Ekleog-NEAR
Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR commented Nov 15, 2023

Note: I’m removing the preparatory "Build neard" job here. The reason is twofold:

  • I didn’t manage to get coverage output to actually write *.profraw files without the whole target folder
  • Uploading the whole compressed (to 3G) target folder as an artifact takes 6 minutes for compression and 15 minutes for upload
  • Just building, on the commit on which I was, took 2 minutes, multiplied by 4 it gives 8 minutes, so quite a lot less than the compress+upload time
  • So overall we’re spending less time by just redoing the compilation work, as opposed to uploading+downloading it; and the tests themselves are relatively fast too so having them on the bigger machines (8cores, enough for them to not be the bottleneck) shouldn’t hog them for too long.

@Ekleog-NEAR Ekleog-NEAR added C-housekeeping Category: Refactoring, cleanups, code quality A-security Area: Security issues A-CI Area: Continuous Integration labels Nov 15, 2023
@Ekleog-NEAR Ekleog-NEAR force-pushed the coverage branch 3 times, most recently from 7c9837e to 63e5b2e Compare November 15, 2023 14:03
@Ekleog-NEAR Ekleog-NEAR force-pushed the coverage branch 2 times, most recently from 37cfed9 to f90423b Compare November 15, 2023 14:26
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 880 lines in your changes are missing coverage. Please review.

Comparison is base (6b4dd20) 85.96% compared to head (c3d6ab5) 72.01%.
Report is 4211 commits behind head on master.

Files Patch % Lines
chain/chain/src/store_validator/validate.rs 3.73% 205 Missing and 1 partial ⚠️
chain/chain/src/test_utils/kv_runtime.rs 82.72% 149 Missing and 35 partials ⚠️
chain/chain/src/resharding.rs 79.29% 40 Missing and 25 partials ⚠️
chain/chain/src/flat_storage_creator.rs 83.91% 31 Missing and 24 partials ⚠️
chain/chain-primitives/src/error.rs 46.46% 50 Missing and 3 partials ⚠️
chain/chain/src/test_utils.rs 62.77% 51 Missing ⚠️
chain/chain/src/store_validator.rs 59.09% 34 Missing and 2 partials ⚠️
chain/chain/src/missing_chunks.rs 86.91% 17 Missing and 8 partials ⚠️
chain/chain/src/blocks_delay_tracker.rs 92.89% 19 Missing and 5 partials ⚠️
chain/chain/src/validate.rs 87.97% 13 Missing and 9 partials ⚠️
... and 19 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10180       +/-   ##
===========================================
- Coverage   85.96%   72.01%   -13.96%     
===========================================
  Files         243      706      +463     
  Lines       46461   141278    +94817     
  Branches        0   141278   +141278     
===========================================
+ Hits        39942   101744    +61802     
- Misses       6519    34846    +28327     
- Partials        0     4688     +4688     
Flag Coverage Δ
backward-compatibility 0.08% <0.00%> (?)
db-migration 0.08% <0.00%> (?)
genesis-check 1.23% <0.00%> (?)
integration-tests 36.31% <57.08%> (?)
linux 71.91% <83.49%> (?)
linux-nightly 71.77% <83.57%> (?)
macos 54.51% <73.58%> (?)
pytests 1.46% <0.00%> (?)
sanity-checks 1.26% <0.00%> (?)
unittests 68.35% <73.58%> (?)
upgradability 0.13% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nagisa
Copy link
Collaborator

nagisa commented Nov 16, 2023

I didn’t manage to get coverage output to actually write *.profraw files without the whole target folder

This would be great to figure out a reason for and submit an issue about in rust-lang/rust. My intuition is that the binary should be sufficient for this (as all the coverage instrumentation is part of the LLVM IR that makes it to the final binary build), but I wouldn’t be surprised if there are some files like separated debuginfo-esque files that are required for the coverage.

Just building, on the commit on which I was, took 2 minutes, multiplied by 4 it gives 8 minutes, so quite a lot less than the compress+upload time

That's the case for now though. It is conceivable that we might want to add additional such super-integration-tests in the future in which case it again makes sense to merge those jobs together at least to some degree – and having to figure out how to merge them well is an annoying and fairly arbitrary exercise (that eventually leads to a nasty behemoth that we had with the "style checks" CI job we had in buildkite)

@nagisa
Copy link
Collaborator

nagisa commented Nov 16, 2023

Codecov Report

This is kinda bogus as it gives a report between 8000 commits ago and now... but I guess it does work.

@Ekleog-NEAR
Copy link
Collaborator Author

Ekleog-NEAR commented Nov 16, 2023

This would be great to figure out a reason for and submit an issue about in rust-lang/rust. My intuition is that the binary should be sufficient for this (as all the coverage instrumentation is part of the LLVM IR that makes it to the final binary build), but I wouldn’t be surprised if there are some files like separated debuginfo-esque files that are required for the coverage.

I sent it as a question on taiki-e/cargo-llvm-cov#320 ; to see whether I’m missing something :) in practice, I can say it at least seems like cargo-llvm-cov is searching for object files in target/${profile} at some point while generating the report, considering the error message in this run.

That's the case for now though. It is conceivable that we might want to add additional such super-integration-tests in the future in which case it again makes sense to merge those jobs together at least to some degree – and having to figure out how to merge them well is an annoying and fairly arbitrary exercise (that eventually leads to a nasty behemoth that we had with the "style checks" CI job we had in buildkite)

Agreed; I’m hoping we can get to a resolution that’d allow to build a single neard again before this happens :) TBH I really don’t understand why the attempt without the whole target folder didn’t work, I just observed during the tests (eg. this run, and interestingly the genesis changes run did generate profraw files without issues even without the whole target folder) that find would list zero *.profraw files if not having the whole target folder, but having the whole target folder would work just fine.

Overall my hope is that taiki-e will tell me on the issue I linked above that I’m wrong and it should only need the binary; and then I’ll go back to debugging and trying to figure out why all the jobs except genesis changes failed when running with only the binary.

This is kinda bogus as it gives a report between 8000 commits ago and now... but I guess it does work.

Right; the reason is actually that codecov was already enabled 8000 commits ago, but hasn’t received any report since then, so as soon as this lands it should start giving a report between master and now again :)

.github/workflows/ci.yml Outdated Show resolved Hide resolved
# See https://github.com/taiki-e/cargo-llvm-cov/issues/292
- run: find target -name '*.profraw' -delete

# Run integration tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmph, splitting these nextest run lines goes in opposition to #9415. If its only serving to provide some sort of categorization within the tool, I would personally much rather keep the single command option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think #9415 should be solved by using something like just test, not by having everything stuffed into nextest, among other reasons because if we did it we could move clippy out and have a cache for it, not needing to rebuild it from scratch each time :)

Also, this split doesn’t change the fact that cargo nextest run still runs all the tests and works fine. I do think the categorization within the tool is useful, especially considering yesterday’s discussion that debugging integration tests is very bad dev UX, and thus we’d mostly be interested in the code coverage for unit-tests only.

Copy link
Collaborator

@nagisa nagisa Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, so back in buildkite days we also had "cargo nextest run", but there were like 5 different ways to run integration tests and you had to pay really close attention to which specific invocation was failing – one of them failing would usually mean the others were passing.

This was exactly what prompted me to start working on CI in the first place. My honest evaluation is that we got into that state back then precisely because it was normal and natural to "just add another invocation of cargo run/test/nextest/etc with my snowflake combination of flags".

Having a single cargo nextest invocation I believe does a great job pushing against such configurations (and I hope it will eventually push enough to obviate the need for nightly vs. non-nightly configurations as well.) Splitting invocations into parts goes back towards normalizing having more of the snowflake configurations.

Now, to make this actionable at all – I wonder if nextest could be made to export test coverage on a per-test basis. All of the tests are distinct binary invocations, which should allow us to produce coverage files for each individual test (and therefore could then eventually be filtered by e.g. crate, module, etc. after the fact), not just two broad categories of unit vs integration tests.

Unfortunately cargo llvm-cov nextest appears to just run cargo nextest after setting up the environment, but perhaps target runner scripts could be used as a mechanism to isolate and produce coverage information for each individual test (there’s even a section on debug output in the documentation?)

Copy link
Collaborator

@nagisa nagisa Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move clippy out and have a cache for it

I don’t see why we couldn’t have a cache right now. We just need to also cache the target/style directory (does rust-cache not do it…?)

FWIW my belief is that the current setup with target/style is actually better for local development purposes as developers no longer risk accidentally invalidating their cached proper build when they run cargo clippy -p foo or something of the sort (by the virtue of not needing to run cargo clippy manually at all.) Though now that I write this, I realized there’s a bug in how style tests set the TARGET_DIR – it should be different for each test that builds code (in particular themis should have its own TARGET_DIR so it does not get blocked waiting on the target directory lock…)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(does rust-cache not do it…?)

Considering the clippy test is always among the last ones to complete and is written as slow (so 120+ seconds), I’d guess it doesn’t do it.

Counter-proposal: what if we had a Justfile with:

  • A rule for unit tests
  • A rule for clippy that runs it with target_dir set to target/style
  • A rule for integration tests
  • A just test rule that runs all of them

Then developers could run just test, and CI could run the rules independently, and maybe even override the target folder depending on whether we can fix target folder caching some other way. We could also easily add the nightly configuration into the Justfile, so that there’s indeed a single command line to run. WDYT?

(I think changes to llvm-cov or cargo nextest might be a great idea as there are already many profraw files and we’d "just" need to figure out which are the right ones; but will not be reasonably doable before november 26, especially as I’m OOO the second half of next week and have a lot of other stuff to do by then)

(Also, fwiw, I personally don’t really care about having a single command line to run right now: it’s much faster for me to upload to CI and wait for the results, than to test locally, currently, but that’s part of the devenv issues we’re discussing on zulip right now, and you’re right that having a single command would in theory be good)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, relatedly, in order to be able to use the github actions for eg. lychee (that I’m introducing in #10217), we’ll need to run the tests outside of cargo nextest anyway, unless we want to check them twice in CI. In particular, we could start using the clippy action, which AFAIU has much nicer error reporting than our current "manually read the logs" system.

So I think this confirms the want for a tool like just, so that we could have just test run lychee in a way similar to what is being run by CI :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so the weekend gave me ample of time to consider the last message. I agree that having nice reporting within GitHub is really nice.

But I also see a major (I think) flaw in what I think you’re proposing we should do. To double check, are you suggesting that we maintain .github/workflows/ci.yaml and a Justfile and that they would each implement the same checks independently? e.g. where Justfile would run cargo clippy, the GHA would run the clippy action? I think doing so is precisely the mistake I would like to avoid. If we’re adding a Justfile, the CI could run exactly the just commands that ought to be run as part of a local development flow (and ideally that command is singular or is split up into sub-jobs programatically) but that does not come with the nice error reporting benefits.

But unless we eschew those benefits, I don’t see how can we ensure, especially in the long term, that the checks run via the justfile remain synchronized with those run within GHA. And that means that ultimately we would still sometimes see instances where a successful just pass does not guarantee the CI passing (or vice versa, for that matter.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current goal is indeed to have the CI run just nextest-unit && coverage && just nextest-integration && coverage, where just nextest for local use would also run the two.

However, for clippy, in order to get the nice reporting, I would indeed have a just clippy for local development that would not be DRY with the clippy-action for github, because by definition clippy-action cannot know to use our Justfile (or literally any other way to run all the tests we could have), any more than just could automatically figure out the command clippy-action runs.

So I’d say that by definition, I’d say that we can only have two of the following three:

  • nice error reporting
  • not implementing error reporting ourselves
  • locally-run command is DRY with remotely-run command

I also personally think we should choose the first two, and whenever we come across an instance where just and the CI workflow don’t give the same result we can consider that as a bug and treat it by probably just copy-pasting again from the updated upstream clippy-action.

That being said, this PR is only about nextest splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action :) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, this PR is only about nextest splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action :) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?

I mean… whatever plan we come up and start following should scale to whatever future ambitions we may have, which is why I think discussing clippy and lychee is also important… In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just commands to run for the CI.

This could be something similar to the wasmtime’s determine job which generates a job matrix for the successive jobs. In our case we’d parse something along the lines of just --summary and generate a job matrix consisting of all the individual subcommands to run across different builders.

One thing I already like about just setup is that it also allows for maintaining the different flag combinations for different OSes (so we can move --ignore integration-tests on macos down into the justfile.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just commands to run for the CI.

While I’m philosophically of the same opinion, at the same time I’m wondering whether the time to implement and the effort to maintain such a determination are financially worth the savings in compute time 😅 but it’d be really nice to have that!

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Show resolved Hide resolved
Copy link
Collaborator

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lets land this for now, I think Justfile is a reasonable future solution to the downgrades in developer experience this PR might temporarily introduce if left unattended…

@Ekleog-NEAR Ekleog-NEAR added this pull request to the merge queue Nov 20, 2023
@Ekleog-NEAR
Copy link
Collaborator Author

Awesome, thank you! I’ll do my best to submit a PR with the Justfile by tomorrow EOD :)

Merged via the queue into near:master with commit 956add9 Nov 20, 2023
15 checks passed
@Ekleog-NEAR Ekleog-NEAR deleted the coverage branch November 20, 2023 12:19
nagisa added a commit to Ekleog-NEAR/nearcore that referenced this pull request Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI Area: Continuous Integration A-security Area: Security issues C-housekeeping Category: Refactoring, cleanups, code quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants