Add backwards-compatible test matrix #4010

bradleystachurski · 2024-01-04T16:49:17Z

Building off of #3982
Progress on #3962

Adds backwards-compatibility testing for fedimintd, gatewayd, fedimint-cli, and gateway-cli defined as a Cartesian product of the binaries and versions defined in scripts/tests/cross-test-ci-all.sh (just v0.2.1 for now). This approach won't scale well, however this is useful to begin identifying breaking changes in master, which already exist.

In CI, this step takes ~30 minutes so I've decided to run on each push instead of only in the merge queue. I think it's a good future improvement to parallelize the runs and to consider running only on merge.

Backwards-compatibility matrix is logged at the end of the run.

Backwards-compatibility tests summary:
fed_version  client_version  gateway_version  exit_code
v0.2.1       v0.2.1          current          1
v0.2.1       current         v0.2.1           1
v0.2.1       current         current          1
current      v0.2.1          v0.2.1           1
current      v0.2.1          current          1
current      current         v0.2.1           1

If any combination fails, this step will fail CI.

codecov · 2024-01-04T16:57:31Z

Codecov Report

Attention: 213 lines in your changes are missing coverage. Please review.

Comparison is base (a099fff) 58.30% compared to head (e4e0098) 58.10%.

Files	Patch %	Lines
devimint/src/util.rs	0.00%	158 Missing ⚠️
devimint/src/main.rs	0.00%	44 Missing ⚠️
devimint/src/external.rs	0.00%	7 Missing ⚠️
devimint/src/federation.rs	0.00%	2 Missing ⚠️
devimint/src/lib.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4010      +/-   ##
==========================================
- Coverage   58.30%   58.10%   -0.20%     
==========================================
  Files         192      192              
  Lines       42757    42923     +166     
==========================================
+ Hits        24928    24941      +13     
- Misses      17829    17982     +153

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

douglaz · 2024-01-16T20:52:11Z

Good stuff

elsirion · 2024-01-17T09:16:04Z

I agree it's useful to test old/new gateway-cli with old/new gatewayd. It's already intentionally included in the test matrix for this PR 😄

Sorry 🙈 I went by the table and didn't check the code 😅

bradleystachurski · 2024-01-17T09:44:56Z

Sorry 🙈 I went by #4010 (comment) and didn't check the code 😅

No worries!

Also, for visibility to folks following this PR, I'm currently debugging some issues in CI in a separate PR (#4050) to avoid cluttering this one. The GH actions runner started running out of disk space before finishing the backwards-compatibility run.

bradleystachurski · 2024-01-17T19:07:39Z

.github/workflows/ci-nix.yml

@@ -126,6 +126,17 @@ jobs:
        if: (github.event_name != 'pull_request' || matrix.build-in-pr) && matrix.run-tests && (matrix.host != 'macos')
        run: nix build -L .#wasm32-unknown.ci.wasmTest --keep-failed

+      - name: Cleanup disk space
+        if: (github.event_name != 'pull_request' || matrix.build-in-pr) && matrix.run-tests
+        run: sudo rm -rf /usr/local/lib/android


The buildjet runner started running out of space during the backwards compatibility tests. It's not obvious to me why this just started happening after several successful CI runs previously without a rebase. In a separate branch I identified some low hanging fruit to cleanup, along with deleting the devimint-* directory, which was adding ~2GB per matrix iteration.

#4050

https://github.com/fedimint/fedimint/actions/runs/7554418498/job/20567202313?pr=4050#step:11:4336

Filesystem Size Used Avail Use% Mounted on udev 14G 0 14G 0% /dev tmpfs 2.8G 992K 2.8G 1% /run /dev/sda1 114G 112G 1.6G 99% / tmpfs 14G 4.0K 14G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 14G 0 14G 0% /sys/fs/cgroup /dev/sda15 105M 6.1M 99M 6% /boot/efi /dev/loop0 64M 64M 0 100% /snap/core20/1822 /dev/loop1 50M 50M 0 100% /snap/snapd/17950 /dev/loop2 92M 92M 0 100% /snap/lxd/24061 tmpfs 2.8G 0 2.8G 0% /run/user/0 /dev/loop3 64M 64M 0 100% /snap/core20/2105 Did not have permissions for all directories 6.8G ┌── CodeQL │███▒▒░ │ 6% 10G ┌─┴ hostedtoolcache │█████░ │ 10% 13G ┌─┴ opt │██████ │ 12% 11G │ ┌── deps│█████▓▓▓ │ 10% 17G │ ┌─┴ debug │███████▓ │ 17% 17G │ ┌─┴ target │███████▓ │ 17% 17G │ ┌─┴ fedimint │███████▓ │ 17% 17G │ ┌─┴ fedimint │███████▓ │ 17% 17G │ ┌─┴ _work │███████▓ │ 17% 18G │ ┌─┴ actions-runner │████████ │ 17% 19G │ ┌─┴ ubuntu │████████ │ 19% 20G ├─┴ home │████████ │ 19% 31G │ ┌── store │█████████████ │ 30% 31G ├─┴ nix │█████████████ │ 30% 6.2G │ ┌── lib │███░░░░░░░░░░░ │ 6% 12G │ │ ┌── sdk │█████▓▒▒▒░░░░░ │ 11% 12G │ │ ┌─┴ android │█████▓▒▒▒░░░░░ │ 11% 13G │ │ ┌─┴ lib │██████▒▒▒░░░░░ │ 13% 22G │ ├─┴ local │█████████░░░░░ │ 21% 36G ├─┴ usr │██████████████ │ 34% 106G ┌─┴ / │█████████████████████████████████████████ │ 100%

@bradleystachurski The problem is that cargo build etc. is very wasteful.

Nix makes it better because crane compresses ./target directory and keeps it deduplicated between runs.

Now this change introduces a whole new build of ./target, just like a developer would have locally, and that does not decompose with with anything else.

We could avoid it, by making these scripts a little smarter and use binaries built already via Nix, instead of running an extra raw cargo build.

This can be made conditional by the CI env var , that would skip all cargo builds, and for the "testing current version" use nix build to get the binaries of the current source code.

Eh... but this won't work for running tests ... Oh dear ...

In that case maybe running the whole thing on a separate GHA workflow, that runs after the main one passes is the way to go.

In that case maybe running the whole thing on a separate GHA workflow, that runs after the main one passes is the way to go.

@dpc Is that something that can be addressed in a followup PR along with other optimizations, or do you think that's a blocker for merging?

Fixing this might require twisting the solution inside out, so I'm not sure if there's a point landing it...

We might want to think a bit deeper about the whole thing. :/

douglaz

Perhaps there are some optimizations that could be done, but looks good enough for now

bradleystachurski · 2024-01-24T17:25:10Z

The latest change moves the backwards-compatibility tests to a separate github action workflow that isn't required to pass.

Doesn't add any additional time to the required Build on linux workflow
Gives a clear pass/fail of backwards-compatibility in CI if we wait for CI to finish
Doesn't block merging if backwards-compatibility doesn't finish or fails

bradleystachurski · 2024-01-25T17:01:18Z

Rebasing to include the fix from #4104

dpc · 2024-01-25T19:09:45Z

.github/workflows/ci-backwards-compatibility.yml

+    branches: [ "main", "master", "devel", "releases/v*" ]
+
+  # Allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:


I would like to avoid keeping it as a PR forever, but I'm worried that even non-required run doesn't buy us anything, will cost money to run, will still cause PRs to wait for it to finish slowing us down and will make the ✔️ to turn into ❎ in the UI.

That's why I was thinking about making it available as just ... command for now, and maybe we could make it a workflow that one can dispatch manually, and just not run it on PRs at all for now.

But I guess we can land first, see how it goes, and tweak later.

will cost money to run

Yep, I agree.

Some followups will help:

Run tests in parallel Add backwards-compatible test matrix #4010 (comment)

Consider running in merge queue instead of on each push

Main drawback is the additional review cycles introduced by finding out about breaking changes at the end of the review process instead of throughout

Run backwards-compatibility on self-hosted infra

will still cause PRs to wait for it to finish slowing us down

That's fair, I think that makes running tests in a parallel the highest priority followup.

will make the ✔️ to turn into ❎ in the UI

This is good! Master has been broken for several weeks so I think it's valuable it shows up in the UI 😃

re: costs

Some back-of-the-envelope math

70 commits to fedimint/fedimint in the last month

https://github.com/fedimint/fedimint/pulse/monthly

Assuming 1 backwards-compatibility run per commit

Assuming cost is accurately reflected in buildjet's pricing, we pay $0.016 / min

Backwards-compatibility workflow takes ~30 minutes

0.016 $/min * 30 min * 70 commits/month => ~$34 / month

Ballpark, $34 / month seems reasonable.

Runtime cost isn't our biggest concern imo, but number of blocked runners are (and upping our runner count would be $$$). I'm ok with turning it on and seeing how it goes.

.github/workflows/ci-nix.yml

nix/flakebox.nix

elsirion · 2024-01-26T09:26:54Z

.github/workflows/ci-backwards-compatibility.yml

+    branches: [ "main", "master", "devel", "releases/v*" ]
+
+  # Allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:


Runtime cost isn't our biggest concern imo, but number of blocked runners are (and upping our runner count would be $$$). I'm ok with turning it on and seeing how it goes.

bradleystachurski force-pushed the overridable_commands branch 29 times, most recently from 05e0dbc to 5d9df4e Compare January 5, 2024 23:21

bradleystachurski dismissed dpc’s stale review via 5527bb8 January 17, 2024 19:02

bradleystachurski commented Jan 17, 2024

View reviewed changes

bradleystachurski requested review from dpc, elsirion and justinmoon January 17, 2024 19:45

douglaz previously approved these changes Jan 19, 2024

View reviewed changes

bradleystachurski mentioned this pull request Jan 23, 2024

fix: make single guardian devimint cli test backwards-compatible #4104

Merged

bradleystachurski dismissed douglaz’s stale review via de28ba0 January 24, 2024 17:07

bradleystachurski force-pushed the overridable_commands branch from 5527bb8 to de28ba0 Compare January 24, 2024 17:07

douglaz and others added 5 commits January 25, 2024 16:04

feat: overridable binaries and cross test

fbc5562

fix: add version_or_default

9501ba9

feat: add backwards-compatibility test matrix

faa156c

chore: cleanup disk space for CI

b40bc8c

chore: move backwards-compatibility to separate GHA workflow

e4e0098

bradleystachurski force-pushed the overridable_commands branch from de28ba0 to e4e0098 Compare January 25, 2024 17:01

dpc reviewed Jan 25, 2024

View reviewed changes

dpc approved these changes Jan 25, 2024

View reviewed changes

bradleystachurski mentioned this pull request Jan 25, 2024

fix: make gateway api backwards-compatible #4136

Merged

elsirion approved these changes Jan 26, 2024

View reviewed changes

elsirion added this pull request to the merge queue Jan 26, 2024

Merged via the queue into fedimint:master with commit 5ed187d Jan 26, 2024
21 of 22 checks passed

bradleystachurski deleted the overridable_commands branch January 26, 2024 14:48

This was referenced Feb 1, 2024

Run backwards-compatibility tests in parallel #4206

Closed

feat: overridable binaries and cross test #3982

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backwards-compatible test matrix #4010

Add backwards-compatible test matrix #4010

bradleystachurski commented Jan 4, 2024 •

edited

codecov bot commented Jan 4, 2024 •

edited

douglaz commented Jan 16, 2024

elsirion commented Jan 17, 2024

bradleystachurski commented Jan 17, 2024 •

edited

bradleystachurski Jan 17, 2024

dpc Jan 19, 2024

dpc Jan 19, 2024

bradleystachurski Jan 19, 2024

dpc Jan 20, 2024

douglaz left a comment

bradleystachurski commented Jan 24, 2024 •

edited

bradleystachurski commented Jan 25, 2024

dpc Jan 25, 2024

bradleystachurski Jan 25, 2024

bradleystachurski Jan 25, 2024

elsirion Jan 26, 2024

elsirion Jan 26, 2024

Add backwards-compatible test matrix #4010

Add backwards-compatible test matrix #4010

Conversation

bradleystachurski commented Jan 4, 2024 • edited

codecov bot commented Jan 4, 2024 • edited

Codecov Report

douglaz commented Jan 16, 2024

elsirion commented Jan 17, 2024

bradleystachurski commented Jan 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

douglaz left a comment

Choose a reason for hiding this comment

bradleystachurski commented Jan 24, 2024 • edited

bradleystachurski commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradleystachurski commented Jan 4, 2024 •

edited

codecov bot commented Jan 4, 2024 •

edited

bradleystachurski commented Jan 17, 2024 •

edited

bradleystachurski commented Jan 24, 2024 •

edited