Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(recoverytest): wait for next session before scanning session outcomes for pegins #5184

Merged
merged 2 commits into from
May 2, 2024

Conversation

elsirion
Copy link
Contributor

@elsirion elsirion commented May 1, 2024

No description provided.

@elsirion elsirion force-pushed the 2024-05-flaky-recoverytest branch from ff2c7e8 to b11a669 Compare May 1, 2024 11:51
Copy link
Member

@bradleystachurski bradleystachurski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling! Looks good as-is, but feel free to address the feedback before merging if you think it simplifies the test.

I'm having trouble reproducing the bug locally, so I'd assume the self-hosted runner with higher loads is needed to repro.

// session outcome in our DB. If there ever is another problem here: wait for
// fedimintd-0 specifically to acknowledge the session switch. In practice this
// should be sufficiently synchronous though.
wait_session_outcome(&client, last_tx_session).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test was previously using a different (incorrect?) approach to awaiting the session outcome by looping until

        if outputs.len() >= fed_utxos_sats.len() {
            break outputs.clone();
        }

I haven'f fully considered how, but this might be a bug since the epochs method includes descriptors from spent outputs in addition to UTXOs so we might break too early in this loop.

Now that we're explicitly waiting for the session outcome I think we can get rid of the loop and just call the recoverytool with epochs once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the loop, good catch!

@elsirion elsirion force-pushed the 2024-05-flaky-recoverytest branch from 8e2cddb to fe2b75b Compare May 1, 2024 15:42
@dpc dpc enabled auto-merge May 1, 2024 15:50
@dpc dpc added this pull request to the merge queue May 1, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 1, 2024
@bradleystachurski
Copy link
Member

For tracking MQ failure #5186

@bradleystachurski bradleystachurski added this pull request to the merge queue May 1, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 1, 2024
@bradleystachurski
Copy link
Member

Oh shoot, the recoverytool failure showed up in the MQ

00:00:39 assertion failed: epochs_descriptors.contains(*utxo_descriptor)

https://github.com/fedimint/fedimint/actions/runs/8911963925/job/24474385524#step:5:925

@dpc
Copy link
Contributor

dpc commented May 1, 2024

 00:00:36 2024-05-01T16:28:29.370540Z  INFO devimint::tests: Awaiting session outcome 2
00:00:36 2024-05-01T16:28:29.370598Z DEBUG fm::devimint: > fedimint-cli --data-dir=/tmp/nix-shell.jQkHEI/recoverytool_tests-BPu6/devimint-4154617-577/clients/recoverytool-test-client-0 dev api await_session_outcome 2
00:00:39 2024-05-01T16:28:32.566268Z  INFO devimint::tests: session found in 3.195712726s
00:00:39 2024-05-01T16:28:32.566290Z  INFO devimint::tests: Recovering using epochs method
00:00:39 2024-05-01T16:28:32.566335Z DEBUG fm::devimint: > recoverytool --readonly --cfg /tmp/nix-shell.jQkHEI/recoverytool_tests-BPu6/devimint-4154617-577/fedimintd-0 epochs --db /tmp/nix-shell.jQkHEI/recoverytool_tests-BPu6/devimint-4154617-577/fedimintd-0/database
00:00:39 thread 'main' panicked at /run/github-runner/runner-03-cc/fedimint/fedimint/devimint/src/tests.rs:2144:9:
00:00:39 assertion failed: epochs_descriptors.contains(*utxo_descriptor)
00:00:39 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@elsirion
Copy link
Contributor Author

elsirion commented May 1, 2024

Uff, now that's weird …

@bradleystachurski
Copy link
Member

bradleystachurski commented May 2, 2024

I'm still having trouble reproducing locally. Let's see if we can figure out more when it flakes again in CI #5192

edit: having trouble reproducing with the self-hosted runner now 😄

@elsirion elsirion marked this pull request as draft May 2, 2024 10:57
@dpc
Copy link
Contributor

dpc commented May 2, 2024

Maybe let's land and follow up if/when we spot it again?

@elsirion elsirion marked this pull request as ready for review May 2, 2024 20:19
@elsirion
Copy link
Contributor Author

elsirion commented May 2, 2024

I'm ok with that, but not very satisfying :/

@elsirion elsirion added this pull request to the merge queue May 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 2, 2024
@dpc
Copy link
Contributor

dpc commented May 2, 2024

 fedimint-test-all-wasm32-unknown-unknown-ci> 00:01:05 Error: Polling waiting-server-status: VerifyingConfigs failed after 109 retries (timeout: 60s)

That's new.

@dpc dpc added this pull request to the merge queue May 2, 2024
Merged via the queue into fedimint:master with commit 445f6f7 May 2, 2024
21 checks passed
@elsirion elsirion deleted the 2024-05-flaky-recoverytest branch May 3, 2024 10:53
@justinmoon
Copy link
Contributor

dev call: we thought this was a fix for #5182, but apparently not ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants