Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State messaging channel for services to report on async work for better tests #266

Closed
wants to merge 24 commits into from

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Nov 28, 2022

Turns out there were at least two places causing tests to sometimes fail, all quite interesting in the end. Although this PR was mainly about getting the tests working consistently, I actually ended up adding a little feature ;-)

Cross test pollution of mutably shared schema provider

  • static SCHEMA_PROVIDER: Lazy<Mutex<Vec<Schema>>> = Lazy::new(|| Mutex::new(Vec::new()));

    We're using the above to allow the GraphQL schema builder access to the current schemas, in particular in create_type_info. This works, but when running multiple tests concurrently I think SCHEMA_PROVIDER is being over-written to in one process before it is read in another.

    We take a MutexGuard in save_static_schemas() here but it is dropped as it goes out of scope and this leaves an opening where another lock can be taken out before it's read in the same test.

  • What we want to do then is keep the lock until we read again, we can return it, keeping it alive longer, but if we keep it too long we simply block our own read, and we can't pass it on because we don't have access to the async_graphql methods.

  • So then maybe we need a yielding mutex like tokio::sync::Mutex which wakes a future when a lock is released? But we can't cos it's async and create_type_info isn't.

SOLUTION: We can force the problematic tests to run in series: https://docs.rs/serial_test/0.4.0/serial_test/

Waiting for stuff to happen

  • it takes a little while for documents to be materialised into the db
  • it takes a little while more for new schema to be registered in the GraphQL server
  • in both these cases we were adding a small sleep in order to wait for the stuff to happen, but this isn't an ideal solution as it slows down the tests, and "how long is long enough"

SOLUTION: a new state messaging channel for all services ⭐

closes: #247

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@codecov
Copy link

codecov bot commented Nov 30, 2022

Codecov Report

Base: 93.95% // Head: 94.17% // Increases project coverage by +0.21% 🎉

Coverage data is based on head (6f069f5) compared to base (670281b).
Patch coverage: 97.91% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #266      +/-   ##
==========================================
+ Coverage   93.95%   94.17%   +0.21%     
==========================================
  Files          59       59              
  Lines        4928     5027      +99     
==========================================
+ Hits         4630     4734     +104     
+ Misses        298      293       -5     
Impacted Files Coverage Δ
aquadoggo/src/node.rs 85.18% <66.66%> (+0.56%) ⬆️
aquadoggo/src/materializer/service.rs 95.41% <96.22%> (+0.52%) ⬆️
aquadoggo/src/graphql/client/dynamic_query.rs 91.01% <100.00%> (+0.14%) ⬆️
...quadoggo/src/graphql/client/dynamic_types/tests.rs 100.00% <100.00%> (ø)
aquadoggo/src/graphql/client/mutation.rs 100.00% <100.00%> (ø)
aquadoggo/src/graphql/client/static_query.rs 100.00% <100.00%> (ø)
aquadoggo/src/graphql/client/tests.rs 100.00% <100.00%> (ø)
aquadoggo/src/graphql/schema.rs 92.95% <100.00%> (+0.20%) ⬆️
aquadoggo/src/http/service.rs 95.34% <100.00%> (+0.47%) ⬆️
aquadoggo/src/manager.rs 96.00% <100.00%> (+0.16%) ⬆️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sandreae sandreae changed the title Debug failing tests Solutions for race conditions in tests (and possibly outside of them too) Nov 30, 2022
aquadoggo/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@adzialocha adzialocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Small comments for now, will look at it a little closer soon as I'm looking at it from my phone only: Wondering if we could a) give our two "status channels" more distinctive meaning and b) make it more ergonomic to use our API.

Also I'd like to look again at the underlying problem, just to understand if it's really expected behavior and not a bug (so far I think it's not).

aquadoggo/Cargo.toml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
aquadoggo/src/bus.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/service.rs Show resolved Hide resolved
aquadoggo/src/materializer/service.rs Outdated Show resolved Hide resolved
@adzialocha adzialocha self-requested a review January 23, 2023 11:25
@adzialocha adzialocha mentioned this pull request Jan 24, 2023
4 tasks
@adzialocha
Copy link
Member

adzialocha commented Jan 24, 2023

@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: #269 - What do you think? Is this a good idea?

I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!

Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.

We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.

In any case, thanks so much for digging into all of that! What a ride!

@adzialocha
Copy link
Member

adzialocha commented Jan 24, 2023

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):

thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26

@sandreae
Copy link
Member Author

@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: #269 - What do you think? Is this a good idea?

I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!

Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.

We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.

In any case, thanks so much for digging into all of that! What a ride!

Hey! Thanks for looking into this. I think what you're suggesting is a good move, it's a shame we might not use a lot of the code I did here, but maybe it was unavoidable, in any case it's good research which will feed into the next design phase!

@sandreae
Copy link
Member Author

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):

thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26

Hmm... interesting.... let me see if I can reproduce it.

@sandreae
Copy link
Member Author

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):

thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26

Hmm... interesting.... let me see if I can reproduce it.

Tried reproducing this but so far never seen a fail, ran it 20x or so.... 😞

@sandreae
Copy link
Member Author

sandreae commented Jan 25, 2023

Let's move this conversation to the new PR though 👍

@adzialocha adzialocha marked this pull request as draft January 25, 2023 11:48
@adzialocha
Copy link
Member

I've created two new issues to tackle the separate topics:

@adzialocha adzialocha changed the title Solutions for race conditions in tests (and possibly outside of them too) State messaging channel for services to report on async work during async tests Jan 25, 2023
@adzialocha adzialocha changed the title State messaging channel for services to report on async work during async tests State messaging channel for services to report on async work during tests Jan 25, 2023
@adzialocha adzialocha changed the title State messaging channel for services to report on async work during tests State messaging channel for services to report on async work for better tests Jan 25, 2023
@adzialocha
Copy link
Member

... it's a shame we might not use a lot of the code I did here

This is definitely not going to waste, think we got some really good learnings from this which will help us to keep on working on it 👍

@adzialocha adzialocha closed this Jun 20, 2023
@adzialocha
Copy link
Member

I've closed this PR for now (keeping the branch intact if we still need it). One reason is that it is stale, the other is that I think we can tackle this issue from another perspective: Rather than changing our code for better tests we can introduce new test utilities.

My suggestion would be to have a "timeout helper" which checks for a condition every x milliseconds and times out if that condition wasn't reached after y seconds. Like that we can test async conditions to arrive eventually. I've seen that pattern in other projects as well and think that this is the way to go for at least test-related problems.

@adzialocha adzialocha deleted the debug-test-failures branch July 7, 2023 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race conditions with failing tests
2 participants