State messaging channel for services to report on async work for better tests #266

sandreae · 2022-11-28T13:47:59Z

Turns out there were at least two places causing tests to sometimes fail, all quite interesting in the end. Although this PR was mainly about getting the tests working consistently, I actually ended up adding a little feature ;-)

Cross test pollution of mutably shared schema provider

static SCHEMA_PROVIDER: Lazy<Mutex<Vec<Schema>>> = Lazy::new(|| Mutex::new(Vec::new()));

We're using the above to allow the GraphQL schema builder access to the current schemas, in particular in create_type_info. This works, but when running multiple tests concurrently I think SCHEMA_PROVIDER is being over-written to in one process before it is read in another.

We take a MutexGuard in save_static_schemas() here but it is dropped as it goes out of scope and this leaves an opening where another lock can be taken out before it's read in the same test.
What we want to do then is keep the lock until we read again, we can return it, keeping it alive longer, but if we keep it too long we simply block our own read, and we can't pass it on because we don't have access to the async_graphql methods.
So then maybe we need a yielding mutex like tokio::sync::Mutex which wakes a future when a lock is released? But we can't cos it's async and create_type_info isn't.

SOLUTION: We can force the problematic tests to run in series: https://docs.rs/serial_test/0.4.0/serial_test/ ⭐

Waiting for stuff to happen

it takes a little while for documents to be materialised into the db
it takes a little while more for new schema to be registered in the GraphQL server
in both these cases we were adding a small sleep in order to wait for the stuff to happen, but this isn't an ideal solution as it slows down the tests, and "how long is long enough"

SOLUTION: a new state messaging channel for all services ⭐

closes: #247

📋 Checklist

Add tests that cover your changes
Add this PR to the Unreleased section in CHANGELOG.md
Link this PR to any issues it closes
New files contain a SPDX license header

codecov · 2022-11-30T00:50:57Z

Codecov Report

Base: 93.95% // Head: 94.17% // Increases project coverage by +0.21% 🎉

Coverage data is based on head (6f069f5) compared to base (670281b).
Patch coverage: 97.91% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #266      +/-   ##
==========================================
+ Coverage   93.95%   94.17%   +0.21%     
==========================================
  Files          59       59              
  Lines        4928     5027      +99     
==========================================
+ Hits         4630     4734     +104     
+ Misses        298      293       -5

Impacted Files	Coverage Δ
aquadoggo/src/node.rs	`85.18% <66.66%> (+0.56%)`	⬆️
aquadoggo/src/materializer/service.rs	`95.41% <96.22%> (+0.52%)`	⬆️
aquadoggo/src/graphql/client/dynamic_query.rs	`91.01% <100.00%> (+0.14%)`	⬆️
...quadoggo/src/graphql/client/dynamic_types/tests.rs	`100.00% <100.00%> (ø)`
aquadoggo/src/graphql/client/mutation.rs	`100.00% <100.00%> (ø)`
aquadoggo/src/graphql/client/static_query.rs	`100.00% <100.00%> (ø)`
aquadoggo/src/graphql/client/tests.rs	`100.00% <100.00%> (ø)`
aquadoggo/src/graphql/schema.rs	`92.95% <100.00%> (+0.20%)`	⬆️
aquadoggo/src/http/service.rs	`95.34% <100.00%> (+0.47%)`	⬆️
aquadoggo/src/manager.rs	`96.00% <100.00%> (+0.16%)`	⬆️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

aquadoggo/src/materializer/service.rs

aquadoggo/Cargo.toml

adzialocha

Cool!

Small comments for now, will look at it a little closer soon as I'm looking at it from my phone only: Wondering if we could a) give our two "status channels" more distinctive meaning and b) make it more ergonomic to use our API.

Also I'd like to look again at the underlying problem, just to understand if it's really expected behavior and not a bug (so far I think it's not).

aquadoggo/Cargo.toml

CHANGELOG.md

aquadoggo/src/bus.rs

aquadoggo/src/materializer/service.rs

adzialocha · 2023-01-24T18:49:51Z

@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: #269 - What do you think? Is this a good idea?

I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!

Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.

We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.

In any case, thanks so much for digging into all of that! What a ride!

adzialocha · 2023-01-24T18:55:48Z

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):

thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26

sandreae · 2023-01-25T10:47:19Z

@sandreae, after some experimentation I thought it might be a good idea to keep this PR open for now and rather only merge the parts which fix the race conditions. You can find it here: #269 - What do you think? Is this a good idea?

I think it will make us get to what we want faster: Fix the annoying race conditions. The newly introduced feature very helpful and definitely more elegant, but is maybe not "solved" yet as we're both not happy with its API design. It doesn't give us much functionality outside of tests but already affects the API for other non-tests parts of the stack, making them harder to grasp. I'd rather merge the important bit for now and keep this open until we find something pretty!

Leaving your feature changes unmerged still keeps the second class of race conditions unfixed, but we can at least "control" them with bumping the waiting times. Also not beautiful, but it would come with no big changes for now.

We can keep this here open and make a new ticket which deals with introducing that feature? If you're happy with this I open #269 for review and create the ticket.

In any case, thanks so much for digging into all of that! What a ride!

Hey! Thanks for looking into this. I think what you're suggesting is a good move, it's a shame we might not use a lot of the code I did here, but maybe it was unavoidable, in any case it's good research which will feed into the next design phase!

sandreae · 2023-01-25T10:47:55Z

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):

thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26

Hmm... interesting.... let me see if I can reproduce it.

sandreae · 2023-01-25T11:03:41Z

In this and the new branch I sadly had still the following failing test (as I've mentioned on the chat):
thread 'db::stores::entry::tests::paginated_log_entries::case_6' panicked at 'Send to store: Custom("Entry's claimed seq num of 1 does not match expected seq num of 23 for given public key and log")', /home/adz/.cargo/registry/src/github.com-1ecc6299db9ec823/p2panda-rs-0.6.0/src/test_utils/db/test_db.rs:246:26
Hmm... interesting.... let me see if I can reproduce it.

Tried reproducing this but so far never seen a fail, ran it 20x or so.... 😞

sandreae · 2023-01-25T11:05:08Z

Let's move this conversation to the new PR though 👍

adzialocha · 2023-01-25T11:51:36Z

I've created two new issues to tackle the separate topics:

adzialocha · 2023-01-25T11:58:35Z

... it's a shame we might not use a lot of the code I did here

This is definitely not going to waste, think we got some really good learnings from this which will help us to keep on working on it 👍

adzialocha · 2023-06-20T11:41:23Z

I've closed this PR for now (keeping the branch intact if we still need it). One reason is that it is stale, the other is that I think we can tackle this issue from another perspective: Rather than changing our code for better tests we can introduce new test utilities.

My suggestion would be to have a "timeout helper" which checks for a condition every x milliseconds and times out if that condition wasn't reached after y seconds. Like that we can test async conditions to arrive eventually. I've seen that pattern in other projects as well and think that this is the way to go for at least test-related problems.

sandreae added 9 commits November 28, 2022 13:45

Make some of the failing tests fail consitently

e0ccad6

Remove unused imports

1465fc2

Debug print the built GraphQL schema

b612396

More logging

3eee345

Yet more logging

37d0614

Use serial_test to make tests run in series

1afa236

Introduce service status channel

0930655

Remove epic logging

0280889

fmt

d1508d6

Clippy

d925843

sandreae changed the title ~~Debug failing tests~~ Solutions for race conditions in tests (and possibly outside of them too) Nov 30, 2022

adzialocha reviewed Nov 30, 2022

View reviewed changes

aquadoggo/src/materializer/service.rs Outdated Show resolved Hide resolved

sandreae added 11 commits December 2, 2022 09:51

More doc strings

fd7aa90

Move ServiceStatusMessage enum to bus module

9993aab

Clippy

8b34515

Us a z not an s

7c2c475

Simplify status message types

7db2861

Run all tests which use graphql_test_client in series

4dd3103

Clippy

e506c6a

fmt

fe45bf1

Improve message doc string

18c7e20

Also run e2e test in series

10d6b04

Update README

cb79962

sandreae requested review from adzialocha and cafca December 2, 2022 14:56

sandreae commented Dec 2, 2022

View reviewed changes

aquadoggo/Cargo.toml Outdated Show resolved Hide resolved

Move serial_test into dev deps

6b70e12

adzialocha reviewed Dec 2, 2022

View reviewed changes

Review fixes

36ca55a

adzialocha self-requested a review January 23, 2023 11:25

Clean up unused variables

6f069f5

adzialocha mentioned this pull request Jan 24, 2023

Fix test-runner race conditions #269

Merged

4 tasks

adzialocha marked this pull request as draft January 25, 2023 11:48

This was referenced Jan 25, 2023

Irregular failing tests #270

Closed

Not depend on timeouts for async testing flows #271

Open

adzialocha changed the title ~~Solutions for race conditions in tests (and possibly outside of them too)~~ State messaging channel for services to report on async work during async tests Jan 25, 2023

adzialocha changed the title ~~State messaging channel for services to report on async work during async tests~~ State messaging channel for services to report on async work during tests Jan 25, 2023

adzialocha changed the title ~~State messaging channel for services to report on async work during tests~~ State messaging channel for services to report on async work for better tests Jan 25, 2023

adzialocha removed request for cafca and adzialocha January 25, 2023 11:54

adzialocha closed this Jun 20, 2023

adzialocha deleted the debug-test-failures branch July 7, 2023 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State messaging channel for services to report on async work for better tests #266

State messaging channel for services to report on async work for better tests #266

sandreae commented Nov 28, 2022 •

edited

Loading

codecov bot commented Nov 30, 2022 •

edited

Loading

adzialocha left a comment

adzialocha commented Jan 24, 2023 •

edited

Loading

adzialocha commented Jan 24, 2023 •

edited

Loading

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023 •

edited

Loading

adzialocha commented Jan 25, 2023

adzialocha commented Jan 25, 2023

adzialocha commented Jun 20, 2023

State messaging channel for services to report on async work for better tests #266

State messaging channel for services to report on async work for better tests #266

Conversation

sandreae commented Nov 28, 2022 • edited Loading

Cross test pollution of mutably shared schema provider

Waiting for stuff to happen

📋 Checklist

codecov bot commented Nov 30, 2022 • edited Loading

Codecov Report

adzialocha left a comment

Choose a reason for hiding this comment

adzialocha commented Jan 24, 2023 • edited Loading

adzialocha commented Jan 24, 2023 • edited Loading

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023

sandreae commented Jan 25, 2023 • edited Loading

adzialocha commented Jan 25, 2023

adzialocha commented Jan 25, 2023

adzialocha commented Jun 20, 2023

sandreae commented Nov 28, 2022 •

edited

Loading

codecov bot commented Nov 30, 2022 •

edited

Loading

adzialocha commented Jan 24, 2023 •

edited

Loading

adzialocha commented Jan 24, 2023 •

edited

Loading

sandreae commented Jan 25, 2023 •

edited

Loading