Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestPartitionReader_WaitReadConsistency flakyness #7391

Merged
merged 3 commits into from
Feb 15, 2024

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Feb 15, 2024

What this PR does

@dimitarvdimitrov reported me that TestPartitionReader_WaitReadConsistency is flaky. We've seen it failing in this CI run because of:

--- FAIL: TestPartitionReader_WaitReadConsistency (0.00s)
    --- FAIL: TestPartitionReader_WaitReadConsistency/should_return_after_all_produced_records_up_have_been_consumed (1.02s)
        reader_test.go:128: produced 2 records
        reader_test.go:137: consumed records: [record-1]
        reader_test.go:148: started waiting for read consistency
        reader_test.go:153: finished waiting for read consistency
        reader_test.go:145: consumed records: [record-2]
        writer_test.go:394: 
            	Error Trace:	/__w/mimir/mimir/pkg/storage/ingest/writer_test.go:394
            	            				/__w/mimir/mimir/pkg/storage/ingest/reader_test.go:132
            	Error:      	Not equal: 
            	            	expected: []int{0, 1}
            	            	actual  : []int{1, 0}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1,4 +1,4 @@
            	            	 ([]int) (len=2) {
            	            	- (int) 0,
            	            	- (int) 1
            	            	+ (int) 1,
            	            	+ (int) 0
            	            	 }
            	Test:       	TestPartitionReader_WaitReadConsistency/should_return_after_all_produced_records_up_have_been_consumed
FAIL

The problem is that there's a race between the 2 async functions, where the function calling consumer.waitRecords() may end after the one calling reader.WaitReadConsistency(ctx) even if reader.WaitReadConsistency(ctx) effectively waited for all records to be consumed.

In this PR I'm reworking the test. Instead of using runAsyncAndAssertCompletionOrder(), I define a custom consume function which adds a delay on the 2nd (and last) record consumption and then increases a counter of processed records. Such counter is increased after the record has been processed by consumption (processing time is simulated with the artificial delay) but before the consume function returns. Then we can just assert that the expected number of records have been consumed once reader.WaitReadConsistency(ctx) returns.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci marked this pull request as ready for review February 15, 2024 10:06
@pracucci pracucci requested a review from a team as a code owner February 15, 2024 10:06
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this

@pracucci pracucci merged commit fa200a5 into main Feb 15, 2024
28 checks passed
@pracucci pracucci deleted the fix-TestPartitionReader_WaitReadConsistency branch February 15, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants