Make testRetriesCantBeExecutedForTooManyRequestMessages more reliable #1781

glbrntt · 2024-01-29T17:27:11Z

Motivation:

The 'testRetriesCantBeExecutedForTooManyRequestMessages' test wedges infrequently. After some diagnosis this appears to be due to the intersection of two bugs. The first is that when yielding an element to the sequence, if some consumers exist, of which a subset are waiting for the next element and the remainder are not 'slow', the continuations of the fastest consumers would not be resumed when a new element was added.

The other bug is a timing issue: when waiting for the next element a subscriber may be told to suspend. However, the subscriber must drop and then reacquire the lock to store the continuation. The state wasn't re-checked on storing the continuation which opened up a window where the element may have become present between asking for it and storing the continuation.

Modifications:

Don't release the lock between attempting to consume an element and suspending to wait for it.
Resume waiting consumers more frequently.
Make a few tests less flaky

Result:

Test didn't wedge in 10k iterations

Motivation: The 'testRetriesCantBeExecutedForTooManyRequestMessages' test wedges infrequently. After some diagnosis this appears to be due to the intersection of two bugs. The first is that when yielding an element to the sequence, if some consumers exist, of which a subset are waiting for the next element and the remainder are not 'slow', the continuations of the fastest consumers would not be resumed when a new element was added. The other bug is a timing issue: when waiting for the next element a subscriber may be told to suspend. However, the subscriber must drop and then reacquire the lock to store the continuation. The state wasn't re-checked on storing the continuation which opened up a window where the element may have become present between asking for it and storing the continuation. Modifications: - Don't release the lock between attempting to consume an element and suspending to wait for it. - Resume waiting consumers more frequently. - Make a few tests less flaky Result: Test didn't wedge in 10k iterations

…grpc#1781) Motivation: The 'testRetriesCantBeExecutedForTooManyRequestMessages' test wedges infrequently. After some diagnosis this appears to be due to the intersection of two bugs. The first is that when yielding an element to the sequence, if some consumers exist, of which a subset are waiting for the next element and the remainder are not 'slow', the continuations of the fastest consumers would not be resumed when a new element was added. The other bug is a timing issue: when waiting for the next element a subscriber may be told to suspend. However, the subscriber must drop and then reacquire the lock to store the continuation. The state wasn't re-checked on storing the continuation which opened up a window where the element may have become present between asking for it and storing the continuation. Modifications: - Don't release the lock between attempting to consume an element and suspending to wait for it. - Resume waiting consumers more frequently. - Make a few tests less flaky Result: Test didn't wedge in 10k iterations

glbrntt requested a review from gjcairo January 29, 2024 17:27

glbrntt added the version/v2 Relates to v2 label Jan 29, 2024

Merge branch 'main' into fix-flaky-tests

8d0174b

glbrntt enabled auto-merge (squash) January 29, 2024 17:42

gjcairo approved these changes Jan 30, 2024

View reviewed changes

glbrntt merged commit 2c27ad5 into grpc:main Jan 30, 2024

glbrntt deleted the fix-flaky-tests branch January 30, 2024 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make testRetriesCantBeExecutedForTooManyRequestMessages more reliable #1781

Make testRetriesCantBeExecutedForTooManyRequestMessages more reliable #1781

Uh oh!

glbrntt commented Jan 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make testRetriesCantBeExecutedForTooManyRequestMessages more reliable #1781

Make testRetriesCantBeExecutedForTooManyRequestMessages more reliable #1781

Uh oh!

Conversation

glbrntt commented Jan 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants