Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ring.DoBatch() result deterministic #430

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Nov 13, 2023

What this PR does:
This PR fixes a bug present in the ring.DoBatch() function. Namely, there might be usecases in which, for the same set of keys, and the same ring instances failing on the same keys, different order of execution of batchTracker.record() on different keys gives different values.

For example, let us assume the following scenario:

  • replication factor is 3
  • number of ring instances is 6 ($I_1$, $I_2$, $I_3$, $I_4$, $I_5$, $I_6$).
  • ring.DoBatch() is called on 2 keys $k_1$ and $k_2$.
  • replication set of $k_1$ is $(I_1, I_2, I_3)$ and of $k_2$ is $(I_4, I_5, I_6)$.
  • calls to the callback function on $k_1$ on $I_1$, $I_2$ and $I_3$ are successful.
  • calls to the callback function on $k_2$ on $I_4$, $I_5$ and $I_6$ fail with a 4xx error.

The expected outcome of this call to ring.DoBatch() is that it would fail with the 4xx error.
With the current implementation, getting results from ingesters for example in this order: $I_1$, $I_4$, $I_5$, $I_2$, $I_3$, $I_6$ will actually give rise to a failure, but in the case of this order: $I_1$, $I_4$, $I_2$, $I_3$, $I_5$, $I_6$ would actually succeed. The reason for that is that successful execution of $k_1$ on $I_1$, $I_2$, $I_3$ is taken into account more than once.

This PR fixes this problem.

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
ring/batch.go Outdated
if it.remaining.Dec() == 0 {
if b.rpcsFailed.Inc() == 1 {
b.err <- it.err.Load()
if it.succeeded.Load() < int32(it.minSuccess) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there a race between calling it.suceeded.Inc() and it.succeeded.Load()? It's possible that between the time one invocation of succeeded.Inc() gets to succeeded.Load() another one has already invoked Inc(). This was we lose one invocation of remaining.Dec() of the first one

you can save the return value of succeeded.Inc in a variable and use that instead of calling Load again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I will fix this error. Thank you @dimitarvdimitrov

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this

@duricanikolic duricanikolic marked this pull request as ready for review November 13, 2023 16:53
@duricanikolic duricanikolic merged commit 40e91ed into main Nov 13, 2023
3 checks passed
@duricanikolic duricanikolic deleted the yuri/successful-trackers branch November 13, 2023 16:53
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing I noticed post-merge...

Comment on lines +72 to +74
go func() {
wg.Wait()
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of waiting for the WaitGroup on a background goroutine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. This code was copied from batchTracker.DoBatchWithOptions(), but in the latter the additional go routine also does the cleanup, which is not necessary in this test.
I will now create a PR that removes this go routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants