Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If ring instances have conflicting tokens, rings can compute different instance ownership. #438

Open
pstibrany opened this issue Nov 22, 2023 · 2 comments

Comments

@pstibrany
Copy link
Member

pstibrany commented Nov 22, 2023

If instances have conflicting tokens in the ring, then different ring incarnations can compute different token ownerships from same underlying RingDesc.

This happens because ringDesc.getTokensInfo() iterate through instances in different order, and may therefore compute different assignments of tokens to instances.

Following test shows the problem:

func TestConflictingTokens(t *testing.T) {
	seed := time.Now().UnixNano()
	t.Log("token generator seed:", seed)

	instances1 := map[string]InstanceDesc{}
	instances2 := map[string]InstanceDesc{}

	const zones = 3
	const instancePerZone = 100
	for z := 0; z < zones; z++ {
		zoneGen := NewRandomTokenGeneratorWithSeed(seed) // We restart token generation for each zone.

		for i := 0; i < instancePerZone; i++ {
			inst, desc, _ := generateRingInstance(zoneGen, i, z, 100*z, nil) // Number of tokens per instance will be different in each zone.
			instances1[inst] = desc
			instances2[inst] = desc
		}
	}

	ringDesc1 := &Desc{Ingesters: instances1}
	ringDesc2 := &Desc{Ingesters: instances2}

	require.Equal(t, ringDesc1.GetTokens(), ringDesc2.GetTokens())
	require.Equal(t, ringDesc1.getTokensByZone(), ringDesc2.getTokensByZone())
	require.Equal(t, ringDesc1.getTokensInfo(), ringDesc2.getTokensInfo())
}

When running it with go test -run 'TestConflictingTokens' -v -count=1000, it fails quite reliably.

I think we should fix the ring code to always produce consistent results even in case when there are conflicting tokens in the ring.

@duricanikolic
Copy link
Contributor

Hi @pstibrany,
The first remark I have here is: does it make sense that instances have conflicting tokens? In the current design I guess it is not the case. Otherwise, once we start searching for a replica set, and we find a token owned by more than one instance, how do we chose which of the instances which will be in the replica set? This already introduces some ambiguity: namely, it could happen that the same timeseries will be replicated to different set of ingesters in different push requests.

The second remark is: generateRingInstance(), used in our tests, actually calls zoneGen.GenerateTokens() containing 2 arguments: requestedTokensCount int, number of tokens to be generated, and allTakenTokens []uint32, a list of all tokens generated so far, and ensures that, if allTakenTokens is passed, new generated tokens will be "fresh". In some tests, this one included, allTakenTokens is set to nil, but in the production code it is not the case.

So, if my first remark is correct, it will imply the correctness of the second remark. Otherwise, we need to re-think about the current design.

@pstibrany
Copy link
Member Author

Hi @pstibrany, The first remark I have here is: does it make sense that instances have conflicting tokens? In the current design I guess it is not the case. Otherwise, once we start searching for a replica set, and we find a token owned by more than one instance, how do we chose which of the instances which will be in the replica set? This already introduces some ambiguity: namely, it could happen that the same timeseries will be replicated to different set of ingesters in different push requests.

In production this should not happen:

  1. when using Consul or Etcd, then clients will never generate conflicting tokens, because they generate unique tokens and then use CAS operation to write new generated tokens back. So conflict can't happen.
  2. when using memberlist KV store, conflicting tokens can be generated by multiple instances adding an entry at the same time, but memberlist client will resolve such conflicts in deterministic way.

If I remember correctly, I only ran into this problem (conflicting tokens) in test code, as you mention in your comment. We could either make sure that no tests generate conflicting tokens, or implement predictable results as I suggest in the issue.

The second remark is: generateRingInstance(), used in our tests, actually calls zoneGen.GenerateTokens() containing 2 arguments: requestedTokensCount int, number of tokens to be generated, and allTakenTokens []uint32, a list of all tokens generated so far, and ensures that, if allTakenTokens is passed, new generated tokens will be "fresh". In some tests, this one included, allTakenTokens is set to nil, but in the production code it is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants