-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ring: add method to compute token ranges owned by given instance. #433
Conversation
|
||
// TokenRanges describes token ranges owned by an instance. | ||
// It consists of [start, end] pairs, where both start and end are inclusive. | ||
type TokenRanges []uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more expressive here and I would say something about the length of the slice, and also about the special cases you handle in the implementation.
E.g., this slice contains an even number of elements, elements at even and odd positions represent range starts and ends respectively, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting implementation details would commit to this implementation. Given the early nature of the development, we may still want or need to change it. I prefer to hide implementation details in supporting functions/methods, and only document what's necessary for clients to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn’t document implementationsl details, but giving an example saying something like “e.g., [5, 10, 20, 30] means that the corresponding instance covers tokens 5-10 and 20-30” wouldn’t damage.
|
||
// GetTokenRangesForInstance returns the token ranges owned by an instance in the ring. | ||
// | ||
// Current implementation only works with multizone setup, where number of zones is equal to replication factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it might be of interest for this implementation, but there is a struct called ringToken
declared in ownership_priority_queue.go
as:
type ringToken struct {
token uint32
prevToken uint32
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't reuse the type only because it has same two uint32 fields. I was considering to create custom type with similar fields (start/end), but decided to keep implementation as-is for now, as the code is pretty straightforward to follow (imho).
ring/token_range.go
Outdated
return nil, errors.New("no tokens for zone") | ||
} | ||
|
||
ranges := make([]uint32, 0, 2*(len(instance.Tokens)+1)) // 1 range (2 values) per token + one additional if we need to split the rollover range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rollover range a range having the start token higher than the end token? E.g.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rollover range a range having the start token higher than the end token? E.g.,
$[2^{32} - 10, 15]$
Yes, but it's stored as two ranges in the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments
"instance-0-1": {10, 50, 100}, | ||
}, | ||
expected: map[string]TokenRanges{ | ||
"instance-0-0": {10, 24, 50, 74}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I personally don’t agree with: if token 10 belongs to an instance, I’d expect the corresponding range to include that token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that it's confusing, and wouldn't mind if we fixed that (in some other PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that it's confusing, but it's a function of this code, which causes items that map directly to a token to actually belong to the range owned by the NEXT token.
We could have used half-open ranges in the implementation ([start, end)
), but that would have made the IncludesKey
logic more complicated, since we'd need to do an additional check if the binarySearch
found the key in the ranges slice.
In my opinion, having the token ranges be closed is also easier to reason about, since you don't need to remember the implementation detail of ring instances not actually "owning" any items that map directly to the tokens they have claimed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, by "fixing" this, I didn't mean to change how token ranges work. I agree that closed ranges make sense.
What I meant by fixing is modifying logic in searchToken
method, so that if ingester owns the token, it actually "owns" it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! The logic makes sense to me. I left nits, but I haven't found any issue.
if !r.cfg.ZoneAwarenessEnabled || rf != numZones { | ||
// if zoneAwareness is disabled we need to treat the whole ring as one big zone, and we would | ||
// need to walk the ring backwards looking for RF-1 tokens from other instances to determine the range. | ||
return nil, errors.New("can't use ring configuration for computing token ranges") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would rather explain "why" telling that this function only support the zone-aware ring with RF = num zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather explain "why" telling that this function only support the zone-aware ring with RF = num zones.
True "why" is because we don't need it for our purposes, and it wasn't worth the effort to add support for all cases. I think we should eventually do it, so that Mimir feature using this has full support of various Mimir setups.
I don't think we need to explain this reasoning beyond what's mentioned in the comment:
// Current implementation only works with multizone setup, where number of zones is equal to replication factor.
} | ||
|
||
// walk the ring backwards, alternating looking for ends and starts of ranges | ||
for i := len(subringTokens) - 1; i > 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[not a blocker] This algorithm complexity is a function of the number of tokens in the ring, so the more instances you have the higher the complexity. The current algorithm as O(N)
complexity, where N is the number of tokens.
I think another way to approach this problem is to lookup the instance's tokens and then for each of them finding the range start. The range start can be find using a binary search, so complexity would be O(T * log N)
, where T is the number of tokens owned by a single instance.
Example:
- 200 ingesters (per zone)
- 512 tokens per ingester
- Current algorithm complexity: 200 * 512 = 102400
- Binary search algorithm complexity: 512 * log(102400) = 512 * 5 = 2560
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I agree there's room for improvement (also see previous comment about not supporting other cases). @duricanikolic also had some ideas (mentioned privately) how to simplify this.
I will keep the current implementation to focus on the main feature we're working on, but will keep this in mind as possible optimization.
} else { | ||
// we have a range end, and are looking for the start of the range | ||
if info.InstanceID != instanceID { | ||
ranges = append(ranges, rangeEnd, token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this hard to read (and the append()
below too). I think append(ranges, token, rangeEnd)
is easier to read. I understand it doesn't make a difference in the logic cause you sort tokens at the end anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment conflicts with the comment below about slices.Reverse()
.
ring/token_range.go
Outdated
|
||
// if this instance claimed the first token, it owns the wrap-around range, which we'll break into two separate ranges | ||
firstToken := subringTokens[0] | ||
firstTokeninfo, ok := r.ringInstanceByToken[firstToken] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
firstTokeninfo, ok := r.ringInstanceByToken[firstToken] | |
firstTokenInfo, ok := r.ringInstanceByToken[firstToken] |
} | ||
|
||
// Ensure returned ranges are sorted. | ||
slices.Sort(ranges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to sort it? Isn't a slices.Reverse()
what we need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that reversing should work, if we preserve order of adding of tokens to the slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a quick try, and tests started to fail, so I'm keeping current code to move on. May look at this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan was to use slices.Reverse()
(which is why the append
s above are slightly awkward), but I ran into (likely) similar issues to Peter, so I opted to just move on at the time.
const numZones = 3 | ||
const numTokens = 512 | ||
const replicationFactor = numZones // This is the only config supported by GetTokenRangesForInstance right now. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] You may want to init and log the rand seed here to be able to eventually reproduce a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added initTokenGenerator
and passing it to generateRingInstances
now, modified many tests in the progress. (But not all, there's still GenerateTokens
function that generates random tokens without logging the seed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that in other tests too: #437
ring/token_range_test.go
Outdated
|
||
// find some instance in subring | ||
var instanceID string | ||
for id := range sr.ringDesc.Ingesters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would run this test for every instance in the ring. It's an easy change to do, but would increase our confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure enough, it found bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out that our ring tests can be flaky, because our tests could produce conflicting tokens for different instances. After fixing token generation in tests, test is now reliable.
3d4d9da
to
82b061d
Compare
Stop iteration early in findInstancesForKey, if possible.
6c87890
to
958aa2a
Compare
What this PR does:
This PR adds
GetTokenRangesForInstance
method to*Ring
. This method returns token ranges owned by given instance. Implementation is simplified to only support case when zones are enabled, and number of zones is equal to replication factor.This PR also adds
numberOfKeysOwnedByInstance
method to the*Ring
, which is more universal way of computing key ownership (it supports zones, instance states, andzoneCount!=RF
case... but is also 14x slower). This is built on existing code for finding instances for given key, and is used to verify implementation of(*Ring).GetTokenRangesForInstance
+(TokenRanges).IncludesKey
methods.Which issue(s) this PR fixes:
We would like to use this code to implement better series limiting in Mimir by considering series ownership.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]