-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain block order after peer selection or exhaustion during peer streaming #258
Conversation
opts := newSessionTestAdminOptions(). | ||
SetFetchSeriesBlocksMaxBlockRetries(2) | ||
s, err := newSession(opts) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want require.NoError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me update this file and these changes to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…treaming (#258) * Maintain block order after peer selection or exhaustion during peer streaming * Use require.NoError for segments of tests that must succeed to proceed
* Format session errors with m3x/errors/.Errors (#257) * Maintain block order after peer selection or exhaustion during peer streaming (#258) * Maintain block order after peer selection or exhaustion during peer streaming * Use require.NoError for segments of tests that must succeed to proceed * Peer streaming only fails when unable to read data from any peer during merging reads (#259) * Peer streaming only fails when unable to read data from any peer during merging reads * Update comments * Fix tests * Disable integration tests for native pool testing * Remove blocks from peers' block lists if they are not eligible * Better comments * Add metrics to report final errors during peer streaming (#260) * Peer streaming only fails when unable to read data from any peer during merging reads * Update comments * Fix tests * Disable integration tests for native pool testing * Remove blocks from peers' block lists if they are not eligible * Better comments * Add metrics to report final errors during peer streaming * Update m3x dependency (#261)
* Format session errors with m3x/errors/.Errors (#257) * Maintain block order after peer selection or exhaustion during peer streaming (#258) * Maintain block order after peer selection or exhaustion during peer streaming * Use require.NoError for segments of tests that must succeed to proceed * Peer streaming only fails when unable to read data from any peer during merging reads (#259) * Peer streaming only fails when unable to read data from any peer during merging reads * Update comments * Fix tests * Disable integration tests for native pool testing * Remove blocks from peers' block lists if they are not eligible * Better comments * Add metrics to report final errors during peer streaming (#260) * Peer streaming only fails when unable to read data from any peer during merging reads * Update comments * Fix tests * Disable integration tests for native pool testing * Remove blocks from peers' block lists if they are not eligible * Better comments * Add metrics to report final errors during peer streaming * Update m3x dependency (#261) * Flush last writes on host queue close * Close any times * Longer timeout
cc @robskillington @prateek @ben-lerner
This PR fixes an issue when removing blocks from peers after we select a block from multiple peer candidate or the peer list has been exhausted. In particular when a block is removed, we used to swap it with the last block and shrink the block list, thereby destroying invariant that the blocks are sorted by their start times. This in turn causes issues because we use the block start times to determine which peers can provide the block and relies on the block order to align the peers. When the block ordering is disrupted, it leads to the error case where a block is only selected from one peer, even though multiple peers are eligible to provide this block.
Also added two test cases to catch the error cases. Both cases are failing against HEAD and passing against this PR.