Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Stream catchup would not sync after server crash and restart. #5362

Merged
merged 8 commits into from Apr 27, 2024

Conversation

derekcollison
Copy link
Member

@derekcollison derekcollison commented Apr 27, 2024

We had a bug that would overwrite the sync subject during parallel stream creation which would cause upper layer stream catchups to fail on server crash and subsequent restarts.

We also were reporting first sequence mismatch when we hit max retries to force a reset but this was misleading, so added in proper error for max retries limit.

Also on some extreme kill and restart cases our internal checks were being called before complete state was achieved, so delay the initial check and added in periodic checks to ensure replica consistency.

Signed-off-by: Derek Collison derek@nats.io

We had a bug that would overwrite the sync subject during parallel stream creation which would cause upper layer stream cacthups to fail on server restarts.
We also were reporting first sequence mismatch when we hot max retries to force a reset but this was misleading, so added in proper error for max retires limit.

Signed-off-by: Derek Collison <derek@nats.io>
Consumers could still be catching up as well.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
…heck if beyond a minumum threshold.

On an active stream the ack floor periodic checks could trigger just due to normal circumstances, so use minimum threshold.
Also do not jump delivered in that logic based on stream sequences.
And finally do not have leader jump ack floors when pending is empty, this allows consistency checks to be consistent across all replicas.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
@derekcollison derekcollison requested a review from a team as a code owner April 27, 2024 19:30
Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Member

@wallyqs wallyqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Derek Collison <derek@nats.io>
@derekcollison derekcollison merged commit a72ea23 into main Apr 27, 2024
4 checks passed
@derekcollison derekcollison deleted the stream-catchup-fix branch April 27, 2024 20:44
wallyqs added a commit that referenced this pull request Apr 28, 2024
Includes the following:

* #5351
* #5353
* #5337
* #5356
* #5361
* #5362

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants