-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve consumer state handling. #2927
Conversation
Signed-off-by: Derek Collison <derek@nats.io>
…iants when no activity has been present. Signed-off-by: Derek Collison <derek@nats.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change itself makes sense.
I just have one question about the test.
c.waitOnServerHealthz(s) | ||
} | ||
|
||
c.waitOnConsumerLeader("$G", "T", "d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way I read your email, don't these two lines mask the issue?
The issue happened before the consumer leader was ready, but didn't happen after.
So shouldn't one instance of the test avoid these and a much shorter test test that once c.waitOnServerHealthz(z)
returned the consumer has a leader (vs wait for one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 11262 simply waits on the consumer leader so that we know ConsumerInfo will succeed.
With the email thread with Marco I am not sure that is what he said about when the issue happened or didn't happen. From my understanding if the consumer was active at all before publishing then the issue did not present, and from what I saw that made sense. We were not persisting our state often enough with no activity (Delivered.Consumer == 0) to survive wiping the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge but we can add on additional torture tests for state being wiped or corrupt for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
js.mu.RUnlock() | ||
break Err | ||
} | ||
// Now do consumers. | ||
for _, o := range stream.getConsumers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there was not a reason you did not add consumers to the probe when you did the original PR... but I don't recall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case was an oversight, good catch by @matthiashanel
Signed-off-by: Derek Collison derek@nats.io
/cc @nats-io/core