Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consumer state handling. #2927

Merged
merged 2 commits into from
Mar 17, 2022
Merged

Improve consumer state handling. #2927

merged 2 commits into from
Mar 17, 2022

Conversation

derekcollison
Copy link
Member

Signed-off-by: Derek Collison derek@nats.io

/cc @nats-io/core

Signed-off-by: Derek Collison <derek@nats.io>
…iants when no activity has been present.

Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Contributor

@matthiashanel matthiashanel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change itself makes sense.
I just have one question about the test.

Comment on lines +11259 to +11262
c.waitOnServerHealthz(s)
}

c.waitOnConsumerLeader("$G", "T", "d")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way I read your email, don't these two lines mask the issue?
The issue happened before the consumer leader was ready, but didn't happen after.
So shouldn't one instance of the test avoid these and a much shorter test test that once c.waitOnServerHealthz(z) returned the consumer has a leader (vs wait for one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 11262 simply waits on the consumer leader so that we know ConsumerInfo will succeed.
With the email thread with Marco I am not sure that is what he said about when the issue happened or didn't happen. From my understanding if the consumer was active at all before publishing then the issue did not present, and from what I saw that made sense. We were not persisting our state often enough with no activity (Delivered.Consumer == 0) to survive wiping the state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge but we can add on additional torture tests for state being wiped or corrupt for sure.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

js.mu.RUnlock()
break Err
}
// Now do consumers.
for _, o := range stream.getConsumers() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there was not a reason you did not add consumers to the probe when you did the original PR... but I don't recall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case was an oversight, good catch by @matthiashanel

@derekcollison derekcollison merged commit b99fd81 into main Mar 17, 2022
@derekcollison derekcollison deleted the consumer_wipe branch March 17, 2022 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants