-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Did not receive all consumer info results for 'USERS > stream_t3' JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum #4363
Comments
Not only k8s, but also physical machines have this problem. |
Constraining to that level will affect the amount of time operations takes, and the system, for consumer report, only waits for responses from the leaders, consumer leaders in this instance, for ~4s. |
You can do a consumer list of just names and individual consumer info for each one with longer timeouts. |
Hi, Thank you for you reply!
|
The stalled command means that the leader is not receiving heartbeats. It could be because everything is blocking waiting on the slow disk. |
I removed pid of nats-server from cgroup: Or set 100M/s limit (equal not limit)
|
Expected result:When the disk is no longer busy, the client can pull messages normally. Actual result:When the disk is no longer busy, some consumers are unable to recover and clients are unable to pull any messages. My questionMy question is why some consumers cannot be automatically restored. |
We would need to setup this experiment and take a look in more detail. |
Hi, what's new?
raft.go:1810~1818 (v2.9.21)
In this branch, the resetElect-related method will not be invoked, so that the Leader election among the three nodes will not continue. Therefore, even if the disk I/O becomes normal, the Raft group cannot work normally. Recovery can only be done by restarting the nats-server. I guess we need to add:
Thank you very much and I look forward to hearing from you. |
Will take a look. Thanks for the information and suggestion. |
You are correct, thanks, will make the fix. |
Thanks to @yuzhou-nj for the catch and fix. Signed-off-by: Derek Collison <derek@nats.io> Resolves #4363
That's good news! We look forward to the new version! |
Defect
Make sure that these boxes are checked before submitting your issue -- thank you!
nats-server -DV
outputVersions of
nats-server
and affected client libraries used:nats-server-v2.9.20-linux-amd64.zip
OS/Container environment:
k8s container, CentOS Compatibility
Steps or code to reproduce the issue:
Environment Description:
3 nodes: jetstream: enable
5+ streams: Retention: Interest, Replicas: 3, Storage: File, ...
1+ consumers for each stream: Pull Mode: true, Ack Policy: Explicit, ...
How to reproduce the isse:
ex:
253:0
from here:nats consumer report somestream
, Sometimes timeoutnats-server.log
nats-server.log on another node:
nats consumer report XXX --trace :
ex:
Expected result:
Actual result:
nats0 log:
nats1:
nats2:
Thank you!
The text was updated successfully, but these errors were encountered: