Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did not receive all consumer info results for 'USERS > stream_t3' JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum #4363

Closed
2 tasks done
yuzhou-nj opened this issue Aug 3, 2023 · 12 comments · Fixed by #4428
Labels
defect Suspected defect such as a bug or regression

Comments

@yuzhou-nj
Copy link

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

Versions of nats-server and affected client libraries used:

nats-server-v2.9.20-linux-amd64.zip

OS/Container environment:

k8s container, CentOS Compatibility

Steps or code to reproduce the issue:

Environment Description:
3 nodes: jetstream: enable
5+ streams: Retention: Interest, Replicas: 3, Storage: File, ...
1+ consumers for each stream: Pull Mode: true, Ack Policy: Explicit, ...

nats stream info stream_t3
Information for Stream stream_t3 created 2023-08-03 18:56:03

             Subjects: stream_t3.>
             Replicas: 3
              Storage: File

Options:

            Retention: Interest
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: 20,000,000
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: 7d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats_cluster
               Leader: nats1
              Replica: nats0, current, seen 0.62s ago
              Replica: nats2, current, seen 0.62s ago

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 1

How to reproduce the isse:

  1. Use cgroup to limit the nats-server writing speed to 10KB/s to simulate the situation that the disk is busy.
    ex:
mkdir /sys/fs/cgroup/blkio/writelimit 
echo $(pidof nats-server) >> /sys/fs/cgroup/blkio/writelimit/cgroup.procs
echo "253:0 10000 " > /sys/fs/cgroup/blkio/writelimit/blkio.throttle.write_bps_device

253:0 from here:

[root@localhost ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    253:0    0  400G  0 disk
├─vda1 253:1    0    1G  0 part /boot
└─vda2 253:2    0  399G  0 part /
vdb    253:16   0    8G  0 disk
  1. pub and sub some msgs. run nats consumer report somestream , Sometimes timeout
[root@localhost ~]# nats consumer info stream_t3
? Select a Consumer stream_t3_2
nats: error: could not load Consumer stream_t3 > stream_t3_2: context deadline exceeded

nats-server.log

[2661477] 2023/08/03 19:11:44.295584 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
...
[2661477] 2023/08/03 19:11:50.565116 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'

nats-server.log on another node:

[2025315] 2023/08/03 14:29:13.771080 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.
...
[2025315] 2023/08/03 14:29:41.448070 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.

nats consumer report XXX --trace :

[root@localhost ~]# nats consumer report stream_t3 --trace
20:41:24 >>> $JS.API.STREAM.INFO.stream_t3
null

20:41:24 <<< $JS.API.STREAM.INFO.stream_t3
{"type":"io.nats.jetstream.api.v1.stream_info_response","total":0,"offset":0,"limit":0,"config":{"name":"stream_t3","subjects":["stream_t3.\u003e"],"retention":"interest","max_consumers":-1,"max_msgs":20000000,"max_bytes":-1,"max_age":604800000000000,"max_msgs_per_subject":-1,"max_msg_size":-1,"discard":"old","storage":"file","num_replicas":3,"duplicate_window":120000000000,"allow_direct":false,"mirror_direct":false,"sealed":false,"deny_delete":false,"deny_purge":false,"allow_rollup_hdrs":false},"created":"2023-08-03T10:56:03.111435473Z","state":{"messages":0,"bytes":0,"first_seq":0,"first_ts":"0001-01-01T00:00:00Z","last_seq":0,"last_ts":"0001-01-01T00:00:00Z","consumer_count":1},"cluster":{"name":"nats_cluster","leader":"nats1","replicas":[{"name":"nats0","current":true,"active":768639229,"peer":"LWhxtZZD"},{"name":"nats2","current":true,"active":768602295,"peer":"SRLRpmYS"}]}}

20:41:24 >>> $JS.API.STREAM.INFO.stream_t3
null

20:41:24 <<< $JS.API.STREAM.INFO.stream_t3
{"type":"io.nats.jetstream.api.v1.stream_info_response","total":0,"offset":0,"limit":0,"config":{"name":"stream_t3","subjects":["stream_t3.\u003e"],"retention":"interest","max_consumers":-1,"max_msgs":20000000,"max_bytes":-1,"max_age":604800000000000,"max_msgs_per_subject":-1,"max_msg_size":-1,"discard":"old","storage":"file","num_replicas":3,"duplicate_window":120000000000,"allow_direct":false,"mirror_direct":false,"sealed":false,"deny_delete":false,"deny_purge":false,"allow_rollup_hdrs":false},"created":"2023-08-03T10:56:03.111435473Z","state":{"messages":0,"bytes":0,"first_seq":0,"first_ts":"0001-01-01T00:00:00Z","last_seq":0,"last_ts":"0001-01-01T00:00:00Z","consumer_count":1},"cluster":{"name":"nats_cluster","leader":"nats1","replicas":[{"name":"nats0","current":true,"active":774041524,"peer":"LWhxtZZD"},{"name":"nats2","current":true,"active":774004590,"peer":"SRLRpmYS"}]}}

20:41:24 >>> $JS.API.CONSUMER.LIST.stream_t3
{"offset":0}




20:41:28 <<< $JS.API.CONSUMER.LIST.stream_t3
{"type":"io.nats.jetstream.api.v1.consumer_list_response","total":0,"offset":0,"limit":256,"consumers":[],"missing":["stream_t3_2"]}

╭─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                             Consumer report for stream_t3 with **1** consumers                              │
├──────────┬──────┬────────────┬──────────┬─────────────┬─────────────┬─────────────┬───────────┬─────────┤
│ Consumer │ Mode │ Ack Policy │ Ack Wait │ Ack Pending │ Redelivered │ Unprocessed │ Ack Floor │ Cluster │
├──────────┼──────┼────────────┼──────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────┤
╰──────────┴──────┴────────────┴──────────┴─────────────┴─────────────┴─────────────┴───────────┴─────────╯

  1. set disk writing speed limit to 10 MB/s, But the problem still exists.
    ex:
echo "253:0 10240000" > /sys/fs/cgroup/blkio/writelimit/blkio.throttle.write_bps_device

Expected result:

Actual result:

[root@localhost ~]# nats consumer info stream_t3
? Select a Consumer stream_t3_2
nats: error: could not load Consumer stream_t3 > stream_t3_2: context deadline exceeded

nats0 log:

[2661477] 2023/08/03 18:56:03.113500 [DBG] Starting stream monitor for 'USERS > stream_t3' [S-R3F-P0t3poBg]
[2661477] 2023/08/03 18:56:03.568867 [INF] JetStream cluster new stream leader for 'USERS > stream_t3'
[2661477] 2023/08/03 18:56:27.128860 [DBG] JetStream cluster creating raft group:&{Name:C-R3F-P00IVKYA Peers:[SRLRpmYS LWhxtZZD RztkeQup] Storage:File Cluster:nats_cluster Preferred:LWhxtZZD node:<nil>}
[2661477] 2023/08/03 18:56:27.129503 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Started
[2661477] 2023/08/03 18:56:27.129531 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Starting campaign
[2661477] 2023/08/03 18:56:27.129880 [DBG] Starting consumer monitor for 'USERS > stream_t3 > stream_t3_2' [C-R3F-P00IVKYA]
[2661477] 2023/08/03 18:56:27.744702 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to candidate
[2661477] 2023/08/03 18:56:27.744756 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending out voteRequest {term:1 lastTerm:0 lastIndex:0 candidate:LWhxtZZD reply:}
[2661477] 2023/08/03 18:56:27.745092 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:0 peer:RztkeQup granted:true}
[2661477] 2023/08/03 18:56:27.745132 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to leader
[2661477] 2023/08/03 18:56:27.745209 [INF] JetStream cluster new consumer leader for 'USERS > stream_t3 > stream_t3_2'
[2661477] 2023/08/03 18:56:27.745252 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:0 peer:SRLRpmYS granted:true}
[2661477] 2023/08/03 18:58:37.320705 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Installing snapshot of 8 bytes
[2661477] 2023/08/03 19:01:55.183674 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:01:55.183814 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry updating leader to "SRLRpmYS"
[2661477] 2023/08/03 19:01:55.183833 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry detected pindex less than ours: 1:99 vs 1:101
[2661477] 2023/08/03 19:01:55.183884 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Truncating and repairing WAL to Term 1 Index 99
[2661477] 2023/08/03 19:02:03.183529 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Canceling catchup subscription since we are now up to date
[2661477] 2023/08/03 19:02:03.183764 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry did not match 2 100 with 2 99
[2661477] 2023/08/03 19:02:03.183844 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Proposal ignored, not leader
[2661477] 2023/08/03 19:02:03.183899 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Proposal ignored, not leader
[2661477] 2023/08/03 19:02:03.183863 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:2 lastTerm:1 lastIndex:99 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:02:03.183991 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:2 peer:LWhxtZZD granted:false} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:02:03.184010 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:3 lastTerm:1 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:02:03.184021 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:2 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:02:03.184028 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:4 lastTerm:1 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:02:03.184036 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:3 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:02:03.184044 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:5 lastTerm:2 lastIndex:101 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:02:03.184054 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:4 peer:LWhxtZZD granted:true} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:02:03.185692 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Canceling catchup subscription since we are now up to date
[2661477] 2023/08/03 19:02:03.185815 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:02:03.185839 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:03:29.031822 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to candidate
[2661477] 2023/08/03 19:03:29.031911 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending out voteRequest {term:6 lastTerm:5 lastIndex:132 candidate:LWhxtZZD reply:}
[2661477] 2023/08/03 19:03:29.368966 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:6 lastTerm:5 lastIndex:132 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:03:29.369012 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:6 peer:LWhxtZZD granted:false} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:03:35.151255 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:6 peer:SRLRpmYS granted:false}
[2661477] 2023/08/03 19:03:38.080137 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending out voteRequest {term:7 lastTerm:5 lastIndex:132 candidate:LWhxtZZD reply:}
[2661477] 2023/08/03 19:03:38.080677 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:6 peer:SRLRpmYS granted:true}
[2661477] 2023/08/03 19:03:38.080730 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to leader
[2661477] 2023/08/03 19:03:38.080872 [INF] JetStream cluster new consumer leader for 'USERS > stream_t3 > stream_t3_2'
[2661477] 2023/08/03 19:04:39.155298 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:6 peer:RztkeQup granted:false}
[2661477] 2023/08/03 19:04:39.155871 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:7 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:04:39.155921 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:7 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:04:40.416268 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:8 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:04:40.416313 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Stepping down from leader, detected higher term: 8 vs 7
[2661477] 2023/08/03 19:04:40.416342 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:7 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:04:40.416349 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:04:40.728858 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:9 lastTerm:7 lastIndex:149 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:04:40.728908 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:8 peer:LWhxtZZD granted:true} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:04:40.729297 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry updating leader to "SRLRpmYS"
[2661477] 2023/08/03 19:04:40.729564 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:04:41.076240 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Proposal ignored, not leader
[2661477] 2023/08/03 19:04:41.086351 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Proposal ignored, not leader
[2661477] 2023/08/03 19:04:46.376735 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:9 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:04:46.376786 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:9 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:04:51.086795 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Proposal ignored, not leader
[2661477] 2023/08/03 19:04:55.251238 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:10 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:04:55.251282 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:9 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:04:55.499154 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to candidate
[2661477] 2023/08/03 19:04:55.499200 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending out voteRequest {term:11 lastTerm:9 lastIndex:151 candidate:LWhxtZZD reply:}
[2661477] 2023/08/03 19:04:55.499716 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:10 peer:SRLRpmYS granted:true}
[2661477] 2023/08/03 19:04:55.499770 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to leader
[2661477] 2023/08/03 19:05:02.282570 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:11 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:05:02.282633 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:11 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:05:07.188392 [INF] JetStream cluster new consumer leader for 'USERS > stream_t3 > stream_t3_2'
[2661477] 2023/08/03 19:05:10.175371 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:12 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:05:10.175404 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Stepping down from leader, detected higher term: 12 vs 11
[2661477] 2023/08/03 19:05:10.175448 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:11 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:05:10.175458 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:05:10.359017 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:13 lastTerm:11 lastIndex:154 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:05:10.359077 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:12 peer:LWhxtZZD granted:true} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:05:10.359541 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry updating leader to "SRLRpmYS"
[2661477] 2023/08/03 19:05:10.359889 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:05:15.562706 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:13 lastTerm:5 lastIndex:99 candidate:RztkeQup reply:$NRG.R.AaDo9quq}
[2661477] 2023/08/03 19:05:15.562759 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:13 peer:LWhxtZZD granted:false} -> "$NRG.R.AaDo9quq"
[2661477] 2023/08/03 19:05:19.142944 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:13 peer:RztkeQup granted:false}
[2661477] 2023/08/03 19:05:19.142974 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Ignoring old vote response, we have stepped down
[2661477] 2023/08/03 19:05:19.146444 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:9 peer:RztkeQup granted:true}
[2661477] 2023/08/03 19:05:19.146450 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Ignoring old vote response, we have stepped down
[2661477] 2023/08/03 19:05:55.180426 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:05:55.180562 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:07:14.781886 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to candidate
[2661477] 2023/08/03 19:07:14.781938 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending out voteRequest {term:14 lastTerm:13 lastIndex:176 candidate:LWhxtZZD reply:}
[2661477] 2023/08/03 19:07:14.782365 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:13 peer:RztkeQup granted:true}
[2661477] 2023/08/03 19:07:14.782421 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to leader
[2661477] 2023/08/03 19:07:55.185806 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteResponse &{term:14 peer:SRLRpmYS granted:false}
[2661477] 2023/08/03 19:08:03.184125 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received a voteRequest &{term:14 lastTerm:13 lastIndex:176 candidate:SRLRpmYS reply:$NRG.R.tXejSJ6I}
[2661477] 2023/08/03 19:08:03.184164 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:08:03.184485 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Sending a voteResponse &{term:14 peer:LWhxtZZD granted:false} -> "$NRG.R.tXejSJ6I"
[2661477] 2023/08/03 19:08:03.184496 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184515 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184520 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184528 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184533 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184538 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184543 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184665 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184671 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184676 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184681 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Received append entry from another leader, stepping down to "SRLRpmYS"
[2661477] 2023/08/03 19:08:03.184687 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184715 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Ignoring old vote response, we have stepped down
[2661477] 2023/08/03 19:08:03.184823 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184828 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184837 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184851 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184864 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184870 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184876 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184952 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184970 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:03.184975 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Switching to follower
[2661477] 2023/08/03 19:08:04.181402 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] AppendEntry did not match 14 180 with 14 179
[2661477] 2023/08/03 19:08:06.181626 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Catchup may be stalled, will request again
[2661477] 2023/08/03 19:08:09.181520 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Catchup may be stalled, will request again
[2661477] 2023/08/03 19:08:12.181359 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Catchup may be stalled, will request again
[2661477] 2023/08/03 19:08:12.991936 [INF] JetStream cluster new stream leader for 'USERS > stream_t3'
[2661477] 2023/08/03 19:08:12.992042 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:08:12.992061 [INF] JetStream cluster new consumer leader for 'USERS > stream_t3 > stream_t3_2'
[2661477] 2023/08/03 19:08:18.060313 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:08:18.060369 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Update peers from leader to map[LWhxtZZD:0xc00002ab88 RztkeQup:0xc00002abd0 SRLRpmYS:0xc00002abb8]
[2661477] 2023/08/03 19:08:20.658660 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Not switching to candidate, catching up
[2661477] 2023/08/03 19:08:20.658723 [DBG] RAFT [LWhxtZZD - C-R3F-P00IVKYA] Canceling catchup subscription since we are now up to date
[2661477] 2023/08/03 19:11:44.295584 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
[2661477] 2023/08/03 19:11:50.565116 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
[2661477] 2023/08/03 19:12:42.288628 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
[2661477] 2023/08/03 19:13:01.912683 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
[2661477] 2023/08/03 19:17:14.857760 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'
[2661477] 2023/08/03 19:20:42.943194 [WRN] Did not receive all consumer info results for 'USERS > stream_t3'

nats1:

[2025315] 2023/08/03 14:13:54.888952 [DBG] JetStream cluster creating raft group:&{Name:C-R3F-yBaTkf7T Peers:[SRLRpmYS RztkeQup LWhxtZZD] Storage:File Cluster:nats_cluster Preferred: node:<nil>}
[2025315] 2023/08/03 14:13:54.889074 [DBG] Starting consumer monitor for 'USERS > stream_t1 > stream_t1_1' [C-R3F-LcnOKyov]
[2025315] 2023/08/03 14:13:54.889254 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Started
[2025315] 2023/08/03 14:13:54.889512 [DBG] Starting consumer monitor for 'USERS > stream_t1 > stream_t1_2' [C-R3F-yBaTkf7T]
[2025315] 2023/08/03 14:13:59.822237 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc0005a0c78 RztkeQup:0xc0005a0c48 SRLRpmYS:0xc0005a0c60]
[2025315] 2023/08/03 14:13:59.822410 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc0005a0c78 RztkeQup:0xc0005a0c48 SRLRpmYS:0xc0005a0c60]
[2025315] 2023/08/03 14:14:03.034362 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteRequest &{term:4 lastTerm:3 lastIndex:213 candidate:LWhxtZZD reply:$NRG.R.naLLNjR2}
[2025315] 2023/08/03 14:14:03.034393 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending a voteResponse &{term:3 peer:RztkeQup granted:true} -> "$NRG.R.naLLNjR2"
[2025315] 2023/08/03 14:14:03.034737 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] AppendEntry updating leader to "LWhxtZZD"
[2025315] 2023/08/03 14:14:03.035273 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc0005a0c78 RztkeQup:0xc0005a0c48 SRLRpmYS:0xc0005a0c60]
[2025315] 2023/08/03 14:29:03.465223 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteRequest &{term:5 lastTerm:4 lastIndex:527 candidate:SRLRpmYS reply:$NRG.R.ntuQEKs2}
[2025315] 2023/08/03 14:29:03.465294 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending a voteResponse &{term:4 peer:RztkeQup granted:true} -> "$NRG.R.ntuQEKs2"
[2025315] 2023/08/03 14:29:07.997860 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Switching to candidate
[2025315] 2023/08/03 14:29:07.997982 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:6 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:13.771080 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.
[2025315] 2023/08/03 14:29:13.771111 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:7 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:21.331280 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:8 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:28.630468 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:9 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:32.985565 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:10 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:41.447991 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:11 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:41.448070 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.
[2025315] 2023/08/03 14:29:48.805144 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:12 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:29:55.278697 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Sending out voteRequest {term:13 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:}
[2025315] 2023/08/03 14:30:02.173176 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received append entry in candidate state from "LWhxtZZD", converting to follower
[2025315] 2023/08/03 14:30:02.178127 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received append entry in candidate state from "LWhxtZZD", converting to follower
[2025315] 2023/08/03 14:30:02.178590 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Switching to follower
[2025315] 2023/08/03 14:30:02.178601 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Switching to follower
[2025315] 2023/08/03 14:30:02.183026 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:5 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.183031 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.183209 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:6 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.183214 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.183639 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:7 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.183644 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.183867 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:8 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.183872 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.184034 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:9 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.184039 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.184481 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:10 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.184486 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.184493 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:11 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.184497 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.184503 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:12 peer:LWhxtZZD granted:false}
[2025315] 2023/08/03 14:30:02.184508 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.184926 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] AppendEntry updating leader to "SRLRpmYS"
[2025315] 2023/08/03 14:30:02.184963 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] AppendEntry detected pindex less than ours: 4:527 vs 4:529
[2025315] 2023/08/03 14:30:02.185064 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Truncating and repairing WAL to Term 4 Index 527
[2025315] 2023/08/03 14:30:02.188903 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Canceling catchup subscription since we are now up to date
[2025315] 2023/08/03 14:30:02.192344 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] AppendEntry did not match 5 528 with 5 527
[2025315] 2023/08/03 14:30:02.193196 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:5 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.193203 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.193899 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:6 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.193905 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194105 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:7 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194109 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194244 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:8 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194249 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194255 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:9 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194259 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194266 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:10 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194272 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194287 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:11 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194291 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:02.194723 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Received a voteResponse &{term:12 peer:SRLRpmYS granted:false}
[2025315] 2023/08/03 14:30:02.194728 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[2025315] 2023/08/03 14:30:09.867057 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Not switching to candidate, catching up
[2025315] 2023/08/03 14:30:09.867095 [DBG] RAFT [RztkeQup - C-R3F-yBaTkf7T] Canceling catchup subscription since we are now up to date

nats2:

[3295844] 2023/08/03 14:13:54.938550 [DBG] JetStream cluster creating raft group:&{Name:C-R3F-yBaTkf7T Peers:[SRLRpmYS RztkeQup LWhxtZZD] Storage:File Cluster:nats_cluster Preferred: node:<nil>}
[3295844] 2023/08/03 14:13:54.939064 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Started
[3295844] 2023/08/03 14:13:54.939271 [DBG] Starting consumer monitor for 'USERS > stream_t1 > stream_t1_2' [C-R3F-yBaTkf7T]
[3295844] 2023/08/03 14:13:54.939798 [DBG] Starting consumer monitor for 'USERS > stream_t1 > stream_t1_1' [C-R3F-LcnOKyov]
[3295844] 2023/08/03 14:13:59.823838 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc00030ade0 RztkeQup:0xc00030adc8 SRLRpmYS:0xc00030adb0]
[3295844] 2023/08/03 14:13:59.823868 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc00030ade0 RztkeQup:0xc00030adc8 SRLRpmYS:0xc00030adb0]
[3295844] 2023/08/03 14:14:03.034308 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:4 lastTerm:3 lastIndex:213 candidate:LWhxtZZD reply:$NRG.R.naLLNjR2}
[3295844] 2023/08/03 14:14:03.034337 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:3 peer:SRLRpmYS granted:true} -> "$NRG.R.naLLNjR2"
[3295844] 2023/08/03 14:14:03.034754 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] AppendEntry updating leader to "LWhxtZZD"
[3295844] 2023/08/03 14:14:03.035297 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Update peers from leader to map[LWhxtZZD:0xc00030ade0 RztkeQup:0xc00030adc8 SRLRpmYS:0xc00030adb0]
[3295844] 2023/08/03 14:29:03.464882 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to candidate
[3295844] 2023/08/03 14:29:03.464939 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending out voteRequest {term:5 lastTerm:4 lastIndex:527 candidate:SRLRpmYS reply:}
[3295844] 2023/08/03 14:29:03.465436 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteResponse &{term:4 peer:RztkeQup granted:true}
[3295844] 2023/08/03 14:29:03.465479 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to leader
[3295844] 2023/08/03 14:30:02.184565 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:6 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.184933 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 6 vs 5
[3295844] 2023/08/03 14:30:02.184949 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:5 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.184959 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:7 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.184811 [INF] JetStream cluster new consumer leader for 'USERS > stream_t1 > stream_t1_2'
[3295844] 2023/08/03 14:30:02.184966 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 7 vs 6
[3295844] 2023/08/03 14:30:02.185696 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:6 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.185709 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:8 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.186481 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 8 vs 7
[3295844] 2023/08/03 14:30:02.186503 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:7 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.186515 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:9 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.186665 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 9 vs 8
[3295844] 2023/08/03 14:30:02.186698 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:8 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.186712 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:10 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.186719 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 10 vs 9
[3295844] 2023/08/03 14:30:02.186728 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:9 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.186736 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:11 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.186742 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Stepping down from leader, detected higher term: 11 vs 10
[3295844] 2023/08/03 14:30:02.186750 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:10 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.186756 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.186776 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.186784 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:12 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.186820 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Installing snapshot of 8 bytes
[3295844] 2023/08/03 14:30:02.186823 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:11 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.188478 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteRequest &{term:13 lastTerm:4 lastIndex:527 candidate:RztkeQup reply:$NRG.R.uOdHLCNw}
[3295844] 2023/08/03 14:30:02.188494 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Sending a voteResponse &{term:12 peer:SRLRpmYS granted:false} -> "$NRG.R.uOdHLCNw"
[3295844] 2023/08/03 14:30:02.188499 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.188505 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.188510 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.188516 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Switching to follower
[3295844] 2023/08/03 14:30:02.192904 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] AppendEntry updating leader to "LWhxtZZD"
[3295844] 2023/08/03 14:30:02.192944 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] AppendEntry detected pindex less than ours: 4:527 vs 5:529
[3295844] 2023/08/03 14:30:02.192955 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Truncating and repairing WAL to Term 0 Index 0
[3295844] 2023/08/03 14:30:02.192968 [WRN] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Resetting WAL state
[3295844] 2023/08/03 14:30:02.193199 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Canceling catchup subscription since we are now up to date
[3295844] 2023/08/03 14:30:02.192949 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Received a voteResponse &{term:4 peer:LWhxtZZD granted:false}
[3295844] 2023/08/03 14:30:02.193584 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Ignoring old vote response, we have stepped down
[3295844] 2023/08/03 14:30:02.193629 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] AppendEntry did not match 4 528 with 4 0
[3295844] 2023/08/03 14:30:11.175340 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not switching to candidate, catching up
[3295844] 2023/08/03 14:30:11.175396 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Canceling catchup subscription since we are now up to date
[3295844] 2023/08/03 14:31:54.932502 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:31:54.932580 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:33:54.932700 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:33:54.932783 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:35:54.933117 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:35:54.933196 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:37:54.932739 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:37:54.932751 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:43:54.933217 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:43:54.933229 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:47:54.933274 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:47:54.933287 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
[3295844] 2023/08/03 14:59:54.932959 [DBG] RAFT [SRLRpmYS - C-R3F-yBaTkf7T] Not current, no commits
[3295844] 2023/08/03 14:59:54.932971 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current

Thank you!

@yuzhou-nj
Copy link
Author

Not only k8s, but also physical machines have this problem.

@derekcollison
Copy link
Member

Constraining to that level will affect the amount of time operations takes, and the system, for consumer report, only waits for responses from the leaders, consumer leaders in this instance, for ~4s.

@derekcollison
Copy link
Member

You can do a consumer list of just names and individual consumer info for each one with longer timeouts.

@yuzhou-nj
Copy link
Author

You can do a consumer list of just names and individual consumer info for each one with longer timeouts.

Hi, Thank you for you reply!
My question is: When the disk is busy, the JetStream Cluster is unstable. (nats consumer info xxx run failed)
After a few minutes, the disk is no longer busy, and the problem is not solved.

when disk busy:
 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.
 [WRN] JetStream cluster consumer 'USERS > stream_t1 > stream_t1_2' has NO quorum, stalled.

disk not busy:
 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current
 [WRN] JetStream consumer 'USERS > stream_t1 > stream_t1_2' is not current

@derekcollison
Copy link
Member

The stalled command means that the leader is not receiving heartbeats. It could be because everything is blocking waiting on the slow disk.

@yuzhou-nj
Copy link
Author

The stalled command means that the leader is not receiving heartbeats. It could be because everything is blocking waiting on the slow disk.

I removed pid of nats-server from cgroup:
rm -rf /sys/fs/cgroup/blkio/writelimit/cgroup.procs

Or set 100M/s limit (equal not limit)
`echo "253:0 104857600" > /sys/fs/cgroup/blkio/writelimit/blkio.throttle.write_bps_device

nats consumer report stream_t1 or nats consumer info stream_t1 stream_t1_2 still abnormal

@yuzhou-nj
Copy link
Author

The stalled command means that the leader is not receiving heartbeats. It could be because everything is blocking waiting on the slow disk.

Expected result:

When the disk is no longer busy, the client can pull messages normally.

Actual result:

When the disk is no longer busy, some consumers are unable to recover and clients are unable to pull any messages.
We had to restart the nats server to recover.

My question

My question is why some consumers cannot be automatically restored.

@derekcollison
Copy link
Member

We would need to setup this experiment and take a look in more detail.

@bruth bruth added defect Suspected defect such as a bug or regression and removed 🐞 bug labels Aug 18, 2023
@yuzhou-nj
Copy link
Author

Hi, what's new?
Note: The following logs are printed on all the three nodes.

[3924910] 2023/08/22 14:42:45.187616 [DBG] RAFT [LWhxtZZD - C-R3F-RS3Sw5xR] Not switching to candidate, catching up
[3924910] 2023/08/22 14:42:45.187623 [DBG] RAFT [LWhxtZZD - C-R3F-RS3Sw5xR] Canceling catchup subscription since we are now up to date

[3313788] 2023/08/22 14:42:46.138637 [DBG] RAFT [RztkeQup - C-R3F-RS3Sw5xR] Not switching to candidate, catching up
[3313788] 2023/08/22 14:42:46.138643 [DBG] RAFT [RztkeQup - C-R3F-RS3Sw5xR] Canceling catchup subscription since we are now up to date

[448871] 2023/08/22 14:42:47.562441 [DBG] RAFT [SRLRpmYS - C-R3F-RS3Sw5xR] Not switching to candidate, catching up
[448871] 2023/08/22 14:42:47.562449 [DBG] RAFT [SRLRpmYS - C-R3F-RS3Sw5xR] Canceling catchup subscription since we are now up to date

raft.go:1810~1818 (v2.9.21)

			} else if n.isCatchingUp() {
				n.debug("Not switching to candidate, catching up")
				// Check to see if our catchup has stalled.
				n.Lock()
				if n.catchupStalled() {
					n.cancelCatchup()
				}
				n.Unlock()
			} else {

In this branch, the resetElect-related method will not be invoked, so that the Leader election among the three nodes will not continue. Therefore, even if the disk I/O becomes normal, the Raft group cannot work normally. Recovery can only be done by restarting the nats-server.

I guess we need to add:

n. ResetElectionTimeoutWithLock()

Thank you very much and I look forward to hearing from you.

@derekcollison
Copy link
Member

Will take a look. Thanks for the information and suggestion.

@derekcollison
Copy link
Member

You are correct, thanks, will make the fix.

derekcollison added a commit that referenced this issue Aug 25, 2023
Thanks to @yuzhou-nj for the catch and fix.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4363
@yuzhou-nj
Copy link
Author

That's good news! We look forward to the new version!
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants