You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are you using Docker or Kubernetes to run your system?
No
Are you running a single node or a cluster?
Cluster with three voters and two read-only nodes.
Description
I was facing an issue with rqlited v8.16.6 where I had a cluster with 3 voters and 2 non-voters and after some days working somehow the cluster ended up in a state where the non-voters could not start anymore and they were always showing "failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found" during startup. Performing a peers recovery managed to get everything back to normal. I don't have access to the database anymore and I still couldn't find a reproducible scenario. All I have are some logs from the attempts to start the read-only node while the voters were working, but I don't have the logs from when the error first started happening. Besides that, I know that the three-voters were being executed in physical machines and the 2 read-only nodes were running in virtual machines in a notebook that could be turned off sometimes.
These are the logs from the startup attempts
�
_ _ _�
| (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _ | | | __/ _ \ The lightweight, distributed�
| | | (_| | | | || __/ relational database.�
|_| \__, |_|_|\__\___|�
| | www.rqlite.io�
|_|�
�
[rqlited] 2024/02/14 19:03:22 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:03:22 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:03:22 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:03:22 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:03:22 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:03:22 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:03:22 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:03:22 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:03:22 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:03:22 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:03:22 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:03:22 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:03:22 HTTP server started�
[store] 2024/02/14 19:03:22 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:03:22 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:03:22 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:03:22 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:03:22 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:03:22 check complete�
[store] 2024/02/14 19:03:22 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:03:22 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�
�
_ _ _�
| (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _ | | | __/ _ \ The lightweight, distributed�
| | | (_| | | | || __/ relational database.�
|_| \__, |_|_|\__\___|�
| | www.rqlite.io�
|_|�
�
[rqlited] 2024/02/14 19:05:27 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:05:27 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:05:27 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:05:27 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:05:27 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:05:27 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:05:27 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:05:27 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:05:27 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:05:27 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:05:27 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:05:27 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:05:27 HTTP server started�
[store] 2024/02/14 19:05:27 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:05:27 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:05:27 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:05:27 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:05:27 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:05:27 check complete�
[store] 2024/02/14 19:05:27 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:05:27 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�
�
_ _ _�
| (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _ | | | __/ _ \ The lightweight, distributed�
| | | (_| | | | || __/ relational database.�
|_| \__, |_|_|\__\___|�
| | www.rqlite.io�
|_|�
�
[rqlited] 2024/02/14 19:07:36 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:07:36 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:07:36 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:07:36 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:07:36 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:07:36 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:07:36 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:07:36 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:07:36 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:07:36 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:07:36 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:07:36 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:07:36 HTTP server started�
[store] 2024/02/14 19:07:36 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:07:36 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:07:36 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:07:36 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:07:36 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:07:36 check complete�
[store] 2024/02/14 19:07:36 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:07:36 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�
The text was updated successfully, but these errors were encountered:
This is an issue in some now-obsolete code, which is preventing the Store from opening and carrying on with normal catch-up with the rest of the cluster. Instead of debugging it, it's time I just removed the code, as it serves no purpose -- and hasn't done so since the 7.x series.
We do lose some logging around start-up times, but I think there are better ways to do this that don't involve Goroutines.
Not necessarily the root fix, but this code is preventing the node from starting up correctly. Just nuke it, and let's see. There may actually be no issue.
What version are you running?
8.16.6
Are you using Docker or Kubernetes to run your system?
No
Are you running a single node or a cluster?
Cluster with three voters and two read-only nodes.
Description
I was facing an issue with rqlited v8.16.6 where I had a cluster with 3 voters and 2 non-voters and after some days working somehow the cluster ended up in a state where the non-voters could not start anymore and they were always showing "failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found" during startup. Performing a peers recovery managed to get everything back to normal. I don't have access to the database anymore and I still couldn't find a reproducible scenario. All I have are some logs from the attempts to start the read-only node while the voters were working, but I don't have the logs from when the error first started happening. Besides that, I know that the three-voters were being executed in physical machines and the 2 read-only nodes were running in virtual machines in a notebook that could be turned off sometimes.
These are the logs from the startup attempts
The text was updated successfully, but these errors were encountered: