Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read-only node showing failed to open store: set log info: failed to get last command index #1705

Closed
dwco-z opened this issue Feb 27, 2024 · 4 comments · Fixed by #1706
Closed

Comments

@dwco-z
Copy link

dwco-z commented Feb 27, 2024

What version are you running?
8.16.6

Are you using Docker or Kubernetes to run your system?
No

Are you running a single node or a cluster?
Cluster with three voters and two read-only nodes.

Description
I was facing an issue with rqlited v8.16.6 where I had a cluster with 3 voters and 2 non-voters and after some days working somehow the cluster ended up in a state where the non-voters could not start anymore and they were always showing "failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found" during startup. Performing a peers recovery managed to get everything back to normal. I don't have access to the database anymore and I still couldn't find a reproducible scenario. All I have are some logs from the attempts to start the read-only node while the voters were working, but I don't have the logs from when the error first started happening. Besides that, I know that the three-voters were being executed in physical machines and the 2 read-only nodes were running in virtual machines in a notebook that could be turned off sometimes.

These are the logs from the startup attempts

�
            _ _ _�
        | (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _  | | | __/ _ \   The lightweight, distributed�
| | | (_| | | | ||  __/   relational database.�
|_|  \__, |_|_|\__\___|�
        | |               www.rqlite.io�
        |_|�
�
[rqlited] 2024/02/14 19:03:22 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:03:22 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:03:22 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:03:22 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:03:22 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:03:22 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:03:22 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:03:22 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:03:22 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:03:22 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:03:22 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:03:22 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:03:22 HTTP server started�
[store] 2024/02/14 19:03:22 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:03:22 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:03:22 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:03:22 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:03:22 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:03:22 check complete�
[store] 2024/02/14 19:03:22 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:03:22 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�
�
            _ _ _�
        | (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _  | | | __/ _ \   The lightweight, distributed�
| | | (_| | | | ||  __/   relational database.�
|_|  \__, |_|_|\__\___|�
        | |               www.rqlite.io�
        |_|�
�
[rqlited] 2024/02/14 19:05:27 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:05:27 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:05:27 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:05:27 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:05:27 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:05:27 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:05:27 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:05:27 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:05:27 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:05:27 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:05:27 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:05:27 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:05:27 HTTP server started�
[store] 2024/02/14 19:05:27 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:05:27 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:05:27 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:05:27 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:05:27 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:05:27 check complete�
[store] 2024/02/14 19:05:27 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:05:27 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�
�
            _ _ _�
        | (_) |�
_ __ __ _| |_| |_ ___�
| '__/ _  | | | __/ _ \   The lightweight, distributed�
| | | (_| | | | ||  __/   relational database.�
|_|  \__, |_|_|\__\___|�
        | |               www.rqlite.io�
        |_|�
�
[rqlited] 2024/02/14 19:07:36 rqlited starting, version 8, SQLite 3.44.0, commit unknown, branch unknown, compiler gc�
[rqlited] 2024/02/14 19:07:36 go1.21.5, target architecture is amd64, operating system target is windows�
[rqlited] 2024/02/14 19:07:36 launch command: rqlited.exe -raft-log-level=DEBUG -node-id 192.168.2.61 -http-addr 192.168.2.61:8907 -raft-addr 192.168.2.61:8905 -raft-non-voter=true -node-ca-cert Database\DatabaseCertificateAuthority.cer -node-cert Database\DatabasePublicCertificate.cer -node-verify-client -node-key Database\DatabasePrivateCertificate.pem -http-ca-cert Database\DatabaseCertificateAuthority.cer -http-cert Database\DatabasePublicCertificate.cer -http-verify-client -http-key Database\DatabasePrivateCertificate.pem -raft-timeout=2s -raft-election-timeout=10s Database\Global�
[rqlited] 2024/02/14 19:07:36 enabling node-to-node encryption with cert: Database\DatabasePublicCertificate.cer, key: Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[rqlited] 2024/02/14 19:07:36 Raft TCP mux Listener registered with byte header 1�
[mux] 2024/02/14 19:07:36 TLS mux serving on 192.168.2.61:8905, advertising 192.168.2.61:8905�
[rqlited] 2024/02/14 19:07:36 preexisting node state detected in Database\Global�
[cluster] 2024/02/14 19:07:36 service listening on 192.168.2.61:8905�
[rqlited] 2024/02/14 19:07:36 cluster TCP mux Listener registered with byte header 2�
[http] 2024/02/14 19:07:36 secure HTTPS server enabled with cert Database\DatabasePublicCertificate.cer, key Database\DatabasePrivateCertificate.pem, CA cert Database\DatabaseCertificateAuthority.cer, mutual TLS enabled�
[http] 2024/02/14 19:07:36 execute queue processing started with capacity 1024, batch size 128, timeout 50ms�
[http] 2024/02/14 19:07:36 service listening on 192.168.2.61:8907�
[rqlited] 2024/02/14 19:07:36 HTTP server started�
[store] 2024/02/14 19:07:36 opening store with node ID 192.168.2.61, listening on 192.168.2.61:8905�
[store] 2024/02/14 19:07:36 ensuring data directory exists at Database\Global�
[store] 2024/02/14 19:07:36 old snapshot directory Database\Global\snapshots does not exist, nothing to upgrade�
[snapshot-store] 2024/02/14 19:07:36 store initialized using Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:07:36 checking consistency of snapshot store at Database\Global\rsnapshots�
[snapshot-store] 2024/02/14 19:07:36 check complete�
[store] 2024/02/14 19:07:36 1 preexisting snapshots present�
[rqlited] 2024/02/14 19:07:36 failed to open store: set log info: failed to get last command index: failed to get log at index 12437: log not found�

@otoolep otoolep linked a pull request Feb 27, 2024 that will close this issue
@otoolep
Copy link
Member

otoolep commented Feb 27, 2024

This is an issue in some now-obsolete code, which is preventing the Store from opening and carrying on with normal catch-up with the rest of the cluster. Instead of debugging it, it's time I just removed the code, as it serves no purpose -- and hasn't done so since the 7.x series.

We do lose some logging around start-up times, but I think there are better ways to do this that don't involve Goroutines.

@dwco-z
Copy link
Author

dwco-z commented Feb 27, 2024

Thanks for your swift action on the matter @otoolep

@otoolep
Copy link
Member

otoolep commented Feb 27, 2024

Merged.

Not necessarily the root fix, but this code is preventing the node from starting up correctly. Just nuke it, and let's see. There may actually be no issue.

@wellescastro
Copy link

Agreed. I'll let you know if anything else comes up. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants