New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd3.5.0: panic: tocommit(458) is out of range [lastIndex(3)]. Was the raft log corrupted, truncated, or lost? #13509
Comments
It's expected behavior. If an etcd node crashes and the local data is completely gone, then operator needs to remove the member from the cluster, and then add the member back again. The commands are roughly like below
At last, start the member again, note you need to set the If a member crashes but the local data is still there, then it should be OK to start the member again directly. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
|
I noticed that a panic can occur when a crashed node restarts and receives a heartbeat message from the leader despite having lost the Raft log. While I understand that a new entry cannot be committed if the last committed log index (lastindex) is less than the log to be committed (toCommit), I am unclear as to why the node should panic and exit the system instead of rejecting the heartbeat message and waiting for further heartbeat messages with lower index logs |
|
I got a panic with version 3.5.0 when I stop a member and remove data start.
panic: tocommit(458) is out of range [lastIndex(3)]. Was the raft log corrupted, truncated, or lost?
goroutine 163 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0001663c0, 0x0, 0x0, 0x0)
/opt/buildtools/go_workpace/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*SugaredLogger).log(0xc0004981e0, 0xc0006ba104, 0x55a3ccd20fe1, 0x5d, 0xc00007c4c0, 0x2, 0x2, 0x0, 0x0, 0x0)
/opt/buildtools/go_workpace/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:227 +0x115
go.uber.org/zap.(*SugaredLogger).Panicf(...)
/opt/buildtools/go_workpace/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:159
go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf(0xc000682b80, 0x55a3ccd20fe1, 0x5d, 0xc00007c4c0, 0x2, 0x2)
/usr1/3.5.0/server/etcdserver/zap_raft.go:101 +0x7f
go.etcd.io/etcd/raft/v3.(*raftLog).commitTo(0xc0006e4310, 0x1ca)
/usr1/3.5.0/raft/log.go:237 +0x135
go.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat(0xc0004bcf20, 0x8, 0x5e92d99e003cce4, 0x507df051d12df981, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/usr1/3.5.0/raft/raft.go:1513 +0x56
go.etcd.io/etcd/raft/v3.stepFollower(0xc0004bcf20, 0x8, 0x5e92d99e003cce4, 0x507df051d12df981, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/usr1/3.5.0/raft/raft.go:1439 +0x498
go.etcd.io/etcd/raft/v3.(*raft).Step(0xc0004bcf20, 0x8, 0x5e92d99e003cce4, 0x507df051d12df981, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/usr1/3.5.0/raft/raft.go:980 +0xa55
go.etcd.io/etcd/raft/v3.(*node).run(0xc00069cf00)
/usr1/3.5.0/raft/node.go:356 +0x798
created by go.etcd.io/etcd/raft/v3.RestartNode
/usr1/3.5.0/raft/node.go:244 +0x330
Reproduce Procedure:
Key Configuration Items:
I think this is a normal operation for a three-node cluster. In 3.4.x and earlier versions, this method is often used to recover a node in a cluster that has corrupted data. However, version 3.5.0 does not apply.
The text was updated successfully, but these errors were encountered: