-
hello! I have been running a kubernetes cluster with etcd backend for 2-3 years now. This weekend, however, I ran into what appears to be a completely unpredicatable issue which has stopped me from being able to restore my kubernetes cluster (and I would rather avoid having to reimport all my manifests from scratch). I run my kubernetes cluster with a single node on microk8s, mostly for running my personal production software. Upon initial inspection, my etcd node was down, and it was failing to start with the following logs:
I am not sure how I ran into this error as my cluster has not been using any options which have shown a history of developing an issue here (ex. #14025). So I decided to try deleting files from my
however, despite the node appearing healthy and running normally, when I attempt to perform any write operation upon the data already inside the cluster, it fails. For example, if I attempt to delete a rogue pod on my kubernetes cluster:
Read operations appear to work OK (ex. I am able to list pods, deployments, etc.) this type of error appears to happen for any type of writes upon existing data. for another example, I noticed that in the logs there was a message BTW there are many other things I have tried but trying to keep this trimmed down to things relevant to the discussion. Happy to provide original database files/WAL to maintainers of the project if the solution turns out to be sophisticated enough to share. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Hey @dbeal-eth - Have you tried restoring with snapshot/db file to a new data directory? Refer: https://etcd.io/docs/v3.5/op-guide/recovery/#restoring-a-cluster |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Hey @dbeal-eth - Have you tried restoring with snapshot/db file to a new data directory?
Refer: https://etcd.io/docs/v3.5/op-guide/recovery/#restoring-a-cluster