Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Etcd failed to flush changes from WAL to the DB before shutting down/crashing. #99

Closed
amshuman-kr opened this issue Sep 27, 2020 · 1 comment
Labels
kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) status/closed Issue is closed (either delivered or triaged)

Comments

@amshuman-kr
Copy link
Collaborator

Describe the bug:
After a single-node etcd instance provisioned via etcd-druid terminated abnormally (non-zero exit code) the etcd container restarted and the the backup-restore sidecar container (on data directory verification) had the following logs.

current etcd revision (2314180238) is less than latest snapshot revision (2314180239): possible data loss

On circumventing the backup restoration triggered because of this, it was found that the WAL directory (not checked by the `backup-restore sidecar) contained more recent revisions which were applied after the restart (without the backup restoration).

Expected behavior:
etcd-druid should try and configure etcd instances to shut down safely (and flush the WAL changes to the database) or often that.

How To Reproduce (as minimally and precisely as possible):
Not known yet.

Logs:

current etcd revision (2314180238) is less than latest snapshot revision (2314180239): possible data loss

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

@amshuman-kr amshuman-kr added the kind/bug Bug label Sep 27, 2020
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 27, 2020
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Sep 22, 2021
@ishan16696
Copy link
Member

ishan16696 commented Jan 9, 2023

This issue when single node etcd fails to flush changes from WAL to the DB before shuttingDown/crashing has been fixed by this PR: gardener/etcd-backup-restore#275 on etcd-backup-restore.
so, IMO for single-node etcd cluster etcd-druid don't have to take any action to fix this as it is now handled by etcd-backup-restore itself.
I'm closing this issue as no action is required from druid side.
/close

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

No branches or pull requests

3 participants