Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using --unsafe-no-fsync still write out the data #12751

Merged
merged 1 commit into from
Mar 7, 2021

Conversation

cwedgwood
Copy link
Contributor

There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes. For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
Because the fsync is missing there is the argument that certain
types of failure events will cause data corruption or loss, in
testing this wasn't seen. If this was to occur the expectation is
the member can be readded to a cluster or worst-case restored from a
robust persisted snapshot.

The etcd members are deployed across isolated racks with different
power feeds. An instantaneous failure of all of them simultaneously
is unlikely.

Testing was usually of the form:

  • create (Kubernetes) etcd write-churn by creating replicasets of
    some 1000s of pods
  • break/fail the leader

Failure testing included:

  • hard node power-off events
  • disk removal
  • orderly reboots/shutdown

In all cases when the node recovered it was able to rejoin the
cluster and synchronize.

Please read https://github.com/etcd-io/etcd/blob/master/CONTRIBUTING.md#contribution-flow.

There are situations where we don't wish to fsync but we do want to
write the data.

Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes.  For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.

By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.

Anecdotally:
  Because the fsync is missing there is the argument that certain
  types of failure events will cause data corruption or loss, in
  testing this wasn't seen.  If this was to occur the expectation is
  the member can be readded to a cluster or worst-case restored from a
  robust persisted snapshot.

  The etcd members are deployed across isolated racks with different
  power feeds.  An instantaneous failure of all of them simultaneously
  is unlikely.

  Testing was usually of the form:
   * create (Kubernetes) etcd write-churn by creating replicasets of
     some 1000s of pods
   * break/fail the leader

  Failure testing included:
   * hard node power-off events
   * disk removal
   * orderly reboots/shutdown

  In all cases when the node recovered it was able to rejoin the
  cluster and synchronize.
Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good catch. Thank you.

Please convert the patch to be against the master branch.
Cherry pick might be considered later - when its in the HEAD.

Otherwise LGTM

@cwedgwood
Copy link
Contributor Author

@ptabor this can't CP between 3.4 and master because things moved about. For 3.4 this should suffice. For master there is #12752 .

Travis CI shows an error here at present though I don't think it's a real problem. Does that need to be addressed?

@ptabor
Copy link
Contributor

ptabor commented Mar 7, 2021

Thank you.

@ptabor ptabor merged commit 2702f9e into etcd-io:release-3.4 Mar 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants