-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: check data corruption on boot #8554
Conversation
6609b77
to
6e3b11e
Compare
e2e/ctl_v3_alarm_test.go
Outdated
|
||
func (f *fakeConsistentIndex) ConsistentIndex() uint64 { return f.rev } | ||
|
||
func alarmCorruptTest(cx ctlCtx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't this an integration test? there's already a corruption test there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fatal in corruption, fatal logger calls os.Exit(1)
(so not testable)?
e2e/ctl_v3_alarm_test.go
Outdated
cx.t.Fatalf("expected error %v after %s", rpctypes.ErrCorrupt, 5*time.Second) | ||
} | ||
|
||
// corrupt alarm should now be on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster that's currently running can be expected to be OK since it's periodically checking its hash with its other members. The newly corrupted member should fatal out before joining raft / doing any damage / causing need for an alarm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly corrupted member should fatal out before joining raft
Instead, if a member can contact the client addresses of its peers, it should first fetch hashes from the other members at a known revision and compare before serving any client requests. (from original issue)
Agree.
So should the checking be before s.publish
request, in s.start
, using Raft entries?
https://github.com/coreos/etcd/blob/master/etcdserver/server.go
func (s *EtcdServer) Start() {
s.start()
s.goAttach(func() { s.publish(s.Cfg.ReqTimeout()) })
s.goAttach(s.purgeFile)
s.goAttach(func() { monitorFileDescriptor(s.stopping) })
s.goAttach(s.monitorVersions)
s.goAttach(s.linearizableReadLoop)
s.goAttach(s.monitorKVHash)
}
I am trying to figure out the best way to fetch hashes from other members without starting gRPC server, or starting Raft. Or on restart, we still start Raft and gRPC server but do not accept new requests other than hash kv RPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just before s.start()
entirely; load the cluster membership info and hash from the backend, issue the hashkv RPCs to the member addresses and compare with the backend, then call start() if there's no mismatch.
e2e/ctl_v3_alarm_test.go
Outdated
|
||
corrupted := false | ||
for i := 0; i < 5; i++ { | ||
presp, perr := cli0.Put(context.TODO(), "abc", "aaa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the boot check should prevent member 0 from servicing any KV RPCs, otherwise it doesn't provide any better guarantees than the periodic corruption check
6e3b11e
to
c68824c
Compare
c15b9f1
to
260eae2
Compare
if this PR itself is done, can we remove the WIP label? |
etcdserver/corrupt.go
Outdated
} | ||
plog.Infof("corruption checking on %s (%d members)", s.ID().String(), n) | ||
|
||
h, rev, _, err := s.kv.HashByRev(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when this local node is slow, and the revision has been compacted on other peers?
etcdserver/corrupt.go
Outdated
@@ -24,6 +24,69 @@ import ( | |||
"github.com/coreos/etcd/pkg/types" | |||
) | |||
|
|||
func (s *EtcdServer) checkHashKVInit() { | |||
// TODO: separate configuration for initial hash check? | |||
if s.Cfg.CorruptCheckTime < 3*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
etcdserver/corrupt.go
Outdated
} | ||
mbs := s.cluster.Members() | ||
n := len(mbs) | ||
if n < 3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
etcdserver/corrupt.go
Outdated
cli.Close() | ||
|
||
if resp == nil && cerr != nil { | ||
plog.Fatal(cerr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all its peers are dead, then this peer cannot start. the cluster cannot be bootstrapped after a full shutdown?
260eae2
to
89bcaa4
Compare
4ab97dd
to
c342c4f
Compare
embed/etcd.go
Outdated
// since this is before "EtcdServer.Start()" | ||
// if not nil, it will block on "EtcdServer.Close()" | ||
e.Server = nil | ||
return e, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just return nil, err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need this to close peer listeners after startPeerListeners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the problem to return a non-nil server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.Server
is not nil, then it will block forever inside defer
with EtcdServer.Close
, since the server never got started.
etcdserver/corrupt.go
Outdated
} | ||
} | ||
if mismatch > 0 { | ||
return fmt.Errorf("%s is corrupt", s.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not exactly true. maybe other node is corrupted. all we know is that there is a state mismatch inside the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. How about %s found data inconsistency with peers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
c342c4f
to
54c6784
Compare
etcdserver/config.go
Outdated
@@ -66,7 +66,8 @@ type ServerConfig struct { | |||
|
|||
AuthToken string | |||
|
|||
CorruptCheckTime time.Duration | |||
InitialCorruptCheck bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add doc string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
54c6784
to
b58694d
Compare
LGTM |
b58694d
to
63df258
Compare
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
63df258
to
d6d2585
Compare
for _, p := range peers { | ||
if p.resp != nil { | ||
peerID := types.ID(p.resp.Header.MemberId) | ||
if h != p.resp.Hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to reason about compaction revision. Calling on hashKV(100) on all different members does not guarantee that the hash is same among all member even those there is no corruption.
Suppose local member's highest rev is 100 and has no compaction. then HashKV hashes keys from rev = 0 to rev 100. If one of peer member has compacted at 50, then calling hashKV results HashKV hashes keys from rev= 50 to 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just added compact revision checks to handle that case.
Codecov Report
@@ Coverage Diff @@
## master #8554 +/- ##
=========================================
Coverage ? 75.96%
=========================================
Files ? 359
Lines ? 29786
Branches ? 0
=========================================
Hits ? 22626
Misses ? 5567
Partials ? 1593
Continue to review full report at Codecov.
|
etcdserver/corrupt.go
Outdated
for _, p := range peers { | ||
if p.resp != nil { | ||
peerID := types.ID(p.resp.Header.MemberId) | ||
if h != p.resp.Hash && crev == p.resp.CompactRevision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to log that we ignore checking as warning if the crev is not matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fixed.
etcdserver/corrupt.go
Outdated
if p.err != nil { | ||
switch p.err { | ||
case rpctypes.ErrFutureRev: | ||
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: peer is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning
etcdserver/corrupt.go
Outdated
case rpctypes.ErrFutureRev: | ||
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: peer is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error()) | ||
case rpctypes.ErrCompacted: | ||
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: local node is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning
50fdf9f
to
e4d8791
Compare
etcdserver/corrupt.go
Outdated
plog.Errorf("%s's hash %d != %s's hash %d (revision %d, peer revision %d, compact revision %d)", s.ID(), h, peerID, p.resp.Hash, rev, p.resp.Header.Revision, crev) | ||
mismatch++ | ||
} else { | ||
plog.Warningf("%s hash mismatch with peer %s at revision %d (compact revision %d, peer compact revision %d)", s.ID(), peerID, rev, crev, p.resp.CompactRevision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this mismatch is expected. we should log this as cannot check hash since the compact reversion is different
e4d8791
to
f7816fd
Compare
etcdserver/corrupt.go
Outdated
plog.Errorf("%s's hash %d != %s's hash %d (revision %d, peer revision %d, compact revision %d)", s.ID(), h, peerID, p.resp.Hash, rev, p.resp.Header.Revision, crev) | ||
mismatch++ | ||
} else { | ||
plog.Warningf("%s cannot check hash since the compact reversion is different at revision %d (compact revision %d, peer %s compact revision %d)", s.ID(), rev, crev, peerID, p.resp.CompactRevision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well. i mean
%s cannot check hash of peer(%s): peer has a different compact revision %d (revision:%d)
etcdserver: only compare hash values if any It's possible that peer has higher revision than local node. In such case, hashes will still be different on requested revision, but peer's header revision is greater. etcdserver: count mismatch only when compact revisions are same Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
f7816fd
to
0e4e8ed
Compare
lgtm if CI passes |
Fix #8313.