-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: always remove member directory when bootstrap fails #4087
Conversation
@@ -272,6 +272,10 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) { | |||
var err error | |||
str, err = discovery.JoinCluster(cfg.DiscoveryURL, cfg.DiscoveryProxy, m.ID, cfg.InitialPeerURLsMap.String()) | |||
if err != nil { | |||
// It removes member directory when NewServer returns error. | |||
// This prevents conflicts with 'proxy' directory when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcdserver should have no idea about proxy. This is just a necessary cleanup step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also remove member dir in another !haveWAL case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically in the two !haveWAL case, we should clean up member dir if bootstrap fails.
@xiang90 I added more checks when deleting. Please take a look again and let me know. |
@@ -241,6 +250,7 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) { | |||
} | |||
existingCluster, err := GetClusterFromRemotePeers(getRemotePeerURLs(cl, cfg.Name), prt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we don't need this I think. What do you mean by another !haveWAL
case?
@gyuho I should be more clear. For every error in the wal not existing case, we should remove member dir. We should only keep member dir if the new member is successfully bootstrapped. We know for sure it is a new member when wal does not perviously exists. |
@@ -229,6 +229,15 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) { | |||
if err != nil { | |||
return nil, err | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a simple way:
right after line 225
if !haveWAL {
defer func() {
if err != nil {
// cleans up member directory if bootstrap fails (including forming or joining a new cluster)
os.RemoveAll(cfg.MemberDir())
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to change the func signature to
func NewServer(cfg *ServerConfig) (srv *EtcdServer, err error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
12570c6
to
aacad34
Compare
LGTM. Let's test it, at least manually. So we know this does fix the issue reported. |
Fixing govet issues. And found that:
is used for discovering. They just keep retrying and defer never get executed. |
aacad34
to
a996b3e
Compare
@@ -274,8 +284,9 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) { | |||
if err != nil { | |||
return nil, &DiscoveryError{Op: "join", Err: err} | |||
} | |||
urlsmap, err := types.NewURLsMap(str) | |||
if err != nil { | |||
urlsmap, e := types.NewURLsMap(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e -> uerr
ea19714
to
0841134
Compare
@xiang90 I tested manually injecting error to make startEtcd return when |
prt, err := rafthttp.NewRoundTripper(cfg.PeerTLSInfo, cfg.peerDialTimeout()) | ||
if err != nil { | ||
return nil, err | ||
prt, uerr := rafthttp.NewRoundTripper(cfg.PeerTLSInfo, cfg.peerDialTimeout()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not call everything uerr. When there is a naming conflict for error, we usually add one prefix. For newRoundTripper, probably just rterr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it.
0841134
to
08d34ec
Compare
existingCluster, err := GetClusterFromRemotePeers(getRemotePeerURLs(cl, cfg.Name), prt) | ||
if err != nil { | ||
return nil, fmt.Errorf("cannot fetch cluster info from peer urls: %v", err) | ||
existingCluster, uerr := GetClusterFromRemotePeers(getRemotePeerURLs(cl, cfg.Name), prt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably gerr
08d34ec
to
f0c969a
Compare
@xiang90 All variable name conflicts and govet shadowing issues are fixed. PTAL. |
@gyuho Can we also update the commit message? etcdserver: always remove member directory when bootstrap fails (including joining existing cluster and forming a new cluster) |
This removes member directory when bootstrap fails including joining existing cluster and forming a new cluster. This fixes etcd-io#3827.
f0c969a
to
a7e443d
Compare
@xiang90 Just did. Thanks! |
LGTM |
Thanks, will merge after CI passes. |
etcdserver: always remove member directory when bootstrap fails
This removes member directory when bootstrap fails including joining existing
cluster and forming a new cluster. This fixes coreos#3827.