-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd daemon: creates member directory if cluster config is not valid #3827
Comments
+1 I'm seeing the same problem - although my extra member fall back to proxy as expected initially, on any subsequent restart I get this error |
Thanks for reporting. I will investigate and try to resolve this. |
+1, all proxies failed to start after reboot. Running CoreOS 845.
Running
solves problem. |
@xcompass I got the service to startup after removing only the directory with nothing in it (based on your etcd settings). Removing both the member and proxy directories for some reason breaks it on the second startup/reboot. |
@gyuho could a possible solution to the problem be to change https://github.com/coreos/etcd/blob/master/etcdmain/etcd.go#L488 to check for the file in the cfg dir instead of the directory itself? |
@nikfoundas Sorry for being slack on this issue. I will look into it and give you updates on that. |
@nikkomiu I found a workaround by setting proxy=1 to force node to running in proxy mode. It will never create the member directory. Then there is no problem after rebooting. |
This fixes etcd-io#3827 where member falls to back to proxy successfully at first, but fails in subsequent tries. It fails when there are 'member' and 'proxy' directory in the same place, one of which did not get cleaned up from failure and causes this conflict error message: 'invalid datadir. Both member and proxy directories exist.'
My first approach can be found here #3949 in order to overwrite proxy setting by deleting member directory. If anybody has feedback, please let me know! |
With CoreOS a simple workaround is to create a drop-in in the cloud-config for etcd2.service which deletes the empty
Afterwards etcd2 servers and proxies start as expected. |
@geku Cool! And thanks for posting here. |
FYI. I am working to improve our discover function to not write any conflicting directories unless the whole operation succeeds. Will keep you updated on this issue. |
This is for etcd-io#3827. This removes member directory with defer statement. And it removes only when etcdserver.NewServer returns error.
This is for etcd-io#3827. This removes member directory with defer statement. And it removes only when etcdserver.NewServer returns error.
When discovery JoinCluster fails returning DiscoveryError, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails with DiscoveryError type. This fixes etcd-io#3827.
When discovery JoinCluster fails returning DiscoveryError, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails with DiscoveryError type. This fixes etcd-io#3827.
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
This is for etcd-io#3827. This removes member directory with defer statement. And it removes only when etcdserver.NewServer returns error.
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
When discovery fails, etcd node falls back to proxy. And in order to start as a proxy, we need to make sure there are conflicting directories: 'member' and 'proxy' directories cannot be existent together. This deletes 'member' directory only when startEtcd fails (etcd-io#3827).
This removes member directory when bootstrap fails including joining existing cluster and forming a new cluster. This fixes etcd-io#3827.
I believe #4087 fixes this issue, which removes |
Let's simulate situation when we bootstrap 4 etcd members but with
size=3
discovery token. First three members bootstrap without any issue. But last one does. It creates/var/lib/etcd2/member
even when it can not be started (for example DNS record is still not updated). Here is the gist:We can see that
/var/lib/etcd2/member/
was created and it is empty:$ ls -la /var/lib/etcd2 total 24 drwxr-xr-x 3 etcd etcd 4096 Nov 6 16:12 . drwxr-xr-x 21 root root 4096 Nov 6 15:58 .. drwx------ 2 root root 4096 Nov 6 16:12 member
And etcd2 daemon can not switch to proxy mode automatically even when DNS record is already valid:
and in addition, even when etcd2 runs as proxy - it still creates
/var/lib/etcd2/member
directory. and if you restart it will fail with the following message:partly relates to #3713
The text was updated successfully, but these errors were encountered: