Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start server fails with "error validating peerURLs" (race?) #10626

Closed
purpleidea opened this issue Apr 10, 2019 · 18 comments
Closed

Start server fails with "error validating peerURLs" (race?) #10626

purpleidea opened this issue Apr 10, 2019 · 18 comments

Comments

@purpleidea
Copy link
Contributor

I'm building some etcd clusters with the embed package. I startup three hosts: h1, then h2, then h3. Occasionally when I add the third host, the server fails to start with:

error validating peerURLs {ClusterID:cdf818194e3a8c32 Members:[&{ID:8e9e05c52164694d RaftAttributes:{PeerURLs:[http://localhost:2380]} Attributes:{Name:h1 ClientURLs:[http://localhost:2379]}} &{ID:b3a536e47da68b19 RaftAttributes:{PeerURLs:[http://127.0.0.1:2382]} Attributes:{Name:h2 ClientURLs:[http://127.0.0.1:2381]}}] RemovedMemberIDs:[]}: member count is unequal

The error is here:

return nil, fmt.Errorf("error validating peerURLs %s: %v", existingCluster, err)

I think this is a race/timing bug. I believe that it happens more often if I startup H3 very quickly after H2. If I wait say 5 seconds before doing so, then I don't think it ever happens.

My suspicion is that after a new member (eg H2) is added, some async goroutine runs to update some cluster state. If this doesn't happen before H3 joins, then we hit this issue.

If I were to guess, I'd think that perhaps the client AutoSyncInterval is related, but I'm really out of my depth, in that I don't know all the etcd internals.

Thanks for reading!

@jingyih
Copy link
Contributor

jingyih commented Apr 10, 2019

How do you start the 3 members? (As member to a new cluster? or as new member to an existing cluster?) Can you share the related parameters when you start the member?

@purpleidea
Copy link
Contributor Author

@jingyih I start H1 as a new cluster. H2 and H3 are joining the existing clusters.

Happy to share any parameters you want.

@jingyih
Copy link
Contributor

jingyih commented Apr 10, 2019

Is it possible for you to configure all 3 members as new cluster? I think this is the recommended way of starting a cluster with multiple members. Here is an example:

etcd/Procfile

Lines 2 to 4 in c7c6894

etcd1: bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd2: bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
etcd3: bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

You are right about the racing part. If creating the cluster your way, the second member will take some time to catch up applying the raft log entries replicated from member 1. Therefore there is a short period of time when member 2 does not have the most recent view of the cluster.

@purpleidea
Copy link
Contributor Author

@jingyih (Please let me know if this understanding is correct...)

I can't create H2 and H3 as new cluster, because if I did, that would mean that I can't use the cluster until they all joined, because H1 would be expecting two more members to be part of their quorum.

Rephrasing the problem, imagine that I startup H1. Then I use that for ten minutes, and THEN I decide to add H2 and H3 rapidly. My understanding is H2 and H3 would need to use existing cluster flag. (cfg.ClusterState = embed.ClusterStateFlagExisting)

If there is a signal I can wait for to know when server is caught up to cluster state, that would be fine. I'm using golang embed package. Currently after I run:

server, err = embed.StartEtcd(cfg)

I wait on:

case <-server.Server.ReadyNotify():

and

case <-server.Server.StopNotify():

Is there another signal I should wait for to know cluster is caught up?

Thanks!

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

Yes, in your use case, H2 and H3 need to use existing cluster flag.

Could you share the member add commands and etcd startup parameters so that me and other contributors can try to reproduce and investigate?

@purpleidea
Copy link
Contributor Author

@jingyih Yes, you can reproduce (modulo races) very easily like this:

I'm using embed package in https://github.com/purpleidea/mgmt/ project to try and build automatic clustering... Checkout branch feat/etcd.

git clone --recursive https://github.com/purpleidea/mgmt/
git checkout feat/etcd

Run make deps or just go get ... to add the dependencies.

Run make to build.

To test:

./mgmt run --tmp-prefix --no-pgp --hostname h1 empty
./mgmt run --tmp-prefix --no-pgp --hostname h2 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2381 --server-urls http://127.0.0.1:2382 empty
./mgmt run --tmp-prefix --no-pgp --hostname h3 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2383 --server-urls http://127.0.0.1:2384 empty

That's how you start H1, H2 and H3 up. Do H3 very quickly after you run H2.

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

Could you provide a script that does the same procedure but using etcd directly? From the script you provided, it is not clear how the etcd is used.

@purpleidea
Copy link
Contributor Author

@jingyih A big golang program is such a script :) It would not be easy to write such a standalone script. But the basic algorithm is described at the top here: https://github.com/purpleidea/mgmt/blob/a87853508a12b478a8825c4f16ae33ddec0765e6/etcd/etcd.go#L31

And the actual server startup stuff happens here: https://github.com/purpleidea/mgmt/blob/a87853508a12b478a8825c4f16ae33ddec0765e6/etcd/server.go#L161

Let me know if you need anything else.

@purpleidea
Copy link
Contributor Author

PS: In: #10626 (comment)

Is there another signal I should wait for to know cluster is caught up?

Thanks!

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

PS: In: #10626 (comment)

Is there another signal I should wait for to know cluster is caught up?

Thanks!

I think you are looking at the correct signal. However, If I understand correctly, it is not directly related to when the config change entry gets applied.

But first I should probably verify if what I described earlier is what actually happened in your case.

@purpleidea
Copy link
Contributor Author

@jingyih I appreciate that you're looking into this, and please let me know if I can help in any way.

Ultimately I want clustering of etcd to be automatic, so hopefully this helps get us one step closer :)

@purpleidea
Copy link
Contributor Author

Possibly related, I was able to trigger a panic (just once so far) by adding a member fairly quickly:

I'm using v.3.3.12

panic: tocommit(55) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?

goroutine 197 [running]:
github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4204e6ba0, 0x1a7d620, 0x5d, 0xc4203387e0, 0x2, 0x2)
	/home/james/code/gopath/src/github.com/coreos/pkg/capnslog/pkg_logger.go:83 +0x162
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420273a40, 0x37)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/log.go:191 +0x15c
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:1194 +0x54
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.stepFollower(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:1140 +0x3ff
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc42050a900, 0x8, 0x91d5d6a98a29bdec, 0x8e9e05c52164694d, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/raft.go:868 +0x12f1
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.(*node).run(0xc420652000, 0xc42050a900)
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/node.go:323 +0x1059
created by github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft.StartNode
	/home/james/code/gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/raft/node.go:210 +0x61e

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

I am still trying to understand more about the server booting process. So far this is my understanding of the issue. Not sure if this is by design due to bootstrapping needs.

In current implementation, serving reads directly from this local member structure could include stale information.

members map[types.ID]*Member

For example, in a two node cluster where node 1 is raft leader, a member add request sent to node 1 will return when the member add raft log entry is appended to both node 1 and 2's raft log, and node 1 finished applying the member add entry to its members struct. If node 2 has slow apply, its members struct might not have information about the newly added node. If the new node (which is node 3) tries to join this cluster and queries the existing cluster information from node 2, it will cause the error message you observed.

I did not actually reproduce the issue using your software, so this is my speculation based on code reading.

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

We can get this fixed if my speculation is proved by issue reproduction. Currently my bandwidth is very limited. I may revisit this issue later. For now, adding some delay will help mitigate the issue.

@purpleidea
Copy link
Contributor Author

@jingyih If I understand this correctly, you are describing a possible bug in etcd itself, correct? If so perhaps this could be easily fixed by having a signal that an embed user can watch for to know when the started server is up-to-date?

This will be helpful for many situations I think too.

With such a signal, I could wait for that and test to see if it removes the race.

@purpleidea
Copy link
Contributor Author

I did some more extensive testing today. I think that if I run member add and then wait enough time before starting the new server, then this is not an issue. It's possible that it's a race in my code as well.

In any case I think this could also be solved by: #10537 (comment) specifically:

think the best way to model all this is to add MemberAPI operations into etcd.Op so that they can happen in a Txn.

That way the PUT operation that a new client is watching on to tell it to start up, can be done at the same time as the MemberAdd operation.

I will close this for now.

@jingyih
Copy link
Contributor

jingyih commented Apr 11, 2019

Member operations are configuration changes, whereas PUT is a mutating operation on key space. For example you cannot add or delete a etcd member by simply updating a key value. They are very very different kinds of operations. Sorry but what you suggested does not make sense to me.

@purpleidea
Copy link
Contributor Author

They are very very different kinds of operations.

I understand, I was just proposing an idea for a race-free API that would help solve the need for cfg.StrictReconfigCheck = false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants