-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"No node has been set up yet" during concurrent cluster join #2145
Comments
This one is hard to reproduce (I ran the upsert test 100 times but couldn't reproduce it) though I can see when this would happen. This happens because n1/alpha.log
n3/alpha.log
All subsequent
Removing the timeout fixes this. Though it doesn't fix the case where |
Sometimes EntryConfChange proposals were dropped silently by raft and without timeout JoinCluster was stuck waiting for response. |
We could still have the timeout, but potentially put the JoinCluster in a loop? |
JoinCluster is already in a loop, the problem is that after the first context deadline exceeded all subsequent requests fail because there is already a I think the solution here is simple, check the leader if the node being added is already part of the |
This should be fixed with 807976c. I took the approach of removing the timeout as |
On the current nightly (2018/02/19), concurrently launching a five-node cluster can occasionally deadlock with errors like...
... when n3, the node n1 is trying to join, is in some sort of timeout loop on its join RPC:
See full logs, attached:
20180219T173203.000-0600.zip
The text was updated successfully, but these errors were encountered: