-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary not being set on failover #76
Comments
More info. So it looks like by using the ip address as the hostname, it does not identify itself as the owner of the previous lease which causes it to consider itself a replica and connect to itself. It appears that store.go:monitorLease or store.go:acquireLeaseOrPrimaryInfo needs another check. mia = [fdaa:0:2fff:a7b:2cc3:1:932e:2]
|
Another amusing one. After a
|
The following patch seems to fix this particular issue. First time writing go, hope this is adequate.
|
After some more testing, it seems this patch has a problem I'm not sure how to resolve. To summarize the issue, when a primary restarts, it will connect to consul and check for a lease. The lease exists, so by default, it will connect as a replica. Unfortunately, it connects to itself which causes replication issues and transaction corruptions. So if we are the previous owner, I'm not sure how to get an actual instance of lease. Going down the function chain:
A few options:
Not sure where to go from here, but I'm hoping I missed an easy fix. For reference, this function compares the hostname to all the local interfaces to determine ownership.
|
I'm adding some fixes to prevent this:
I'd like to avoid checking the host/interface and rely on the API to determine if it should accept connections or not. |
Yes, this is way more elegant and it doesn't break the lease acquisition state machine. 🙂 The tradeoff of writes being dropped till lease expiry is well documented, so there are no surprises here. As a side note, forcing a primary change might be the safest thing to do in general as we don't know why the original crashed in the first place. |
A new primary should automatically be promoted once the key expires in Consul and the lock delay passes. We can't force a primary change in order to prevent split brain. The safest thing to do is rely on the TTL. It does introduce a little bit of downtime in the case of unclean exits but it's only about 15s. There are some tricks we can do in the future to help minimize that time as well. |
Ah, I used some confusing language there: "forcing a primary change" -> "ensuring the primary will change" may be safer than trying to re-acquire the lease since we don't know why the primary crashed. From my perspective, given the unclean exit, I do believe 15 seconds of no writes is quite acceptable. |
I added the node ID here: #79 |
It also changes the disk layout a bit. It moves the database directories from under the root data directory (e.g. |
I added context wrapping so that it will get canceled when the primary status changes. The PR still needs some tests but I figured I'd throw it up on GitHub. I'm AFK for a couples days but I'll add those tests on Monday. #80 |
Wow, thanks for the incredibly quick fix. Manually tested against dd6e73e. Both quickly restarting fly vm restart (which would simulate a reboot or OOM)
fly vm stop loses the vm completely takes about 40 seconds to come back up completely.
|
I added tests and merged #80 so I'm closing this out. Thanks for reporting it! |
When I was testing failover, I found some odd behaviour testing against two nodes (sea and mia).
In sea, the .primary contains:
And the ip address of mia is:
It looks like that sea is not acquiring the new primary information from consul?
We do not see any lease acquisition logs from sea. Both hosts believe they are replicas. Trying to insert into the database manually gives:
Also, I built a library for elixir assuming the .primary file would be deleted on after the lease was acquired on the primary. Essentially, it connects to all the nodes and checks if the .primary does not exist. This way if the user is on different platforms, etc. we don't have to parse the contents of the file to figure out the location of the primary. Can you confirm that is the expected behaviour? And if it isn't, can it be changed so only the replicas have the .primary?
Thanks.
The text was updated successfully, but these errors were encountered: