Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbchecker: don't delete database file; just back it up #3587

Closed
wants to merge 1 commit into from

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented May 11, 2023

Deleting the database file also makes the server forget its cluster ID and server ID. This causes the DB to make a new cluster ID and server ID on restart, which actually causes split-brain, because the other servers in the cluster think there's a new member at the same IP address, but the ond member isn't removed. This either causes them to reject the new member or to think the cluster has more members than it should.

When trying to reset the RAFT cluster state, all DBs on all masters need to be removed at the same time, and all servers must be restarted at the same time so that the don't keep the old cluster ID and server IDs. The dbchecker cannot do that, because it only runs on a single machine and can't coordinate between nodes.

Deleting the DB just makes the problem worse and we've see it actually cause split-brain in a number of cases where the DBs could have recovered.

https://issues.redhat.com/browse/OCPBUGS-11705

Deleting the database file also makes the server forget its cluster ID
and server ID. This causes the DB to make a new cluster ID and server ID
on restart, which actually causes split-brain, because the other servers
in the cluster think there's a new member at the same IP address, but the
ond member isn't removed. This either causes them to reject the new
member or to think the cluster has more members than it should.

When trying to reset the RAFT cluster state, all DBs on all masters
need to be removed at the same time, and all servers must be restarted
at the same time so that the don't keep the old cluster ID and server
IDs. The dbchecker cannot do that, because it only runs on a single
machine and can't coordinate between nodes.

Deleting the DB just makes the problem worse and we've see it actually
cause split-brain in a number of cases where the DBs could have
recovered.

https://issues.redhat.com/browse/OCPBUGS-11705

Signed-off-by: Dan Williams <dcbw@redhat.com>
@dcbw dcbw force-pushed the dbchecker-dont-wipe-dbs branch from 6050c08 to df8d82b Compare May 11, 2023 17:17
@trozet
Copy link
Contributor

trozet commented May 18, 2023

So the code to resetRaftDB originated from:
openshift/ovn-kubernetes#406

bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1903660

@abhat wrote the code originally so he might have a better recollection of the original issue

going to also go look at CNO

@trozet
Copy link
Contributor

trozet commented May 18, 2023

So looks like we check cluster status 10 times, if that fails we then delete the db. In CNO on bringup we check if the db file exists. Assuming it doesn't cause resetRaftDB removed it we then get to here:

https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/self-hosted/ovnkube-master.yaml#L617

where we check once per second for 5 seconds to see if an existing cluster exists. We check if any master pod is part of a raft cluster (db_part_of_cluster) by listing its connection remotely.

Let's assume the above fails for some reason, then only master-0 will attempt to create a new raft cluster. master-1 and master-2 will wait on master-0 to create the cluster.

To get into such a failure state where cluster/status failed 10 times, and we failed to query the other 2 pods 5 times for their connection seems like a rare scenario to me. Maybe it indicates there were other networking problems occurring at the same time preventing a the node from querying the other 2 nodes.

@trozet trozet closed this Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants