Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switchover with children cluster will set the readonly flag on children cluster primary server #565

Open
rdemongeot opened this issue Apr 24, 2024 · 10 comments · Fixed by #569
Assignees
Milestone

Comments

@rdemongeot
Copy link

rdemongeot commented Apr 24, 2024

We have 2 different clusters.
First cluster 'French' (FR1 & FR2) is a primary/replica cluster.
Second cluster 'Europe' (EU1 & EU2) is also a primary/replica cluster; but the 'Europe' primary is also a replica from 'French' only for 'French' subset of data.

Initial_state

When we want to switchover French cluster from primary to secondary; (From FR1 to FR2 for exemple) - in order to perform a work into FR1 - the switch over will put the flag read_only into EU1.

replication manager logs

time="2024-04-24 07:34:46" level=info msg="Starting master switchover" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:46" level=info msg="Freezing writes set read only on FR-01" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Switching other slaves to the new master" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Waiting for slave EU-01 to sync" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:48" level=info msg="Change master on slave EU-01" cluster=cluster_FR_masterslave
time="2024-04-24 07:34:50" level=info msg="Server EU-01 disable read only as last non slave" cluster=cluster_EU_masterslave

I can't see any "putting read_only flag"

proxysql logs

2024-04-24 07:34:48 [INFO] Server 'EU-01' found with 'read_only=1', but not found as reader
2024-04-24 07:34:50 [INFO] Server 'EU-01' found with 'read_only=0', but not found as writer

final_state

The flag will be removed few seconds later; but during this time slot; ProxySQL see this read_only flag; and move the EU1 Server as non-writer generating issues on many application who try to write on it.

This issue (and putting RO/removing it quickly) will generate a race condition on old proxysql (2.0) - breaking the state machine - and forcing to restart the proxysql engine.

@rdemongeot
Copy link
Author

Perhaps on :

utils/dbhelper/dbhelper.go :

                if cluster.Conf.ReadOnly && cluster.Conf.MxsBinlogOn == false && !cluster.IsInIgnoredReadonly(sl) {
                        logs, err = sl.SetReadOnly()
                        cluster.LogSQL(logs, err, sl.URL, "MasterFailover", LvlErr, "Could not set slave %s as read-only, %s", sl.URL, err)
                }

@ahfa92
Copy link
Contributor

ahfa92 commented Apr 24, 2024

Can you explain your topology?

time="2024-04-24 07:34:50" level=info msg="Server EU-01 disable read only as last non slave" cluster=cluster_EU_masterslave

@ahfa92
Copy link
Contributor

ahfa92 commented Apr 24, 2024

If you can, please explain your topology and your expected behavior.
Please send us your configurations and logs for further examination. It will help us in assessing the situation.
Thank you

@rdemongeot
Copy link
Author

If you can, please explain your topology and your expected behavior. Please send us your configurations and logs for further examination. It will help us in assessing the situation. Thank you

FR Cluster is the "First cluster" in the picture;
EU cluster is the second on on the picture.

During a switch over on FR cluster (which is one of the replication source for EU) Replication manager will set the read_only flag on EU Primary cluster. But EU Primary NEED to be read-write all the time.

After the switchover; Replication manager (for EU cluster) see that EU-01 is not RW; and put it again as RW. But we have EU-01 forced (by mistake) as read_only for 2 seconds.

@ahfa92
Copy link
Contributor

ahfa92 commented Apr 24, 2024

can you show the GUI of the topology? what topology it's detected as? if it was detected as master-slave, then it will try to force EU to read only.

@rdemongeot
Copy link
Author

EU1 is detected as secondary, but ignored one.

It is not into the configuration of FR1; but all the replication flow are named; and EU1 is a replica of FR cluster (flow named FR) so detected by Replication Manager as secondary ignored.

Reconfigure EU1 to be replica of FR2 when switchover is the normal way (child cluster SHOULD be a replica of Primary FR).

Initial_state

FR Cluster is the cluster on the top of this picture;
EU Cluster is the cluster on the bottom.

@ahfa92
Copy link
Contributor

ahfa92 commented Apr 24, 2024

Please show the screenshot of the dashboard of replication manager.

@rdemongeot
Copy link
Author

Please show the screenshot of the dashboard of replication manager.

It's not easy, names are rewritten for obfuscation reasons, a screenshot would not be obfuscated :'(

@ahfa92
Copy link
Contributor

ahfa92 commented Apr 24, 2024

We will check on this.

Thank you for your patience.

@ahfa92
Copy link
Contributor

ahfa92 commented May 2, 2024

Already done some patch in #569 and #570

@caffeinated92 caffeinated92 added this to the 2.3.24 milestone May 3, 2024
@caffeinated92 caffeinated92 linked a pull request May 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants