mysql8 gtid #434 test intermittent rejoin failure #436

nyxneuf · 2022-07-05T02:11:44Z

SRM - v2.2.24
MySQL8 - M-S

Test scenario

M - S -> master db stop and slave is new master!
old master db start -> S - M
S - M -> master db stop and slave is new master!
old master db start -> M - S

repeat the above process!
intermittent rejoin failure!

nyxneuf · 2022-07-05T03:27:34Z

Similarly, I use the code
there was no problem!

replication-manager/cluster/srv_rejoin.go

Lines 675 to 682 in 932673c

    
           		ss, errss := server.GetSlaveStatus(server.ReplicationSourceName) 
        
           		if errss != nil { 
        
           			server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", errss) 
        
           			return false 
        
           		} 
        
           		server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)

errss -> err after modification

	ss, err := server.GetSlaveStatus(server.ReplicationSourceName)
	if err != nil {
		server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", err)
		return false
	}
	server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)

nyxneuf · 2022-07-05T06:43:41Z

I found it!

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L675-L682

not the code above!

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L684

[Before]
if crash.FailoverIOGtid == nil {

[After]
if crash.FailoverIOGtid != nil {

=======================

I applied and tested it and there was no problem.

What does this code mean?

And

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L376

You need to check that the code is also correct!

svaroqui · 2022-07-05T07:49:58Z

Re, I think you are missing one point here . There are time when you can not rejoin the old master , and replication-manager is detecting this scenario if it's not possible to rejoin you need a full state transfert via mysqldump , restoring a backup or a snapshot to put the old master back in time before the crash of the old leader. you need to install the same server package on the replication-manager server and provide the path to the tools needed for state transfert.

autorejoin-mysqldump = true
backup-mysqlbinlog-path = "/Users/apple/mysql/bin/mysqlbinlog"
backup-mysqldump-path = "/Users/apple/mysql/bin/mysqldump"
backup-mysqlclient-path = "/Users/apple/mysql/bin/mysql"

nyxneuf · 2022-07-05T08:24:59Z

Rejoin fails even if there is no change in DB data.
Do I have to use mysqldump for all DB data?

svaroqui · 2022-07-05T08:41:46Z

Ok i'll give it try you say multiple times failover rejoin

svaroqui · 2022-07-05T08:42:14Z

I'm testing always with 3 nodes may be it make a difference

svaroqui · 2022-07-05T08:47:48Z

To explain the code when the leader is crashed , replication-manager record the position of the crash by looking show slave status on the candidate master that is still a replica at this time . It record gtid_executed and translate uuid to hash to get similar to mariadb GTID an record it in property FailoverIOGtid that is the name of the MariaDB counter part .

nyxneuf · 2022-07-05T09:19:34Z

yes! multiple times failover rejoin.
I'm sorry to bother you.
Please check!

svaroqui · 2022-07-05T09:29:02Z

You don't bother me at all , you're help is very valuable, would prefer still to get issues without images but with log files it's more easy to dig into it

nyxneuf · 2022-07-05T09:56:25Z

Thank you.
I'm not a developer. I'm an infra engineer.
It's hard to understand Golang. It is difficult to use English. So I use a translator.
I think replication-manager is great.
I'm glad it helped. :)

svaroqui · 2022-07-07T19:40:31Z

Found it MySQL can print carriage return in GTID output of show slave status i 've pushed a fix and will make a new release soon , thanks again for founding this !

svaroqui self-assigned this Jul 8, 2022

svaroqui added this to the 2.2.25 milestone Jul 8, 2022

svaroqui closed this as completed Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql8 gtid #434 test intermittent rejoin failure #436

mysql8 gtid #434 test intermittent rejoin failure #436

nyxneuf commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

nyxneuf commented Jul 5, 2022 •

edited

Loading

svaroqui commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 5, 2022

svaroqui commented Jul 5, 2022

svaroqui commented Jul 5, 2022 •

edited

Loading

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 7, 2022

mysql8 gtid #434 test intermittent rejoin failure #436

mysql8 gtid #434 test intermittent rejoin failure #436

Comments

nyxneuf commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

nyxneuf commented Jul 5, 2022 • edited Loading

svaroqui commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 5, 2022

svaroqui commented Jul 5, 2022

svaroqui commented Jul 5, 2022 • edited Loading

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 5, 2022

nyxneuf commented Jul 5, 2022

svaroqui commented Jul 7, 2022

nyxneuf commented Jul 5, 2022 •

edited

Loading

svaroqui commented Jul 5, 2022 •

edited

Loading