Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql8 gtid #434 test intermittent rejoin failure #436

Closed
nyxneuf opened this issue Jul 5, 2022 · 11 comments
Closed

mysql8 gtid #434 test intermittent rejoin failure #436

nyxneuf opened this issue Jul 5, 2022 · 11 comments
Assignees
Milestone

Comments

@nyxneuf
Copy link

nyxneuf commented Jul 5, 2022

SRM - v2.2.24
MySQL8 - M-S

Test scenario

  1. M - S -> master db stop and slave is new master!
  2. old master db start -> S - M
  3. S - M -> master db stop and slave is new master!
  4. old master db start -> M - S

repeat the above process!
intermittent rejoin failure!

20220705_105822

@nyxneuf
Copy link
Author

nyxneuf commented Jul 5, 2022

Similarly, I use the code
there was no problem!

ss, errss := server.GetSlaveStatus(server.ReplicationSourceName)
if errss != nil {
server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", errss)
return false
}
server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)

errss -> err after modification

	ss, err := server.GetSlaveStatus(server.ReplicationSourceName)
	if err != nil {
		server.ClusterGroup.LogPrintf(LvlInfo, "Failed to check if server was using GTID %s", err)
		return false
	}
	server.ClusterGroup.LogPrintf(LvlInfo, "Rejoin server using GTID %s", ss.UsingGtid.String)

@nyxneuf
Copy link
Author

nyxneuf commented Jul 5, 2022

I found it!

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L675-L682

not the code above!

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L684

[Before]
if crash.FailoverIOGtid == nil {

[After]
if crash.FailoverIOGtid != nil {

=======================

I applied and tested it and there was no problem.

What does this code mean?

And

https://github.com/signal18/replication-manager/blob/develop/cluster/srv_rejoin.go#L376

You need to check that the code is also correct!

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 5, 2022

Re, I think you are missing one point here . There are time when you can not rejoin the old master , and replication-manager is detecting this scenario if it's not possible to rejoin you need a full state transfert via mysqldump , restoring a backup or a snapshot to put the old master back in time before the crash of the old leader. you need to install the same server package on the replication-manager server and provide the path to the tools needed for state transfert.

autorejoin-mysqldump = true
backup-mysqlbinlog-path = "/Users/apple/mysql/bin/mysqlbinlog"
backup-mysqldump-path = "/Users/apple/mysql/bin/mysqldump"
backup-mysqlclient-path = "/Users/apple/mysql/bin/mysql"

@nyxneuf
Copy link
Author

nyxneuf commented Jul 5, 2022

Rejoin fails even if there is no change in DB data.
Do I have to use mysqldump for all DB data?

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 5, 2022

Ok i'll give it try you say multiple times failover rejoin

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 5, 2022

I'm testing always with 3 nodes may be it make a difference

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 5, 2022

To explain the code when the leader is crashed , replication-manager record the position of the crash by looking show slave status on the candidate master that is still a replica at this time . It record gtid_executed and translate uuid to hash to get similar to mariadb GTID an record it in property FailoverIOGtid that is the name of the MariaDB counter part .

@nyxneuf
Copy link
Author

nyxneuf commented Jul 5, 2022

yes! multiple times failover rejoin.
I'm sorry to bother you.
Please check!

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 5, 2022

You don't bother me at all , you're help is very valuable, would prefer still to get issues without images but with log files it's more easy to dig into it

@nyxneuf
Copy link
Author

nyxneuf commented Jul 5, 2022

Thank you.
I'm not a developer. I'm an infra engineer.
It's hard to understand Golang. It is difficult to use English. So I use a translator.
I think replication-manager is great.
I'm glad it helped. :)

@svaroqui
Copy link
Collaborator

svaroqui commented Jul 7, 2022

Found it MySQL can print carriage return in GTID output of show slave status i 've pushed a fix and will make a new release soon , thanks again for founding this !

@svaroqui svaroqui self-assigned this Jul 8, 2022
@svaroqui svaroqui added this to the 2.2.25 milestone Jul 8, 2022
@svaroqui svaroqui closed this as completed Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants