-
Notifications
You must be signed in to change notification settings - Fork 931
How to turn DeadMaster into slave when MySQL comes back online after automatic failover? #891
Comments
if you are using GTID based replication you can easily repoint the demoted previous master to slave by CHANGE MASTER TO MASTER_USER='user',MASTER_PASSWORD='password',MASTER_AUTO_POSITION=1; you can use PostGracefulTakeoverProcesses hooks to accomplished what you want e.g
after a successful take over that postfailover shell script will execute and you can do whatever you want. For me i think it's a conservative moved/decision by shlomi to make it this as a default behaviour to give us the user time to investigate as to why the previous master has been demoted. imagine the scenario if orchestrator do this thing for you automagically and that server is not not meant to be automatically rejoin the cluster, it's just my 0.2$ |
I must be very honest with you, i've also done what you are trying to accomplished until one day i've been bitten by it and learned my lesson the hardway, you will realised this as soon as you throw proxySQL/maxscale in to the equation :) |
That's the expected behavior and you should rebuild the server.
Very non-trivial. See http://code.openark.org/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind |
@shlomi-noach a very good read will reserved this late this evening :) |
Hi all, Thanks for the clarification.
In our case we use MariaDB which doesn't support
This is a lot quicker recovery and less work than rebuilding slaves. For now we'll consider doing this manually instead of through a hook.
Cheers, thanks for the read. Will have a look! In our case I think it's less of an issue because we plan to use SQL Proxy to only send writes to the master and reads to the slaves. So queries shouldn't be going to the old master. But I understand it is always a risk. I'll test to see if this works and doesn't give weird unexpected issues. Thanks so-far! |
@NielsH our millage may vary in my end just to make sure that the newly promoted slave was indeed sync with the master I always run pt-table-checksum just to make it sure that they were sync. I don't know if pt-table-checksum is compatible with mariadb because we are a percona shop :) Anyway glad that it was working for you as well, another tip use semi sync replication if it was supported by mariadb and also try to perform a graceful failover using orchestrator-cli not a GUI whenever you are doing a test specially when you are going to integrate proxysql because you will see a verbose message on what's going on in the background and have much better understanding behind the scene's to avoid surprises. you don't want the app might write some data on the hostgroup reader because of failed failover so extra careful is necessary ^_^ this is off topic by the way |
@hellracer , it seems that I am looking for a hook that should be executed when this old master comes to life or should i manage this case outside of orchestrator ? as i couldn't find any hook for it. I have another question that i think it was not mentioned in this issue. why the failover is creating a different alias in the dashboard ? how can i reuse the same old name ? or this is the default ? Note: I am using pseudo gtid with mariadb |
@NielsH have you solved this instead of doing it manually ? |
@mostafahussein we ended up doing it manually. However, we did create a script that hooks into the mariadb systemd unit file and that runs before stopping mysql itself. The script triggers a failover before shutting down, so the node can rejoin the cluster properly without having to do manual recovery. This is because for us a primary concern was mainly that we would have to manually restart mariadb (through The script may not work for your configuration/usecase and hasn't seen a lot of testing, so you probably have to change it to fit your needs, but if it helps:
And the systemd config:
|
@NielsH Thanks alot! |
Hello @NielsH, This is just a reminder when you have available time to follow up my previous question. Thanks for the assist |
Hi, Sorry, I lost track of the question.
With this, and the above script, whenever I restarted a node in the cluster, as long as the failover worked in advance, so it was a slave at the time of restarting, it would show back up in the same cluster. I did have to manually click "start replication" again, but it would not be a separate cluster. If the mysql server was a master when being restarted (so no failover prior) it would show up as a seperate cluster and I had to manually convert it to a slave (through #891 (comment) ) Hope this helps.... ? |
Hi,
I have a setup with 1 MariaDB master and 2 slaves:
With this config:
Config:
I have noticed that when I restart or stop the MySQL master (or in any other way if it becomes unavailable) the master becomes "lost". The failover happens properly; a slave is promoted to master and the other slave now slaves from the new master.
But the old master, once it comes back online shows up in its own cluster. It is not recognized as being part of the old cluster. I'd like to have it automatically rejoin and turn into a slave instead. I thought that perhaps this was possible by having it automatically rollback anything that was done that was not transferred to the slaves at the time of the failover and then start the slave from the last-known "shared" change that the old master originally slaved to the new master that was promoted during the failover.
These changes shouldn't even be there anyways but I guess it can happen when the slave lag was high during the failover. However they can be discarded anyways.
What currently happens if I restart the MySQL master is this becomes the new topology:
Is there anything I can do to make it work as I want? Or is it expected behaviour that if I restart the Master server or do failover in any way I should manually rebuild it as a slave and rejoin it into the cluster afterwards?
Thank you in advance!
The text was updated successfully, but these errors were encountered: