Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-loss Handling of Mysql with Apache Helix Cluster management #5

Closed
Kuntal-G opened this issue Mar 17, 2015 · 7 comments
Closed

Comments

@Kuntal-G
Copy link

We are trying to manage our MySql cluster with Apache Helix. We will be running 1 Mysql master and 2 Slaves in each Helix Cluster. We are using Mysql Helix Fullmatrix for this purpose

I have the following queries:

  1. All write goes to master and replicated to slave. Now suppose write occur in master and it is not replicated in slave yet,and on that time the master goes down. Now Helix will choose one of the slave as master.But say when the previous master comes up and become slave,is there a way that transaction that were there in this previous master will sync with current master and other slaves?? Is there a way to ensure that there is No data-loss in this type of scenario??

  2. Also if my requirement is such that i always want that when my previous master comes up,it should become the master,not slave.Because we are planning to use high configuration machine for master. So how to do it using Helix?? Or we should keep master and slave with same configuration machine? What is the best approach?

I have seen that there might a way to do it using Customize/user defined Rebalancing algorithm.But Not able to find proper piece of code for getting started with this kind of scenario.

Any help or expert opinion for the above queries will be very helpfull.

@kishoreg
Copy link
Owner

Hi Kuntal,

Short answer, you can achieve what the goals you want but it requires some more work :-).

There are couple of ways to approach this but first is to have a way to detect that master and slave are out of sync. This can be done by looking at the gtid's of master and slave.

If there is discrepancy, then there are multiple options. This node can first set up replication with current master, catch up with the slave after which you will have to make this node the master. When this node becomes the master, other slaves will setup replication and the missing data will be propagated to the slaves.

This is one of the solution. I will write a detailed answer tomorrow.
MasterSlaveRebalancer is the code to customize leader election code.

@Kuntal-G
Copy link
Author

Hi Kishore,

Thanks for providing some guidance. I will look into the MasterSlaveRebalancer code too meet our requirement for customize leader election.
Also looking forward to your detail explanation :)

@Kuntal-G
Copy link
Author

Kuntal-G commented Apr 9, 2015

Hi Kishore,

After going through your suggestion,here is what we are implementing HELIX for MySQL.

  1. We are using semisync for replication between mysql cluster to encounter dataloss. and for that we have modified the Replicator class inside setupReplication() method:

//Add semi sync replication
if(slaveHost.equals(masterHost)) {
slaveStatement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 1;");
slaveStatement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 0;");
} else {
slaveStatement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 0;");
slaveStatement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 1;");
}

  1. For Master election policy, we are making sure that when our ideal master comes from Offline to Slave ( after some failure) it is re-Elected as master and also the semisync is adjusted accordingly.We are keeping the ideal master info inside our store system,and also keeping the information,whenever any state changes happening inside onCallback() method of MasterSlaveRebalancer class before the stupAppPermission() is called.Once the ideal master become Slave, we are calling rebalance cluster and Helix is re-Electing the ideal master from slave to current master( because the ideal state store in zookeeper during initial rebalance cluster call don't change).. With this changes we are able to meet our requirement and its working fine. Please let me know am i doing the right way or is there any flaw with this approach.

// Update this information to the dashboard inside onCallback() method of MasterSlaveRebalancer

                String[] instanceDets = new String[2];
                LineReaderUtil.fastSplit(instanceDets, instance, '_');
                try 
                {
                    DashboardFunctions.updateInstanceState("slave", instanceDets[0], instanceDets[1]);

//Updated part of MasterSlaveRebalancer class setAppPermissions() method:

if ( instanceType.equals("slave") )
{
try
{
for ( String aUser : userL )
{
System.out.println("Setting Permissions for user: " +aUser +" as slave");
hasOutput = statement.execute("REVOKE ALL PRIVILEGES ON . FROM '" +aUser +"'@'localhost'");
hasOutput = statement.execute("FLUSH PRIVILEGES");
hasOutput = statement.execute("GRANT SELECT ON . TO '" +aUser +"'@'localhost' IDENTIFIED BY 'drone-user'");
hasOutput = statement.execute("FLUSH PRIVILEGES");
}
statement.execute("stop slave");

            //Add semi sync replication
            statement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 0;");
            statement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 1;");

            // START SLAVE
            statement.execute("start slave");
        }
        catch ( Exception e )
        {
            LOG.fatal("Unable to set app permissions", e);
        }
        finally
        {
            if ( null != connection ) connection.close();
        }

    }
    else if ( instanceType.equals("master") )
    {
        try
        {
            for ( String aUser : userL )
            {
                System.out.println("Setting Permissions for user: " +aUser +" as master");
                hasOutput = statement.execute("GRANT ALL ON *.* TO '" +aUser +"'@'localhost'");
                hasOutput = statement.execute("FLUSH PRIVILEGES");
            }
            statement.execute("stop slave");

            //Add semi sync replication
            statement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 1;");
            statement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 0;");
        }

@kishoreg
Copy link
Owner

kishoreg commented Apr 9, 2015

Once the ideal master become Slave, we are calling rebalance cluster and
Helix is re-Electing the ideal master from slave to current master( because
the ideal state store in zookeeper during initial rebalance cluster call
don't change)..

Are you referring to admin.rebalance? Why do you need to call rebalance
cluster? You idea seems correct but looks like you might have to use
different api's to achieve that.

semi-sync is a good idea to avoid data loss.

On Thu, Apr 9, 2015 at 5:49 AM, Kuntal Ganguly notifications@github.com
wrote:

Hi Kishore,

After going through your suggestion,here is what we are implementing HELIX
for MySQL.

  1. We are using semisync for replication between mysql cluster to
    encounter dataloss. and for that i have modified the Replicator class
    inside setupReplication() method:

//Add semi sync replication
if(slaveHost.equals(masterHost)) {
slaveStatement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 1;");
slaveStatement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 0;");
} else {
slaveStatement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 0;");
slaveStatement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 1;");
}

  1. For Master election policy, we are making sure that when our ideal
    master comes from Offline to Slave ( after some failure) it is re-Elected
    as master and also the semisync is adjusted accordingly.We are keeping the
    ideal master info inside our store system,and also keeping the
    information,whenever any state changes happening inside onCallback() method
    of MasterSlaveRebalancer class before the stupAppPermission() is
    called.Once the ideal master become Slave, we are calling rebalance cluster
    and Helix is re-Electing the ideal master from slave to current master(
    because the ideal state store in zookeeper during initial rebalance cluster
    call don't change).. With this changes we are able to meet our requirement
    and its working finet. Please let me know am i doing the right way or is
    there any flaw with this approach. :
    // Update this information to the dashboard
    String[] instanceDets = new String[2];
    LineReaderUtil.fastSplit(instanceDets, instance, '_');
    try
    {
    DashboardFunctions.updateInstanceState("slave", instanceDets[0],
    instanceDets[1]);

Updated part of MasterSlaveRebalancer class setAppPermissions() method:

if ( instanceType.equals("slave") )
{
try
{
for ( String aUser : userL )
{
System.out.println("Setting Permissions for user: " +aUser +" as slave");
hasOutput = statement.execute("REVOKE ALL PRIVILEGES ON . FROM '"
+aUser +"'@'localhost'");
hasOutput = statement.execute("FLUSH PRIVILEGES");
hasOutput = statement.execute("GRANT SELECT ON . TO '" +aUser
+"'@'localhost' IDENTIFIED BY 'drone-user'");
hasOutput = statement.execute("FLUSH PRIVILEGES");
}
statement.execute("stop slave");

        //Add semi sync replication
        statement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 0;");
        statement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 1;");

        // START SLAVE
        statement.execute("start slave");
    }
    catch ( Exception e )
    {
        LOG.fatal("Unable to set app permissions", e);
    }
    finally
    {
        if ( null != connection ) connection.close();
    }

}
else if ( instanceType.equals("master") )
{
    try
    {
        for ( String aUser : userL )
        {
            System.out.println("Setting Permissions for user: " +aUser +" as master");
            hasOutput = statement.execute("GRANT ALL ON *.* TO '" +aUser +"'@'localhost'");
            hasOutput = statement.execute("FLUSH PRIVILEGES");
        }
        statement.execute("stop slave");

        //Add semi sync replication
        statement.execute("SET GLOBAL rpl_semi_sync_master_enabled = 1;");
        statement.execute("SET GLOBAL rpl_semi_sync_slave_enabled = 0;");
    }


Reply to this email directly or view it on GitHub
#5 (comment).

@Kuntal-G
Copy link
Author

Yeah i'm using admin.rebalance for rebalancing the cluster.
Can you please tell me which other api should i use to bring back the ideal master from slave state?
Also this ideal master re-Election thru admin.rebalance is manual triggering, how can i automate this?

@kishoreg
Copy link
Owner

MasterSlaveRebalancer is where you need to make appropriate changes. (
https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MasterSlaveRebalancer.java
).

This code is invoked every time any node starts/stops in the cluster. See
how this is initialized in MySQLAgent (
https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/MySQLAgent.java
).
MasterSlaveRebalancer rebalancer = new MasterSlaveRebalancer(_context);
HelixCustomCodeRunner helixCustomCodeRunner = new
HelixCustomCodeRunner(helixManager, _zkAddress).invoke(rebalancer)
.on(ChangeType.CONFIG, ChangeType.LIVE_INSTANCE)
.usingLeaderStandbyModel("MasterSlave_rebalancer");
helixCustomCodeRunner.start();

Which node is MASTER/SLAVE is controlled via IdealState. There are two
places where this is set #1. Creation of IdealState via
ClusterAdmin.doInitialAssignment.
https://github.com/kishoreg/fullmatix/blob/master/mysql-cluster/src/main/java/org/apache/fullmatix/mysql/tools/ClusterAdmin.java

So if you have 6 nodes (N1...N6) and say replication factor as 2. It will
create a mapping as follows

P0
N1:MASTER
N2:SLAVE
P1
N3:MASTER
N4:SLAVE
P2
N5:MASTER
N6:SLAVE

This is done only one time when the cluster is initially created.

MasterSlaveRebalancer will always maintain this mapping but only updates
the STATE (MASTER/SLAVE) according to which node is up/down.

Lets say that N1 goes down and N2 is alive, MasterSlaveRebalancer will
change the mapping to
P0
N1:SLAVE
N2:MASTER

Now lets say the N1 comes back up, MasterSlaveRebalancer will not update
this mapping. This is the behavior you want to change. So, here is what you
want to do for that.

You want MasterSlaveRebalancer to be invoked when ever there is change in
EXTERNALVIEW. You can do this by simply changing
new HelixCustomCodeRunner(helixManager, _zkAddress).invoke(rebalancer)
.on(ChangeType.CONFIG, ChangeType.LIVE_INSTANCE)
TO
new HelixCustomCodeRunner(helixManager, _zkAddress).invoke(rebalancer)
.on(ChangeType.CONFIG, ChangeType.LIVE_INSTANCE, ChangeType.EXTERNAL_VIEW)
This will invoke whenever a node changes its state MASTER->SLAVE,
OFFLINE->SLAVE etc.

So, what you can do here is wait for N1: to become SLAVE for a partition.
After that, you can change the idealstate to
P0
N1:MASTER
N2:SLAVE

That should solve your problem

Hope this helps.

thanks,
Kishore G

On Thu, Apr 9, 2015 at 11:06 PM, Kuntal Ganguly notifications@github.com
wrote:

Yeah i'm using admin.rebalance for rebalancing the cluster.
Can you please tell me which other api should i use to bring back the
ideal master from slave state?
Also this ideal master re-Election thru admin.rebalance is manual
triggering, can i automate this?


Reply to this email directly or view it on GitHub
#5 (comment).

@Kuntal-G
Copy link
Author

Thank you very much Kishore. 👍

Its really helpful for us. Also this will help us automate the master re-Election process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants