Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Who is the new intermediate master? #46

Open
theTibi opened this issue Jan 10, 2017 · 32 comments
Open

Who is the new intermediate master? #46

theTibi opened this issue Jan 10, 2017 · 32 comments
Assignees

Comments

@theTibi
Copy link

theTibi commented Jan 10, 2017

Hi,

If I have a topology like this (just an example):

               -> rep3 
rep1 --> rep2-|
               -> rep4

Rep2 is an intermediate master. If rep2 dies Orchestrator processes a DeadIntermediateMaster failover and reorganises the topology like (just an example):

rep1 --> rep4 --> rep3

So rep4 is going to be an intermediate master now. But based on the PostFailoverProcesses placeholders I can not decide who is the new intermediate master.

It has the following placeholders:
{failureType}, {failureDescription}, {failedHost}, {failureCluster}, {failureClusterAlias}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countSlaves}, {slaveHosts}, {isDowntimed}, {isSuccessful}, {lostSlaves}

I am trying to call an external script when an intermediate master dies but the script should/has to know who is the new intermediate master after failover.

Is there any solution/ideas for this?

Thanks.

@shlomi-noach
Copy link
Collaborator

The new intermediate master is {successorHost}, and more specifically it's {successorHost}:{successorPort}

BTW you can also get this value from the environment variable ORC_ORCHESTRATOR_HOST sent to your script. If you're using shell script, just read $ORC_ORCHESTRATOR_HOST.

@theTibi
Copy link
Author

theTibi commented Jan 12, 2017

@shlomi-noach : That was my first thought also. So in this example the {successorHost} should be rep4 if I understand correctly, because rep4 was promoted.

But in my test the {successorHost} was rep1, here are some log entries:

...
2017-01-10 16:37:27 INFO ChangeMasterTo: Changed master on rep4:3306 to: rep1:3306, mysql-bin.000001:62541370. GTID: false
2017-01-10 16:37:27 INFO Started slave on rep4:3306
...
2017-01-10 16:37:28 DEBUG execCmd: echo 'Recovered from DeadIntermediateMaster on rep1:3306. Failed: rep2:3306; Successor: rep1:3306' >> /tmp/recovery.log
...

So it promoted rep4 but the {successorHost} is rep1.

Do I miss something here?

@shlomi-noach
Copy link
Collaborator

Oh OK. So here's the complication:

(though, first of all, in your case it should have been rep4, not arguing)

The result of an intermediate master failover may result with multiple successors. Perhaps the promoted replica cannot take over all of its siblings, and some will move elsewhere. There can be up to three different successors at the same time: an uncle, one of the orphaned siblings (as in your case), and the grandfather (master in your case).

The logic goes through all three options and apparently settled the successor to be the master, even though everything was satisfied by rep4.
So this is a bug to be fixed.

@shlomi-noach shlomi-noach self-assigned this Jan 12, 2017
@theTibi
Copy link
Author

theTibi commented Jan 12, 2017

Thanks @shlomi-noach I am going to run some tests again make sure I did everything right , but I already ran it many times and the {successorHost} was always rep1.

@shlomi-noach
Copy link
Collaborator

I'm now looking into this

@shlomi-noach
Copy link
Collaborator

I'm actually unable to get the same results. When I failover an intermediate master X, and when all of its replicas are successfully relocated to its sibling Y, I always get Y as {successorHost}.

@theTibi
Copy link
Author

theTibi commented Jan 24, 2017

I just redid my test again and the {successorHost} was wrong again, but maybe I misunderstood something, so here is a test topology.
screen shot 2017-01-24 at 00 40 45

If rep3 goes down if I understand correctly rep4 should be the {successorHost}.

But here is the log:

2017-01-24 07:41:50 DEBUG auditType:begin-maintenance instance:rep4:3306 cluster:rep1:3306 message:maintenanceToken: 427, owner: orc, reason: move below rep2:3306
2017-01-24 07:41:50 INFO Stopped slave on rep4:3306, Self:mysql-bin.000002:151, Exec:mysql-bin.000002:151
2017-01-24 07:41:50 DEBUG ChangeMasterTo: will attempt changing master on rep4:3306 to rep2:3306, mysql-bin.000002:151
2017-01-24 07:41:50 INFO ChangeMasterTo: Changed master on rep4:3306 to: rep2:3306, mysql-bin.000002:151. GTID: true
2017-01-24 07:41:50 INFO Started slave on rep4:3306
2017-01-24 07:41:51 DEBUG outdated keys: []
2017-01-24 07:41:51 DEBUG auditType:move-below-gtid instance:rep4:3306 cluster:rep1:3306 message:moved rep4:3306 below rep2:3306
2017-01-24 07:41:51 DEBUG auditType:end-maintenance instance:rep4:3306 cluster:rep1:3306 message:maintenanceToken: 427
2017-01-24 07:41:51 DEBUG auditType:move-slaves-gtid instance:rep2:3306 cluster:rep1:3306 message:moved 1/1 slaves below rep2:3306 via GTID
2017-01-24 07:41:51 DEBUG auditType:relocate-slaves instance:rep3:3306 cluster:rep1:3306 message:relocated 1 slaves of rep3:3306 below rep2:3306
2017-01-24 07:41:51 DEBUG auditType:recover-dead-intermediate-master instance:rep3:3306 cluster:rep1:3306 message:Relocated slaves under: rep2:3306 0 errors: []
2017-01-24 07:41:51 DEBUG execCmd: echo 'Recovered from DeadIntermediateMasterWithSingleSlaveFailingToConnect on rep1:3306. Failed: rep3:3306; Successor: rep2:3306' >> /tmp/recovery.log
2017-01-24 07:41:51 DEBUG [/tmp/orchestrator-process-cmd-273667872]
2017-01-24 07:41:51 INFO Executed PostIntermediateMasterFailoverProcesses command: echo 'Recovered from DeadIntermediateMasterWithSingleSlaveFailingToConnect on rep1:3306. Failed: rep3:3306; Successor: rep2:3306' >> /tmp/recovery.log
2017-01-24 07:41:51 DEBUG execCmd: echo '(for all types) Recovered from DeadIntermediateMasterWithSingleSlaveFailingToConnect on rep1:3306. Failed: rep3:3306; Successor: rep2:3306' >> /tmp/recovery.log
2017-01-24 07:41:51 DEBUG [/tmp/orchestrator-process-cmd-545199103]
2017-01-24 07:41:51 INFO Executed PostFailoverProcesses command: echo '(for all types) Recovered from DeadIntermediateMasterWithSingleSlaveFailingToConnect on rep1:3306. Failed: rep3:3306; Successor: rep2:3306' >> /tmp/recovery.log

It says the successor is rep2.

@theTibi
Copy link
Author

theTibi commented Jan 24, 2017

With the following topology the {successorHost} was rep1 instead of rep4 who came be the new intermediate master when I stopped rep2.

screen shot 2017-01-24 at 01 23 55

@shlomi-noach
Copy link
Collaborator

@theTibi I see. Now here's the thing: in your case it can be argued who the successor is:

  • on one hand, as you suggest, rep4 takes over rep3, and becomes a new intermediate master
  • but then again, rep1 takes over rep4 and is the one to save it from being disconnected. This is the answer you get from orchestrator now.

So there's actually two servers which have a role in the recovery process. Which of them is the successor?

In your particular use case you'd like to get the new intermediate master, because in your setup the intermediate master has a special importance, being writable. However for someone else it might make more sense to know "who ultimately took charge".

This can be argued and I'm open to hear of a strong argument to one over the other.

@theTibi
Copy link
Author

theTibi commented Jan 24, 2017

@shlomi-noach Both of the them have pros and cons. But in my opinion if Orchestrator handles intermediate masters and intermediate master failovers the {successorHost} should be the host who can take over all the roles of the original (intermediate) master.

Example in this case if an intermediate master is writable and can take writes. After a failover the {successorHost} should have a server who can take over all the roles. I mean it has all the schema and data what the original intermediate master server had. Otherwise this is going to cause sirius problems.

But we might should not choose between them. Maybe a new placeholder would be the best solution and everybody can decide which {successorHost} would be the best fit for their topology/application.

@shlomi-noach
Copy link
Collaborator

An interesting scenario is that of a split. Say the new intermediate master could only take over a few of the boxes, and the rest were salvaged by the master. Then the problem is even stronger.

Let me look at the code and see what can make sense to change.

@shlomi-noach
Copy link
Collaborator

Addressed by #61

@shlomi-noach
Copy link
Collaborator

Gonna run a few experiments to see that #61 doesn't return the wrong thing, and then I'm happy.

@theTibi
Copy link
Author

theTibi commented Jan 30, 2017

@shlomi-noach Thanks, I am also going to test this soon,I will let you know how did it go.

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jan 30, 2017

@theTibi if you're able to compile and test that would be awesome

@theTibi
Copy link
Author

theTibi commented Jan 30, 2017

@shlomi-noach Yes, going to do it today.

@theTibi
Copy link
Author

theTibi commented Jan 30, 2017

@shlomi-noach : So I did some tests but I did not get what I expected. Here are two examples:

screen shot 2017-01-30 at 23 27 33

In this case I would like to see {successorHost} rep4 or rep5 because they have all the data as the intermediate master rep3. But:

screen shot 2017-01-30 at 23 29 37

As we can see rep2 was the {successorHost}.

2017-01-30 23:00:50 DEBUG auditType:move-replicas-gtid instance:rep2:3306 cluster:rep1:3306 message:moved 2/2 replicas below rep2:3306 via GTID
2017-01-30 23:00:50 DEBUG auditType:relocate-replicas instance:rep3:3306 cluster:rep1:3306 message:relocated 2 replicas of rep3:3306 below rep2:3306
2017-01-30 23:00:50 DEBUG auditType:recover-dead-intermediate-master instance:rep3:3306 cluster:rep1:3306 message:Relocated 2 replicas under candidate sibling: rep2:3306; 0 errors: []
2017-01-30 23:00:50 INFO CommandRun(echo 'Recovered from DeadIntermediateMaster on rep1:3306. Failed: rep3:3306; Successor: rep2:3306' >> /tmp/recovery.log,[])

Another test:

screen shot 2017-01-30 at 23 30 25

Again I was expecting rep4 or rep5 as a {successorHost}, but:
screen shot 2017-01-30 at 23 32 52

2017-01-30 23:04:21 DEBUG auditType:move-replicas-gtid instance:rep3:3306 cluster:rep1:3306 message:moved 2/2 replicas below rep3:3306 via GTID
2017-01-30 23:04:21 DEBUG auditType:relocate-replicas instance:rep2:3306 cluster:rep1:3306 message:relocated 2 replicas of rep2:3306 below rep3:3306
2017-01-30 23:04:21 DEBUG auditType:recover-dead-intermediate-master instance:rep2:3306 cluster:rep1:3306 message:Relocated replicas under: rep3:3306 0 errors: []
2017-01-30 23:04:21 INFO CommandRun(echo 'Recovered from DeadIntermediateMaster on rep1:3306. Failed: rep2:3306; Successor: rep3:3306' >> /tmp/recovery.log,[])
2017-01-30 23:04:21 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-060282838
2017-01-30 23:04:21 INFO CommandRun successful. exit status 0
2017-01-30 23:04:21 INFO Executed PostIntermediateMasterFailoverProcesses command: echo 'Recovered from DeadIntermediateMaster on rep1:3306. Failed: rep2:3306; Successor: rep3:3306' >> /tmp/recovery.log
2017-01-30 23:04:21 INFO CommandRun(echo '(for all types) Recovered from DeadIntermediateMaster on rep1:3306. Failed: rep2:3306; Successor: rep3:3306' >> /tmp/recovery.log,[])

So I can see some changes in the promotion logic but I think it is still missing the point and Orchestrator is promoting new intermediate master which might does not have all the data.

@theTibi
Copy link
Author

theTibi commented Jan 30, 2017

@shlomi-noach :

- relocatedReplicas, successorInstance, err, errs = inst.RelocateReplicas(failedInstanceKey, &analysisEntry.AnalyzedInstanceMasterKey, "")
+ relocatedReplicas, masterInstance, err, errs := inst.RelocateReplicas(failedInstanceKey, &analysisEntry.AnalyzedInstanceMasterKey, "")

So based on the code the slaves are going to be relocated now under the masterInstance. Which is good for the new intermediate master but the other slaves (which were replicating from the intermediate master) should be relocated under the new intermediate master not under the masterInstance.

This is how should look like the second example after failover:
screen shot 2017-01-30 at 23 52 26

rep4 is replicating from rep3 but rep5 is replicating from rep4 not from rep3.

@shlomi-noach
Copy link
Collaborator

@theTibi I'm surprised and confused that you find the above logic to be incorrect, and I think it all comes down to the "have all the data" issue. To recap your example, you have:

rep1
+ rep2
+ rep3
  + rep4
  + rep5

you kill rep3 and orchestrator recovers into:

rep1
+ rep2
  + rep4
  + rep5
+ rep3 (dead)

and announces rep2 as the successor.

I find this to be perfect behavior: rep2 took over rep4 and rep5. But you expect rep4 or rep5 to be announced as successor. Why?

What makes rep4 or rep5 "have more data"? How would you communicate that to orchestrator? It sounds to me like you have a very particular setup that I just can't anticipate.

Imagine in the above I'd tell you rep5 and rep2 have more data than rep4 and rep3. What would be the failover logic now? How can we even decide what's "more data"?

You have writable intermediate masters. What if you'd have different kinds of writable intermediate masters? Imagine:

rep1
+ rep2
+ rep3
  + rep4
  + rep5
+ rep6
  + rep7
  + rep8

How would you react to rep4 having special data A and rep7 having special data B?
what if the same applies to rep3 and rep6? No kind of failover would make sense now.

Your setup is what it is, but I just wanted to illustrate why I don't see how orchestrator should necessarily prefer one method over the other.

In your expectation to get

rep1
+ rep3
  + rep2 (dead)
  + rep4
    +rep5

there are hidden assumptions. First and foremost, that rep4 and rep5 have different kind of data than the master and every other server. orchestrator doesn't know that in advance. Then again, that both will always have binary logs and log_slave_updates. orchestrator does know that and acts accordingly, but do you always promise that? Then, that both will always be able to replicate from one another. What happens if you upgrade rep4 to 5.7 and at the time of IM crash it turns out to be the most up-to-date? rep5 will not be able to replicate from rep4. What then? You may say this isn't the case right now, but orchestrator is made to be aware of such limitations, and would expect to be able to move rep5 under rep3.

If orchestrator is to support such topologies, where replicas have different data than the master, that would require some rewrites. The easiest path I can think of is by the user tagging servers with type. So, by default, all servers will be type A. But if a bunch of servers have different data, you would tag them as type B. But then you'd also have to specify the rules. What if there;s also some type C ? Can type B replicate from type C? Only you know for sure, and you'd have to somehow communicate that to orchestrator. I'm not sure how this would be done.

I hope I managed to clarify the complexity of your setup and of the specific failover expectation you have for it. I'm happy to continue the discussion and to perhaps discuss simple and feasible solutions to all these questions.

@shlomi-noach
Copy link
Collaborator

cc @jfg956 and @sjmudd who may have a similar use case: a writable intermediate master, where a failover should only run within the scope of replicas of the failed IM.

@theTibi
Copy link
Author

theTibi commented Jan 31, 2017

@shlomi-noach Thank you for your answer.
This is how I can imagine this:

rep1
+ rep2
+ rep3
  + rep4
  + rep5

So rep3 is an intermediate master which may can have different data. If rep3 dies I would hope the following topology:

rep1
+ rep2
+ rep3 (died)
+ rep4
  + rep5

rep4 moved under rep1 because it rep3 was replicating from rep1 and rep5 should replicating from rep5 because rep4 is going to be the new intermediate master which has all the data like rep3.

Another example:

rep1
+ rep2
+ rep3
  + rep4
  + rep5
+ rep6
  + rep7
  + rep8

If rep4 has dataB but no other server has it because there is no replication from it if rep4 dies the data is not available anymore and Orchestrator does not have to do anything.

If rep6 dies:

rep1
+ rep2
+ rep3
  + rep4
  + rep5
+ rep6 (died)
+ rep7
  + rep8

Same like in the first case, rep7 replicating from rep1 and rep8 replicating from rep7.

Other example:

rep1
+ rep2
+ rep3
  + rep4
  + rep5
+ rep6
  + rep7
    + rep9
    + rep10
  + rep8

If rep7 dies:

rep1
+ rep2
+ rep3
  + rep4
  + rep5
+ rep6
  + rep7 (died)
  + rep9
    + rep10
  + rep8

rep9 is replicating from rep6 and rep10 is replicating from rep9.

So in my opinion if there are intermediate masters and they have slaves, when the intermediate master fails Orchestrator should promote one server as a new intermediate master and relocate the other slaves under that new intermediate master.

Just like one you have one master and many slave, if master dies one slave will be promoted and other slaves relocate under the new master.

But of course it has some limitation example rep7 has a different schema than rep6
and rep8 and after rep6 died Orchestrator is promoting rep7 and now rep8 can't replicate from rep7. In that case it would fail but there is option called RecoveryIgnoreHostnameFilters (I did not use it before) but if I understand right if I add rep7 here Orchestrator will ignore that hostname and won't promote it in case of failover.

So I understand Orchestrator can not handle all the different topologies and does not know which server has which data etc.. but I think it would already help a lot if Orchestrator will relocate the slaves under the new intermediate master in case of failover.

Opinions?

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jan 31, 2017

if there are intermediate masters and they have slaves, when the intermediate master fails Orchestrator should promote one server as a new intermediate master and relocate the other slaves under that new intermediate master.

You must be aware that this is not always possible. Consider:

rep1
+ rep2
+ rep3
  + rep4
  + rep5

rep3 dies. You want this:

rep1
+ rep2
+ rep3 (died)
+ rep4
  + rep5

However can you ensure that both rep4 and rep5 have log_slave_updates? What if not?
What if you've just upgraded rep4 to 5.7, and it turns out to be more up-to-date than rep5 at time of crash? I just want you to be aware of a scenarios where one of the replicas cannot take charge of all its siblings.

@theTibi
Copy link
Author

theTibi commented Jan 31, 2017

@shlomi-noach I understand. Orchestrator is not a magic wand which solves all the problems. Of course if you would like to use topologies like this you should know the requirements and limitations.

Example you have to make sure log_slave_updates is enabled.

If you have a 5.7 node this applies for any topology like one master and many slaves and one of them is 5.7 there is a risk Orchestrator is going to promote it in case of failover, if I am right. So if you are testing/upgrading 5.7 you should put that node in down time make sure Orchestrator wont promote it.

@shlomi-noach
Copy link
Collaborator

OK, I'm getting a clearer picture of how this would be implemented in code.
The user, however, would have the burden of appropriately tagging those intermediate masters (and subsequently, their nested replicas) which have "special data", that is imcompatible with the rest of the topology.
In such sub-topologies the logic would be to first attempt to promote a replica from within the subtopology, and only then try to attach it outside. Any replicas "lost in action" would connect to the master. The replicas that would overtake its siblings would be the successor host.

@theTibi
Copy link
Author

theTibi commented Jan 31, 2017

@shlomi-noach In you previous pot you mentioned this user tagging servers with type.
I was thinking about this , it could be a bit similar like how Orchestrator handles the different DCs. With a regex we could "tag" the sub-topologies and Orchestrator can try to promote a server from that sub-topologies and relocate the replicas under that server.

@shlomi-noach
Copy link
Collaborator

@theTibi I'm moving away from hostname regexes and onto something more formal.

@sjmudd
Copy link
Collaborator

sjmudd commented Feb 13, 2017

I'm in the middle of a split and looking at this now, partly to ensure an intermediate master with filters won't ever get promoted and that normal slaves won't get put under these filtered intermediate masters. I think but need to check that current logic prevents this. As Shlomi says things get get quite hairy and my topology is already 6 layers deep.
I also agree that host name based management starts to get quite hairy as the number of boxes and chains grow and have also been bitten by incomplete regexps allowing recovery of boxes where I wanted to avoid this. A better way to handle this would certainly be good, but it's a tricky problem to solve generically.

So the trick here seems to be to make orchestrator aware of these barriers /borders and to ensure that failover never moves boxes outside of their zones.

@theTibi
Copy link
Author

theTibi commented Mar 13, 2017

Hi, I was just wondering are there any new thoughts on this topic?

@shlomi-noach
Copy link
Collaborator

Hi @theTibi I haven't made progress on this unfortunately, was not focusing on this issue.

@shlomi-noach
Copy link
Collaborator

@theTibi I'm looking into this now.

@shlomi-noach
Copy link
Collaborator

So the very most basic question is:

We make the promotion, intermediate master takes over as we would expect etc.

How do we connect the new intermediate master with the topology? The basic assumption is that the intermediate master took writes, hence it was potentially taking writes at time of failure, hence its binary logs, and those of its replicas, are different than those of the rest of the topology.
Pseudo-GTID will fail to find a match.

@theTibi
Copy link
Author

theTibi commented May 4, 2017

Here is the time to bring up this old ticket again.

I think GTID won't fail, let's see we have the following replicaset:

+ rep2
+ rep3
  + rep4
  + rep5

Rep3 dies:

rep1
+ rep2
+ rep3 (died)
+ rep4
  + rep5

If we are using GTID I think rep4 can replicate from rep1 and we can move rep5 also under rep4. But I have to and I am going to test this and give you feedbacks soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants