correlating relay logs #32

shlomi-noach · 2017-01-02T07:47:48Z

Following up on #1, in the objective of aligning replicas at failover.

At this time this will cooperate with orchestrator-agent at github/orchestrator-agent#13, though the same can be accomplished via remote SSH.

When master fails, orchestrator is able to use GTID/Pseudo-GTID to match replicas. However there are a few constraints: what if the most up-to-date replica doesn't have binlogging or log-slave-updates? What if it uses a ROW based replication where all others use STATEMENT based? What if it's of a higher MySQL version? orchestrator would have to lose it, even though it contained more data than others.

This PR follows on the MHA approach, which requires either remote agents on MySQL boxes, or remote SSH. The intention is to correlate relay logs between failed replicas (done, optimized speed), then copy & apply such logs from the most up-to-date replica onto a (single) candidate replica.

Why single? Because the candidate replica would have log-slave-updates and orchestrator would be able to point all other replicas under that one. It is yet to be seen whether comparing and copying relay logs onto a single replica, then applying Pseudo-GTID/GTID logic to heal the rest of the replicas, is faster or slower than comparing and copying relay logs from the most up-to-date replica onto all other replicas.

Initial commits in this PR provide heuristic search for relay log coordinates & entries, which turn relay-log correlation into a subsecond operation, within 1 minute from failure.

initial commit: using last known relay log coordinates to begin search for last relaylog entry

… destination servers

shlomi-noach · 2017-01-02T08:29:30Z

go/inst/instance_topology.go

@@ -1261,7 +1261,7 @@ func FindLastPseudoGTIDEntry(instance *Instance, recordedInstanceRelayLogCoordin
 	}

 	minBinlogCoordinates, minRelaylogCoordinates, err := GetHeuristiclyRecentCoordinatesForInstance(&instance.Key)
-	if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) {
+	if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && !*config.RuntimeCLIFlags.SkipBinlogSearch && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) {


This change is actually unrelated to this PR. However the need for it came at a good time and it makes sense to include it here.

POC for AlignViaRelaylogCorrelation()

shlomi-noach · 2017-01-03T09:36:40Z

originally this PR was meant do go the whole way to relaylog-sync on failover.
However the functionality at this time is substantial enough, and can (and should) be tested. Once tested to work, I suggest merging this, and opening a new PR to take it from here.

…hestrator into failover-correlate-relay-logs

shlomi-noach · 2017-01-08T05:57:46Z

merging for now, continued work in other PRs

Shlomi Noach added 3 commits January 2, 2017 09:12

WIP: applying relay logs remotely

8a5acec

initial commit: using last known relay log coordinates to begin search for last relaylog entry

highly optimized search for relay log coordinates, both in origin and…

2eed0e2

… destination servers

skip binlog search for pseudo-gtid match

be13ca3

shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:04 Active

shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:04 Failure

FindLastPseudoGTIDEntry uses cli flags directly

a2d46fd

shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:07 Active

shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:07 Failure

removign excessive param

ebcd92d

shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:28 Active

shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:28 Failure

shlomi-noach commented Jan 2, 2017

View reviewed changes

optimized search for last pseudo-gtid relay log

7b49ba9

shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:46 Active

shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:46 Failure

correlate-relaylog-pos outputs instance's relaylog coordinates

597b26d

shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:56 Active

shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:56 Failure

Shlomi Noach added 3 commits January 2, 2017 10:58

fixed naming

734348b

support for HTTP POST for agents ;

0fc7ac5

POC for AlignViaRelaylogCorrelation()

fixed AlignViaRelaylogCorrelation call

997ffc4

shlomi-noach deployed to production/github-mysql6 January 2, 2017 12:53 Active

forcefully initializing http client

cde1715

shlomi-noach deployed to production/github-mysql6 January 2, 2017 12:59 Active

unmarshalling the JSON response

8dc8635

shlomi-noach deployed to production/github-mysql6 January 3, 2017 06:10 Active

Applying ChangeMasterTo on instance after applying relaylogs

00e47c7

shlomi-noach deployed to production/github-mysql6 January 3, 2017 06:36 Active

shlomi-noach deployed to production/github-mysqlutil January 3, 2017 08:16 Active

shlomi-noach changed the title ~~WIP: correlating relay logs on failover~~ correlating relay logs on failover Jan 3, 2017

shlomi-noach changed the title ~~correlating relay logs on failover~~ correlating relay logs Jan 3, 2017

Merge branch 'master' into failover-correlate-relay-logs

02a6aec

shlomi-noach deployed to production/github-mysql6 January 4, 2017 06:35 Active

Shlomi Noach added 2 commits January 5, 2017 08:28

clearing relaylog coordinates history on CHANGE MASTER TO

95362ff

Merge branch 'failover-correlate-relay-logs' of github.com:github/orc…

a149ad7

…hestrator into failover-correlate-relay-logs

shlomi-noach deployed to production/github-mysql6 January 5, 2017 06:30 Active

Auto-merged master into failover-correlate-relay-logs on deployment

73aa12e

shlomi-noach deployed to production/github-mysql6 January 5, 2017 09:41 Active

shlomi-noach merged commit 4b07d99 into master Jan 8, 2017

shlomi-noach deleted the failover-correlate-relay-logs branch January 8, 2017 05:57

This was referenced Jan 8, 2017

Failover correlate relay logs ssh #42

Merged

Failover remote ssh relaylogs #48

Merged

Binlog saving, parsing and transfer from failed server to new master #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correlating relay logs #32

correlating relay logs #32

shlomi-noach commented Jan 2, 2017

shlomi-noach Jan 2, 2017

shlomi-noach commented Jan 3, 2017

shlomi-noach commented Jan 8, 2017

correlating relay logs #32

correlating relay logs #32

Conversation

shlomi-noach commented Jan 2, 2017

shlomi-noach Jan 2, 2017

Choose a reason for hiding this comment

shlomi-noach commented Jan 3, 2017

shlomi-noach commented Jan 8, 2017