-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correlating relay logs #32
Conversation
initial commit: using last known relay log coordinates to begin search for last relaylog entry
… destination servers
@@ -1261,7 +1261,7 @@ func FindLastPseudoGTIDEntry(instance *Instance, recordedInstanceRelayLogCoordin | |||
} | |||
|
|||
minBinlogCoordinates, minRelaylogCoordinates, err := GetHeuristiclyRecentCoordinatesForInstance(&instance.Key) | |||
if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) { | |||
if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && !*config.RuntimeCLIFlags.SkipBinlogSearch && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is actually unrelated to this PR. However the need for it came at a good time and it makes sense to include it here.
POC for AlignViaRelaylogCorrelation()
originally this PR was meant do go the whole way to relaylog-sync on failover. |
…hestrator into failover-correlate-relay-logs
merging for now, continued work in other PRs |
Following up on #1, in the objective of aligning replicas at failover.
At this time this will cooperate with
orchestrator-agent
at github/orchestrator-agent#13, though the same can be accomplished via remote SSH.When master fails,
orchestrator
is able to use GTID/Pseudo-GTID to match replicas. However there are a few constraints: what if the most up-to-date replica doesn't have binlogging orlog-slave-updates
? What if it uses aROW
based replication where all others useSTATEMENT
based? What if it's of a higher MySQL version?orchestrator
would have to lose it, even though it contained more data than others.This PR follows on the MHA approach, which requires either remote agents on MySQL boxes, or remote SSH. The intention is to correlate relay logs between failed replicas (done, optimized speed), then copy & apply such logs from the most up-to-date replica onto a (single) candidate replica.
Why single? Because the candidate replica would have
log-slave-updates
andorchestrator
would be able to point all other replicas under that one. It is yet to be seen whether comparing and copying relay logs onto a single replica, then applying Pseudo-GTID/GTID logic to heal the rest of the replicas, is faster or slower than comparing and copying relay logs from the most up-to-date replica onto all other replicas.Initial commits in this PR provide heuristic search for relay log coordinates & entries, which turn relay-log correlation into a subsecond operation, within
1
minute from failure.