Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correlating relay logs #32

Merged
merged 17 commits into from
Jan 8, 2017
Merged

Conversation

shlomi-noach
Copy link
Collaborator

Following up on #1, in the objective of aligning replicas at failover.

At this time this will cooperate with orchestrator-agent at github/orchestrator-agent#13, though the same can be accomplished via remote SSH.

When master fails, orchestrator is able to use GTID/Pseudo-GTID to match replicas. However there are a few constraints: what if the most up-to-date replica doesn't have binlogging or log-slave-updates? What if it uses a ROW based replication where all others use STATEMENT based? What if it's of a higher MySQL version? orchestrator would have to lose it, even though it contained more data than others.

This PR follows on the MHA approach, which requires either remote agents on MySQL boxes, or remote SSH. The intention is to correlate relay logs between failed replicas (done, optimized speed), then copy & apply such logs from the most up-to-date replica onto a (single) candidate replica.

Why single? Because the candidate replica would have log-slave-updates and orchestrator would be able to point all other replicas under that one. It is yet to be seen whether comparing and copying relay logs onto a single replica, then applying Pseudo-GTID/GTID logic to heal the rest of the replicas, is faster or slower than comparing and copying relay logs from the most up-to-date replica onto all other replicas.

Initial commits in this PR provide heuristic search for relay log coordinates & entries, which turn relay-log correlation into a subsecond operation, within 1 minute from failure.

Shlomi Noach added 3 commits January 2, 2017 09:12
initial commit: using last known relay log coordinates to begin search for last relaylog entry
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:04 Active
@shlomi-noach shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:04 Failure
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:07 Active
@shlomi-noach shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:07 Failure
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:28 Active
@shlomi-noach shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:28 Failure
@@ -1261,7 +1261,7 @@ func FindLastPseudoGTIDEntry(instance *Instance, recordedInstanceRelayLogCoordin
}

minBinlogCoordinates, minRelaylogCoordinates, err := GetHeuristiclyRecentCoordinatesForInstance(&instance.Key)
if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) {
if instance.LogBinEnabled && instance.LogSlaveUpdatesEnabled && !*config.RuntimeCLIFlags.SkipBinlogSearch && (expectedBinlogFormat == nil || instance.Binlog_format == *expectedBinlogFormat) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is actually unrelated to this PR. However the need for it came at a good time and it makes sense to include it here.

@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:46 Active
@shlomi-noach shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:46 Failure
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 08:56 Active
@shlomi-noach shlomi-noach had a problem deploying to production/github-mysqlutil January 2, 2017 08:56 Failure
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 12:53 Active
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 2, 2017 12:59 Active
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 3, 2017 06:10 Active
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 3, 2017 06:36 Active
@shlomi-noach shlomi-noach deployed to production/github-mysqlutil January 3, 2017 08:16 Active
@shlomi-noach shlomi-noach changed the title WIP: correlating relay logs on failover correlating relay logs on failover Jan 3, 2017
@shlomi-noach shlomi-noach changed the title correlating relay logs on failover correlating relay logs Jan 3, 2017
@shlomi-noach
Copy link
Collaborator Author

originally this PR was meant do go the whole way to relaylog-sync on failover.
However the functionality at this time is substantial enough, and can (and should) be tested. Once tested to work, I suggest merging this, and opening a new PR to take it from here.

@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 4, 2017 06:35 Active
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 5, 2017 06:30 Active
@shlomi-noach shlomi-noach deployed to production/github-mysql6 January 5, 2017 09:41 Active
@shlomi-noach
Copy link
Collaborator Author

merging for now, continued work in other PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant