operational question: how does one decide whether a _gho or _ghc are abandoned? #99

shlomi-noach · 2016-07-22T17:47:54Z

With triggered tools, the absence of triggers is an indication that the operation is dead. How does one recognize the triggerless gh-ost operation is dead?

Once closed, this question should move into the documentation.

The text was updated successfully, but these errors were encountered:

shlomi-noach · 2016-07-22T17:51:00Z

The quick answer is to select max(last_update) from _whatever_ghc. An active gh-ost operation will routinely update that table (within the second) even while throttling. Thus, if max(last_update) is, say, a minute ago (are you checking on master? That's definite. Are you checking on replica? Make sure lag is OK) indicates the migration is dead.

jonahberquist · 2016-07-22T18:01:29Z

I think abandoned and dead are two different states. With trigger-based migration tools, if the triggers are gone, we have to give up and start over, but that's not the case here. If the gh-ost process dies, we could theoretically resume the operation, as long as we know where to resume in the table copy process, know where to resume in the binary log for incoming changes, and have the binary logs to resume from.

shlomi-noach · 2016-07-24T10:57:35Z

The official query to see if a migration is running is:

select last_update from _tbl_ghc where hint='heartbeat';

(to get last known activity), or

select last_update > now() - interval 1 minute as is_alive from _tbl_ghc where hint='heartbeat';

for a heuristic "if it hasn't been updated in the last minute it must be dead"
cc @tomkrouper

shlomi-noach · 2016-07-24T10:59:34Z

If the gh-ost process dies, we could theoretically resume the operation, as long as we know where to resume in the table copy process, know where to resume in the binary log for incoming changes, and have the binary logs to resume from.

omg.

So, I've been thinking about this: we only need to know where to resume in the table-copy process, actually. It turns out, and I need to put this in detailed writing, that replaying the RBR is idempotent!! Which means we can just replay it from some point in the past (but of course we must never skip entries).

But, I would suggest this is still way in the future.

pbitty · 2016-08-19T21:48:57Z

The above sounds great. Having the ability to resume would be a great feature.

Now that binlog events are applied in a transaction, it should be possible to confidently store and read the last-processed binlog position. The same could be done with the table copy process, if the statements are also wrapped in a transaction.

shlomi-noach added question documentation labels Jul 22, 2016

jonahberquist mentioned this issue Aug 27, 2016

Allow a migration to resurrect under a new gh-ost process #205

Open

s4mur4i mentioned this issue Sep 25, 2020

[Question] Drop of _ghc table after succesful completion #886

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operational question: how does one decide whether a _gho or _ghc are abandoned? #99

operational question: how does one decide whether a _gho or _ghc are abandoned? #99

shlomi-noach commented Jul 22, 2016

shlomi-noach commented Jul 22, 2016

jonahberquist commented Jul 22, 2016

shlomi-noach commented Jul 24, 2016 •

edited

Loading

shlomi-noach commented Jul 24, 2016

pbitty commented Aug 19, 2016

operational question: how does one decide whether a _gho or _ghc are abandoned? #99

operational question: how does one decide whether a _gho or _ghc are abandoned? #99

Comments

shlomi-noach commented Jul 22, 2016

shlomi-noach commented Jul 22, 2016

jonahberquist commented Jul 22, 2016

shlomi-noach commented Jul 24, 2016 • edited Loading

shlomi-noach commented Jul 24, 2016

pbitty commented Aug 19, 2016

shlomi-noach commented Jul 24, 2016 •

edited

Loading