Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--test-on-replica should not write to binlogs #646

Open
Jericon opened this issue Sep 27, 2018 · 6 comments
Open

--test-on-replica should not write to binlogs #646

Jericon opened this issue Sep 27, 2018 · 6 comments

Comments

@Jericon
Copy link

Jericon commented Sep 27, 2018

In our environment, we do not use GTID's and we run most clusters in an Active/Passive Master/Master configuration. The current behavior of gh-ost's test on replica function is that it assumes the host is a leaf node with no replicas and still writes the changes to the binlog.

It would be beneficial if there was an additional flag to not write to the binlog.

The specific situation I am in is that I am compressing some large tables. Based on small tests and estimates, we should have enough space to complete the compression of one table without running out of disk space. I had intended on running the migration on the passive master, which is taking no traffic and could run out of disk space without any negative issues. With the migration being written to the binlogs, though, this would also cause the active master to run out of space as well.

@ggunson
Copy link
Contributor

ggunson commented Sep 27, 2018

note: @Jericon and I discussed this earlier

From a GTID standpoint, even if --test-on-replica was run on a leaf node, but then it was later promoted to master, its gtid_executed/gtid_purged would include the events from those local writes to tablename_gho but its new replicas wouldn't have them (requiring some wrangling of gtid_purged to fix replication).

I can see the usefulness of being able to test on a replica both writing to the binary logs and avoiding them, but we've listed several scenarios where it would be better to have the option not to.

@shlomi-noach
Copy link
Contributor

Related: #146, #149, #254

@zmoazeni
Copy link
Contributor

zmoazeni commented Jun 10, 2019

@shlomi-noach I read #254 (as well as the rest of the issues) and I wanted to verify. The approach we should take here is to reset the GTID purged/executed on the replica to effectively doctor the set of GTIDs applied as if the test never happened? (Only when the option is passed).

For what it's worth, I ran into this same issue in production a year+ ago by promoting a replica and the other replicas were blocked from starting replication. We realized after hours of digging that it was from our replica-only tests long ago.

@shlomi-noach
Copy link
Contributor

@zmoazeni yes, correct.
It's noteworthy that since then I've put a lot of GTID logic into orchestrator, and have learned quite a few things myself. So yes, you will probably want to run a GTID/reset master sequence (which is hard to like), or alternatively (which I don't like even more) apply those errant transactions on the master. I really can't recommend the latter for such massive operations as a migration due to the amount of errant GTID transactions.

@zmoazeni
Copy link
Contributor

zmoazeni commented Jun 10, 2019

Yeah we ended up doing the latter with a one-off script. But it did make us nervous.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jun 10, 2019

the latter (apply errant GTIDs on the master) is actually safer. However:

  • it is slower, because you may need to apply millions of transactions
  • It bloats the gtid_executed set on all servers.

@zmoazeni zmoazeni mentioned this issue Jun 22, 2019
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants