-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing mysqlbinlog output #3
Comments
Related: #4 |
I'm displeased at this time. In testing all seems to go well (See #4), but on production, parsing a true binary log file yields with different (smaller) number of statements as compared to raw binlog parsing. |
Displease turns to satisfaction: shlomi-noach@my-test-machine~$ /tmp/gh-osc --debug --mysql-basedir=/usr --mysql-datadir=/home/shlomi-noach/tmp/ --binlog-file=mysql-bin.012323 --internal-experiment=true 2> /dev/null > tmp/statements-gh4.sql
shlomi-noach@my-test-machine~$ /usr/bin/mysqlbinlog --verbose --base64-output=DECODE-ROWS --start-position=4 /home/shlomi-noach/tmp/mysql-bin.012323 | egrep "### (INSERT|UPDATE|DELETE)" | sed -e "s/### //" -e "s/INTO //" -e "s/FROM //" > tmp/statements-mbl4.sql
shlomi-noach@my-test-machine~$ wc -l tmp/statements-gh4.sql
2214601 tmp/statements-gh4.sql
shlomi-noach@my-test-machine~$ wc -l tmp/statements-mbl4.sql
2214601 tmp/statements-mbl4.sql
shlomi-noach@my-test-machine~$ diff tmp/statements-mbl4.sql tmp/statements-gh4.sql
shlomi-noach@my-test-machine~$ md5sum tmp/*4.sql
3ca149d5f53bbb83eb9467ee89adb0fe tmp/statements-gh4.sql
3ca149d5f53bbb83eb9467ee89adb0fe tmp/statements-mbl4.sql So for now this is something we can work with, on production data |
LOL string
|
Uggggggh! A
This is no fun. This means when we wish to apply the change on the ghost table, we need to verify whether there's been a change to But if there is a change to cc @ggunson |
Huh, that's weird. What statement did you run that ended up with this in the binary log? I'd love to see the SQL somewhere to reproduce the test cases. |
That sounds really, really wrong. MySQL RBR has gone rogue and is thinning the row herd |
@jonahberquist see https://github.com/github/gh-osc/pull/4/files#diff-0d909eaf5269b40f9d9afe5ef9fae52cR2 The |
Well, that is what it's supposed to be doing, yes? I created a test
|
Yeah, I agree with @ggunson. This makes a lot more sense seeing the unique index on column 2. The row with the unique key getting updated and the row with the PK getting deleted is a bit weird to me -- we could have deleted row 4 and updated row 2 and ended up with the same table. I don't really see this as a problem. It's just a bit weird. Given that it's a correct set of action, if we encountered this, it's basically the same as if we encountered two distinct things:
and
And updates and deletes are both things we'd need to be able to handle anyway. The delete is just a delete. The update is just an update. So, not a problem. Oh. 😯 Wait. 😰 The problem we're talking about now isn't actually in our parsing of the logs. It's in the fact that we'll be issuing REPLACE statements, and they might modify rows differently than just a DELETE and INSERT for the PK. When we were spec'ing this out in person a few weeks ago, I don't remember talking about that aspect of REPLACE behavior. Is there a time where our ghost table could have data that would lead to a REPLACE replacing more than the single row we want to if that table has a unique secondary key? Since we'll be applying all of the changes from the binary log in order, I don't think we could, unless the table had already had data that would violate that secondary unique constraint, but this is definitely a corner case that we should make sure we understand. |
@ggunson the end result -- yes, that's what is supposed to happen. I was more ranting on the way it got there. It could have |
One implication of parsing the RBR is that we are unable to handle an @jonahberquist I don't think there's going to be an integrity problem. It's more that given an event such as:
We would need to be more specific about how to apply this change. It's no longer @ggunson envisioned that we would only need to parse the |
I was mostly just pointing out the reason for the behaviour, which wasn't obvious until the secondary unique key was known. I'm actually a bit weirded out in terms of the MySQL documentation, since it specifically states that a
However, in my very lame testing with REPLACE, here's what I'm seeing in the row binary logs:
Later on the docs say that, well, the storage engine could be wack, but like, who cares, it's all the same:
Which still doesn't point out all the differences we're seeing. Oddly, my quick test of MyISAM shows it gives more information on the PK plus unique collision situation than the InnoDB version of RBR does (just for testing, I know we don't care about MyISAM):
|
Yeah, this makes sense to me now. I still don't like it, but at least it makes sense. Depending on how we want our binlogs, specifically with regard to binlog_row_image, I think we have a few options for handling this. If we have With
|
Well, we control the row image. We can force configuring it to |
I'm closing this for now, having nice success with #5 |
* fix: close streamer when tearing down * fix: set CutOverCompleteFlag to 1 to stop binlog syncer * add debug info * Revert "add debug info" This reverts commit 149f174.
As per #1, I can see or think of the following:
# at 123456
and the following entry has anend_log_pos
### Row event for unknown table ... at 123456
end_log_pos
from previous entry to validate that the current entry has same number. It's just a self testmysqlbinlog
less ideal than the already less-than-ideal state. We'll see.SHOW MASTER STATUS
orSHOW BINARY LOGS
, as the output of these commands does indicate a true position as which a statement/entry is complete.The text was updated successfully, but these errors were encountered: