Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum mismatches when restoring between versions #860

Closed
stephanie-dba opened this issue Oct 19, 2022 · 3 comments · Fixed by #871 or #877
Closed

Checksum mismatches when restoring between versions #860

stephanie-dba opened this issue Oct 19, 2022 · 3 comments · Fixed by #871 or #877
Labels

Comments

@stephanie-dba
Copy link

Describe the Issue
I am using mydumper/loader to upgrade from mySQL 5.7 to 8.0 (our cloud hosting provider does not offer in-place upgrade). When loading into mySQL 8.0 using checksums, I receive warnings like this:
[WARNING] - Structure checksum mismatch found for database.table. Got 'a5a324c6', expecting '2C313EF' in file: database.table-schema-checksum.

Further investigation uncovered that this is because of the difference in the way our integer columns are defined. In the dump from 5.7, they are defined as tinyint(4), int(11), and bigint(20) - with precision/display width included. In 8.0, the display width is deprecated and they are loaded as simply tinyint, int, and bigint. This throws off the checksum value because the COLUMN_TYPE used in the calculation is different between versions.

To Reproduce
Commands executed:
mydumper --host hostname --user root --password pw --database dbname --outputdir dirpath/filename --ssl --compress --routines --triggers --data-checksums --schema-checksums --routine-checksums --trx-consistency-only --verbose 3 --threads 12

myloader --host hostname --user root --password pw --database dbname --directory dirpath/filename --ssl --disable-redo-log --innodb-optimize-keys --verbose 3 --threads 12 --max-threads-per-table 8 --max-threads-for-index-creation 8 --overwrite-tables

Expected behavior
This may be an edge case of how you expect this tool to be used. However, it would be nice if the checksum validation could account for the difference in definition. (See additional context for my rudimentary workaround).

How to repeat
CREATE TABLE test_table ( id int(11) NOT NULL AUTO_INCREMENT, name varchar(255) DEFAULT NULL ) ENGINE = INNODB, AUTO_INCREMENT = 1, CHARACTER SET latin1, COLLATE latin1_swedish_ci;

Dump and load that table using --schema-checksums. Upon load, warning will be thrown and checksum will not match, leading the user to believe that perhaps something went wrong during the load.

Environment:

  • OS version: [Ubuntu 20.04.5]
  • MyDumper version: [0.12.7-3]

Additional context
I was able to get around this by modifying the checksum query you posted here: (#361) to account for the difference in column definition syntax:

SELECT table_name, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(COLUMN_NAME, ORDINAL_POSITION, DATA_TYPE, REPLACE(REPLACE(REPLACE(COLUMN_TYPE, 'int(11)', 'int'), 'bigint(20)', 'bigint'), 'tinyint(4)', 'tinyint'))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'schema_name' GROUP BY table_name ORDER BY table_name;

I ran this in source and target databases and manually validated that the checksums matched for each table when the COLUMN_TYPE value was standardized. They did.

@davidducos davidducos added this to the Release 0.13.1-1 milestone Oct 20, 2022
@davidducos davidducos added the bug label Oct 20, 2022
@davidducos
Copy link
Member

@stephanie-dba THANK YOU!!! this bug report is so complete! I will be glad to fix this for next release.

@stephanie-dba
Copy link
Author

@davidducos Thank you! This is a great tool and I am so glad I found it.

One thing to note that may be obvious but I thought I should mention: I'm not sure how the "standard" widths get assigned to INT column_types (I'm new to mySQL; have a MSSQLSERVER background), but according to the documentation https://dev.mysql.com/doc/refman/8.0/en/numeric-type-attributes.html, the width can be set to anything. So my REPLACE worked for our instance, but ideally, you'd just remove anything in () after the initial INT in the table definitions. Again, probably obvious, but... :) Thanks so much!

@stephanie-dba
Copy link
Author

@davidducos I found a couple other checksum issues while going over the rest of the loader log.

Triggers Checksum
When the triggers were loaded, some of them came in with an extra space after each statement delimiter ( ; ) in the code. I'm not sure why that happened, maybe because of the version difference, but I ended up modifying the checksum to replace space, line feed, carriage return, and tab (because I wasn't sure at first which character was the offender):
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(REPLACE(REPLACE(REPLACE(REPLACE(ACTION_STATEMENT, CHAR(32), ''), CHAR(13), ''), CHAR(10), ''), CHAR(9), '')) AS UNSIGNED)), 10, 16)), 0) AS crc

Indexes Checksum
For indexes, the checksum query in the code is:
SELECT COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(table_name,INDEX_NAME,SEQ_IN_INDEX,COLUMN_NAME)) AS UNSIGNED)), 10, 16)), 0) AS crc from information_schema.STATISTICS WHERE TABLE_SCHEMA='%s' and TABLE_NAME='%s' group by INDEX_NAME,SEQ_IN_INDEX,COLUMN_NAME
The formula itself was fine, but I got warnings for mismatches on a few tables because the checksums came out in a different order between the two instances. For example, I had a table with 5 indexes and got five identical checksums, but because they were returned in a different order, the warning was thrown in the log as if they did not match.

This could be related to collation, as 8.0 defaults to UTF and my 5.7 instance was latin1. Or it could be because 8.0 removed support for sorting by GROUP BY: https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-sql-changes. Regardless, I was able to get them to match by adding an ORDER BY clause like this:
ORDER BY CAST(INDEX_NAME AS char(64)), SEQ_IN_INDEX
Note that I also had to cast as char(64) to make it work. Without that, the sorting of index names was different between the instances - maybe also version or unicode related. This happened specifically with underscores and plurals (something like "index_names" and "index_name_types", for example, sorted opposite in each instance).

Anyway, I think that gets me through all of the issues I had during this process, but if anything else comes up, I'll report back here. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants