Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skeema 1.7.1 + MySQL 8.0 Diff verification failure #184

Closed
tspotts opened this issue May 19, 2022 · 4 comments
Closed

skeema 1.7.1 + MySQL 8.0 Diff verification failure #184

tspotts opened this issue May 19, 2022 · 4 comments
Labels

Comments

@tspotts
Copy link

tspotts commented May 19, 2022

Describe the bug
When creating a new table with varchar/text columns or adding varchar/text columns to an existing table we're experiencing skeema verification failures if we do not explicitly specify COLLATION on each column.

Environment

  • Skeema version: [skeema version 1.7.1-community, commit 4630511, released 2022-03-30T16:14:39Z]
  • Database vendor, version, OS/platform: [8.0.27-18 Percona Server (GPL), Release 18, Revision 24801e21b45 running on CentOS Linux release 7.5.1804 (Core)]

To Reproduce

$ cat why_you_so_mean_utf8.sql

CREATE TABLE `why_you_so_mean_utf8` (
  `the_id` bigint NOT NULL,
  `alpha` varchar(42) NOT NULL DEFAULT '',
  `bravo` text,
  PRIMARY KEY (`the_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;


$ skeema push
2022-05-18 09:01:50 [INFO]  Using container skeema-percona-8.0 (image=percona:8.0) for workspace operations
2022-05-18 09:01:51 [INFO]  Pushing changes from /workspace/development/database_name/*.sql to mysql-server-fqdn.xyz:3306 database_name
-- instance: mysql-server-fqdn.xyz:3306
USE `database_name`;
\! /usr/local/bin/skeema-wrappers/ddl_wrapper.py --u=skeema-user --p=XXXXX --schema=database_name --table=why_you_so_mean_utf8 --host=mysql-server-fqdn.xyz --port=3306 --ddl='CREATE TABLE `why_you_so_mean_utf8` (
  `the_id` bigint NOT NULL,
  `alpha` varchar(42) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '"'"''"'"',
  `bravo` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`the_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8' --clauses='(
  `the_id` bigint NOT NULL,
  `alpha` varchar(42) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '"'"''"'"',
  `bravo` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`the_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8' --type=CREATE
Running CREATE TABLE `why_you_so_mean_utf8` (
  `the_id` bigint NOT NULL,
  `alpha` varchar(42) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `bravo` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`the_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8...
Online ALTER applied successfully
2022-05-18 09:01:54 [INFO]  mysql-server-fqdn.xyz:3306 database_name: push complete

$ skeema diff
2022-05-18 09:02:18 [INFO]  Using container skeema-percona-8.0 (image=percona:8.0) for workspace operations
2022-05-18 09:02:19 [INFO]  Generating diff of mysql-server-fqdn.xyz:3306 database_name vs /workspace/development/database_name/*.sql
2022-05-18 09:02:20 [ERROR] Diff verification failure on table why_you_so_mean_utf8

                            EXPECTED POST-ALTER:
                            CREATE TABLE `why_you_so_mean_utf8` (
                              `the_id` bigint NOT NULL,
                              `alpha` varchar(42) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
                              `bravo` text COLLATE utf8mb4_unicode_ci,
                              PRIMARY KEY (`the_id`)
                            ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

                            ACTUAL POST-ALTER:
                            CREATE TABLE `why_you_so_mean_utf8` (
                              `the_id` bigint NOT NULL,
                              `alpha` varchar(42) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
                              `bravo` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
                              PRIMARY KEY (`the_id`)
                            ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

                            Run command again with --skip-verify if this discrepancy is safe to ignore
2022/05/18 09:02:20 socat[4857] E write(6, 0x7fdb7a697000, 55): Broken pipe

Expected behavior
On skeema 1.7.1 + MySQL 5.7 this doesn't appear to be an issue. In testing skeema 1.7.0 + MySQL 8.0 this doesn't appear to be an issue.

Additional context
Our work arounds so far have been either adding COLLATION for schema changes OR running a skeema pull after the fact to get the .sql files back into canonical format.

@tspotts tspotts added the bug label May 19, 2022
@evanelias
Copy link
Contributor

Thank you Terry (and Keith) for the excellent issue report! I can confirm this is a bug: some refactored code in Skeema 1.7.1 appears to have caused an unintentional regression.

The issue only affects MySQL 8, and only if the table's default charset has a different associated default collation than the table's default collation. In other words, this won't come up on tables using DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci, but it will happen to tables using DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci.

Unfortunately that means users who upgraded from MySQL 5.7 (or earlier) to 8.0 are more likely to be affected, since the default collation for utf8mb4 changed to utf8mb4_0900_ai_ci beginning in 8.0; that collation didn't even exist prior to 8.0.

Context on the refactored code and this bug:

Skeema uses the output of SHOW CREATE TABLE for a few different things, including any command which dumps your table definitions to the filesystem (skeema init, skeema pull, skeema format). It is also used for two different verification/safety checks in diff logic (skeema diff, skeema push):

  • Skeema confirms that it has introspected a table correctly by ensuring its in-memory representation of a table would result in the expected SHOW CREATE TABLE

  • When generating a diff, by default the verify option is enabled, which also verifies the generated ALTER TABLE is correct: in a temporary location it confirms that (previous CREATE TABLE) + (generated ALTER TABLE) = (new desired CREATE TABLE), using SHOW CREATE TABLE to get those CREATEs

MySQL versions before MySQL 8, and all versions of MariaDB, had consistent logic around when columns would include CHARACTER SET and/or COLLATE clauses in SHOW CREATE TABLE. This logic was changed substantially in MySQL 8 in strange ways. So Skeema has logic which tries to capture MySQL 8's behavior, to avoid erroneous failures in those two verification/safety checks.

  • Prior to Skeema v1.7.1, the solution for this was a bit hacky: Skeema would take the literal SHOW CREATE TABLE returned by MySQL 8, and then manipulate it so that it conforms to the pre-8 formatting rules with regards to superfluous column-level CHARACTER SET and COLLATE clauses.

  • Skeema v1.7.1 refactored this code to instead track which columns have the superfluous clauses, and use this in its verifications. The new approach turned out to be simpler, less hacky, and more efficient. However, it made the assumption that MySQL 8's SHOW CREATE TABLE was more consistent than it actually is 🙃

In MySQL 8, if your table-level defaults are (for example) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci, and you have a stringy column that doesn't specify any CHARACTER SET or COLLATE clause, then SHOW CREATE TABLE will add COLLATE clauses to those columns anyway. But then if you re-run that output as a CREATE TABLE, now the subsequent SHOW CREATE TABLE will also add CHARACTER SET clauses to those columns. It makes no sense at all 🤯

mysql> CREATE TABLE wtf (
    ->    name varchar(100)
    -> ) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.04 sec)

mysql> SHOW CREATE TABLE wtf;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                          |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| wtf   | CREATE TABLE `wtf` (
  `name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> DROP TABLE wtf;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE `wtf` (
    ->   `name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW CREATE TABLE wtf;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| wtf   | CREATE TABLE `wtf` (
  `name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

In other words: In MySQL 8, you can take a "canonical" CREATE TABLE (exactly as returned by SHOW CREATE TABLE), run it, and then get a slightly different SHOW CREATE TABLE. This breaks the assumptions made in Skeema's verification logic.

This should be fixable by reintroducing some of the hacky normalization logic from pre-1.7.1, but hopefully there's a better way. In any case, will aim to have a fix in Skeema 1.8.0, due out in a couple weeks.

@tspotts
Copy link
Author

tspotts commented May 20, 2022

Thank you for creating such excellent software! ❤️ skeema

@evanelias
Copy link
Contributor

Update: a fix will be committed soon 🥳 It will be included in Skeema v1.8.0, which is currently planned for release tomorrow, assuming no unexpected problems come up when running the full matrix of integration tests.

The fix changes the verify logic to do a structural table comparison (essentially re-using the existing table diff logic) if the simpler SHOW CREATE TABLE comparison doesn't match. The structural comparison intentionally ignores presence or lack of column-level charset/collation when they are equal to the table-level defaults, since this is just a cosmetic detail which does not have any functional impact. But it will still correctly flag any actual functional discrepancy between the tables, which is what we care about in verify.

Initially I was wary of re-using the table diff logic in this way, to essentially verify itself, but in retrospect it has enough safeguards built-in for this approach to be viable. For example, if the SHOW CREATE TABLEs don't match, but the diff logic can't figure out how/why the tables actually differ structurally or cosmetically, then it already catches this situation so verify can throw an error appropriately.

@evanelias
Copy link
Contributor

Skeema v1.8.0 has been released and contains this fix. Thanks again for the comprehensive issue report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants