PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) #5294

venkatesh-prasad-v · 2024-05-20T09:59:38Z

PS-9219: MySQL converts collation of date data type in ibd but data dictionary

https://perconadev.atlassian.net/browse/PS-9219

Problem

Import tablespace operation fails due to the incorrect table definition
in the ibd file if the charset-collation was changed before the backup
if the table had a column of temporal type (date, datetime, timestamp).

Analysis

In MySQL temporal types are always stored and compared using
my_charset_latin1 charset.

During the execution of ALTER TABLE CONVERT TO CHARACTER SET, MySQL
changes the charset and collation stored in data-dictionary and SDI
for temporal columns to a different collation_id.
In practice, this new collation_id is ignored when such columns
are stored/compared. Corresponding Field objects are not updated to
use this new collation (Actually, Field objects for temporal types are
hardcoded to use my_charset_latin1).

This new collation is not visible in I_S and SHOW CREATE TABLE output.
It will be ignored by CREATE TABLE LIKE and rewritten by ALTER TABLE
since both these statements use info from Field objects of source table
to produce Create_field objects describing columns of new table/new
version of the table.

For example:

create table a(dt datetime);
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 8

alter table a CONVERT TO CHARACTER SET utf8mb4 collate utf8mb4_unicode_ci;
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 224

alter table a engine = innodb;
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 8

However, this almost invisible incorrect collation_id causes problems when
we try to restore InnoDB table with such a column from the backup created
using Percona eXtraBackup tool.

This tool uses information from DD/SDI to produce .cfg file describing
InnoDB table being restored. Later information from this file is used
by ALTER TABLE IMPORT TABLESPACE that imports restored table.

Particularly, for new temporal types that support fractional seconds
collation_id from DD/SDI affects the InnoDB "precise type" describing
the column and stored in .cfg. Because of this, incorrect collation_id
in DD/SDI for such columns results in incorrect "precise type" in .cfg
file.

As consequence we get Schema mismatch (Column a precise type mismatch.)
error when ALTER TABLE IMPORT TABLESPACE compares the "precise type"
from such .cfg file with "precise type" for column in table version
being imported into. The latter is based on collation_id which ultimately
comes from Field object and always corresponds to my_charset_latin1 for
temporal types.

Note that this problem do not affect scenarios when we import table
with .cfg file that was generated by MySQL's FLUSH TABLE FOR EXPORT
command, as the latter doesn't use DD/SDI to calculate "precise type"
but gets information from Field object in table being exported instead.

Solution

This commit changes ALTER TABLE CONVERT TO CHARACTER SET command to
not alter the character set for temporal types stored in data-dictionary/SDI.
In other words, we now force the server to always use my_charset_latin1
in DD/SDI for temporal types.

There will be a separate fix to Percona eXtraBackup tool which will
change code generating .cfg to ignore collation_id stored in DD/SDI for
temporal columns like it is already done for some other types with fixed
collation_id.

PR for trunk: #5295

Testing Done

Jenkins: https://ps80.cd.percona.com/view/8.0%20parallel%20MTR/job/percona-server-8.0-param-parallel-mtr/46/console

dlenev

Hello Venkatesh!

The patch itself looks good to me. I only have some comments about commit message
and comments.

First of all, could you please fix typo in commit message title and replace "collection" with "collation" ? Also "tpye" -> "type".
I also think that it is worth to rewrite the rest of commit message:

a) To avoid creating false impression that temporal
columns can have collation different than my_charset_latin1

they really can't, it is only collation stored in
data-dictionary/SDI for such columns that can be wrong.

b) To focus on how this wrong collation stored in DD/SDI
causes problems during restoring of backup.

For example we can say something like:

Analysis

In MySQL temporal types are always stored and compared using
my_charset_latin1 charset.

During the execution of ALTER TABLE CONVERT TO CHARACTER SET, MySQL
changes the charset and collation stored in data-dictionary and SDI
for temporal columns to a different collation_id.
In practice, this new collation_id is ignored when such columns
are stored/compared. Corresponding Field objects are not updated to
use this new collation (Actually, Field objects for temporal types are
hardcoded to use my_charset_latin1).

This new collation is not visible in I_S and SHOW CREATE TABLE output.
It will be ignored by CREATE TABLE LIKE and rewritten by ALTER TABLE
since both these statements use info from Field objects of source table
to produce Create_field objects describing columns of new table/new
version of the table.

For example:

create table a(dt datetime);
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 8
alter table a CONVERT TO CHARACTER SET utf8mb4 collate utf8mb4_unicode_ci;
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 224
alter table a engine = innodb;
$ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd
"name": "dt",
"collation_id": 8

However, this almost invisible incorrect collation_id causes problems when
we try to restore InnoDB table with such a column from the backup created
using Percona eXtraBackup tool.

This tool uses information from DD/SDI to produce .cfg file describing
InnoDB table being restored. Later information from this file is used
by ALTER TABLE IMPORT TABLESPACE that imports restored table.

Particularly, for new temporal types that support fractional seconds
collation_id from DD/SDI affects the InnoDB "precise type" describing
the column and stored in .cfg. Because of this, incorrect collation_id
in DD/SDI for such columns results in incorrect "precise type" in .cfg
file.

As consequence we get Schema mismatch (Column a precise type mismatch.)
error when ALTER TABLE IMPORT TABLESPACE compares the "precise type"
from such .cfg file with "precise type" for column in table version
being imported into. The latter is based on collation_id which ultimately
comes from Field object and always corresponds to my_charset_latin1 for
temporal types.

Note that this problem do not affect scenarios when we import table
with .cfg file that was generated by MySQL's FLUSH TABLE FOR EXPORT
command, as the latter doesn't use DD/SDI to calculate "precise type"
but gets information from Field object in table being exported instead.

Solution

This commit changes ALTER TABLE CONVERT TO CHARACTER SET command to
not alter the character set for temporal types stored in data-dictionary/SDI.
In other words, we now force the server to always use my_charset_latin1
in DD/SDI for temporal types.

There will be a separate fix to Percona eXtraBackup tool which will
change code generating .cfg to ignore collation_id stored in DD/SDI for
temporal columns like it is already done for some other types with fixed
collation_id.

sql/sql_table.cc

venkatesh-prasad-v · 2024-06-05T07:30:59Z

@dlenev Thank you very much for the detailed and the insightful review. I have updated both the commit message and the code comment as per your suggestions.

dlenev

Hello Venki!

I think it is OK to push this patch.

…ictionary https://perconadev.atlassian.net/browse/PS-9219 Problem ======= Import tablespace operation fails due to the incorrect table definition in the ibd file if the charset-collation was changed before the backup if the table had a column of temporal type (date, datetime, timestamp). Analysis ======== In MySQL temporal types are always stored and compared using my_charset_latin1 charset. During the execution of ALTER TABLE CONVERT TO CHARACTER SET, MySQL changes the charset and collation stored in data-dictionary and SDI for temporal columns to a different collation_id. In practice, this new collation_id is ignored when such columns are stored/compared. Corresponding Field objects are not updated to use this new collation (Actually, Field objects for temporal types are hardcoded to use my_charset_latin1). This new collation is not visible in I_S and SHOW CREATE TABLE output. It will be ignored by CREATE TABLE LIKE and rewritten by ALTER TABLE since both these statements use info from Field objects of source table to produce Create_field objects describing columns of new table/new version of the table. For example: create table a(dt datetime); $ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd "name": "dt", "collation_id": 8 alter table a CONVERT TO CHARACTER SET utf8mb4 collate utf8mb4_unicode_ci; $ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd "name": "dt", "collation_id": 224 alter table a engine = innodb; $ ./bin/ibd2sdi ./var/mysqld.1/data/test/a.ibd "name": "dt", "collation_id": 8 However, this almost invisible incorrect collation_id causes problems when we try to restore InnoDB table with such a column from the backup created using Percona XtraBackup tool. This tool uses information from DD/SDI to produce .cfg file describing InnoDB table being restored. Later information from this file is used by ALTER TABLE IMPORT TABLESPACE that imports restored table. Particularly, for new temporal types that support fractional seconds collation_id from DD/SDI affects the InnoDB "precise type" describing the column and stored in .cfg. Because of this, incorrect collation_id in DD/SDI for such columns results in incorrect "precise type" in .cfg file. As consequence we get Schema mismatch (Column a precise type mismatch.) error when ALTER TABLE IMPORT TABLESPACE compares the "precise type" from such .cfg file with "precise type" for column in table version being imported into. The latter is based on collation_id which ultimately comes from Field object and always corresponds to my_charset_latin1 for temporal types. Note that this problem do not affect scenarios when we import table with .cfg file that was generated by MySQL's FLUSH TABLE FOR EXPORT command, as the latter doesn't use DD/SDI to calculate "precise type" but gets information from Field object in table being exported instead. Solution ======== This commit changes ALTER TABLE CONVERT TO CHARACTER SET command to not alter the character set for temporal types stored in data-dictionary/SDI. In other words, we now force the server to always use my_charset_latin1 in DD/SDI for temporal types. There will be a separate fix to Percona XtraBackup tool which will change code generating .cfg to ignore collation_id stored in DD/SDI for temporal columns like it is already done for some other types with fixed collation_id.

venkatesh-prasad-v requested review from percona-ysorokin and dlenev May 20, 2024 09:59

venkatesh-prasad-v self-assigned this May 20, 2024

venkatesh-prasad-v force-pushed the PS-9219-8.0 branch 2 times, most recently from f0ab289 to 26eb735 Compare May 20, 2024 10:49

venkatesh-prasad-v changed the title ~~PS-9219: MySQL converts collection of date data type in ibd but data dictionary~~ PS-9219: MySQL converts collection of date data type in ibd but data dictionary (8.0) May 23, 2024

dlenev requested changes Jun 4, 2024

View reviewed changes

sql/sql_table.cc Outdated Show resolved Hide resolved

venkatesh-prasad-v changed the title ~~PS-9219: MySQL converts collection of date data type in ibd but data dictionary (8.0)~~ PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) Jun 5, 2024

venkatesh-prasad-v force-pushed the PS-9219-8.0 branch 2 times, most recently from b7957cb to be9aefa Compare June 5, 2024 07:29

venkatesh-prasad-v requested a review from dlenev June 5, 2024 07:31

venkatesh-prasad-v force-pushed the PS-9219-8.0 branch from be9aefa to f3347bd Compare June 5, 2024 07:34

dlenev approved these changes Jun 5, 2024

View reviewed changes

venkatesh-prasad-v mentioned this pull request Jun 5, 2024

PS-9219: MySQL converts collation of date data type in ibd but data dictionary (trunk) #5295

Merged

venkatesh-prasad-v force-pushed the PS-9219-8.0 branch from f3347bd to 670413d Compare June 5, 2024 11:43

venkatesh-prasad-v force-pushed the PS-9219-8.0 branch from 670413d to a93dde3 Compare June 14, 2024 08:59

venkatesh-prasad-v changed the base branch from 8.0 to release-8.0.37-29 June 17, 2024 06:46

venkatesh-prasad-v merged commit 9b88474 into percona:release-8.0.37-29 Jun 17, 2024
23 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) #5294

PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) #5294

venkatesh-prasad-v commented May 20, 2024 •

edited

Loading

dlenev left a comment •

edited

Loading

venkatesh-prasad-v commented Jun 5, 2024

dlenev left a comment

PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) #5294

PS-9219: MySQL converts collation of date data type in ibd but data dictionary (8.0) #5294

Conversation

venkatesh-prasad-v commented May 20, 2024 • edited Loading

Problem

Analysis

Solution

Testing Done

dlenev left a comment • edited Loading

Choose a reason for hiding this comment

Analysis

Solution

venkatesh-prasad-v commented Jun 5, 2024

dlenev left a comment

Choose a reason for hiding this comment

venkatesh-prasad-v commented May 20, 2024 •

edited

Loading

dlenev left a comment •

edited

Loading