Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data too long error on latin1/ascii collate #27676

Closed
milkice233 opened this issue Aug 30, 2021 · 2 comments
Closed

Data too long error on latin1/ascii collate #27676

milkice233 opened this issue Aug 30, 2021 · 2 comments
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@milkice233
Copy link
Contributor

milkice233 commented Aug 30, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Launch mysql client, connect to tidb and execute the following command

CREATE TABLE NT_COLLATION27788(col1 varchar(20),col2 varchar(20),col4 datetime,col3 bigint,col5 float) charset latin1 collate latin1_bin;
INSERT INTO NT_COLLATION27788(col1,col2,col4,col3,col5) values("î²>k°{ßt}ÊCID¿%", "ÎÃÊv¿~", "7534-10-07 08:19:34", -1964980308060067372, 118.66672);

2. What did you expect to see? (Required)

TiDB should output Query OK, 1 row affected (0.00 sec), just like what MySQL does.

3. What did you see instead (Required)

TiDB returns an error: ERROR 1406 (22001): Data too long for column 'col1' at row 1

4. What is your TiDB version? (Required)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v5.0.4 |
+--------------------+
1 row in set (0.00 sec)
@milkice233 milkice233 added the type/bug The issue is confirmed as a bug. label Aug 30, 2021
@zimulala
Copy link
Contributor

zimulala commented Aug 30, 2021

There should be some problems with character set processing, we can see the following behavior:

tidb>  select length("î²>k°{ßt}ÊCID¿%");
+---------------------------------+
| length("î²>k°{ßt}ÊCID¿%")       |
+---------------------------------+
|                              21 |
+---------------------------------+
1 row in set (0.00 sec)

tidb>  select length(convert("î²>k°{ßt}ÊCID¿%" using latin1));
+-------------------------------------------------------+
| length(convert("î²>k°{ßt}ÊCID¿%" using latin1))       |
+-------------------------------------------------------+
|                                                    33 |
+-------------------------------------------------------+
1 row in set (0.00 sec)




mysql> select length("î²>k°{ßt}ÊCID¿%");
+---------------------------------+
| length("î²>k°{ßt}ÊCID¿%")       |
+---------------------------------+
|                              21 |
+---------------------------------+
1 row in set (0.01 sec)

mysql> select length(convert("î²>k°{ßt}ÊCID¿%" using latin1));
+-------------------------------------------------------+
| length(convert("î²>k°{ßt}ÊCID¿%" using latin1))       |
+-------------------------------------------------------+
|                                                    15 |
+-------------------------------------------------------+
1 row in set (0.00 sec)

@bb7133
Copy link
Member

bb7133 commented Aug 30, 2021

@milkice233 @jingshanglu This is a known issue that is caused by some historical mistakes, as you can see from the official website:

TiDB incorrectly treats latin1 as a subset of utf8. This can lead to unexpected behaviors when you store characters that differ between latin1 and utf8 encodings. It is strongly recommended to the utf8mb4 character set. See TiDB #18955 for more details.

Since TiDB incorrectly treats latin1 as a subset of utf-8, the value î²>k°{ßt}ÊCID¿% requires >20 bytes when encoding in UTF8, which caused Data too long for column error.

It is possible to be fixed if we do not consider the historical burden, but for the existing users that use latin1 already, fixing them are quite difficult(and may be dangerous). Considering latin1 is not a good choice for modern systems any more, we decide to leave it as 'not fixed'.

I will mark it as 'won't fix' and close this issue. Thanks for reporting it anyway. /cc @zimulala

@bb7133 bb7133 closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants