-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HEX + CONVERT / INSERT with utf8 returns different values with MySQL #32599
Comments
for the first case, it's may cause by this as describe in https://docs.pingcap.com/tidb/stable/character-set-and-collation#character-sets-and-collations-supported-by-tidb
but there maybe another bug here MySQL root@127.0.0.1:(none)> select char(2557 using latin1)
'utf-8' codec can't decode byte 0xfd in position 1: invalid start byte
MySQL root@127.0.0.1:(none)> use test;
Packet sequence number wrong - got 5 expected 1 case 2: mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.26 |
+-----------+
1 row in set (0.04 sec)
mysql> SELECT HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20));
+-------------------------------------------+
| HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20)) |
+-------------------------------------------+
| D18F20 |
+-------------------------------------------+
1 row in set, 1 warning (0.04 sec)
# mysql 8.0.23
MySQL root@172.16.5.103:test> SELECT HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20));
+-------------------------------------------+
| HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20)) |
+-------------------------------------------+
| D120D18E |
+-------------------------------------------+
1 row in set
Time: 0.062s
MySQL root@172.16.5.103:test>
MySQL root@172.16.5.103:test> select version();
+-----------+
| version() |
+-----------+
| 8.0.23 |
+-----------+
|
tidb v5.3.0 has a same behavior with mysql 8.0.26, but since v5.4.0 the behavior is changed. MySQL root@127.0.0.1:(none)> select tidb_version()\G
***************************[ 1. row ]***************************
tidb_version() | Release Version: v5.3.0
Edition: Community
Git Commit Hash: 4a1b2e9fe5b5afb1068c56de47adb07098d768d6
Git Branch: heads/refs/tags/v5.3.0
UTC Build Time: 2021-11-24 13:31:24
GoVersion: go1.16.5
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set
Time: 0.001s
MySQL root@127.0.0.1:(none)> SELECT HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20));
->
+-------------------------------------------+
| HEX(INSERT(_utf8 0xD18FD18E, 2, 1, 0x20)) |
+-------------------------------------------+
| D18F20 |
+-------------------------------------------+ |
Thanks for the in-depth investigation. I am working on MySQL v.8.0.28 |
For https://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ latin1 is a single-byte encoding, so each of the 256 characters are just a single byte. The UTF-8 encoding was designed to be backward-compatible with ASCII documents, for the first 128 characters. For characters above 128, a multi-byte sequence describes the character. According to http://en.wikipedia.org/wiki/UTF-8#Description. For 2557(0x09FD), whose size is 2 bytes, will be covert to utf8 code 0x09C3BD, whose size is 3 bytes. 2557 to binary But for tidb >= 5.4, result is 0x093F. Root cause is #30288 |
For According to https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_insert,
tidb >= 5.4 will get unexpected result. tidb/expression/builtin_string.go Lines 3646 to 3653 in d4dc6b5
|
Seems like some encoding/collation change broke some of our old behavior, thus this issue. Changing to sig/sql-infra. |
For Another interesting case:
It's related with the following codes: Lines 398 to 404 in aff0c36
If the third argument is binary, the return collation will be inferred as binary 🤔 |
And the insert chooses to cast charset according to the return type, but not the first argument. This change is brought by #29905 . |
The behavior of I'll close this issue as it's expected (both problems). If you need further discussion, feel free to reopen this issue 😃 . |
Bug Report
1. Minimal reproduce step
2. What did you expect to see?
3. What did you see instead
4. What is your TiDB version?
The text was updated successfully, but these errors were encountered: