Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOCATE is ignoring the _latin1 literal and treating this as a utf8 string. #29741

Open
ramanich1 opened this issue Nov 13, 2021 · 6 comments
Open
Labels
sig/sql-infra SIG: SQL Infra type/compatibility type/enhancement The issue or PR belongs to an enhancement.

Comments

@ramanich1
Copy link
Collaborator

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

select locate(_latin1 0xD0B1, _latin1 0xD0B0D0B1D0B2) ;

2. What did you expect to see? (Required)

mysql> select locate(_latin1 0xD0B1, _latin1 0xD0B0D0B1D0B2) ;
+------------------------------------------------+
| locate(_latin1 0xD0B1, _latin1 0xD0B0D0B1D0B2) |
+------------------------------------------------+
|                                              3 |
+------------------------------------------------+

3. What did you see instead (Required)

mysql> select locate(_latin1 0xD0B1, _latin1 0xD0B0D0B1D0B2) ;
+------------------------------------------------+
| locate(_latin1 0xD0B1, _latin1 0xD0B0D0B1D0B2) |
+------------------------------------------------+
|                                              2 |
+------------------------------------------------+

4. What is your TiDB version? (Required)

| Release Version: v5.4.0-alpha-67-g17edc5758
Edition: Community
Git Commit Hash: 17edc5758fbf865cada7e156318c41d3ee8a7501
Git Branch: master
UTC Build Time: 2021-11-10 20:45:18
GoVersion: go1.17.2
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
@ramanich1 ramanich1 added the type/bug The issue is confirmed as a bug. label Nov 13, 2021
@ChenPeng2013
Copy link
Contributor

https://docs.pingcap.com/tidb/stable/character-set-and-collation

Warning:

TiDB incorrectly treats latin1 as a subset of utf8. This can lead to unexpected behaviors when you store characters that differ between latin1 and utf8 encodings. It is strongly recommended to the utf8mb4 character set. See TiDB #18955 for more details.

@Defined2014
Copy link
Contributor

/remove-type bug

@ti-chi-bot ti-chi-bot removed the type/bug The issue is confirmed as a bug. label Nov 15, 2021
@Defined2014
Copy link
Contributor

/type enhancement

@ti-chi-bot ti-chi-bot added the type/enhancement The issue or PR belongs to an enhancement. label Nov 15, 2021
@Defined2014
Copy link
Contributor

https://docs.pingcap.com/tidb/stable/character-set-and-collation

Warning:
TiDB incorrectly treats latin1 as a subset of utf8. This can lead to unexpected behaviors when you store characters that differ between latin1 and utf8 encodings. It is strongly recommended to the utf8mb4 character set. See TiDB #18955 for more details.

According to the doc, it's more like enhancement better than bug.

@Defined2014
Copy link
Contributor

/type compatibility

@Alkaagr81
Copy link
Collaborator

Alkaagr81 commented Dec 14, 2021

--It works 
select POSITION(_utf8mb4'B' IN _utf8mb4'abcd' COLLATE utf8mb4_general_ci);
--Not works 
select POSITION(_utf8mb4'B' IN _utf8mb4'abcd');
select POSITION(_latin1'B' IN _latin1'aBcd');

--In Mysql
mysql> select POSITION(_utf8mb4'B' IN _utf8mb4'abcd');
+-----------------------------------------+
| POSITION(_utf8mb4'B' IN _utf8mb4'abcd') |
+-----------------------------------------+
|                                       2 |
+-----------------------------------------+
1 row in set (0.00 sec)

mysql> select POSITION(_latin1'b' IN _latin1'aBcd');
+---------------------------------------+
| POSITION(_latin1'b' IN _latin1'aBcd') |
+---------------------------------------+
|                                     2 |
+---------------------------------------+
1 row in set (0.00 sec)

--IN TIDB
mysql> select POSITION(_utf8mb4'B' IN _utf8mb4'abcd');
+-----------------------------------------+
| POSITION(_utf8mb4'B' IN _utf8mb4'abcd') |
+-----------------------------------------+
|                                       0 |
+-----------------------------------------+
1 row in set (0.00 sec)

mysql> select POSITION(_latin1'b' IN _latin1'aBcd');
+---------------------------------------+
| POSITION(_latin1'b' IN _latin1'aBcd') |
+---------------------------------------+
|                                     0 |
+---------------------------------------+
1 row in set (0.00 sec)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra type/compatibility type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

6 participants