diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 37c66a6a0125b..4d76b05b0be09 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -644,6 +644,7 @@ - Character Set and Collation - [Overview](/character-set-and-collation.md) - [GBK](/character-set-gbk.md) + - [GB18030](/character-set-gb18030.md) - Read Historical Data - Use Stale Read (Recommended) - [Usage Scenarios of Stale Read](/stale-read.md) diff --git a/TOC.md b/TOC.md index 2f0a3ea4f7f1d..5006f35c8ec1f 100644 --- a/TOC.md +++ b/TOC.md @@ -1004,6 +1004,7 @@ - Character Set and Collation - [Overview](/character-set-and-collation.md) - [GBK](/character-set-gbk.md) + - [GB18030](/character-set-gb18030.md) - [Placement Rules in SQL](/placement-rules-in-sql.md) - System Tables - `mysql` Schema diff --git a/br/backup-and-restore-overview.md b/br/backup-and-restore-overview.md index 061d08f8ec8a5..4aed24e5c1542 100644 --- a/br/backup-and-restore-overview.md +++ b/br/backup-and-restore-overview.md @@ -112,7 +112,8 @@ Backup and restore might go wrong when some TiDB features are enabled or disable | Feature | Issue | Solution | | ---- | ---- | ----- | -|GBK charset|| BR of versions earlier than v5.4.0 does not support restoring `charset=GBK` tables. No version of BR supports recovering `charset=GBK` tables to TiDB clusters earlier than v5.4.0. | +|GBK charset|| Before v5.4.0, BR does not support restoring tables with `charset=GBK`. In addition, no version of BR supports restoring tables with `charset=GBK` to TiDB clusters earlier than v5.4.0. | +|GB18030 charset|| Before v9.0.0, BR does not support restoring tables with `charset=GB18030`. In addition, no version of BR supports restoring tables with `charset=GB18030` to TiDB clusters earlier than v9.0.0.| | Clustered index | [#565](https://github.com/pingcap/br/issues/565) | Make sure that the value of the `tidb_enable_clustered_index` global variable during restore is consistent with that during backup. Otherwise, data inconsistency might occur, such as `default not found` error and inconsistent data index. | | New collation | [#352](https://github.com/pingcap/br/issues/352) | Make sure that the value of the `new_collation_enabled` variable in the `mysql.tidb` table during restore is consistent with that during backup. Otherwise, inconsistent data index might occur and checksum might fail to pass. For more information, see [FAQ - Why does BR report `new_collations_enabled_on_first_bootstrap` mismatch?](/faq/backup-and-restore-faq.md#why-is-new_collation_enabled-mismatch-reported-during-restore). | | Global temporary tables | | Make sure that you are using v5.3.0 or a later version of BR to back up and restore data. Otherwise, an error occurs in the definition of the backed global temporary tables. | diff --git a/character-set-and-collation.md b/character-set-and-collation.md index 5b5b3d845826e..303b14ffcf0e3 100644 --- a/character-set-and-collation.md +++ b/character-set-and-collation.md @@ -1,6 +1,6 @@ --- title: Character Set and Collation -summary: Learn about the supported character sets and collations in TiDB. +summary: Learn character sets and collations supported by TiDB. aliases: ['/docs/dev/character-set-and-collation/','/docs/dev/reference/sql/characterset-and-collation/','/docs/dev/reference/sql/character-set/'] --- @@ -38,7 +38,7 @@ SELECT 'A' = 'a'; SET NAMES utf8mb4 COLLATE utf8mb4_general_ci; ``` -```sql +``` Query OK, 0 rows affected (0.00 sec) ``` @@ -46,7 +46,7 @@ Query OK, 0 rows affected (0.00 sec) SELECT 'A' = 'a'; ``` -```sql +``` +-----------+ | 'A' = 'a' | +-----------+ @@ -98,18 +98,19 @@ Currently, TiDB supports the following character sets: SHOW CHARACTER SET; ``` -```sql -+---------+-------------------------------------+-------------------+--------+ -| Charset | Description | Default collation | Maxlen | -+---------+-------------------------------------+-------------------+--------+ -| ascii | US ASCII | ascii_bin | 1 | -| binary | binary | binary | 1 | -| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 | -| latin1 | Latin1 | latin1_bin | 1 | -| utf8 | UTF-8 Unicode | utf8_bin | 3 | -| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 | -+---------+-------------------------------------+-------------------+--------+ -6 rows in set (0.00 sec) +``` ++---------+-------------------------------------+--------------------+--------+ +| Charset | Description | Default collation | Maxlen | ++---------+-------------------------------------+--------------------+--------+ +| ascii | US ASCII | ascii_bin | 1 | +| binary | binary | binary | 1 | +| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | +| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 | +| latin1 | Latin1 | latin1_bin | 1 | +| utf8 | UTF-8 Unicode | utf8_bin | 3 | +| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 | ++---------+-------------------------------------+--------------------+--------+ +7 rows in set (0.000 sec) ``` TiDB supports the following collations: @@ -118,12 +119,14 @@ TiDB supports the following collations: SHOW COLLATION; ``` -```sql +``` +--------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +--------------------+---------+-----+---------+----------+---------+---------------+ | ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | | binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | +| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | | gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | | latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | @@ -136,7 +139,7 @@ SHOW COLLATION; | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | +--------------------+---------+-----+---------+----------+---------+---------------+ -13 rows in set (0.00 sec) +15 rows in set (0.000 sec) ``` > **Warning:** @@ -158,7 +161,7 @@ You can use the following statement to view the collations (under the [new frame SHOW COLLATION WHERE Charset = 'utf8mb4'; ``` -```sql +``` +--------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +--------------------+---------+-----+---------+----------+---------+---------------+ @@ -171,7 +174,7 @@ SHOW COLLATION WHERE Charset = 'utf8mb4'; 5 rows in set (0.001 sec) ``` -For details about the TiDB support of the GBK character set, see [GBK](/character-set-gbk.md). +For details about the GBK character set, see [The GBK Character Set](/character-set-gbk.md). For details about the GB18030 character set, see [The GB18030 Character Set](/character-set-gb18030.md). ## `utf8` and `utf8mb4` in TiDB @@ -282,7 +285,7 @@ Database changed SELECT @@character_set_database, @@collation_database; ``` -```sql +``` +--------------------------|----------------------+ | @@character_set_database | @@collation_database | +--------------------------|----------------------+ @@ -295,7 +298,7 @@ SELECT @@character_set_database, @@collation_database; CREATE SCHEMA test2 CHARACTER SET latin1 COLLATE latin1_bin; ``` -```sql +``` Query OK, 0 rows affected (0.09 sec) ``` @@ -311,7 +314,7 @@ Database changed SELECT @@character_set_database, @@collation_database; ``` -```sql +``` +--------------------------|----------------------+ | @@character_set_database | @@collation_database | +--------------------------|----------------------+ @@ -347,7 +350,7 @@ For example: CREATE TABLE t1(a int) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; ``` -```sql +``` Query OK, 0 rows affected (0.08 sec) ``` @@ -379,7 +382,7 @@ Each string corresponds to a character set and a collation. When you use a strin Example: -```sql +``` SELECT 'string'; SELECT _utf8mb4'string'; SELECT _utf8mb4'string' COLLATE utf8mb4_general_ci; @@ -518,7 +521,7 @@ For a TiDB cluster that is already initialized, you can check whether the new co SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled'; ``` -```sql +``` +----------------+ | VARIABLE_VALUE | +----------------+ @@ -535,15 +538,15 @@ This new framework supports semantically parsing collations. TiDB enables the ne -Under the new framework, TiDB supports the `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_bin`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci`, and `gbk_bin` collations, which is compatible with MySQL. +Under the new framework, TiDB supports the `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_bin`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci`, `gbk_bin`, `gb18030_chinese_ci` and `gb18030_bin` collations, which is compatible with MySQL. -When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_ai_ci` and `gbk_chinese_ci` is used, the string comparison is case-insensitive and accent-insensitive. At the same time, TiDB also corrects the collation's `PADDING` behavior: +When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci` and `gb18030_chinese_ci` is used, the string comparison is case-insensitive and accent-insensitive. At the same time, TiDB also corrects the collation's `PADDING` behavior: ```sql CREATE TABLE t(a varchar(20) charset utf8mb4 collate utf8mb4_general_ci PRIMARY KEY); ``` -```sql +``` Query OK, 0 rows affected (0.00 sec) ``` @@ -551,7 +554,7 @@ Query OK, 0 rows affected (0.00 sec) INSERT INTO t VALUES ('A'); ``` -```sql +``` Query OK, 1 row affected (0.00 sec) ``` @@ -559,7 +562,7 @@ Query OK, 1 row affected (0.00 sec) INSERT INTO t VALUES ('a'); ``` -```sql +``` ERROR 1062 (23000): Duplicate entry 'a' for key 't.PRIMARY' -- TiDB is compatible with the case-insensitive collation of MySQL. ``` @@ -567,7 +570,7 @@ ERROR 1062 (23000): Duplicate entry 'a' for key 't.PRIMARY' -- TiDB is compatibl INSERT INTO t VALUES ('a '); ``` -```sql +``` ERROR 1062 (23000): Duplicate entry 'a ' for key 't.PRIMARY' -- TiDB modifies the `PADDING` behavior to be compatible with MySQL. ``` @@ -604,7 +607,7 @@ TiDB supports using the `COLLATE` clause to specify the collation of an expressi SELECT 'a' = _utf8mb4 'A' collate utf8mb4_general_ci; ``` -```sql +``` +-----------------------------------------------+ | 'a' = _utf8mb4 'A' collate utf8mb4_general_ci | +-----------------------------------------------+ diff --git a/character-set-gb18030.md b/character-set-gb18030.md new file mode 100644 index 0000000000000..d13f5886c7474 --- /dev/null +++ b/character-set-gb18030.md @@ -0,0 +1,111 @@ +--- +title: The GB18030 Character Set +summary: Learn the details of TiDB's support for the GB18030 character set. +--- + +# The GB18030 Character Set New in v9.0.0 + +Starting from v9.0.0, TiDB supports the GB18030-2022 character set. This document describes TiDB's support for and compatibility with the GB18030 character set. + +```sql +SHOW CHARACTER SET WHERE CHARSET = 'gb18030'; +``` + +``` ++---------+---------------------------------+--------------------+--------+ +| Charset | Description | Default collation | Maxlen | ++---------+---------------------------------+--------------------+--------+ +| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | ++---------+---------------------------------+--------------------+--------+ +1 row in set (0.01 sec) +``` + +```sql +SHOW COLLATION WHERE CHARSET = 'gb18030'; +``` + +``` ++--------------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++--------------------+---------+-----+---------+----------+---------+---------------+ +| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | +| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | ++--------------------+---------+-----+---------+----------+---------+---------------+ +2 rows in set (0.001 sec) +``` + +## MySQL compatibility + +This section describes the compatibility of the GB18030 character set in TiDB with MySQL. + +### Collation compatibility + +In MySQL, the default collation for the GB18030 character set is `gb18030_chinese_ci`. In TiDB, the default collation for GB18030 depends on the configuration parameter [`new_collations_enabled_on_first_bootstrap`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file/#new_collations_enabled_on_first_bootstrap): + +- By default, `new_collations_enabled_on_first_bootstrap` is set to `true`, which means enabling the [new collation framework](/character-set-and-collation.md#new-framework-for-collations). In this case, the default collation for GB18030 is `gb18030_chinese_ci`. +- If `new_collations_enabled_on_first_bootstrap` is set to `false`, the new framework for collations is disabled, and the default collation for GB18030 is `gb18030_bin`. + +Additionally, the `gb18030_bin` supported by TiDB differs from MySQL's `gb18030_bin` collation. TiDB converts GB18030 to `utf8mb4` and then performs binary sorting. + +After enabling the new framework for collations, if you check the collations for the GB18030 character set, you can see that TiDB's default collation for GB18030 is switched to `gb18030_chinese_ci`: + +```sql +SHOW CHARACTER SET WHERE CHARSET = 'gb18030'; +``` + +``` ++---------+---------------------------------+--------------------+--------+ +| Charset | Description | Default collation | Maxlen | ++---------+---------------------------------+--------------------+--------+ +| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | ++---------+---------------------------------+--------------------+--------+ +1 row in set (0.01 sec) +``` + +```sql +SHOW COLLATION WHERE CHARSET = 'gb18030'; +``` + +``` ++--------------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++--------------------+---------+-----+---------+----------+---------+---------------+ +| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | +| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | ++--------------------+---------+-----+---------+----------+---------+---------------+ +2 rows in set (0.00 sec) +``` + +### Character compatibility + +- TiDB supports GB18030-2022 characters, while MySQL supports GB18030-2005 characters. As a result, the encoding and decoding results for certain characters differ between the two systems. + +- For invalid GB18030 characters, such as `0xFE39FE39`, MySQL allows writing them to the database in hexadecimal form and stores them as `?`. In TiDB, reading or writing invalid GB18030 characters in strict mode returns an error; in non-strict mode, TiDB allows reading or writing invalid GB18030 characters but returns a warning. + +### Others + +- Currently, TiDB does not support using the `ALTER TABLE` statement to convert other character sets to `gb18030`, or to convert from `gb18030` to another character set. + +- TiDB does not support using the `_gb18030` character set introducer. For example: + + ```sql + CREATE TABLE t(a CHAR(10) CHARSET BINARY); + Query OK, 0 rows affected (0.00 sec) + INSERT INTO t VALUES (_gb18030'啊'); + ERROR 1115 (42000): Unsupported character introducer: 'gb18030' + ``` + +- For binary characters in `ENUM` and `SET` types, TiDB currently treats them as using the `utf8mb4` character set. + +## Component compatibility + +- TiFlash, TiDB Data Migration (DM), and TiCDC currently do not support the GB18030 character set. + +- Before v9.0.0, Dumpling does not support exporting tables with `charset=GB18030`, and TiDB Lightning does not support importing tables with `charset=GB18030`. + +- Before v9.0.0, TiDB Backup & Restore (BR) does not support backing up or restoring tables with `charset=GB18030`. In addition, no version of BR supports restoring tables with `charset=GB18030` to TiDB clusters earlier than v9.0.0. + +## See also + +* [`SHOW CHARACTER SET`](/sql-statements/sql-statement-show-character-set.md) +* [Character Set and Collation](/character-set-and-collation.md) diff --git a/character-set-gbk.md b/character-set-gbk.md index a767af5fcdbba..3f812cd7e9d0c 100644 --- a/character-set-gbk.md +++ b/character-set-gbk.md @@ -1,14 +1,12 @@ --- -title: GBK +title: The GBK Character Set summary: This document provides details about the TiDB support of the GBK character set. --- -# GBK +# The GBK Character Set Starting from v5.4.0, TiDB supports the GBK character set. This document provides the TiDB support and compatibility information of the GBK character set. -Starting from v6.0.0, TiDB enables the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) by default. The default collation for TiDB GBK character set is `gbk_chinese_ci`, which is consistent with MySQL. - ```sql SHOW CHARACTER SET WHERE CHARSET = 'gbk'; ``` @@ -17,7 +15,7 @@ SHOW CHARACTER SET WHERE CHARSET = 'gbk'; +---------+-------------------------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+-------------------------------------+-------------------+--------+ -| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 | +| gbk | Chinese Internal Code Specification | gbk_bin | 2 | +---------+-------------------------------------+-------------------+--------+ 1 row in set (0.00 sec) ``` @@ -33,7 +31,7 @@ SHOW COLLATION WHERE CHARSET = 'gbk'; | gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | +----------------+---------+----+---------+----------+---------+---------------+ -2 rows in set (0.00 sec) +2 rows in set (0.001 sec) ``` ## MySQL compatibility @@ -57,7 +55,38 @@ By default, TiDB Cloud enables the [new framework for collations](/character-set -Additionally, because TiDB converts GBK to `utf8mb4` and then uses a binary collation, the `gbk_bin` collation in TiDB is not the same as the `gbk_bin` collation in MySQL. +Additionally, the `gbk_bin` supported by TiDB differs from MySQL's `gbk_bin` collation. TiDB converts GBK to `utf8mb4` and then performs binary sorting. + +After [the new framework for collations](/character-set-and-collation.md#new-framework-for-collations) is enabled, if you check the collations for the GBK character set, you can see that TiDB's default collation for GBK is switched to `gbk_chinese_ci`. + +Starting from TiDB v6.0.0, the new framework for collations is enabled by default, which sets `gbk_chinese_ci` as the default collation for the GBK character set in TiDB, consistent with MySQL. + +```sql +SHOW CHARACTER SET WHERE CHARSET = 'gbk'; +``` + +``` ++---------+-------------------------------------+-------------------+--------+ +| Charset | Description | Default collation | Maxlen | ++---------+-------------------------------------+-------------------+--------+ +| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 | ++---------+-------------------------------------+-------------------+--------+ +1 row in set (0.00 sec) +``` + +```sql +SHOW COLLATION WHERE CHARSET = 'gbk'; +``` + +``` ++----------------+---------+----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++----------------+---------+----+---------+----------+---------+---------------+ +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | ++----------------+---------+----+---------+----------+---------+---------------+ +2 rows in set (0.001 sec) +``` ### Illegal character compatibility diff --git a/dm/dm-overview.md b/dm/dm-overview.md index 8ecd035a17193..c9eb5c7ce1889 100644 --- a/dm/dm-overview.md +++ b/dm/dm-overview.md @@ -59,9 +59,9 @@ Before using the DM tool, note the following restrictions: - DM does not replicate view-related DDL statements and DML statements to the downstream TiDB cluster. It is recommended that you create the view in the downstream TiDB cluster manually. -+ GBK character set compatibility ++ GBK and GB18030 character sets compatibility - - DM does not support migrating `charset=GBK` tables to TiDB clusters earlier than v5.4.0. + - Before v5.4.0, DM does not support migrating `charset=GBK` tables to TiDB clusters. Before v9.0.0, DM does not support migrating tables with `charset=GB18030` to TiDB clusters. + Binlog compatibility diff --git a/information-schema/information-schema-character-sets.md b/information-schema/information-schema-character-sets.md index b94170bb5f8ff..197074ede79e1 100644 --- a/information-schema/information-schema-character-sets.md +++ b/information-schema/information-schema-character-sets.md @@ -40,12 +40,13 @@ The output is as follows: +--------------------+----------------------+-------------------------------------+--------+ | ascii | ascii_bin | US ASCII | 1 | | binary | binary | binary | 1 | +| gb18030 | gb18030_chinese_ci | China National Standard GB18030 | 4 | | gbk | gbk_chinese_ci | Chinese Internal Code Specification | 2 | | latin1 | latin1_bin | Latin1 | 1 | | utf8 | utf8_bin | UTF-8 Unicode | 3 | | utf8mb4 | utf8mb4_bin | UTF-8 Unicode | 4 | +--------------------+----------------------+-------------------------------------+--------+ -6 rows in set (0.00 sec) +7 rows in set (0.00 sec) ``` The description of columns in the `CHARACTER_SETS` table is as follows: diff --git a/migrate-from-mariadb.md b/migrate-from-mariadb.md index 4a670c08cceec..bf70a1b950d5f 100644 --- a/migrate-from-mariadb.md +++ b/migrate-from-mariadb.md @@ -192,12 +192,14 @@ To see what collations TiDB supports, execute this statement on TiDB: SHOW COLLATION; ``` -```sql +``` +--------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +--------------------+---------+-----+---------+----------+---------+---------------+ | ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | | binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | +| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | | gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | | latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | @@ -210,7 +212,7 @@ SHOW COLLATION; | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | +--------------------+---------+-----+---------+----------+---------+---------------+ -13 rows in set (0.00 sec) +15 rows in set (0.000 sec) ``` To check what collations the columns of your current tables are using, you can use this statement: diff --git a/mysql-compatibility.md b/mysql-compatibility.md index b8f9629b97070..deb85071aa1bb 100644 --- a/mysql-compatibility.md +++ b/mysql-compatibility.md @@ -52,7 +52,7 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u > Currently, only {{{ .starter }}} and {{{ .essential }}} clusters in certain AWS regions support [`FULLTEXT` syntax and indexes](https://docs.pingcap.com/tidbcloud/vector-search-full-text-search-sql). TiDB Self-Managed and TiDB Cloud Dedicated support parsing the `FULLTEXT` syntax but do not support using the `FULLTEXT` indexes. + `SPATIAL` (also known as `GIS`/`GEOMETRY`) functions, data types and indexes [#6347](https://github.com/pingcap/tidb/issues/6347) -+ Character sets other than `ascii`, `latin1`, `binary`, `utf8`, `utf8mb4`, and `gbk`. ++ Character sets other than `ascii`, `latin1`, `binary`, `utf8`, `utf8mb4`, `gbk`, and `gb18030`. + Optimizer trace + XML Functions + X-Protocol [#1109](https://github.com/pingcap/tidb/issues/1109) @@ -210,6 +210,8 @@ For more information, see [Compatibility between TiDB local temporary tables and * For information on the MySQL compatibility of the GBK character set, refer to [GBK compatibility](/character-set-gbk.md#mysql-compatibility) . +* For information on the MySQL compatibility of the GB18030 character set, refer to [GB18030 compatibility](/character-set-gb18030.md#mysql-compatibility). + * TiDB inherits the character set used in the table as the national character set. ### Storage engines diff --git a/sql-statements/sql-statement-show-collation.md b/sql-statements/sql-statement-show-collation.md index a6c840aaa225f..16bdcb00df4fc 100644 --- a/sql-statements/sql-statement-show-collation.md +++ b/sql-statements/sql-statement-show-collation.md @@ -25,11 +25,10 @@ ShowLikeOrWhere ::= ## Examples - +When the [new collation framework](https://docs.pingcap.com/tidb/stable/tidb-configuration-file/#new_collations_enabled_on_first_bootstrap) is enabled, in addition to the binary collations, TiDB also supports the following collations: -When [the new collation framework](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) is enabled (the default), the example output is as follows: - - +- Seven case- and accent-insensitive collations, ending with `_ci` +- `utf8mb4_0900_bin` ```sql SHOW COLLATION; @@ -41,6 +40,8 @@ SHOW COLLATION; +--------------------+---------+-----+---------+----------+---------+---------------+ | ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | | binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | +| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | | gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | | latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | @@ -53,33 +54,30 @@ SHOW COLLATION; | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | +--------------------+---------+-----+---------+----------+---------+---------------+ -13 rows in set (0.00 sec) +15 rows in set (0.000 sec) ``` - - -When the new collation framework is disabled, only binary collations are listed. +If [the new collation framework](https://docs.pingcap.com/tidb/stable/tidb-configuration-file/#new_collations_enabled_on_first_bootstrap) is disabled, TiDB supports only binary collations. ```sql SHOW COLLATION; ``` ``` -+-------------+---------+----+---------+----------+---------+---------------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | -+-------------+---------+----+---------+----------+---------+---------------+ -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | -| binary | binary | 63 | Yes | Yes | 1 | NO PAD | -| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | -| gbk_bin | gbk | 87 | Yes | Yes | 1 | PAD SPACE | -+-------------+---------+----+---------+----------+---------+---------------+ -6 rows in set (0.00 sec) ++-------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++-------------+---------+-----+---------+----------+---------+---------------+ +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | +| binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | +| gbk_bin | gbk | 87 | Yes | Yes | 1 | PAD SPACE | +| gb18030_bin | gb18030 | 249 | Yes | Yes | 1 | PAD SPACE | ++-------------+---------+-----+---------+----------+---------+---------------+ +7 rows in set (0.00 sec) ``` - - To filter on the character set, you can add a `WHERE` clause. ```sql