Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions sql-statements/sql-statement-create-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ IndexOption ::=
| 'COMMENT' stringLit
| ("VISIBLE" | "INVISIBLE")
| ("GLOBAL" | "LOCAL")
| 'WHERE' Expression

IndexTypeName ::=
'BTREE'
Expand Down Expand Up @@ -366,6 +367,138 @@ See [Index Selection - Use multi-valued indexes](/choose-index.md#use-multi-valu
- If a table uses multi-valued indexes, you cannot back up, replicate, or import the table using BR, TiCDC, or TiDB Lightning to a TiDB cluster earlier than v6.6.0.
- For a query with complex conditions, TiDB might not be able to select multi-valued indexes. For information on the condition patterns supported by multi-valued indexes, refer to [Use multi-valued indexes](/choose-index.md#use-multi-valued-indexes).

## Partial indexes

A partial index is an index built on a subset of rows in a table, defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those rows that satisfy the predicate.

### Create partial indexes

You can create partial indexes by adding a `WHERE` clause to the index definition. For example:

```sql
CREATE TABLE t1 (c1 INT, c2 INT, c3 TEXT);
CREATE INDEX idx1 ON t1 (c1) WHERE c2 > 10;
```

You can also create partial indexes using `ALTER TABLE`:

```sql
ALTER TABLE t1 ADD INDEX idx2 (c1, c2) WHERE c3 = 'abc';
```

Or specify the partial index when creating the table:

```sql
CREATE TABLE t2 (
id INT PRIMARY KEY,
status VARCHAR(20),
created_at DATETIME,
INDEX idx_active_status (status) WHERE status = 'active'
);
```

### Usage examples

The following examples demonstrate how to use partial indexes effectively:

```sql
-- Create a table with user data
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
status varchar(20),
created_at DATETIME,
score INT
);

-- Create partial indexes for common query patterns
CREATE INDEX idx_active_users ON users (name) WHERE status = 'active';
CREATE INDEX idx_high_score_users ON users (created_at) WHERE score > 1000;
CREATE INDEX idx_pending_status ON users (created_at) WHERE status = 'pending';
```

Then the following queries can use the partial index:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The current phrasing "Then the following queries can use the partial index:" is slightly misleading, as the last example (SELECT * FROM users WHERE status = 'pending';) does not use the partial index. To improve clarity, I suggest explaining that for a partial index to be used, the query must benefit from the indexed columns.

Suggested change
Then the following queries can use the partial index:
The following examples show how the optimizer uses (or does not use) partial indexes based on the query structure. For a partial index to be used, the query must not only match the `WHERE` condition of the index but also benefit from the indexed columns for filtering or ordering.


```
mysql> explain SELECT * FROM users WHERE status = 'active' AND name = 'John';
+-------------------------------+---------+-----------+-------------------------------------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+-------------------------------------------+-------------------------------------------------------+
| IndexLookUp_9 | 1.00 | root | | |
| ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:users, index:idx_active_users(name) | range:["John","John"], keep order:false, stats:pseudo |
| └─Selection_8(Probe) | 1.00 | cop[tikv] | | eq(test.users.status, "active") |
| └─TableRowIDScan_7 | 10.00 | cop[tikv] | table:users | keep order:false, stats:pseudo |
+-------------------------------+---------+-----------+-------------------------------------------+-------------------------------------------------------+
4 rows in set (0.00 sec)

mysql> explain SELECT * FROM users WHERE status = 'active' ORDER BY name;
+-------------------------------+----------+-----------+-------------------------------------------+---------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+----------+-----------+-------------------------------------------+---------------------------------+
| IndexLookUp_18 | 10.00 | root | | |
| ├─IndexFullScan_15(Build) | 10000.00 | cop[tikv] | table:users, index:idx_active_users(name) | keep order:true, stats:pseudo |
| └─Selection_17(Probe) | 10.00 | cop[tikv] | | eq(test.users.status, "active") |
| └─TableRowIDScan_16 | 10000.00 | cop[tikv] | table:users | keep order:false, stats:pseudo |
+-------------------------------+----------+-----------+-------------------------------------------+---------------------------------+
4 rows in set (0.00 sec)

mysql> explain SELECT * FROM users WHERE score > 10000 ORDER BY created_at;
+-------------------------------+----------+-----------+-----------------------------------------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+----------+-----------+-----------------------------------------------------+--------------------------------+
| IndexLookUp_18 | 3333.33 | root | | |
| ├─IndexFullScan_15(Build) | 10000.00 | cop[tikv] | table:users, index:idx_high_score_users(created_at) | keep order:true, stats:pseudo |
| └─Selection_17(Probe) | 3333.33 | cop[tikv] | | gt(test.users.score, 10000) |
| └─TableRowIDScan_16 | 10000.00 | cop[tikv] | table:users | keep order:false, stats:pseudo |
+-------------------------------+----------+-----------+-----------------------------------------------------+--------------------------------+
4 rows in set (0.00 sec)

mysql> explain SELECT * FROM users WHERE status = 'pending';
+-------------------------+----------+-----------+---------------+----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+---------------+----------------------------------+
| TableReader_8 | 10.00 | root | | data:Selection_7 |
| └─Selection_7 | 10.00 | cop[tikv] | | eq(test.users.status, "pending") |
| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:users | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+----------------------------------+
3 rows in set (0.00 sec)
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This example correctly shows that the optimizer might not choose the partial index. To help users understand why, I suggest adding a note explaining that since the query doesn't filter or order by created_at, a full table scan is more efficient.

Suggested change
```
> **Note:**
> In this case, although the `WHERE status = 'pending'` condition matches the partial index `idx_pending_status (created_at) WHERE status = 'pending'`, the query does not filter or order by `created_at`. Therefore, the optimizer chooses a full table scan as it is more efficient.


If the predicates in query don't meet the index definition, the index will not be chosen even with hint:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The phrase "don't meet the index definition" is a bit vague. For better clarity, I suggest rephrasing to explain that the query's WHERE clause must imply the condition of the partial index for it to be used.

Suggested change
If the predicates in query don't meet the index definition, the index will not be chosen even with hint:
If the query's `WHERE` clause does not imply the condition of the partial index, the index will not be used, even with a hint:


```
mysql> explain SELECT * FROM users use index(idx_high_score_users) WHERE score > 100 ORDER BY created_at;
+---------------------------+----------+-----------+---------------+--------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+----------+-----------+---------------+--------------------------------+
| Sort_5 | 3333.33 | root | | test.users.created_at |
| └─TableReader_11 | 3333.33 | root | | data:Selection_10 |
| └─Selection_10 | 3333.33 | cop[tikv] | | gt(test.users.score, 100) |
| └─TableFullScan_9 | 10000.00 | cop[tikv] | table:users | keep order:false, stats:pseudo |
+---------------------------+----------+-----------+---------------+--------------------------------+
```

### When to use partial indexes

Partial indexes are particularly useful in the following scenarios:

- **Selective filtering**: When you frequently query a small subset of rows based on specific conditions
- **Conditional uniqueness**: When you need unique constraints that only apply under certain conditions

### Limitations

- The `WHERE` clause in partial indexes supports basic comparison operators (`=`, `!=`, `<`, `<=`, `>`, `>=`) and `IN` predicates with constant values
- The types of the column and constant value should be the same
- The predicate can only reference columns from the same table
- Partial indexes cannot be used on expression indexes

### Performance benefits

Partial indexes offer several advantages:

1. **Reduced storage**: Only rows matching the predicate are indexed, saving storage space
2. **Faster DML**: It'll be faster to maintain the index of a subset of data during INSERT, UPDATE, and DELETE operations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The use of "It'll" is informal. I suggest rephrasing for a more professional tone and improved clarity, and adding backticks to DML operations as per the style guide.

Suggested change
2. **Faster DML**: It'll be faster to maintain the index of a subset of data during INSERT, UPDATE, and DELETE operations
2. **Faster DML**: Index maintenance during `INSERT`, `UPDATE`, and `DELETE` operations is faster, as only a subset of data is indexed.


## Invisible index

By default, invisible indexes are indexes that are ignored by the query optimizer:
Expand Down