fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

qidi1 · 2022-08-22T09:17:04Z

This is an automated cherry-pick of #2455

What problem does this PR solve?

What is changed and how it works?

Incorrect unique index key when the table is not intHandle
- Why does this problem occur
  
  In TiDB, if there are null values in two unique indexes, then these two unique indexes are not conflicting.
  In order to achieve the purpose of no conflict, TiDB encodes the rows with null in the unique index by adding the clustered index or RowID to the key of the unique index.
  But in TiSpark, when the clustered index is not Integer type, the clustered index will not be added after the encoded key. This leads to a conflict in TiSpark when inserting two rows with null values in the unique index when the clustered index is not of Integer type
- How do we solve this problem
  
  The original logic only adds handle to Rowley when any col in the unique index is if the handle is of type int. It is now added in all cases.
  The original code:
```
    if (handle.isInt()) {
      // set appendHandle=true when any col in the unique index is null.
       ....
    }
   ...
    }
    if (appendHandle) {
      Key key = TypedKey.toTypedKey(handle, IntegerType.BIGINT);
      keys[keys.length - 1] = key;
    }
```
  The changed code:
```
    // set appendHandle=true when any col in the unique index is null
     ....
    if (appendHandle) {
      Key key = TypedKey.toRawKey(handle.encodedAsKey());
      keys[keys.length - 1] = key;
    }
```

Duplicate values for unique indexes

Why does this problem occur

This problem is caused by the fact that When we insert data, we can't decode the Handle out of a row with a unique index conflict.
For example:

First, let's assume that the unique index data in TIKV is {1: “1”} and the cluster key data is {"1":1,“1“,0}.
Now we use TiSpark to insert a row of data {1,“2”,0} to TiDB.
1. determine whether there is a conflicting row, request the row with unique index data of 1 from TiKV and return the value as {1, "1"}, request the row with cluster index key of "2" from TiKV and return null.
2. resolve the cluster index key value from the conflicting row {1: "1"} and delete the row corresponding to the primary key. However, due to the error of decode function, we can't resolve the correct cluster index key, so we can't delete the row corresponding to the cluster index key,.
3. We insert the unique index data {1:"2"} and the primary key data {"2":1,“2”,0} into TiKV.
At this time, the unique index data in TiKV is {1:"2"} and the primary key data is {"1":1,“1”,0} and {"2":1,"2",0}. Two unique indexes of the same data appear in the database.

How we solve this problem

The index value layout is like that.

   // Value layout:
   //		+-- IndexValueVersion0  (with restore data, or common handle, or index is global)
   //		|
   //		|  Layout: TailLen | Options      | Padding      | [IntHandle] | [UntouchedFlag]
   //		|  Length:   1     | len(options) | len(padding) |    8        |     1
   //		|
   //		|  TailLen:       len(padding) + len(IntHandle) + len(UntouchedFlag)
   //		|  Options:       Encode some value for new features, such as common handle, new collations or global index.
   //		|                 See below for more information.
   //		|  Padding:       Ensure length of value always >= 10. (or >= 11 if UntouchedFlag exists.)
   //		|  IntHandle:     Only exists when table use int handles and index is unique.
   //		|  UntouchedFlag: Only exists when index is untouched.
   //		|
   //		+-- Old Encoding (without restore data, integer handle, local)
   //		|
   //		|  Layout: [Handle] | [UntouchedFlag]
   //		|  Length:   8      |     1
   //		|
   //		|  Handle:        Only exists in unique index.
   //		|  UntouchedFlag: Only exists when index is untouched.
   //		|
   //		|  If neither Handle nor UntouchedFlag exists, value will be one single byte '0' (i.e. []byte{'0'}).
   //		|  Length of value <= 9, use to distinguish from the new encoding.
   // 		|
   //		+-- IndexValueForClusteredIndexVersion1
   //		|
   //		|  Layout: TailLen |    VersionFlag  |    Version     ｜ Options      |   [UntouchedFlag]
   //		|  Length:   1     |        1        |      1         |  len(options) |         1
   //		|
   //		|  TailLen:       len(UntouchedFlag)
   //		|  Options:       Encode some value for new features, such as common handle, new collations or global index.
   //		|                 See below for more information.
   //		|  UntouchedFlag: Only exists when index is untouched.
   //		|
   //		|  Layout of Options:
   //		|
   //		|     Segment:             Common Handle                 |     Global Index      |   New Collation
   // 		|     Layout:  CHandle flag | CHandle Len | CHandle      | PidFlag | PartitionID |    restoreData
   //		|     Length:     1         | 2           | len(CHandle) |    1    |    8        |   len(restoreData)
   //		|

When the clustered index is not int type, the original decodeHandle cannot decode the value of unqiue index correctly, so we added a new decode method decodeIndexValueForClusteredIndexVersion1.The new decode code is same as TiDB

The logic for decoding is shown in the flowchart

 graph TD
      A[decodeHandleInUniqueIndexValue] --> B{isCommHandle}
      B -->|Yes| C{getIndexVersion}
      C-->|1|D[decode<br/>commonHandle version1]
        C-->|0|F[decode commmonHandle <br/>version0 new encoding]
      B --No-->E{encodeData <br/>length < 9}
      E --Yes--> G[decode version0<br/> old encoding]
      E --No--> H[decode version0<br/> new encoding]

Check List

Tests

Unit test
Integration test

Side effects

Possible performance regression

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

Signed-off-by: qidi1 <1083369179@qq.com>

ti-chi-bot · 2022-08-22T09:17:09Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Daemonxiao
shiyuhang0

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

qidi1 · 2022-08-22T09:18:23Z

/run-all-tests

qidi1 · 2022-08-22T13:49:52Z

/run-all-tests

qidi1 · 2022-08-25T01:05:09Z

/run-all-tests

xuanyu66 · 2022-08-26T03:23:43Z

/merge

ti-chi-bot · 2022-08-26T03:23:46Z

This pull request has been accepted and is ready to merge.

Commit hash: c5c2546

qidi1 and others added 2 commits August 21, 2022 08:16

This is an automated cherry-pick of pingcap#2455

50a7cfc

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

resovle conflict and format code

f1154d6

Signed-off-by: qidi1 <1083369179@qq.com>

Merge branch 'release-2.5' into cherry-pick-2455-to-release-2.5

e40584d

zhangyangyu mentioned this pull request Aug 22, 2022

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2509

Closed

shiyuhang0 approved these changes Aug 23, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 label Aug 23, 2022

Daemonxiao approved these changes Aug 24, 2022

View reviewed changes

ti-chi-bot added status/LGT2 and removed status/LGT1 labels Aug 24, 2022

xuanyu66 and others added 2 commits August 24, 2022 12:52

Merge branch 'release-2.5' into cherry-pick-2455-to-release-2.5

929a801

Merge branch 'release-2.5' into cherry-pick-2455-to-release-2.5

c5c2546

ti-chi-bot added the status/can-merge label Aug 26, 2022

ti-chi-bot merged commit 541f459 into pingcap:release-2.5 Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

qidi1 commented Aug 22, 2022

ti-chi-bot commented Aug 22, 2022 •

edited

qidi1 commented Aug 22, 2022

qidi1 commented Aug 22, 2022

qidi1 commented Aug 25, 2022

xuanyu66 commented Aug 26, 2022

ti-chi-bot commented Aug 26, 2022

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

Conversation

qidi1 commented Aug 22, 2022

What problem does this PR solve?

What is changed and how it works?

Check List

ti-chi-bot commented Aug 22, 2022 • edited

qidi1 commented Aug 22, 2022

qidi1 commented Aug 22, 2022

qidi1 commented Aug 25, 2022

xuanyu66 commented Aug 26, 2022

ti-chi-bot commented Aug 26, 2022

ti-chi-bot commented Aug 22, 2022 •

edited