Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516

Merged
merged 5 commits into from Aug 26, 2022

Conversation

qidi1
Copy link
Contributor

@qidi1 qidi1 commented Aug 22, 2022

This is an automated cherry-pick of #2455

What problem does this PR solve?

What is changed and how it works?

  • Incorrect unique index key when the table is not intHandle

    • Why does this problem occur

      In TiDB, if there are null values in two unique indexes, then these two unique indexes are not conflicting.
      In order to achieve the purpose of no conflict, TiDB encodes the rows with null in the unique index by adding the clustered index or RowID to the key of the unique index.
      But in TiSpark, when the clustered index is not Integer type, the clustered index will not be added after the encoded key. This leads to a conflict in TiSpark when inserting two rows with null values in the unique index when the clustered index is not of Integer type

    • How do we solve this problem

      The original logic only adds handle to Rowley when any col in the unique index is if the handle is of type int. It is now added in all cases.
      The original code:

          if (handle.isInt()) {
            // set appendHandle=true when any col in the unique index is null.
             ....
          }
         ...
          }
          if (appendHandle) {
            Key key = TypedKey.toTypedKey(handle, IntegerType.BIGINT);
            keys[keys.length - 1] = key;
          }

      The changed code:

          // set appendHandle=true when any col in the unique index is null
           ....
          if (appendHandle) {
            Key key = TypedKey.toRawKey(handle.encodedAsKey());
            keys[keys.length - 1] = key;
          }
  • Duplicate values for unique indexes

    • Why does this problem occur

      This problem is caused by the fact that When we insert data, we can't decode the Handle out of a row with a unique index conflict.
      For example:

      First, let's assume that the unique index data in TIKV is {1: “1”} and the cluster key data is {"1":1,“1“,0}.
      Now we use TiSpark to insert a row of data {1,“2”,0} to TiDB.

      1. determine whether there is a conflicting row, request the row with unique index data of 1 from TiKV and return the value as {1, "1"}, request the row with cluster index key of "2" from TiKV and return null.

      2. resolve the cluster index key value from the conflicting row {1: "1"} and delete the row corresponding to the primary key. However, due to the error of decode function, we can't resolve the correct cluster index key, so we can't delete the row corresponding to the cluster index key,.

      3. We insert the unique index data {1:"2"} and the primary key data {"2":1,“2”,0} into TiKV.

      At this time, the unique index data in TiKV is {1:"2"} and the primary key data is {"1":1,“1”,0} and {"2":1,"2",0}. Two unique indexes of the same data appear in the database.

    • How we solve this problem

      The index value layout is like that.

         // Value layout:
         //		+-- IndexValueVersion0  (with restore data, or common handle, or index is global)
         //		|
         //		|  Layout: TailLen | Options      | Padding      | [IntHandle] | [UntouchedFlag]
         //		|  Length:   1     | len(options) | len(padding) |    8        |     1
         //		|
         //		|  TailLen:       len(padding) + len(IntHandle) + len(UntouchedFlag)
         //		|  Options:       Encode some value for new features, such as common handle, new collations or global index.
         //		|                 See below for more information.
         //		|  Padding:       Ensure length of value always >= 10. (or >= 11 if UntouchedFlag exists.)
         //		|  IntHandle:     Only exists when table use int handles and index is unique.
         //		|  UntouchedFlag: Only exists when index is untouched.
         //		|
         //		+-- Old Encoding (without restore data, integer handle, local)
         //		|
         //		|  Layout: [Handle] | [UntouchedFlag]
         //		|  Length:   8      |     1
         //		|
         //		|  Handle:        Only exists in unique index.
         //		|  UntouchedFlag: Only exists when index is untouched.
         //		|
         //		|  If neither Handle nor UntouchedFlag exists, value will be one single byte '0' (i.e. []byte{'0'}).
         //		|  Length of value <= 9, use to distinguish from the new encoding.
         // 		|
         //		+-- IndexValueForClusteredIndexVersion1
         //		|
         //		|  Layout: TailLen |    VersionFlag  |    Version     | Options      |   [UntouchedFlag]
         //		|  Length:   1     |        1        |      1         |  len(options) |         1
         //		|
         //		|  TailLen:       len(UntouchedFlag)
         //		|  Options:       Encode some value for new features, such as common handle, new collations or global index.
         //		|                 See below for more information.
         //		|  UntouchedFlag: Only exists when index is untouched.
         //		|
         //		|  Layout of Options:
         //		|
         //		|     Segment:             Common Handle                 |     Global Index      |   New Collation
         // 		|     Layout:  CHandle flag | CHandle Len | CHandle      | PidFlag | PartitionID |    restoreData
         //		|     Length:     1         | 2           | len(CHandle) |    1    |    8        |   len(restoreData)
         //		|
      

      When the clustered index is not int type, the original decodeHandle cannot decode the value of unqiue index correctly, so we added a new decode method decodeIndexValueForClusteredIndexVersion1.The new decode code is same as TiDB

      The logic for decoding is shown in the flowchart

       graph TD
            A[decodeHandleInUniqueIndexValue] --> B{isCommHandle}
            B -->|Yes| C{getIndexVersion}
            C-->|1|D[decode<br/>commonHandle version1]
              C-->|0|F[decode commmonHandle <br/>version0 new encoding]
            B --No-->E{encodeData <br/>length < 9}
            E --Yes--> G[decode version0<br/> old encoding]
            E --No--> H[decode version0<br/> new encoding]
      

Check List

Tests

  • Unit test
  • Integration test

Side effects

  • Possible performance regression

qidi1 and others added 2 commits August 21, 2022 08:16
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: qidi1 <1083369179@qq.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 22, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Daemonxiao
  • shiyuhang0

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@qidi1
Copy link
Contributor Author

qidi1 commented Aug 22, 2022

/run-all-tests

@qidi1
Copy link
Contributor Author

qidi1 commented Aug 22, 2022

/run-all-tests

@qidi1
Copy link
Contributor Author

qidi1 commented Aug 25, 2022

/run-all-tests

@xuanyu66
Copy link
Collaborator

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: c5c2546

@ti-chi-bot ti-chi-bot merged commit 541f459 into pingcap:release-2.5 Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants