New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516
fix: Incorrect unique index key when table is not intHandle & Duplicate values for unique indexes (#2455) #2516
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: qidi1 <1083369179@qq.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/run-all-tests |
/run-all-tests |
/run-all-tests |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: c5c2546
|
This is an automated cherry-pick of #2455
What problem does this PR solve?
What is changed and how it works?
Incorrect unique index key when the table is not intHandle
Why does this problem occur
In TiDB, if there are null values in two unique indexes, then these two unique indexes are not conflicting.
In order to achieve the purpose of no conflict, TiDB encodes the rows with null in the unique index by adding the clustered index or RowID to the key of the unique index.
But in TiSpark, when the clustered index is not Integer type, the clustered index will not be added after the encoded key. This leads to a conflict in TiSpark when inserting two rows with null values in the unique index when the clustered index is not of Integer type
How do we solve this problem
The original logic only adds handle to Rowley when any col in the unique index is if the handle is of type int. It is now added in all cases.
The original code:
The changed code:
Duplicate values for unique indexes
Why does this problem occur
This problem is caused by the fact that When we insert data, we can't decode the Handle out of a row with a unique index conflict.
For example:
First, let's assume that the unique index data in TIKV is {1: “1”} and the cluster key data is {"1":1,“1“,0}.
Now we use TiSpark to insert a row of data {1,“2”,0} to TiDB.
determine whether there is a conflicting row, request the row with unique index data of 1 from TiKV and return the value as {1, "1"}, request the row with cluster index key of "2" from TiKV and return null.
resolve the cluster index key value from the conflicting row {1: "1"} and delete the row corresponding to the primary key. However, due to the error of decode function, we can't resolve the correct cluster index key, so we can't delete the row corresponding to the cluster index key,.
We insert the unique index data {1:"2"} and the primary key data {"2":1,“2”,0} into TiKV.
At this time, the unique index data in TiKV is {1:"2"} and the primary key data is {"1":1,“1”,0} and {"2":1,"2",0}. Two unique indexes of the same data appear in the database.
How we solve this problem
The index value layout is like that.
When the clustered index is not int type, the original decodeHandle cannot decode the value of unqiue index correctly, so we added a new decode method
decodeIndexValueForClusteredIndexVersion1
.The new decode code is same as TiDBThe logic for decoding is shown in the flowchart
Check List
Tests
Side effects