New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: update the reorg range after delay for async txn #38195
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
ddl/reorg.go
Outdated
if reorgInfo.mergingTmpIdx { | ||
start, end = tablecodec.GetTableIndexKeyRange(pid, tablecodec.TempIndexPrefix|elem.ID) | ||
} else { | ||
currentVer, err := getValidCurrentVersion(dCtx.store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the currentVer
value is changed. Before this PR, the version is got when region.first
is true. Maybe we can use job.SnapshotVer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use job.SnapshotVer
, we cannot get the latest data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it went back.
But if not, does it feel like whether it is "first" doesn't make much of a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the reorgInfo.first
is necessary right from the start... Let me refactor these code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK~, I mean that if it isn't necessary, we can do #38195 (comment) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ddl/stat_test.go
Outdated
@@ -89,7 +89,7 @@ func TestDDLStatsInfo(t *testing.T) { | |||
varMap, err := d.Stats(nil) | |||
wg.Done() | |||
require.NoError(t, err) | |||
require.Equal(t, "1", varMap[ddlJobReorgHandle]) | |||
require.Equal(t, "", varMap[ddlJobReorgHandle]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the test is changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the channel ddl.TestCheckWorkerNumCh
sends, the transaction that updating ddl_job_reorg_handle
is not committed.
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 333e389c93b956ef54f0344c23dd9c64aa26b091
|
/hold because add index test is timed out. |
In the Cloud CI,check_2 only have 8 core cpu to run this test case. it maybe an reason。 |
c1e99f4
to
b4fadfd
Compare
/merge |
This pull request has been accepted and is ready to merge. Commit hash: b4fadfd
|
// get the current version for reorganization if we don't have | ||
if d.lease > 0 { // Only delay when it's not in test. | ||
delayForAsyncCommit() | ||
} | ||
ver, err := getValidCurrentVersion(d.store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are the same codes. I think it can be merged into a single function.
info.first = true
...
ver, err := getValidCurrentVersion(d.store)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly speaking, getReorgInfoFromPartitions
should reuse getReorgInfo
since they are quite similar. Please note that getReorgInfoFromPartitions
is only used by the global index feature, which is experimental, unstable, and lacks maintenance.
We need to discuss how to handle this piece of code properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it may be improved after GA the global index feature.
/unhold |
In response to a cherrypick label: cannot checkout |
In response to a cherrypick label: cannot checkout |
In response to a cherrypick label: cannot checkout |
In response to a cherrypick label: cannot checkout |
TiDB MergeCI notify🔴 Bad News! New failing [1] after this pr merged.
|
What problem does this PR solve?
Issue Number: close #38165
Problem Summary:
Previously,
reorgInfo
is initialized ingetReorgInfo()
, which happens beforedelayForAsyncCommit()
. However, the reorg range may change after the delay(for example, an async commit transaction inserts a record). As a result, the backfill transaction does not add the corresponding index record.What is changed and how it works?
This PR moves the reorg info range initialization after
delayForAsyncCommit
. It makes sure no async transactions are active.Check List
Tests
It is not easy to write the unit test because client-go is involved. I follow the steps described in issue, the problem does not occur anymore.
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.