fix: allow replace duplicate source keys#24497
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
aunjgr
left a comment
There was a problem hiding this comment.
Review: allow replace duplicate source keys
Request Changes
The cross-key dedup fix still drops required delete side effects for rows that are later discarded by another keep-last pass.
keepDiscardedRowsForDelete appends delete-only rows and seeds hb.DelRows, but resetHashStateForRebuild immediately clears hb.DelRows before the recursive rebuild. The rebuild then runs with a reduced InputBatchRowCount, so the discarded rows are never reprocessed and their delete effects are lost. That is the same failure mode as the original blocker: a source row that first deletes an existing PK conflict can be dropped later by a UK conflict, leaving the old row behind.
Please preserve the delete-only rows through the rebuild path, or restructure the dedup so delete effects survive later keep-last filtering. Add a regression test for a PK-conflict row that is later discarded by a UK-conflict row.
|
Thanks for the review. I re-checked this against the current PR head The specific concern here looks stale on the current code. The current flow is:
I also re-ran the focused regression on the current head: go test ./pkg/sql/colexec/hashbuild -run TestDedupBuildKeepLastPreservesDeleteOnlyRows -count=1It passes. The SQL BVT cross-key regression is also already covered in create table t_replace_cross_key_keep_last(id int primary key, u int unique, v int);
insert into t_replace_cross_key_keep_last values (1, 100, 0);
replace into t_replace_cross_key_keep_last values (1, 200, 10), (2, 200, 20);
select * from t_replace_cross_key_keep_last order by id;I also ran that |
aunjgr
left a comment
There was a problem hiding this comment.
Re-review: REPLACE duplicate source keys
Approve.
The previous cross-key delete-side-effect concern is now covered in both implementation and tests:
- keep-last path preserves delete-only rows via
keepDiscardedRowsForDelete - dedicated unit test validates preserved delete markers after rebuild (
TestDedupBuildKeepLastPreservesDeleteOnlyRows) - BVT includes the cross-key REPLACE case (
t_replace_cross_key_keep_last) and PK-not-first / NOT NULL-after-filter cases
I did not find a remaining blocking correctness issue in this revision.
Merge Queue Status
This pull request spent 1 hour 20 minutes 7 seconds in the queue, including 1 hour 19 minutes 41 seconds running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
fixes #24473
What this PR does / why we need it:
This PR fixes REPLACE INTO when the source rows in a single statement contain duplicate primary or unique keys.
The DEDUP hash build path previously treated duplicate build-side keys as an INSERT-style duplicate-entry error. That is correct for INSERT, but REPLACE should keep the final source row for the key and continue with the replacement flow.
Changes:
Validation:
go test ./pkg/sql/colexec/hashbuild -count=1go test ./pkg/sql/colexec/dedupjoin -count=1go test ./pkg/sql/compile -run 'Test_convertToPipelineInstruction|Test_convertToVmInstruction' -count=1\n-go test ./pkg/pb/pipeline -count=1\n-git diff --check\n- Embedded SQL verification for duplicate source primary key and unique key REPLACE cases.