Skip to content

fix: allow replace duplicate source keys#24497

Merged
mergify[bot] merged 8 commits into
matrixorigin:mainfrom
ck89119:issue-24473
May 22, 2026
Merged

fix: allow replace duplicate source keys#24497
mergify[bot] merged 8 commits into
matrixorigin:mainfrom
ck89119:issue-24473

Conversation

@ck89119
Copy link
Copy Markdown
Contributor

@ck89119 ck89119 commented May 20, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

fixes #24473

What this PR does / why we need it:

This PR fixes REPLACE INTO when the source rows in a single statement contain duplicate primary or unique keys.

The DEDUP hash build path previously treated duplicate build-side keys as an INSERT-style duplicate-entry error. That is correct for INSERT, but REPLACE should keep the final source row for the key and continue with the replacement flow.

Changes:

  • Add a REPLACE-only hash build policy to keep the last source row for duplicate build-side keys.
  • Preserve existing INSERT and non-REPLACE DEDUP duplicate-entry behavior.
  • Serialize the new HashBuild flag through remote pipeline execution.
  • Add hashbuild unit tests and REPLACE BVT coverage for duplicate source primary key and unique key cases.

Validation:

  • go test ./pkg/sql/colexec/hashbuild -count=1
  • go test ./pkg/sql/colexec/dedupjoin -count=1
  • go test ./pkg/sql/compile -run 'Test_convertToPipelineInstruction|Test_convertToVmInstruction' -count=1\n- go test ./pkg/pb/pipeline -count=1\n- git diff --check\n- Embedded SQL verification for duplicate source primary key and unique key REPLACE cases.

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copy link
Copy Markdown
Contributor

@aunjgr aunjgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: allow replace duplicate source keys

Request Changes

The cross-key dedup fix still drops required delete side effects for rows that are later discarded by another keep-last pass.

keepDiscardedRowsForDelete appends delete-only rows and seeds hb.DelRows, but resetHashStateForRebuild immediately clears hb.DelRows before the recursive rebuild. The rebuild then runs with a reduced InputBatchRowCount, so the discarded rows are never reprocessed and their delete effects are lost. That is the same failure mode as the original blocker: a source row that first deletes an existing PK conflict can be dropped later by a UK conflict, leaving the old row behind.

Please preserve the delete-only rows through the rebuild path, or restructure the dedup so delete effects survive later keep-last filtering. Add a regression test for a PK-conflict row that is later discarded by a UK-conflict row.

@ck89119
Copy link
Copy Markdown
Contributor Author

ck89119 commented May 22, 2026

Thanks for the review. I re-checked this against the current PR head 06a38a8d0265f98fb92c2706f6801d23edec5883.

The specific concern here looks stale on the current code. resetHashStateForRebuild does not clear hb.DelRows; it frees the old hash map / sels / unique keys / current vectors and clears IgnoreRows, while preserving the delete-row bitmap created by keepDiscardedRowsForDelete.

The current flow is:

  1. keepDiscardedRowsForDelete appends delete-only rows and marks them in hb.DelRows.
  2. The rebuild temporarily uses only the active row count, so the delete-only rows do not participate in the rebuilt hash map.
  3. After rebuild, InputBatchRowCount is restored to the total batch row count.
  4. The JoinMap carries both the restored row count and DelRows; dedupjoin skips deleted rows during probe matching and emits them as unmatched build rows during finalize, preserving the delete side effects.

I also re-ran the focused regression on the current head:

go test ./pkg/sql/colexec/hashbuild -run TestDedupBuildKeepLastPreservesDeleteOnlyRows -count=1

It passes. The SQL BVT cross-key regression is also already covered in test/distributed/cases/dml/replace/replace.test with:

create table t_replace_cross_key_keep_last(id int primary key, u int unique, v int);
insert into t_replace_cross_key_keep_last values (1, 100, 0);
replace into t_replace_cross_key_keep_last values (1, 200, 10), (2, 200, 20);
select * from t_replace_cross_key_keep_last order by id;

I also ran that replace.test through mo-tester locally after starting a fresh local MO service: 154/154 passed.

Copy link
Copy Markdown
Contributor

@aunjgr aunjgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: REPLACE duplicate source keys

Approve.

The previous cross-key delete-side-effect concern is now covered in both implementation and tests:

  • keep-last path preserves delete-only rows via keepDiscardedRowsForDelete
  • dedicated unit test validates preserved delete markers after rebuild (TestDedupBuildKeepLastPreservesDeleteOnlyRows)
  • BVT includes the cross-key REPLACE case (t_replace_cross_key_keep_last) and PK-not-first / NOT NULL-after-filter cases

I did not find a remaining blocking correctness issue in this revision.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 22, 2026

Merge Queue Status

  • Entered queue2026-05-22 10:29 UTC · Rule: main
  • Checks passed · in-place
  • Merged2026-05-22 11:49 UTC · at 6bff0b85848965c4eb518b0af38a0f1b1b250a9a · squash

This pull request spent 1 hour 20 minutes 7 seconds in the queue, including 1 hour 19 minutes 41 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] REPLACE INTO with duplicate keys in source returns duplicate entry

5 participants