Skip to content

fix: abort explicit txn on lock bind change#24354

Merged
mergify[bot] merged 7 commits into
matrixorigin:mainfrom
LeftHandCold:fix-tpcc-bind-change-explicit-txn
May 12, 2026
Merged

fix: abort explicit txn on lock bind change#24354
mergify[bot] merged 7 commits into
matrixorigin:mainfrom
LeftHandCold:fix-tpcc-bind-change-explicit-txn

Conversation

@LeftHandCold
Copy link
Copy Markdown
Contributor

@LeftHandCold LeftHandCold commented May 12, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

refs #24346

What this PR does / why we need it:

This PR adds a necessary guard for explicit user transactions when lock retry sees ErrLockTableBindChanged after a CN restart / lockservice owner epoch change.

The existing retry path treats ErrLockTableBindChanged as retryable for all transactions. That remains acceptable for autocommit statements and internal retry paths, because the statement can be retried from the statement boundary without carrying explicit multi-statement transaction state. It is unsafe for explicit user transactions: retrying the lock inside the same transaction can continue with stale transaction / lock context across the old and new lock table binds after CN OOM/restart.

This PR changes lock retry so explicit user transactions return ErrLockTableBindChanged directly and let the frontend roll back the whole transaction. That includes:

  • explicit user transactions with autocommit=0
  • BEGIN transactions where the session autocommit variable remains enabled

Autocommit user statements and internal transactions keep the existing retry behavior, so this avoids changing the normal customer-facing autocommit retry path while still covering the TPCC explicit transaction failure mode.

This is not the full protocol-level fix for #24346. The complete fix needs lockservice old-bind active transaction fencing so CNs abort in-flight transactions that still hold lock table binds invalidated by allocator rebind/removal.

Also included:

  • unit tests covering explicit user transactions, BEGIN transactions, and preserved autocommit retry behavior

Validation

  • PROJECTROOT="/Users/shenjiangwei/Work/code/view/matrixone"
  • export CGO_CPPFLAGS="-I$PROJECTROOT/cgo -I$PROJECTROOT/thirdparties/install/include"
  • export CGO_CFLAGS="-I$PROJECTROOT/cgo -I$PROJECTROOT/thirdparties/install/include"
  • export CGO_LDFLAGS="-Wl,-w,-rpath,$PROJECTROOT/thirdparties/install/lib -L$PROJECTROOT/thirdparties/install/lib -L$PROJECTROOT/cgo -lmo"
  • go test ./pkg/sql/colexec/lockop -count=1

A lock table bind change means the lockservice owner epoch has changed. Retrying transparently inside an explicit user transaction can keep using stale transaction context across the old/new bind boundary after CN restart. Return ErrLockTableBindChanged directly for explicit user transactions so frontend rolls back the whole txn, while preserving retry for autocommit and internal transactions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 12, 2026 07:48
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 12, 2026

Merge Queue Status

  • Entered queue2026-05-12 11:20 UTC · Rule: main
  • Checks passed · in-place
  • Merged2026-05-12 12:32 UTC · at 952285f070f34d6d0055ee6a34cac1230d871da1 · squash

This pull request spent 1 hour 11 minutes 45 seconds in the queue, including 59 minutes 32 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants