Skip to content

fix: reject CreateIndex when concurrent RewriteColumns modified indexed fields#16

Merged
zehiko merged 1 commit intorelease-3.0.0from
fix/scalar-index-rewrite-columns-corruption
Apr 13, 2026
Merged

fix: reject CreateIndex when concurrent RewriteColumns modified indexed fields#16
zehiko merged 1 commit intorelease-3.0.0from
fix/scalar-index-rewrite-columns-corruption

Conversation

@zehiko
Copy link
Copy Markdown
Member

@zehiko zehiko commented Apr 13, 2026

What

When a MergeInsert updates fewer columns than the dataset schema, Lance takes the RewriteColumns path — rewriting column data files in-place within the same fragment (same fragment ID, same row IDs, different column values). If optimize_indices concurrently builds a new index from the pre-update data and commits it, the conflict resolver in check_create_index_txn unconditionally allows it ("row ids are still valid"). The resulting index has stale values for the rewritten columns.

Since the fragment ID didn't change, effective_fragment_bitmap does not filter it out, and the stale index data is authoritative. Indexed scans then return incorrect results while full scans return correct data.

The fix: when CreateIndex encounters a concurrent Update with RewriteColumns mode, check whether the modified fields overlap with the new index's fields AND whether the rewritten fragments are in the index's fragment_bitmap. If so, return a retryable conflict, forcing optimize_indices to re-read the updated data.

RewriteRows (full-column upsert) and Delete operations remain unconditionally allowed — they remove the old fragment entirely, so effective_fragment_bitmap naturally filters out stale entries.

Upstream PR: lance-format#6493

Testing done

  • new unit tests
  • ran small repro test - 90% failure before, 0 failures after
  • ran our integration test (that triggered the whole investigation), before we were seeing 15-20% failure, now no failures (300 runs)

…ed fields

When a MergeInsert updates fewer columns than the dataset schema, Lance
takes the RewriteColumns path — rewriting column data files in-place
within the same fragment (same fragment ID, same row IDs, different
column values). If optimize_indices concurrently builds a new index from
the pre-update data and commits it, the conflict resolver in
check_create_index_txn unconditionally allows it ("row ids are still
valid"). The resulting index has stale values for the rewritten columns.

Since the fragment ID didn't change, effective_fragment_bitmap does not
filter it out, and the stale index data is authoritative. Indexed scans
then return incorrect results while full scans return correct data.

The fix: when CreateIndex encounters a concurrent Update with
RewriteColumns mode, check whether the modified fields overlap with the
new index's fields AND whether the rewritten fragments are in the
index's fragment_bitmap. If so, return a retryable conflict, forcing
optimize_indices to re-read the updated data.

RewriteRows (full-column upsert) and Delete operations remain
unconditionally allowed — they remove the old fragment entirely, so
effective_fragment_bitmap naturally filters out stale entries.
@zehiko zehiko requested a review from andrea-reale April 13, 2026 07:58
@github-actions github-actions bot added the bug Something isn't working label Apr 13, 2026
Copy link
Copy Markdown
Member

@andrea-reale andrea-reale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing!

@zehiko zehiko merged commit e44ea68 into release-3.0.0 Apr 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants