fix: create index conflict with concurrent rewrite columns updates by zehiko · Pull Request #6493 · lance-format/lance

zehiko · 2026-04-13T08:26:32Z

What is this about?

While running concurrently index optimization along with Dataset updates we've observed incorrect query results. As we dug deeper I believe we've identified the root cause - index corruption when optimize_indices (CreateIndex) races with a partial-column MergeInsert (Update with RewriteColumns mode).

I obviously don't have deep knowledge of Lance internals, so I'm happy to learn that actually our usage of APIs was problematic or that this bug fix is not really the right way to go about it.

Investigation

I've built independent (to our own code base) small reproducer and managed to figure out that the key thing to reproduce was making sure that updates to the dataset are partial-column updates i.e. we try to update with the RecordBatch that has less columns then our Dataset. While doing such updates concurrently with optimize_indices calls, we've landed at situation where the index was corrupted (or more precisely - stale).

Repro example is basically something like:

 // 1. Create dataset with 15 columns, scalar Bitmap index on "my_status" column
 let dataset = Dataset::write(initial_batch, uri).await;
 dataset.create_index(&["my_status"], IndexType::Bitmap, ...).await;

 // 2. Concurrently:
 //    Thread A: register rows in 3 phases
 for row in rows {
     // Phase 1: full-schema upsert (my_status="bad", all 15 columns)
     merge_insert(dataset, full_schema_batch).await;       // → RewriteRows mode

     // Phase 2: partial-schema upsert (my_status="good", only 5 columns)
     merge_insert(dataset, partial_schema_batch).await;     // → RewriteColumns mode ← TRIGGER

     create_index(&["status"], replace=false).await;        // no-op, index exists
 }

 //    Thread B: maintenance loop
 for _ in 0..5 {
     compact_files(&mut dataset).await;
     optimize_indices(&mut dataset, merge=200).await;       // ← builds index from stale data
 }

 // 3. Final maintenance
 compact_files(&mut dataset).await;
 optimize_indices(&mut dataset).await;

 // 4. Verify
 assert_eq!(
     dataset.scan().filter("status = 'done'").count(),  // indexed scan — MISSES rows
     dataset.scan().count_where("status = 'done'"),     // full scan — correct
 );

Root cause details

When a MergeInsert updates fewer columns than the dataset schema, Lance takes the RewriteColumns path — rewriting column data files in-place within the same fragment (same
fragment ID, same row IDs, different column values). If optimize_indices concurrently builds a new index from the pre-update data and commits it, check_create_index_txn unconditionally allows it with the comment "row ids are still valid." The resulting index has stale values for the rewritten columns.

Since the fragment ID didn't change, effective_fragment_bitmap does not filter it out, and the stale index data is authoritative. Indexed scans return incorrect results while full scans return correct data.

Fix

When CreateIndex encounters a concurrent Update with RewriteColumns mode:

Check if the modified fields overlap with the new index's fields
Check if the rewritten fragments are in the index's fragment_bitmap (treating None as "covers everything")
If both: return a retryable conflict, forcing optimize_indices to re-read the updated data

RewriteRows (full-column upsert) and Delete remain unconditionally allowed — they remove the old fragment entirely, so effective_fragment_bitmap naturally filters
stale entries.

Testing done

new unit test
repro example failure went from 80% (and more) to 0 failures
our own internal codebase integration test failure rate went from 15-20% to 0.

…ed fields When a MergeInsert updates fewer columns than the dataset schema, Lance takes the RewriteColumns path — rewriting column data files in-place within the same fragment (same fragment ID, same row IDs, different column values). If optimize_indices concurrently builds a new index from the pre-update data and commits it, the conflict resolver in check_create_index_txn unconditionally allows it ("row ids are still valid"). The resulting index has stale values for the rewritten columns. Since the fragment ID didn't change, effective_fragment_bitmap does not filter it out, and the stale index data is authoritative. Indexed scans then return incorrect results while full scans return correct data. The fix: when CreateIndex encounters a concurrent Update with RewriteColumns mode, check whether the modified fields overlap with the new index's fields AND whether the rewritten fragments are in the index's fragment_bitmap (treating None as "covers everything" for safety). If so, return a retryable conflict, forcing optimize_indices to re-read the updated data. RewriteRows (full-column upsert) and Delete operations remain unconditionally allowed — they remove the old fragment entirely, so effective_fragment_bitmap naturally filters out stale entries.

github-actions · 2026-04-13T08:26:47Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

zehiko changed the title ~~Fix CreateIndex conflict with concurrent RewriteColumns updates~~ fix: CreateIndex conflict with concurrent RewriteColumns updates Apr 13, 2026

github-actions bot added the bug Something isn't working label Apr 13, 2026

zehiko changed the title ~~fix: CreateIndex conflict with concurrent RewriteColumns updates~~ fix: create index conflict with concurrent rewrite columns updates Apr 13, 2026

This was referenced Apr 16, 2026

Rerun PRs rerun-io/opensource#2

Open

fix: reject CreateIndex when concurrent RewriteColumns modified indexed fields rerun-io/lance#16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: create index conflict with concurrent rewrite columns updates#6493

fix: create index conflict with concurrent rewrite columns updates#6493
zehiko wants to merge 1 commit intolance-format:mainfrom
zehiko:fix/scalar-index-rewrite-columns-corruption

zehiko commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zehiko commented Apr 13, 2026

What is this about?

Investigation

Root cause details

Fix

Testing done

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant