Skip to content

fix: propagate update_columns offsets and partial last_updated for RewriteColumns#6650

Merged
jackye1995 merged 2 commits intolance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix
May 3, 2026
Merged

fix: propagate update_columns offsets and partial last_updated for RewriteColumns#6650
jackye1995 merged 2 commits intolance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix

Conversation

@jerryjch
Copy link
Copy Markdown
Contributor

Summary

  • Fixes Operation::Update with RewriteColumns: refresh per-row version metadata for matched rows #6505
  • FileFragment::update_columns returns Result<(Fragment, Vec<u32>)> (unchanged
    public shape). update_columns_with_offsets returns FragmentUpdateColumnsResult
    (fragment, fields_modified, matched_offsets: RoaringBitmap) for callers that need
    physical row indices for stable row-id metadata.
  • HashJoiner::matched_join_rows — boolean mask for hash hits; used by
    update_columns_with_offsets and covered by test_matched_join_rows.
  • Operation::Update: optional updated_fragment_offsets: Option<UpdatedFragmentOffsets>
    where UpdatedFragmentOffsets wraps HashMap<u64, RoaringBitmap> (newtype with
    Default, PartialEq, manual DeepSizeOf). None means the caller did not supply
    offsets.
  • Proto (transaction.proto): backward-compatible map<uint64, UInt32List> updated_fragment_offsets = 9
    on Update; serde round-trip preserves semantics.
  • build_manifest: when stable row IDs are enabled, update_mode == RewriteColumns,
    and Some(UpdatedFragmentOffsets(..)) includes a non-empty bitmap for a fragment,
    calls refresh_row_latest_update_meta_for_partial_frag_rewrite_cols for those offsets
    only — unmatched rows and untouched fragments are left unchanged.
  • JNI / Java: FragmentUpdateResult includes matched row offsets; the 2-arg constructor
    (FragmentMetadata, long[]) delegates to the 3-arg form with an empty offset array for
    compatibility. JNI uses update_columns_with_offsets.
  • Python: update_columns binding correctly destructures the (Fragment, Vec<u32>) tuple.

Root cause

For Operation::Update with RewriteColumns, commits could advance the dataset version
without advancing _row_last_updated_at_version for the rows that were actually rewritten.
update_columns did not report which physical offsets matched, and build_manifest had no
per-fragment offset map to drive the partial refresh. Without that information the transaction
layer cannot distinguish which rows changed, so the version metadata is not updated.

Implementation notes

  • RoaringBitmap iteration is ascending and duplicate-free; redundant sort / dedup
    when building proto lists or offset vectors from bitmaps were removed.
  • Call sites that do not populate offsets use updated_fragment_offsets: None.

Why the protobuf field exists

lance-spark passes Transaction through JNI as a protobuf blob: Java builds a Transaction
proto, Rust deserializes it and runs build_manifest. Without updated_fragment_offsets on
the wire, the decoded Operation::Update would always have updated_fragment_offsets: None
even when matched offsets were computed on the JVM side, and the partial refresh in
build_manifest would silently do nothing.

Test plan

  • cargo test -p lance test_matched_join_rowsHashJoiner::matched_join_rows.
  • cargo test -p lance test_build_manifest_partial_last_updated_rewrite_columns_stable_row_ids
    Dataset::commit -> build_manifest: two fragments, partial
    update_columns_with_offsets, Operation::Update with RewriteColumns and an offset
    map; asserts matched vs unmatched vs untouched row version metadata.
  • cargo test -p lance test_fragment_update — fragment path with Operation::Update and
    offsets.
  • cargo test -p lance --tests (or at least cargo check -p lance --tests) and
    cargo check --manifest-path java/lance-jni/Cargo.toml.

The pylance crate is excluded from the root workspace; validate Python bindings in the
usual maturin / CI flow if you touch python/.

Compatibility

  • Rust: update_columns signature unchanged; update_columns_with_offsets is additive.
  • Java: 2-arg FragmentUpdateResult constructor preserved.
  • Proto: field 9; older clients ignore unknown fields.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added bug Something isn't working python java labels Apr 30, 2026
@jerryjch jerryjch force-pushed the update-with-rewrite-columns-fix branch 2 times, most recently from e57ad14 to ac9f48f Compare April 30, 2026 22:59
@jackye1995
Copy link
Copy Markdown
Contributor

@claude review

Comment on lines 1899 to +1925

final_fragments.extend(updated_frags);

if next_row_id.is_some() && matches!(update_mode, Some(RewriteColumns)) {
if let Some(UpdatedFragmentOffsets(off_map)) = updated_fragment_offsets {
if !off_map.is_empty() {
let new_version = current_manifest.map(|m| m.version + 1).unwrap_or(1);
let prev_version = current_manifest.map(|m| m.version).unwrap_or(0);
for fragment in final_fragments.iter_mut() {
let Some(bitmap) = off_map.get(&fragment.id) else {
continue;
};
if bitmap.is_empty() {
continue;
}
let offsets: Vec<usize> =
bitmap.iter().map(|o| o as usize).collect();
lance_table::rowids::version::refresh_row_latest_update_meta_for_partial_frag_rewrite_cols(
fragment,
&offsets,
new_version,
prev_version,
)?;
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Partial RewriteColumns refresh fabricates last_updated_at_version for unmatched rows when a fragment in final_fragments has last_updated_at_version_meta == None. The helper at lance-table/src/rowids/version.rs:552-564 fills every base position with prev_version (the manifest version before the new commit), then overwrites only the matched offsets — so untouched rows in the same fragment are silently restamped, contradicting the PR's stated invariant that 'unmatched rows … are left unchanged'. Easiest fix at the new call site (transaction.rs:1907-1922): skip the helper for any fragment whose last_updated_at_version_meta is None so the no-meta state is preserved.

Extended reasoning...

Where the bug lives

The new code at rust/lance/src/dataset/transaction.rs:1902-1925 (added by this PR) iterates over final_fragments and calls
refresh_row_latest_update_meta_for_partial_frag_rewrite_cols for any fragment listed in updated_fragment_offsets.
That helper at rust/lance-table/src/rowids/version.rs:551-595 is pre-existing, but it has this fallback:

if let Some(meta) = fragment.last_updated_at_version_meta.as_ref() {
    if let Ok(base_seq) = meta.load_sequence() {
        for pos in 0..(row_count_u64 as usize) {
            base_versions.push(base_seq.version_at(pos).unwrap_or(prev_version));
        }
    } else {
        base_versions.resize(row_count_u64 as usize, prev_version);
    }
} else {
    base_versions.resize(row_count_u64 as usize, prev_version);   // <-- bug source
}

When last_updated_at_version_meta is None, every position in base_versions is initialised to prev_version (the manifest version before this commit). The matched offsets are then overwritten with current_version. Net effect on unmatched rows: their last_updated_at_version is silently set to prev_version even though they were never touched.

Why this matters for the new caller

In normal mainline flow, every fragment in final_fragments does have meta populated (Append, Overwrite, Update, Compaction all call build_version_meta). But two reachable cases break that assumption — and both are reachable via the new build_manifest partial-refresh path:

  • Fragments produced by Operation::Merge are extended into final_fragments without going through build_version_meta (transaction.rs around 2065-2073). A subsequent partial RewriteColumns against such a fragment will hit the None arm.
  • Legacy stable-row-id datasets persisted before per-row version meta was emitted carry last_updated_at_version_meta == None indefinitely. Once the partial-refresh path runs against them, it backfills prev_version for every offset that wasn't matched — which then surfaces on every subsequent scan via the default at lance-table/src/utils/stream.rs:347-358.

The same shape applies inside the Some arm: base_seq.version_at(pos).unwrap_or(prev_version) will fabricate prev_version for any positional gap in an existing sparse sequence.

Why the new test doesn't catch this

test_build_manifest_partial_last_updated_rewrite_columns_stable_row_ids builds the dataset entirely via Dataset::write + Dataset::append. Both paths populate last_updated_at_version_meta = Some(...) (transaction.rs:1838-1841 and 1995-2008), so the None arm is never exercised. The assertion last_after == last_before passes only because the test never hits the buggy path — change the dataset construction to leave the fragment with last_updated_at_version_meta = None and the same assertion fails with last_after = prev_version.

Step-by-step proof

  1. Construct a stable-row-id dataset where fragment 0 has 8 rows and last_updated_at_version_meta = None (e.g. via Operation::Merge, or a legacy dataset). Suppose its rows were created at version V1 = 1 and the current manifest is at version Vn-1 = 5.
  2. Call update_columns_with_offsets against fragment 0 matching only offset 2 → matched_offsets = {2}.
  3. Build Operation::Update { update_mode: Some(RewriteColumns), updated_fragment_offsets: Some({0 → bitmap{2}}), .. }.
  4. build_manifest reaches the new block at transaction.rs:1902-1925: prev_version = 5, new_version = 6. It calls the helper.
  5. Helper sees fragment.last_updated_at_version_meta == None → resizes base_versions to [5,5,5,5,5,5,5,5].
  6. Loop overwrites offset 2 → [5,5,6,5,5,5,5,5].
  7. Helper compresses runs and stores meta back on the fragment.
  8. Scan after commit (stream.rs:347-358) reports last_updated = 5 for row offsets 0,1,3,4,5,6,7 — all rows that were not touched by this transaction. Before the commit they would have read as the default constant 1.

So the user-observable effect is (BEFORE: last_updated = 1) → (AFTER: last_updated = 5) for seven untouched rows, plus the genuine update on offset 2. That contradicts the PR description's claim that 'unmatched rows … are left unchanged'.

Addressing the refutation

The strongest counter-argument is that the helper is pre-existing and that the only mainline path that called it (merge_insert.rs:1139) had the same fallback. Two responses:

  • The helper is unchanged, but this PR adds a new caller that re-uses the same buggy fallback in a new flow (Operation::Update { RewriteColumns } via build_manifest). 'Pre-existing in another caller' doesn't make the PR's own stated invariant correct.
  • The fix does not require touching the helper. The new call site at transaction.rs:1907-1922 can simply skip the call when fragment.last_updated_at_version_meta.is_none(), preserving the no-meta state for legacy/Merge fragments. That's a one-line guard, in scope for this PR.

Suggested fix

At transaction.rs:1907-1922, before calling the helper:

if fragment.last_updated_at_version_meta.is_none() {
    continue;  // preserve no-meta state; do not fabricate prev_version for untouched rows
}

Or, if a sparse meta covering only the matched offsets is desired, emit one directly from bitmap and new_version rather than going through the per-row vector that defaults to prev_version. Either approach honours the PR's stated invariant in the legacy/Merge path.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 95.01385% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment.rs 80.95% 8 Missing and 4 partials ⚠️
rust/lance/src/dataset/transaction.rs 97.96% 3 Missing and 2 partials ⚠️
rust/lance/src/dataset/hash_joiner.rs 97.22% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@jerryjch jerryjch force-pushed the update-with-rewrite-columns-fix branch from ac9f48f to ece9ddd Compare May 2, 2026 01:33
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jackye1995 jackye1995 merged commit 43c3780 into lance-format:main May 3, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operation::Update with RewriteColumns: refresh per-row version metadata for matched rows

2 participants