fix: propagate update_columns offsets and partial last_updated for RewriteColumns#6650
Conversation
e57ad14 to
ac9f48f
Compare
|
@claude review |
|
|
||
| final_fragments.extend(updated_frags); | ||
|
|
||
| if next_row_id.is_some() && matches!(update_mode, Some(RewriteColumns)) { | ||
| if let Some(UpdatedFragmentOffsets(off_map)) = updated_fragment_offsets { | ||
| if !off_map.is_empty() { | ||
| let new_version = current_manifest.map(|m| m.version + 1).unwrap_or(1); | ||
| let prev_version = current_manifest.map(|m| m.version).unwrap_or(0); | ||
| for fragment in final_fragments.iter_mut() { | ||
| let Some(bitmap) = off_map.get(&fragment.id) else { | ||
| continue; | ||
| }; | ||
| if bitmap.is_empty() { | ||
| continue; | ||
| } | ||
| let offsets: Vec<usize> = | ||
| bitmap.iter().map(|o| o as usize).collect(); | ||
| lance_table::rowids::version::refresh_row_latest_update_meta_for_partial_frag_rewrite_cols( | ||
| fragment, | ||
| &offsets, | ||
| new_version, | ||
| prev_version, | ||
| )?; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Partial RewriteColumns refresh fabricates last_updated_at_version for unmatched rows when a fragment in final_fragments has last_updated_at_version_meta == None. The helper at lance-table/src/rowids/version.rs:552-564 fills every base position with prev_version (the manifest version before the new commit), then overwrites only the matched offsets — so untouched rows in the same fragment are silently restamped, contradicting the PR's stated invariant that 'unmatched rows … are left unchanged'. Easiest fix at the new call site (transaction.rs:1907-1922): skip the helper for any fragment whose last_updated_at_version_meta is None so the no-meta state is preserved.
Extended reasoning...
Where the bug lives
The new code at rust/lance/src/dataset/transaction.rs:1902-1925 (added by this PR) iterates over final_fragments and calls
refresh_row_latest_update_meta_for_partial_frag_rewrite_cols for any fragment listed in updated_fragment_offsets.
That helper at rust/lance-table/src/rowids/version.rs:551-595 is pre-existing, but it has this fallback:
if let Some(meta) = fragment.last_updated_at_version_meta.as_ref() {
if let Ok(base_seq) = meta.load_sequence() {
for pos in 0..(row_count_u64 as usize) {
base_versions.push(base_seq.version_at(pos).unwrap_or(prev_version));
}
} else {
base_versions.resize(row_count_u64 as usize, prev_version);
}
} else {
base_versions.resize(row_count_u64 as usize, prev_version); // <-- bug source
}When last_updated_at_version_meta is None, every position in base_versions is initialised to prev_version (the manifest version before this commit). The matched offsets are then overwritten with current_version. Net effect on unmatched rows: their last_updated_at_version is silently set to prev_version even though they were never touched.
Why this matters for the new caller
In normal mainline flow, every fragment in final_fragments does have meta populated (Append, Overwrite, Update, Compaction all call build_version_meta). But two reachable cases break that assumption — and both are reachable via the new build_manifest partial-refresh path:
- Fragments produced by
Operation::Mergeare extended intofinal_fragmentswithout going throughbuild_version_meta(transaction.rs around 2065-2073). A subsequent partialRewriteColumnsagainst such a fragment will hit theNonearm. - Legacy stable-row-id datasets persisted before per-row version meta was emitted carry
last_updated_at_version_meta == Noneindefinitely. Once the partial-refresh path runs against them, it backfillsprev_versionfor every offset that wasn't matched — which then surfaces on every subsequent scan via the default atlance-table/src/utils/stream.rs:347-358.
The same shape applies inside the Some arm: base_seq.version_at(pos).unwrap_or(prev_version) will fabricate prev_version for any positional gap in an existing sparse sequence.
Why the new test doesn't catch this
test_build_manifest_partial_last_updated_rewrite_columns_stable_row_ids builds the dataset entirely via Dataset::write + Dataset::append. Both paths populate last_updated_at_version_meta = Some(...) (transaction.rs:1838-1841 and 1995-2008), so the None arm is never exercised. The assertion last_after == last_before passes only because the test never hits the buggy path — change the dataset construction to leave the fragment with last_updated_at_version_meta = None and the same assertion fails with last_after = prev_version.
Step-by-step proof
- Construct a stable-row-id dataset where fragment 0 has 8 rows and
last_updated_at_version_meta = None(e.g. viaOperation::Merge, or a legacy dataset). Suppose its rows were created at version V1 = 1 and the current manifest is at version Vn-1 = 5. - Call
update_columns_with_offsetsagainst fragment 0 matching only offset 2 →matched_offsets = {2}. - Build
Operation::Update { update_mode: Some(RewriteColumns), updated_fragment_offsets: Some({0 → bitmap{2}}), .. }. build_manifestreaches the new block at transaction.rs:1902-1925:prev_version = 5,new_version = 6. It calls the helper.- Helper sees
fragment.last_updated_at_version_meta == None→ resizesbase_versionsto[5,5,5,5,5,5,5,5]. - Loop overwrites offset 2 →
[5,5,6,5,5,5,5,5]. - Helper compresses runs and stores meta back on the fragment.
- Scan after commit (
stream.rs:347-358) reportslast_updated = 5for row offsets 0,1,3,4,5,6,7 — all rows that were not touched by this transaction. Before the commit they would have read as the default constant1.
So the user-observable effect is (BEFORE: last_updated = 1) → (AFTER: last_updated = 5) for seven untouched rows, plus the genuine update on offset 2. That contradicts the PR description's claim that 'unmatched rows … are left unchanged'.
Addressing the refutation
The strongest counter-argument is that the helper is pre-existing and that the only mainline path that called it (merge_insert.rs:1139) had the same fallback. Two responses:
- The helper is unchanged, but this PR adds a new caller that re-uses the same buggy fallback in a new flow (
Operation::Update { RewriteColumns }viabuild_manifest). 'Pre-existing in another caller' doesn't make the PR's own stated invariant correct. - The fix does not require touching the helper. The new call site at transaction.rs:1907-1922 can simply skip the call when
fragment.last_updated_at_version_meta.is_none(), preserving the no-meta state for legacy/Merge fragments. That's a one-line guard, in scope for this PR.
Suggested fix
At transaction.rs:1907-1922, before calling the helper:
if fragment.last_updated_at_version_meta.is_none() {
continue; // preserve no-meta state; do not fabricate prev_version for untouched rows
}Or, if a sparse meta covering only the matched offsets is desired, emit one directly from bitmap and new_version rather than going through the per-row vector that defaults to prev_version. Either approach honours the PR's stated invariant in the legacy/Merge path.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
ac9f48f to
ece9ddd
Compare
Summary
FileFragment::update_columnsreturnsResult<(Fragment, Vec<u32>)>(unchangedpublic shape).
update_columns_with_offsetsreturnsFragmentUpdateColumnsResult(fragment,
fields_modified,matched_offsets: RoaringBitmap) for callers that needphysical row indices for stable row-id metadata.
HashJoiner::matched_join_rows— boolean mask for hash hits; used byupdate_columns_with_offsetsand covered bytest_matched_join_rows.Operation::Update: optionalupdated_fragment_offsets: Option<UpdatedFragmentOffsets>where
UpdatedFragmentOffsetswrapsHashMap<u64, RoaringBitmap>(newtype withDefault,PartialEq, manualDeepSizeOf).Nonemeans the caller did not supplyoffsets.
transaction.proto): backward-compatiblemap<uint64, UInt32List> updated_fragment_offsets = 9on
Update; serde round-trip preserves semantics.build_manifest: when stable row IDs are enabled,update_mode == RewriteColumns,and
Some(UpdatedFragmentOffsets(..))includes a non-empty bitmap for a fragment,calls
refresh_row_latest_update_meta_for_partial_frag_rewrite_colsfor those offsetsonly — unmatched rows and untouched fragments are left unchanged.
FragmentUpdateResultincludes matched row offsets; the 2-arg constructor(FragmentMetadata, long[])delegates to the 3-arg form with an empty offset array forcompatibility. JNI uses
update_columns_with_offsets.update_columnsbinding correctly destructures the(Fragment, Vec<u32>)tuple.Root cause
For
Operation::UpdatewithRewriteColumns, commits could advance the dataset versionwithout advancing
_row_last_updated_at_versionfor the rows that were actually rewritten.update_columnsdid not report which physical offsets matched, andbuild_manifesthad noper-fragment offset map to drive the partial refresh. Without that information the transaction
layer cannot distinguish which rows changed, so the version metadata is not updated.
Implementation notes
RoaringBitmapiteration is ascending and duplicate-free; redundantsort/dedupwhen building proto lists or offset vectors from bitmaps were removed.
updated_fragment_offsets: None.Why the protobuf field exists
lance-spark passes
Transactionthrough JNI as a protobuf blob: Java builds aTransactionproto, Rust deserializes it and runs
build_manifest. Withoutupdated_fragment_offsetsonthe wire, the decoded
Operation::Updatewould always haveupdated_fragment_offsets: Noneeven when matched offsets were computed on the JVM side, and the partial refresh in
build_manifestwould silently do nothing.Test plan
cargo test -p lance test_matched_join_rows—HashJoiner::matched_join_rows.cargo test -p lance test_build_manifest_partial_last_updated_rewrite_columns_stable_row_ids—
Dataset::commit->build_manifest: two fragments, partialupdate_columns_with_offsets,Operation::UpdatewithRewriteColumnsand an offsetmap; asserts matched vs unmatched vs untouched row version metadata.
cargo test -p lance test_fragment_update— fragment path withOperation::Updateandoffsets.
cargo test -p lance --tests(or at leastcargo check -p lance --tests) andcargo check --manifest-path java/lance-jni/Cargo.toml.The
pylancecrate is excluded from the root workspace; validate Python bindings in theusual
maturin/ CI flow if you touchpython/.Compatibility
update_columnssignature unchanged;update_columns_with_offsetsis additive.FragmentUpdateResultconstructor preserved.