Skip to content

fix: refresh last_updated metadata on Operation::Merge for rewritten fragments#6640

Merged
jackye1995 merged 2 commits intolance-format:mainfrom
jerryjch:fix-merge-per-row-version-metadata
May 4, 2026
Merged

fix: refresh last_updated metadata on Operation::Merge for rewritten fragments#6640
jackye1995 merged 2 commits intolance-format:mainfrom
jerryjch:fix-merge-per-row-version-metadata

Conversation

@jerryjch
Copy link
Copy Markdown
Contributor

@jerryjch jerryjch commented Apr 29, 2026

Summary

  • Fixes Operation::Merge: refresh per-row version metadata for rewritten fragments #6519
  • In build_manifest, when stable row IDs are enabled (next_row_id.is_some()), the
    Operation::Merge arm now bumps last_updated_at_version_meta for fragments whose
    on-disk data changed versus the previous manifest, using
    refresh_row_latest_update_meta_for_full_frag_rewrite_cols(fragment, new_version).
  • Rewrite vs carry-forward: refresh only when Fragment::files differs from the prior
    manifest for the same fragment id (column data rewritten); skip when files match
    (metadata-only / carry-forward). Deletion-file-only changes do not trigger refresh.
  • New fragment ids with no prior manifest row still get a full-fragment refresh so
    last_updated is not left unset.
  • Concurrent-transaction safety: build_manifest is called fresh on every commit retry
    with the latest committed manifest, so new_version is always consistent with the actual
    committed version regardless of intervening transactions.

Root cause

Operation::Merge previously appended merged fragments without updating per-row
last_updated metadata for physical rewrites. Dataset version advanced, but
_row_last_updated_at_version could stay stale for rewritten rows, breaking CDF semantics
for merges that commit through Operation::Merge (e.g. ADD COLUMNS-style materialization and
other callers sharing this arm).

Test plan

  • cargo test -p lance merge_build_manifest — covers files change -> last_updated -> new
    version, same files -> unchanged, and stable row IDs disabled -> no last_updated metadata.
  • Existing lance CI / regression suites unchanged aside from these tests.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@jerryjch jerryjch force-pushed the fix-merge-per-row-version-metadata branch from 6d332c5 to 1ca31f2 Compare April 29, 2026 05:41
@jerryjch jerryjch changed the title fix: Operation::Merge: refresh per-row version metadata for rewritten fragments fix: refresh last_updated metadata on Operation::Merge for rewritten fragments Apr 29, 2026
@jerryjch jerryjch force-pushed the fix-merge-per-row-version-metadata branch 2 times, most recently from 0f6d501 to 613a87e Compare April 30, 2026 21:01
@jackye1995
Copy link
Copy Markdown
Contributor

@claude review

Comment on lines +3520 to +3531
/// Returns true if Operation::Merge rewrote this fragment's column data files (Fragment::files
/// changed versus the previous manifest). Used to bump last_updated_at_version_meta only when
/// new column values were materialized to disk.
///
/// Deletion file changes alone are not treated as rewrites: tombstones remove rows but
/// survivors did not receive new column bytes; stamping last_updated for those rows would be
/// incorrect for CDF.
#[inline]
fn merge_fragment_physically_rewritten(prev: &Fragment, merged: &Fragment) -> bool {
debug_assert_eq!(prev.id, merged.id);
prev.files != merged.files
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 merge_fragment_physically_rewritten compares Fragment.files via Vec::PartialEq, which transitively compares CachedFileSize — an interior-mutable AtomicU64 that scheduler.rs:632 writes in place on the live manifest's atomics. A caller that constructs merged fragments by cloning from get_fragments() (which snapshots CachedFileSize) and a concurrent scan that flips the original atomic between clone-time and build_manifest can drive prev.files != merged.files to true on identical files, falsely triggering refresh_row_latest_update_meta_for_full_frag_rewrite_cols and overwriting carry-forward last_updated history. Fix: compare only the immutable identity fields (path, fields, column_indices, file_major_version, file_minor_version, base_id), or normalize CachedFileSize before comparing.

Extended reasoning...

Bug

merge_fragment_physically_rewritten in transaction.rs:3520-3531 decides whether the carry-forward path needs to refresh per-row last_updated metadata by comparing prev.files != merged.files. Vec::eq uses the derived PartialEq on DataFile (rust/lance-table/src/format/fragment.rs:27), which compares all fields including file_size_bytes: CachedFileSize.

CachedFileSize wraps an AtomicU64 (rust/lance-io/src/utils.rs:177). Critically:

  • CachedFileSize::eq loads the AtomicU64 and compares the value (utils.rs:226-231).
  • CachedFileSize::set mutates the AtomicU64 in place via Relaxed store (utils.rs:252-255), through a shared reference.
  • Clone for CachedFileSize copies a snapshot of the current value into a new AtomicU64 (utils.rs:218-224); subsequent set() on the original is not reflected in the clone.

scheduler.rs:632 calls file_size_bytes.set(size) on the caller-supplied CachedFileSize whenever a file's size becomes known. Callers in dataset/optimize.rs:516, dataset/optimize/binary_copy.rs:263, and dataset/fragment.rs:987 pass &data_file.file_size_bytes from the live dataset.manifest.fragments — i.e. they mutate the live manifest's atomic.

In build_manifest (rust/lance/src/io/commit.rs:800), current_manifest is dataset.manifest.as_ref(), so prev.files[].file_size_bytes is the live atomic. The merged fragments come from caller-side construction, which for several flows starts from a Clone of get_fragments() (a frozen snapshot).

Trigger sequence

  1. T1: caller clones the existing fragments via get_fragments() → CachedFileSize snapshot, value = 0 (unknown).
  2. T2: a concurrent scan or compaction running on the same Dataset opens those files via open_file_with_priority(..., &df.file_size_bytes) → the live atomic is set to N. The snapshot is unaffected.
  3. T3: caller commits an Operation::Merge that carries the fragment forward unchanged. build_manifest evaluates prev.files != merged.files: prev side reads N from the live atomic, merged side reads 0 from the snapshot → equality returns false despite identical path/fields/versions/base_id.
  4. T4: merge_fragment_physically_rewritten returns true → refresh_row_latest_update_meta_for_full_frag_rewrite_cols stamps every row with new_version, destroying the preserved per-row last_updated_at_version_meta history that the caller intended to carry forward.

This silently breaks CDF semantics for the carry-forward case: rows are reported as 'updated at v_new' even though no column data was actually rewritten.

Why current in-tree callers do not exercise it

The two confirmed callers — add_columns/alter_columns and drop_columns_impl — change Fragment::files length (append a new file or drop one), so Vec::eq short-circuits on length and CachedFileSize never participates. The AllNulls path does not access files, so both atomics stay at zero. The PR's own carry-forward test (merge_build_manifest_skips_refresh_when_carry_forward_stable_row_ids) constructs both DataFiles with the same Option<NonZero>=None and runs no concurrent scan, so it does not exercise the race. The vulnerability is latent: any future external caller (Python/JNI Operation::Merge construction) that submits a true carry-forward fragment whose live manifest atomic gets populated mid-flight will hit it.

Suggested fix

Compare only the immutable identity fields, e.g.:

fn data_file_identity_eq(a: &DataFile, b: &DataFile) -> bool {
    a.path == b.path
        && a.fields == b.fields
        && a.column_indices == b.column_indices
        && a.file_major_version == b.file_major_version
        && a.file_minor_version == b.file_minor_version
        && a.base_id == b.base_id
}

Then iterate prev.files / merged.files pairwise. Alternatively, normalize file_size_bytes on both sides before the Vec equality check. The current debug_assert_eq!(prev.id, merged.id) plus a length check also belong in the helper.

Comment on lines +2031 to +2039
None => {
// Fragment id not present in the previous manifest (Merge may append
// ids beyond the original list). Apply the same full-fragment
// last_updated refresh as for a rewrite.
lance_table::rowids::version::refresh_row_latest_update_meta_for_full_frag_rewrite_cols(
fragment,
new_version,
)?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 In the new Operation::Merge None branch (lines 2031-2039), brand-new fragment IDs only have last_updated_at_version_meta initialized; created_at_version_meta is left at whatever the caller passed (typically None). This is inconsistent with Operation::Append (1810-1814) and Operation::Overwrite (1949-1953), which set both fields via build_version_meta for genuinely new fragments. No in-tree caller currently appends new fragment IDs through Merge, but the Python transaction API exposes Operation::Merge directly (python/src/transaction.rs:313), so external callers reaching this branch will see _row_created_at_version come back NULL while _row_last_updated_at_version is populated. Suggest mirroring the Append pattern in the None arm: build the version meta once and assign to both fields.

Extended reasoning...

What the bug is

The PR adds two arms inside the new Operation::Merge block in build_manifest:

  • Some(prev) branch (existing fragment id) — refresh last_updated only if Fragment::files changed. This is correct: the rows already existed in a prior version, so created_at must be preserved as-is.
  • None branch (fragment id not in previous manifest, i.e. brand-new) — unconditionally calls refresh_row_latest_update_meta_for_full_frag_rewrite_cols.

refresh_row_latest_update_meta_for_full_frag_rewrite_cols (rust/lance-table/src/rowids/version.rs:493-520) only writes fragment.last_updated_at_version_meta = Some(version_meta); at line 516. It never touches created_at_version_meta. So a brand-new fragment introduced via Merge will end up with last_updated stamped to new_version and created_at left as whatever the caller supplied — typically None.

Why this is inconsistent with the rest of the file

The established pattern for newly-introduced fragments under stable row IDs is to populate both fields:

// Operation::Append (lines 1810-1814)
let version_meta = build_version_meta(fragment, new_version);
fragment.last_updated_at_version_meta = version_meta.clone();
fragment.created_at_version_meta = version_meta;

Operation::Overwrite does the same at 1949-1953, and Update's resolve_update_version_metadata does the same for genuinely new rows. The None branch is the only place a brand-new fragment gets last_updated set without created_at.

Step-by-step proof

  1. External caller (e.g. via the Python API at python/src/transaction.rs:313) constructs an Operation::Merge with a fragment whose id is greater than any id in the previous manifest, with row_id_meta = Some(...), physical_rows = Some(N), and both *_version_meta fields left as None (the natural default for a freshly-built Fragment).
  2. merge_fragments_valid (transaction.rs:3536) accepts this — test_merge_fragments_valid Test 4 explicitly asserts that adding new fragment IDs is valid.
  3. build_manifest enters the Merge arm, prev_by_id.get(&fragment.id) returns None, the None arm runs, and refresh_row_latest_update_meta_for_full_frag_rewrite_cols populates only last_updated_at_version_meta to new_version.
  4. created_at_version_meta stays None.
  5. A subsequent CDF query against the new rows returns new_version for _row_last_updated_at_version but NULL for _row_created_at_version — for rows that demonstrably did not exist before new_version.

Why I disagree with the refutation

The refutation argues that Merge is caller-controlled and the asymmetry mirrors the Some(prev) arm (which also doesn't touch created_at). But the Some(prev) arm is preserving created_at from a row that genuinely was created in an earlier version — that is correct. The None arm by contrast is overwriting the caller's last_updated (treating it the same way Append does) while leaving created_at un-overwritten. If the contract is "caller fully owns version meta," the None branch should not be touching last_updated either; if the contract is "merge stamps brand-new fragments like Append does," it should also stamp created_at. The current half-and-half behavior is the inconsistency.

The refutation also notes (correctly) that no in-tree caller exercises this path today (schema_evolution.rs and others all derive fragments from get_fragments()), which is why I'm flagging this as a nit rather than blocking. But the None branch was deliberately added by this PR with a comment explicitly calling out that "Merge may append ids beyond the original list," and the Python API is a public surface that can hit it.

How to fix

Mirror the Append/Overwrite pattern in the None arm:

None => {
    let version_meta = build_version_meta(fragment, new_version);
    fragment.last_updated_at_version_meta = version_meta.clone();
    fragment.created_at_version_meta = version_meta;
}

Either that, or drop the None arm entirely if the case is genuinely impossible — but that contradicts merge_fragments_valid and the comment justifying the branch.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 98.84170% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/transaction.rs 98.84% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@jerryjch jerryjch force-pushed the fix-merge-per-row-version-metadata branch from 613a87e to c1f662f Compare May 2, 2026 00:38
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operation::Merge: refresh per-row version metadata for rewritten fragments

2 participants