-
Notifications
You must be signed in to change notification settings - Fork 650
fix: refresh last_updated metadata on Operation::Merge for rewritten fragments #6640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2011,7 +2011,38 @@ impl Transaction { | |
| final_fragments.extend(maybe_existing_fragments?.clone()); | ||
| } | ||
| Operation::Merge { fragments, .. } => { | ||
| final_fragments.extend(fragments.clone()); | ||
| let mut merged_fragments = fragments.clone(); | ||
| if next_row_id.is_some() { | ||
| let new_version = current_manifest.map(|m| m.version + 1).unwrap_or(1); | ||
| let prev_by_id: HashMap<u64, &Fragment> = maybe_existing_fragments? | ||
| .iter() | ||
| .map(|f| (f.id, f)) | ||
| .collect(); | ||
| for fragment in merged_fragments.iter_mut() { | ||
| match prev_by_id.get(&fragment.id) { | ||
| Some(prev) => { | ||
| if merge_fragment_physically_rewritten(prev, fragment) { | ||
| lance_table::rowids::version::refresh_row_latest_update_meta_for_full_frag_rewrite_cols( | ||
| fragment, | ||
| new_version, | ||
| )?; | ||
| } | ||
| } | ||
| None => { | ||
| // Brand-new fragment ID not present in the previous manifest. | ||
| // Set both last_updated and created version meta, consistent | ||
| // with Append/Overwrite for genuinely new fragments. | ||
| lance_table::rowids::version::refresh_row_latest_update_meta_for_full_frag_rewrite_cols( | ||
| fragment, | ||
| new_version, | ||
| )?; | ||
| fragment.created_at_version_meta = | ||
| fragment.last_updated_at_version_meta.clone(); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| final_fragments.extend(merged_fragments); | ||
|
|
||
| // Some fields that have indices may have been removed, so we should | ||
| // remove those indices as well. | ||
|
|
@@ -3488,6 +3519,32 @@ fn schema_fragments_legacy_valid(schema: &Schema, fragments: &[Fragment]) -> Res | |
| Ok(()) | ||
| } | ||
|
|
||
| /// Returns true if Operation::Merge rewrote this fragment's column data files (Fragment::files | ||
| /// changed versus the previous manifest). Used to bump last_updated_at_version_meta only when | ||
| /// new column values were materialized to disk. | ||
| /// | ||
| /// Deletion file changes alone are not treated as rewrites: tombstones remove rows but | ||
| /// survivors did not receive new column bytes; stamping last_updated for those rows would be | ||
| /// incorrect for CDF. | ||
| #[inline] | ||
| fn merge_fragment_physically_rewritten(prev: &Fragment, merged: &Fragment) -> bool { | ||
| debug_assert_eq!(prev.id, merged.id); | ||
| if prev.files.len() != merged.files.len() { | ||
| return true; | ||
| } | ||
| // Compare identity fields only. file_size_bytes is an AtomicU64 cache that | ||
| // concurrent scans can populate in place on the manifest's DataFile, so it | ||
| // must not be part of the rewrite check. | ||
| prev.files.iter().zip(merged.files.iter()).any(|(p, m)| { | ||
| p.path != m.path | ||
| || p.fields != m.fields | ||
| || p.column_indices != m.column_indices | ||
| || p.file_major_version != m.file_major_version | ||
| || p.file_minor_version != m.file_minor_version | ||
| || p.base_id != m.base_id | ||
| }) | ||
| } | ||
|
Comment on lines
+3522
to
+3546
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 merge_fragment_physically_rewritten compares Fragment.files via Vec::PartialEq, which transitively compares CachedFileSize — an interior-mutable AtomicU64 that scheduler.rs:632 writes in place on the live manifest's atomics. A caller that constructs merged fragments by cloning from get_fragments() (which snapshots CachedFileSize) and a concurrent scan that flips the original atomic between clone-time and build_manifest can drive prev.files != merged.files to true on identical files, falsely triggering refresh_row_latest_update_meta_for_full_frag_rewrite_cols and overwriting carry-forward last_updated history. Fix: compare only the immutable identity fields (path, fields, column_indices, file_major_version, file_minor_version, base_id), or normalize CachedFileSize before comparing. Extended reasoning...Bugmerge_fragment_physically_rewritten in transaction.rs:3520-3531 decides whether the carry-forward path needs to refresh per-row last_updated metadata by comparing prev.files != merged.files. Vec::eq uses the derived PartialEq on DataFile (rust/lance-table/src/format/fragment.rs:27), which compares all fields including file_size_bytes: CachedFileSize. CachedFileSize wraps an AtomicU64 (rust/lance-io/src/utils.rs:177). Critically:
scheduler.rs:632 calls file_size_bytes.set(size) on the caller-supplied CachedFileSize whenever a file's size becomes known. Callers in dataset/optimize.rs:516, dataset/optimize/binary_copy.rs:263, and dataset/fragment.rs:987 pass &data_file.file_size_bytes from the live dataset.manifest.fragments — i.e. they mutate the live manifest's atomic. In build_manifest (rust/lance/src/io/commit.rs:800), current_manifest is dataset.manifest.as_ref(), so prev.files[].file_size_bytes is the live atomic. The merged fragments come from caller-side construction, which for several flows starts from a Clone of get_fragments() (a frozen snapshot). Trigger sequence
This silently breaks CDF semantics for the carry-forward case: rows are reported as 'updated at v_new' even though no column data was actually rewritten. Why current in-tree callers do not exercise itThe two confirmed callers — add_columns/alter_columns and drop_columns_impl — change Fragment::files length (append a new file or drop one), so Vec::eq short-circuits on length and CachedFileSize never participates. The AllNulls path does not access files, so both atomics stay at zero. The PR's own carry-forward test (merge_build_manifest_skips_refresh_when_carry_forward_stable_row_ids) constructs both DataFiles with the same Option<NonZero>=None and runs no concurrent scan, so it does not exercise the race. The vulnerability is latent: any future external caller (Python/JNI Operation::Merge construction) that submits a true carry-forward fragment whose live manifest atomic gets populated mid-flight will hit it. Suggested fixCompare only the immutable identity fields, e.g.: fn data_file_identity_eq(a: &DataFile, b: &DataFile) -> bool {
a.path == b.path
&& a.fields == b.fields
&& a.column_indices == b.column_indices
&& a.file_major_version == b.file_major_version
&& a.file_minor_version == b.file_minor_version
&& a.base_id == b.base_id
}Then iterate prev.files / merged.files pairwise. Alternatively, normalize file_size_bytes on both sides before the Vec equality check. The current debug_assert_eq!(prev.id, merged.id) plus a length check also belong in the helper. |
||
|
|
||
| /// Validate that Merge operations preserve all original fragments. | ||
| /// Merge operations should only add columns or rows, not reduce fragments. | ||
| /// This ensures fragments correspond at one-to-one with the original fragment list. | ||
|
|
@@ -4649,6 +4706,299 @@ mod tests { | |
| seq.versions().collect() | ||
| } | ||
|
|
||
| #[test] | ||
| fn merge_build_manifest_refreshes_last_updated_when_data_files_change_stable_row_ids() { | ||
| use lance_file::version::LanceFileVersion; | ||
| use lance_table::feature_flags::FLAG_STABLE_ROW_IDS; | ||
|
|
||
| let (major, minor) = LanceFileVersion::Stable.to_numbers(); | ||
| let mk_file = |path: &str| DataFile::new(path, vec![0], vec![0], major, minor, None, None); | ||
|
|
||
| let arrow_schema = ArrowSchema::new(vec![ArrowField::new("id", DataType::Int32, false)]); | ||
| let lance_schema = LanceSchema::try_from(&arrow_schema).unwrap(); | ||
|
|
||
| let row_ids = RowIdSequence::from([100u64, 101, 102, 103, 104].as_slice()); | ||
| let row_id_meta = Some(RowIdMeta::Inline(write_row_ids(&row_ids))); | ||
|
|
||
| let prev_fragment = Fragment { | ||
| id: 0, | ||
| files: vec![mk_file("before.lance")], | ||
| deletion_file: None, | ||
| row_id_meta, | ||
| physical_rows: Some(5), | ||
| last_updated_at_version_meta: None, | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let mut manifest = Manifest::new( | ||
| lance_schema.clone(), | ||
| Arc::new(vec![prev_fragment.clone()]), | ||
| DataStorageFormat::new(LanceFileVersion::V2_0), | ||
| HashMap::new(), | ||
| ); | ||
| manifest.reader_feature_flags |= FLAG_STABLE_ROW_IDS; | ||
| manifest.next_row_id = 100; | ||
|
|
||
| let merged_fragment = Fragment { | ||
| files: vec![mk_file("after.lance")], | ||
| ..prev_fragment | ||
| }; | ||
|
|
||
| let tx = Transaction::new( | ||
| manifest.version, | ||
| Operation::Merge { | ||
| fragments: vec![merged_fragment], | ||
| schema: lance_schema, | ||
| }, | ||
| None, | ||
| ); | ||
|
|
||
| let (out, _) = tx | ||
| .build_manifest( | ||
| Some(&manifest), | ||
| vec![], | ||
| "txn", | ||
| &ManifestWriteConfig::default(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| assert_eq!(out.version, 2); | ||
| let frag = &out.fragments[0]; | ||
| let seq = frag | ||
| .last_updated_at_version_meta | ||
| .as_ref() | ||
| .unwrap() | ||
| .load_sequence() | ||
| .unwrap(); | ||
| assert_eq!(seq.version_at(0).unwrap(), 2); | ||
| assert_eq!(seq.version_at(4).unwrap(), 2); | ||
| } | ||
|
|
||
| #[test] | ||
| fn merge_build_manifest_skips_refresh_when_carry_forward_stable_row_ids() { | ||
| use lance_file::version::LanceFileVersion; | ||
| use lance_table::feature_flags::FLAG_STABLE_ROW_IDS; | ||
| use lance_table::rowids::version::{RowDatasetVersionMeta, RowDatasetVersionSequence}; | ||
|
|
||
| let (major, minor) = LanceFileVersion::Stable.to_numbers(); | ||
| let data_file = DataFile::new("same.lance", vec![0], vec![0], major, minor, None, None); | ||
|
|
||
| let arrow_schema = ArrowSchema::new(vec![ArrowField::new("id", DataType::Int32, false)]); | ||
| let lance_schema = LanceSchema::try_from(&arrow_schema).unwrap(); | ||
|
|
||
| let row_ids = RowIdSequence::from([200u64, 201, 202, 203, 204].as_slice()); | ||
| let row_id_meta = Some(RowIdMeta::Inline(write_row_ids(&row_ids))); | ||
|
|
||
| let uniform_v1 = RowDatasetVersionSequence::from_uniform_row_count(5, 1); | ||
| let meta_v1 = RowDatasetVersionMeta::from_sequence(&uniform_v1).unwrap(); | ||
|
|
||
| let prev_fragment = Fragment { | ||
| id: 0, | ||
| files: vec![data_file.clone()], | ||
| deletion_file: None, | ||
| row_id_meta: row_id_meta.clone(), | ||
| physical_rows: Some(5), | ||
| last_updated_at_version_meta: Some(meta_v1.clone()), | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let mut manifest = Manifest::new( | ||
| lance_schema.clone(), | ||
| Arc::new(vec![prev_fragment]), | ||
| DataStorageFormat::new(LanceFileVersion::V2_0), | ||
| HashMap::new(), | ||
| ); | ||
| manifest.reader_feature_flags |= FLAG_STABLE_ROW_IDS; | ||
| manifest.next_row_id = 100; | ||
|
|
||
| let merged_fragment = Fragment { | ||
| id: 0, | ||
| files: vec![data_file], | ||
| deletion_file: None, | ||
| row_id_meta, | ||
| physical_rows: Some(5), | ||
| last_updated_at_version_meta: Some(meta_v1), | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let tx = Transaction::new( | ||
| manifest.version, | ||
| Operation::Merge { | ||
| fragments: vec![merged_fragment], | ||
| schema: lance_schema, | ||
| }, | ||
| None, | ||
| ); | ||
|
|
||
| let (out, _) = tx | ||
| .build_manifest( | ||
| Some(&manifest), | ||
| vec![], | ||
| "txn", | ||
| &ManifestWriteConfig::default(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let seq = out.fragments[0] | ||
| .last_updated_at_version_meta | ||
| .as_ref() | ||
| .unwrap() | ||
| .load_sequence() | ||
| .unwrap(); | ||
| assert_eq!(seq.version_at(0).unwrap(), 1); | ||
| assert_eq!(seq.version_at(4).unwrap(), 1); | ||
| } | ||
|
|
||
| #[test] | ||
| fn merge_build_manifest_no_last_updated_refresh_without_stable_row_ids() { | ||
| use lance_file::version::LanceFileVersion; | ||
| use lance_table::feature_flags::FLAG_STABLE_ROW_IDS; | ||
|
|
||
| let (major, minor) = LanceFileVersion::Stable.to_numbers(); | ||
| let mk_file = |path: &str| DataFile::new(path, vec![0], vec![0], major, minor, None, None); | ||
|
|
||
| let arrow_schema = ArrowSchema::new(vec![ArrowField::new("id", DataType::Int32, false)]); | ||
| let lance_schema = LanceSchema::try_from(&arrow_schema).unwrap(); | ||
|
|
||
| let prev_fragment = Fragment { | ||
| id: 0, | ||
| files: vec![mk_file("before.lance")], | ||
| deletion_file: None, | ||
| row_id_meta: None, | ||
| physical_rows: Some(5), | ||
| last_updated_at_version_meta: None, | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let manifest = Manifest::new( | ||
| lance_schema.clone(), | ||
| Arc::new(vec![prev_fragment.clone()]), | ||
| DataStorageFormat::new(LanceFileVersion::V2_0), | ||
| HashMap::new(), | ||
| ); | ||
| assert_eq!( | ||
| manifest.reader_feature_flags & FLAG_STABLE_ROW_IDS, | ||
| 0, | ||
| "manifest must not use stable row IDs for this guard test" | ||
| ); | ||
|
|
||
| let merged_fragment = Fragment { | ||
| files: vec![mk_file("after.lance")], | ||
| ..prev_fragment | ||
| }; | ||
|
|
||
| let tx = Transaction::new( | ||
| manifest.version, | ||
| Operation::Merge { | ||
| fragments: vec![merged_fragment], | ||
| schema: lance_schema, | ||
| }, | ||
| None, | ||
| ); | ||
|
|
||
| let (out, _) = tx | ||
| .build_manifest( | ||
| Some(&manifest), | ||
| vec![], | ||
| "txn", | ||
| &ManifestWriteConfig::default(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| assert!( | ||
| out.fragments[0].last_updated_at_version_meta.is_none(), | ||
| "without stable row IDs, Merge must not populate per-row last_updated metadata" | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn merge_build_manifest_sets_both_version_meta_for_new_fragment_id_stable_row_ids() { | ||
| use lance_file::version::LanceFileVersion; | ||
| use lance_table::feature_flags::FLAG_STABLE_ROW_IDS; | ||
|
|
||
| let (major, minor) = LanceFileVersion::Stable.to_numbers(); | ||
| let mk_file = |path: &str| DataFile::new(path, vec![0], vec![0], major, minor, None, None); | ||
|
|
||
| let arrow_schema = ArrowSchema::new(vec![ArrowField::new("id", DataType::Int32, false)]); | ||
| let lance_schema = LanceSchema::try_from(&arrow_schema).unwrap(); | ||
|
|
||
| // Existing fragment (id=0) with stable row IDs | ||
| let row_ids_0 = RowIdSequence::from([10u64, 11, 12].as_slice()); | ||
| let existing_fragment = Fragment { | ||
| id: 0, | ||
| files: vec![mk_file("existing.lance")], | ||
| deletion_file: None, | ||
| row_id_meta: Some(RowIdMeta::Inline(write_row_ids(&row_ids_0))), | ||
| physical_rows: Some(3), | ||
| last_updated_at_version_meta: None, | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let mut manifest = Manifest::new( | ||
| lance_schema.clone(), | ||
| Arc::new(vec![existing_fragment.clone()]), | ||
| DataStorageFormat::new(LanceFileVersion::V2_0), | ||
| HashMap::new(), | ||
| ); | ||
| manifest.reader_feature_flags |= FLAG_STABLE_ROW_IDS; | ||
| manifest.next_row_id = 100; | ||
| manifest.version = 1; | ||
|
|
||
| // New fragment (id=1) not present in prev manifest — exercises the None branch | ||
| let row_ids_1 = RowIdSequence::from([20u64, 21, 22, 23].as_slice()); | ||
| let new_fragment = Fragment { | ||
| id: 1, | ||
| files: vec![mk_file("new.lance")], | ||
| deletion_file: None, | ||
| row_id_meta: Some(RowIdMeta::Inline(write_row_ids(&row_ids_1))), | ||
| physical_rows: Some(4), | ||
| last_updated_at_version_meta: None, | ||
| created_at_version_meta: None, | ||
| }; | ||
|
|
||
| let tx = Transaction::new( | ||
| manifest.version, | ||
| Operation::Merge { | ||
| fragments: vec![existing_fragment, new_fragment], | ||
| schema: lance_schema, | ||
| }, | ||
| None, | ||
| ); | ||
|
|
||
| let (out, _) = tx | ||
| .build_manifest( | ||
| Some(&manifest), | ||
| vec![], | ||
| "txn", | ||
| &ManifestWriteConfig::default(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| assert_eq!(out.version, 2); | ||
|
|
||
| let new_frag = out.fragments.iter().find(|f| f.id == 1).unwrap(); | ||
|
|
||
| // last_updated_at_version must be set to the commit version | ||
| let last_updated_seq = new_frag | ||
| .last_updated_at_version_meta | ||
| .as_ref() | ||
| .expect("new fragment must have last_updated_at_version_meta") | ||
| .load_sequence() | ||
| .unwrap(); | ||
| assert_eq!(last_updated_seq.version_at(0).unwrap(), 2); | ||
| assert_eq!(last_updated_seq.version_at(3).unwrap(), 2); | ||
|
|
||
| // created_at_version must also be set — must not be None | ||
| let created_seq = new_frag | ||
| .created_at_version_meta | ||
| .as_ref() | ||
| .expect("new fragment must have created_at_version_meta") | ||
| .load_sequence() | ||
| .unwrap(); | ||
| assert_eq!(created_seq.version_at(0).unwrap(), 2); | ||
| assert_eq!(created_seq.version_at(3).unwrap(), 2); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_update_version_tracking_preserves_created_at() { | ||
| let existing_seq = RowIdSequence::from([100u64, 101, 102].as_slice()); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 In the new
Operation::MergeNone branch (lines 2031-2039), brand-new fragment IDs only havelast_updated_at_version_metainitialized;created_at_version_metais left at whatever the caller passed (typicallyNone). This is inconsistent withOperation::Append(1810-1814) andOperation::Overwrite(1949-1953), which set both fields viabuild_version_metafor genuinely new fragments. No in-tree caller currently appends new fragment IDs through Merge, but the Python transaction API exposesOperation::Mergedirectly (python/src/transaction.rs:313), so external callers reaching this branch will see_row_created_at_versioncome back NULL while_row_last_updated_at_versionis populated. Suggest mirroring the Append pattern in the None arm: build the version meta once and assign to both fields.Extended reasoning...
What the bug is
The PR adds two arms inside the new
Operation::Mergeblock inbuild_manifest:Some(prev)branch (existing fragment id) — refreshlast_updatedonly ifFragment::fileschanged. This is correct: the rows already existed in a prior version, socreated_atmust be preserved as-is.Nonebranch (fragment id not in previous manifest, i.e. brand-new) — unconditionally callsrefresh_row_latest_update_meta_for_full_frag_rewrite_cols.refresh_row_latest_update_meta_for_full_frag_rewrite_cols(rust/lance-table/src/rowids/version.rs:493-520) only writesfragment.last_updated_at_version_meta = Some(version_meta);at line 516. It never touchescreated_at_version_meta. So a brand-new fragment introduced via Merge will end up withlast_updatedstamped tonew_versionandcreated_atleft as whatever the caller supplied — typicallyNone.Why this is inconsistent with the rest of the file
The established pattern for newly-introduced fragments under stable row IDs is to populate both fields:
Operation::Overwritedoes the same at 1949-1953, andUpdate'sresolve_update_version_metadatadoes the same for genuinely new rows. The None branch is the only place a brand-new fragment getslast_updatedset withoutcreated_at.Step-by-step proof
python/src/transaction.rs:313) constructs anOperation::Mergewith a fragment whoseidis greater than any id in the previous manifest, withrow_id_meta = Some(...),physical_rows = Some(N), and both*_version_metafields left asNone(the natural default for a freshly-builtFragment).merge_fragments_valid(transaction.rs:3536) accepts this —test_merge_fragments_validTest 4 explicitly asserts that adding new fragment IDs is valid.build_manifestenters the Merge arm,prev_by_id.get(&fragment.id)returnsNone, the None arm runs, andrefresh_row_latest_update_meta_for_full_frag_rewrite_colspopulates onlylast_updated_at_version_metatonew_version.created_at_version_metastaysNone.new_versionfor_row_last_updated_at_versionbutNULLfor_row_created_at_version— for rows that demonstrably did not exist beforenew_version.Why I disagree with the refutation
The refutation argues that Merge is caller-controlled and the asymmetry mirrors the
Some(prev)arm (which also doesn't touchcreated_at). But theSome(prev)arm is preservingcreated_atfrom a row that genuinely was created in an earlier version — that is correct. The None arm by contrast is overwriting the caller'slast_updated(treating it the same way Append does) while leavingcreated_atun-overwritten. If the contract is "caller fully owns version meta," the None branch should not be touchinglast_updatedeither; if the contract is "merge stamps brand-new fragments like Append does," it should also stampcreated_at. The current half-and-half behavior is the inconsistency.The refutation also notes (correctly) that no in-tree caller exercises this path today (
schema_evolution.rsand others all derive fragments fromget_fragments()), which is why I'm flagging this as a nit rather than blocking. But the None branch was deliberately added by this PR with a comment explicitly calling out that "Merge may append ids beyond the original list," and the Python API is a public surface that can hit it.How to fix
Mirror the Append/Overwrite pattern in the None arm:
Either that, or drop the None arm entirely if the case is genuinely impossible — but that contradicts
merge_fragments_validand the comment justifying the branch.