JNI loses version metadata and row-ID lookup is incorrect for stable row IDs during updates
Problem
When a dataset has stable row IDs enabled, _row_created_at_version reports incorrect values (typically defaulting to 1) after update operations. This affects any consumer relying on CDF (Change Data Feed) version columns to track row lifecycle.
Two independent root causes were identified:
1. JNI serialization gap
FragmentMetadata round-trips through JNI during update operations (Rust → Java connector → Rust commit). The JNI layer correctly serializes row_id_meta but does not serialize created_at_version_meta or last_updated_at_version_meta. On the Java → Rust path, FromJObjectWithEnv<Fragment> hard-codes both fields to None:
Ok(Fragment {
// ...
created_at_version_meta: None, // ← always None
last_updated_at_version_meta: None, // ← always None
})
Any version metadata that lance-core attaches to fragments is silently dropped when fragments pass through the Java SDK.
2. Incorrect row-ID lookup in Operation::Update
The version-tracking logic in Transaction::build_manifest for Operation::Update derives the original fragment ID from the row ID using row_id >> 32. This assumes the upper 32 bits encode the fragment ID, which is only true for unstable row addresses. Stable row IDs are sequential integers with no fragment-encoding, so this lookup always fails and falls through to the default version of 1.
Expected behavior
After an update, _row_created_at_version should reflect the version at which each row was originally inserted, not a default. Untouched rows co-located in the same rewritten fragment should also retain their original creation version.
Reproduction
- Create a dataset with stable row IDs and CDF enabled.
- Insert rows (version 2).
- Update a subset of rows (version 3).
- Query
_row_created_at_version — all rows show 1 instead of 2.
JNI loses version metadata and row-ID lookup is incorrect for stable row IDs during updates
Problem
When a dataset has stable row IDs enabled,
_row_created_at_versionreports incorrect values (typically defaulting to1) after update operations. This affects any consumer relying on CDF (Change Data Feed) version columns to track row lifecycle.Two independent root causes were identified:
1. JNI serialization gap
FragmentMetadataround-trips through JNI during update operations (Rust → Java connector → Rust commit). The JNI layer correctly serializesrow_id_metabut does not serializecreated_at_version_metaorlast_updated_at_version_meta. On the Java → Rust path,FromJObjectWithEnv<Fragment>hard-codes both fields toNone:Any version metadata that lance-core attaches to fragments is silently dropped when fragments pass through the Java SDK.
2. Incorrect row-ID lookup in
Operation::UpdateThe version-tracking logic in
Transaction::build_manifestforOperation::Updatederives the original fragment ID from the row ID usingrow_id >> 32. This assumes the upper 32 bits encode the fragment ID, which is only true for unstable row addresses. Stable row IDs are sequential integers with no fragment-encoding, so this lookup always fails and falls through to the default version of1.Expected behavior
After an update,
_row_created_at_versionshould reflect the version at which each row was originally inserted, not a default. Untouched rows co-located in the same rewritten fragment should also retain their original creation version.Reproduction
_row_created_at_version— all rows show1instead of2.