Skip to content

JNI loses version metadata and row-ID lookup is incorrect for stable row IDs during updates #6464

@ivscheianu

Description

@ivscheianu

JNI loses version metadata and row-ID lookup is incorrect for stable row IDs during updates

Problem

When a dataset has stable row IDs enabled, _row_created_at_version reports incorrect values (typically defaulting to 1) after update operations. This affects any consumer relying on CDF (Change Data Feed) version columns to track row lifecycle.

Two independent root causes were identified:

1. JNI serialization gap

FragmentMetadata round-trips through JNI during update operations (Rust → Java connector → Rust commit). The JNI layer correctly serializes row_id_meta but does not serialize created_at_version_meta or last_updated_at_version_meta. On the Java → Rust path, FromJObjectWithEnv<Fragment> hard-codes both fields to None:

Ok(Fragment {
    // ...
    created_at_version_meta: None,   // ← always None
    last_updated_at_version_meta: None, // ← always None
})

Any version metadata that lance-core attaches to fragments is silently dropped when fragments pass through the Java SDK.

2. Incorrect row-ID lookup in Operation::Update

The version-tracking logic in Transaction::build_manifest for Operation::Update derives the original fragment ID from the row ID using row_id >> 32. This assumes the upper 32 bits encode the fragment ID, which is only true for unstable row addresses. Stable row IDs are sequential integers with no fragment-encoding, so this lookup always fails and falls through to the default version of 1.

Expected behavior

After an update, _row_created_at_version should reflect the version at which each row was originally inserted, not a default. Untouched rows co-located in the same rewritten fragment should also retain their original creation version.

Reproduction

  1. Create a dataset with stable row IDs and CDF enabled.
  2. Insert rows (version 2).
  3. Update a subset of rows (version 3).
  4. Query _row_created_at_version — all rows show 1 instead of 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions