Skip to content

fix: sync unenforced_primary_key_position when field metadata updates#6706

Merged
jackye1995 merged 3 commits into
lance-format:mainfrom
touch-of-grey:SyncUnenforcedPrimaryKeyPositionOnMetadataUpdate
May 7, 2026
Merged

fix: sync unenforced_primary_key_position when field metadata updates#6706
jackye1995 merged 3 commits into
lance-format:mainfrom
touch-of-grey:SyncUnenforcedPrimaryKeyPositionOnMetadataUpdate

Conversation

@touch-of-grey
Copy link
Copy Markdown
Contributor

Summary

UpdateConfig's field-metadata path in Transaction::apply_update_map (rust/lance/src/dataset/transaction.rs) mutates field.metadata but does not refresh the cached Field::unenforced_primary_key_position. The protobuf encoder for Field reads the cached Option<u32> (not the metadata HashMap), so the next commit drops the PK marker even though the metadata HashMap correctly contains lance-schema:unenforced-primary-key:position. The mirror image is also broken: removing the keys from metadata leaves a stale cached Some(_) that gets re-encoded as a PK marker on the next commit.

This makes it currently impossible to install or remove an unenforced primary key on an existing dataset via any public API — update_field_metadata and the deprecated replace_field_metadata both go through this path, and the round-trip via reopen returns no PK.

Fix

After applying each field_metadata_update, re-derive field.unenforced_primary_key_position from the freshly-updated metadata using the same parsing rules the Arrow → Field decoder uses (position key first, then the legacy boolean-flag fallback). This keeps the two views in sync so the next protobuf encoding reflects the user's intent.

Also re-exports LANCE_UNENFORCED_PRIMARY_KEY from lance_core::datatypes alongside the existing LANCE_UNENFORCED_PRIMARY_KEY_POSITION re-export, so callers (and the new test) can use both constants without reaching into the private field module.

Test

test_update_field_metadata_syncs_unenforced_primary_key_position in rust/lance/src/dataset/metadata.rs covers:

  • Installing the PK on a column via the position key — cached option must be Some(0), Schema::unenforced_primary_key() must report it, and a fresh Dataset::open must round-trip the marker.
  • The legacy boolean-flag form must also sync the cached option.
  • Removing the keys must clear the cached option so the marker is not re-encoded.

The test fails on main (returns None instead of Some(0)) and passes with the fix.

UpdateConfig's field-metadata path mutates field.metadata but leaves
the cached unenforced_primary_key_position stale. The protobuf encoder
reads the cached option, so the next commit drops the PK marker even
though the metadata HashMap correctly contains the position key. The
same hole on the other direction silently keeps a stale PK marker
after the keys are removed.

Re-derive the cached option from the freshly-applied metadata after
each field-metadata update so the two views stay in sync. Export
LANCE_UNENFORCED_PRIMARY_KEY alongside the position constant from
lance_core::datatypes for the legacy boolean-flag fallback.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

Comment thread rust/lance/src/dataset/transaction.rs Outdated
for (field_id, field_metadata_update) in field_metadata_updates {
if let Some(field) = manifest.schema.field_by_id_mut(*field_id) {
apply_update_map(&mut field.metadata, field_metadata_update);
// The unenforced primary key position lives in two
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be much more concise, just something like "also set unenforced primary key based on updated field metadata" is sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 34a0be5 — trimmed the comment to a single line.

@jackye1995
Copy link
Copy Markdown
Contributor

looks like CI failure:

failures:
    dataset::mem_wal::write::tests::test_wal_only_size_trigger_fires_repeatedly

@touch-of-grey
Copy link
Copy Markdown
Contributor Author

The failing test test_wal_only_size_trigger_fires_repeatedly is in rust/lance/src/dataset/mem_wal/write.rs, which this PR does not touch. The test was introduced by #6675 (e9d1617) and is timing-sensitive — it pushes 3 puts with a 50ms sleep between them and relies on a background tokio flush task draining the size-trigger queue between puts. On a slower Windows runner the 50ms sleep can be too short for the trigger queue to drain, allowing multiple crossings to coalesce into a single WAL entry — exactly the failure mode reported (next_position = 2 instead of >= 3).

The same test passed on Linux CI in this run, and this PR's only changes are in transaction.rs::apply_update_map (field-metadata sync) and a new unit test in metadata.rs — neither exercises the WAL-only size-trigger path.

Could you please re-run the failed windows-build job? I don't have admin rights on the upstream repo, so I can't re-run it myself.

… fallback

The 50ms inter-put sleep in `test_wal_only_size_trigger_fires_repeatedly`
was tight: on slower Windows CI runners the background flush task did
not always drain the trigger queue between puts, allowing crossings to
coalesce into a single drain and the assertion `next_position >= 3` to
fire spuriously. Bump the sleep to 250ms so it keeps a comfortable
margin without slowing the suite.

Also extend the metadata-sync regression test to cover both the
truthy spellings of the legacy `lance-schema:unenforced-primary-key`
boolean flag (`true`, `1`, `yes`, plus mixed-case) and the falsy /
non-numeric fallback paths so codecov is satisfied.
@touch-of-grey
Copy link
Copy Markdown
Contributor Author

Pushed b32468f to address both:

  1. Windows CI flake — bumped the inter-put sleep in test_wal_only_size_trigger_fires_repeatedly from 50ms to 250ms. The original 50ms was tight enough that slow Windows runners couldn't always drain the size-trigger queue between puts, causing crossings to coalesce. Verified locally on macOS: 5/5 reliable runs.

  2. Codecov 1-line partial — extended the metadata-sync regression test to cover all truthy spellings of the legacy lance-schema:unenforced-primary-key flag (true, 1, yes, plus mixed-case) and the falsy / non-numeric fallback paths.

PTAL.

Per review feedback on lance-format#6706 —
the inline rationale belongs in the commit message of the original fix,
not duplicated as a comment block above the line.
@jackye1995 jackye1995 merged commit 06f9271 into lance-format:main May 7, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants