fix: reject changing the unenforced primary key once set#6810
Merged
Conversation
Validate in the UpdateConfig transaction-apply path that a commit does not change an already-set unenforced primary key. This runs on every apply including conflict-rebase, so it rejects both a direct override and the concurrent-writer race where a transaction is retried onto a base manifest that already has a primary key.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
jackye1995
reviewed
May 17, 2026
| for falsy in ["no", "false", "0", "anything-else"] { | ||
|
|
||
| // Non-matching values must not be treated as a PK marker, so the | ||
| // field never becomes part of the primary key. |
Contributor
There was a problem hiding this comment.
not only we should not treat it as a pk marker, we should error with something like the primary key is a reserved key.
jackye1995
reviewed
May 17, 2026
|
|
||
| // Removing the keys must clear the cached option, otherwise the | ||
| // protobuf would still encode a stale PK marker on next commit. | ||
| // Re-applying the identical primary key is a no-op and allowed; this |
Contributor
There was a problem hiding this comment.
this is confusing, we should just not allow setting it like the rest of the behavior, instead of allowing it to succeed.
…mary key Address review on lance-format#6810: the immutability guard now rejects any write that touches the reserved primary key metadata keys once a key is set (not just changes that alter the key), and rejects writing those keys with a value that is not a valid marker.
jackye1995
pushed a commit
to lancedb/lancedb
that referenced
this pull request
May 17, 2026
## Summary Adds `Table::set_unenforced_primary_key` — records a single column as the table's unenforced primary key in Lance schema field metadata. "Unenforced" means LanceDB does not check uniqueness on write; the key is metadata that `merge_insert` consumes. - Single-column only; the column must exist and have a supported dtype (Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary). The API accepts an iterable for binding ergonomics but requires exactly one column — compound keys are rejected. - The primary key is immutable: calling this on a table that already has an unenforced primary key is rejected. Concurrent writers racing to set the key fail at commit time rather than silently overriding it. - `RemoteTable` returns `NotSupported`. - Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and TypeScript (`Table.setUnenforcedPrimaryKey`). ## Context Split out from #3354 per review feedback, so the unenforced primary key and the `merge_insert` sharding spec land as separate reviewable PRs. No Lance dependency bump — `main` is already on v7.0.0-beta.10, which includes the field-metadata round-trip fix the API relies on. Enforcing primary-key immutability at the Lance commit layer (so the cross-column concurrent race is also rejected) is a companion Lance change: lance-format/lance#6810.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The unenforced primary key is stored as schema field metadata, and the
UpdateConfigtransaction-apply path applied field metadata updates with noguard — so the key could be silently overridden, evolved, removed, or written
with a junk value.
This treats the
lance-schema:unenforced-primary-key*metadata keys asreserved, validated in the transaction-apply path:
primary-key metadata key at all — is rejected;
rather than silently ignored;
The check runs on every apply, including conflict-rebase, so it also rejects
the concurrent-writer race where two writers set the key on different columns
(their field-metadata updates don't conflict, so the loser would otherwise
rebase and corrupt the key).
The
metadata.rsprimary-key tests are updated accordingly.