Skip to content

fix(core): allow all-null Map columns in schema evolution#7462

Merged
westonpace merged 4 commits into
lance-format:mainfrom
Ali2Arslan:feat/all-null-map-columns
Jun 30, 2026
Merged

fix(core): allow all-null Map columns in schema evolution#7462
westonpace merged 4 commits into
lance-format:mainfrom
Ali2Arslan:feat/all-null-map-columns

Conversation

@Ali2Arslan

Copy link
Copy Markdown
Contributor

Summary

Schema::all_fields_nullable walks every field in pre-order and requires each to be nullable. Arrow's Map<K, V> layout mandates a non-null entries struct and a non-null key (Lance enforces this when constructing a Field from an Arrow Field). As a result, any Map column fails the check, so NewColumnTransform::AllNulls rejects adding an all-null Map column with:

Invalid user input: All-null columns must be nullable.

…even though the inner entries/key fields are offset-addressed and completely inert when the whole Map value is NULL (the offsets pair collapses to a zero-length slice).

Fix

Treat a nullable Map field as a leaf for this check — its mandatory non-null inner fields are never exercised for an all-NULL row. Behavior is otherwise unchanged:

  • a non-nullable Map outer field is still rejected (the top-level nullable flag is still checked);
  • the struct/list recursion for every other type is preserved.

Test plan

  • Extended test_all_fields_nullable (lance-core) with nullable-Map (accept), non-nullable-Map (reject), and struct-containing-Map (accept) cases.
  • Added test_add_column_all_nulls_map (lance) — adds an all-null Map<Utf8, Float64> column end to end, asserts the schema/row count and that the column materializes fully null, and that a non-nullable Map is still rejected.
  • cargo fmt --all
  • cargo clippy -p lance-core -p lance --tests clean
  • Both tests pass.

Made with Cursor

`Schema::all_fields_nullable` walked every field pre-order and required
each to be nullable. Arrow's `Map<K, V>` mandates a non-null `entries`
struct and non-null `key` (Lance enforces this when building a `Field`
from an Arrow `Field`), so any Map column failed the check and
`NewColumnTransform::AllNulls` rejected it with "All-null columns must
be nullable.", even though the inner fields are offset-addressed and
inert for an all-NULL row.

Treat a nullable Map field as a leaf for this check: the mandatory
non-null inner fields are never exercised when the whole Map value is
NULL. A non-nullable Map outer field is still rejected, and the struct
recursion is otherwise unchanged.

Adds a unit case to `test_all_fields_nullable` and an end-to-end
`test_add_column_all_nulls_map` that adds an all-null `Map<Utf8, Float64>`
column and asserts it materializes as all-null.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added the bug Something isn't working label Jun 25, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
@Ali2Arslan Ali2Arslan marked this pull request as ready for review June 25, 2026 16:27

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the fix a bit more of a general fix so we don't have to make another fix soon. Or let me know if there is some case where we would want to recurse into children.

Comment thread rust/lance-core/src/datatypes/schema.rs Outdated
Comment on lines +763 to +764
/// Whether every field can hold a NULL — i.e. whether the schema is compatible
/// with columns added through `NewColumnTransform::AllNulls`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this comment because this method is in lance-core and we don't really have any concept of NewColumnTransform::AllNulls at that level (even "add columns" doesn't really make sense)

I wonder if we should move this method to schema_evolution.rs as its the only call site and its behavior seems tailored to schema evolution (e.g. the next caller might not expect that maps are leaves).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went a head and deleted the method all together from lance-core, since this method was public, technically it's a breaking change, not sure If I need to do anything to handle that.

Comment thread rust/lance-core/src/datatypes/schema.rs Outdated
if field.logical_type.is_map() {
return true;
}
field.children.iter().all(Self::field_all_null_compatible)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why recurse into children at all? I think "nullable list with non-nullable items" and "nullable struct with non-nullable child fields" would be compatible with schema evolution too.

I think this fix is perhaps too narrow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we don't need to recurse at all, I in-lined the checked directly into schema_evolution, and had it check the new top-level field was nullable.

@Ali2Arslan

Copy link
Copy Markdown
Contributor Author

have to make a

That's fair, I had AI start to port some of the standalone changes we made, which at the time were narrow on purpose (since we were only using the btree paths), I'll do a more thorough review and try to make them more general now (across the existing and new PR's).
Thank you!

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/schema_evolution.rs 93.02% 1 Missing and 5 partials ⚠️

📢 Thoughts on this report? Let us know!

@Ali2Arslan Ali2Arslan left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace mind taking another look? I removed the Schema::try_from all together from NewColumnTransform::AllNulls branch since afaik it was only there to check the nullability of the new columns, but let me know if you guys would want to keep it.

@wjones127 wjones127 requested a review from westonpace June 29, 2026 23:44

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that approach even better, thanks!

@westonpace westonpace merged commit 4d866ea into lance-format:main Jun 30, 2026
30 checks passed
@Ali2Arslan Ali2Arslan deleted the feat/all-null-map-columns branch July 1, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants