fix(core): allow all-null Map columns in schema evolution#7462
Conversation
`Schema::all_fields_nullable` walked every field pre-order and required each to be nullable. Arrow's `Map<K, V>` mandates a non-null `entries` struct and non-null `key` (Lance enforces this when building a `Field` from an Arrow `Field`), so any Map column failed the check and `NewColumnTransform::AllNulls` rejected it with "All-null columns must be nullable.", even though the inner fields are offset-addressed and inert for an all-NULL row. Treat a nullable Map field as a leaf for this check: the mandatory non-null inner fields are never exercised when the whole Map value is NULL. A non-nullable Map outer field is still rejected, and the struct recursion is otherwise unchanged. Adds a unit case to `test_all_fields_nullable` and an end-to-end `test_add_column_all_nulls_map` that adds an all-null `Map<Utf8, Float64>` column and asserts it materializes as all-null. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
westonpace
left a comment
There was a problem hiding this comment.
Let's make the fix a bit more of a general fix so we don't have to make another fix soon. Or let me know if there is some case where we would want to recurse into children.
| /// Whether every field can hold a NULL — i.e. whether the schema is compatible | ||
| /// with columns added through `NewColumnTransform::AllNulls`. |
There was a problem hiding this comment.
I don't love this comment because this method is in lance-core and we don't really have any concept of NewColumnTransform::AllNulls at that level (even "add columns" doesn't really make sense)
I wonder if we should move this method to schema_evolution.rs as its the only call site and its behavior seems tailored to schema evolution (e.g. the next caller might not expect that maps are leaves).
There was a problem hiding this comment.
I went a head and deleted the method all together from lance-core, since this method was public, technically it's a breaking change, not sure If I need to do anything to handle that.
| if field.logical_type.is_map() { | ||
| return true; | ||
| } | ||
| field.children.iter().all(Self::field_all_null_compatible) |
There was a problem hiding this comment.
Why recurse into children at all? I think "nullable list with non-nullable items" and "nullable struct with non-nullable child fields" would be compatible with schema evolution too.
I think this fix is perhaps too narrow.
There was a problem hiding this comment.
yeah, we don't need to recurse at all, I in-lined the checked directly into schema_evolution, and had it check the new top-level field was nullable.
That's fair, I had AI start to port some of the standalone changes we made, which at the time were narrow on purpose (since we were only using the btree paths), I'll do a more thorough review and try to make them more general now (across the existing and new PR's). |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Ali2Arslan
left a comment
There was a problem hiding this comment.
@westonpace mind taking another look? I removed the Schema::try_from all together from NewColumnTransform::AllNulls branch since afaik it was only there to check the nullability of the new columns, but let me know if you guys would want to keep it.
westonpace
left a comment
There was a problem hiding this comment.
I like that approach even better, thanks!
Summary
Schema::all_fields_nullablewalks every field in pre-order and requires each to be nullable. Arrow'sMap<K, V>layout mandates a non-nullentriesstruct and a non-nullkey(Lance enforces this when constructing aFieldfrom an ArrowField). As a result, any Map column fails the check, soNewColumnTransform::AllNullsrejects adding an all-null Map column with:…even though the inner
entries/keyfields are offset-addressed and completely inert when the whole Map value is NULL (the offsets pair collapses to a zero-length slice).Fix
Treat a nullable Map field as a leaf for this check — its mandatory non-null inner fields are never exercised for an all-NULL row. Behavior is otherwise unchanged:
nullableflag is still checked);Test plan
test_all_fields_nullable(lance-core) with nullable-Map (accept), non-nullable-Map (reject), and struct-containing-Map (accept) cases.test_add_column_all_nulls_map(lance) — adds an all-nullMap<Utf8, Float64>column end to end, asserts the schema/row count and that the column materializes fully null, and that a non-nullable Map is still rejected.cargo fmt --allcargo clippy -p lance-core -p lance --testscleanMade with Cursor