Restore DF 51 SchemaAdapter cast behaviour in ParquetOpener#45
Merged
zhuqi-lucas merged 1 commit intoApr 1, 2026
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Restores DataFusion 51–style schema adaptation behavior in ParquetOpener’s replace_schema step so schema-evolved Parquet files (type changes and nested field metadata differences) can be read by casting/retyping columns instead of failing strict schema validation.
Changes:
- Adds per-column adaptation in
replace_schema: fast-path zero-copy when types match, otherwise attemptsarrow::compute::cast, and finally rebuilds array data with the targetDataTypefor nested metadata differences. - Extends Arrow imports to include
ArrayReffor the adapted array vector.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ea67f67 to
24339a5
Compare
DF 52 removed SchemaAdapter which handled type/field-name mismatches between file and table schemas. The replace_schema step now: 1. Casts columns via arrow::compute::cast when types differ (e.g. Utf8 → Date32 for schema evolution) 2. Rebuilds arrays with target DataType when metadata differs (e.g. List inner field name/nullability mismatch) This does NOT force nullability — that's the caller's responsibility (e.g. atlas's adapt_table_schema_for_parquet for file columns). Tests: - test_utf8_to_date32_schema_evolution - test_list_field_name_and_nullability_mismatch (quotes_v1 regression) - test_nullability_mismatch_non_null_to_nullable
24339a5 to
fd292b6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restore DF 51's
SchemaAdapter::map_batch()cast behaviour in thereplace_schemastep ofParquetOpener.Problem
DF 52 removed
SchemaAdapterand thereplace_schemastep now does strict type validation viaRecordBatch::try_new_with_options. This fails for schema-evolved files where:conditionsvselement)non-null Int32vsInt32)Utf8vsDate32)Error:
Fix
In
replace_schema, for each column:arrow::compute::cast(handles Utf8→Date32 etc)into_builder().data_type(target).build()This matches DF 51's
SchemaAdapter::map_batch()+cast_column()pipeline.