-
Notifications
You must be signed in to change notification settings - Fork 71
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Spark may write timestamp as the deprecated int96 physical type in Parquet files. Currently, such data cannot be read correctly in Sail.
- Arrow reads int96 as timestamp with nanosecond unit, while Spark expects microsecond unit. So the valid value range is different.
- Schema analysis request (
printSchema()) fails since we cannot convert the Arrow data type (nanosecond unit) back to Spark data type.
We should respect the Spark schema (stored as a metadata key) when reading the Parquet file. Type casting of timestamp seems possible after the recent upstream fix (apache/arrow-rs#7285). So we should be able to handle this after the next Arrow release.
keen85
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers