Reading int96 timestamp in Parquet

Spark may write timestamp as the deprecated int96 physical type in Parquet files. Currently, such data cannot be read correctly in Sail.
1. Arrow reads int96 as timestamp with nanosecond unit, while Spark expects microsecond unit. So the valid value range is different.
2. Schema analysis request (`printSchema()`) fails since we cannot convert the Arrow data type (nanosecond unit) back to Spark data type.

We should respect the Spark schema (stored as a metadata key) when reading the Parquet file. Type casting of timestamp seems possible after the recent upstream fix (<https://github.com/apache/arrow-rs/pull/7285>). So we should be able to handle this after the next Arrow release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading int96 timestamp in Parquet #427

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading int96 timestamp in Parquet #427

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions