New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid point parquet file #2
Comments
I think that is a long-standing issue with the fixed-size list implementation in Parquet ( apache/arrow#35692 , apache/arrow#24425 ), at least on the Arrow C++ side. Practically it means you can't read NULL points from a Parquet file if you use the interleaved representation (although you can write them no problem). |
For testing purposes I should probably render all the example files to Arrow IPC as well since it's unlikely any fix to that will be widely available in the next few months. |
IMO saving as IPC makes the most sense since this is nominally test data for geoarrow, not geoparquet. Also IPC is able to exactly mirror every type in Arrow, whereas in the future unions won't be able to be represented in Parquet right? |
I think Parquet can model a sparse union as a struct but I don't know if that's something that is useful or not. In any case, rendering those examples to IPC is the best fit since, as you noted, it can perfectly represent an Arrow type. |
I forgot to create a branch + PR (😬 ) but they should all be there in IPC format! (e.g., https://github.com/geoarrow/geoarrow-data/blob/main/example/README.md ) |
Trying to load
example-point-interleaved.parquet
fails in both pyarrow and Rust.pyarrow.parquet.read_table('example-point-interleaved.parquet')
gives:Rust (arrow2/parquet2) gives:
The text was updated successfully, but these errors were encountered: