-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geoparquet format reader #37
Comments
A GeoPartquet reader would be welcome for sure. It shouldn't be very difficult to implement, since there are Rust implementations for reading Parquet (https://github.com/apache/arrow-rs) and AFAIK geometries are stored as WKB, which is already supported in geozero. |
I'm new to geospatial, but I'm gonna take a look at trying this out! (assuming that pull requests are welcome) |
Just a quick note that in versions of geoparquet so far WKB is the standard, but in future versions we hope to also enable an Arrow-native geometry encoding, see https://github.com/geopandas/geo-arrow-spec/blob/main/format.md. The difference between the two is that a binary buffer of WKB in memory still needs to be parsed to be used. In contrast, an Arrow-native implementation would store geometries in a layout standardized by the Arrow project. This means that it's What this means is that using geozero on a GeoParquet dataset with geometries stored as WKB is essentially already supported by A separate |
I am starting work on a geoparquet crate that will use geozero and could potentially be merged into geozero at some point. FYI: I also will be working on a geoarrow crate and geodatafusion crate, stay tuned for developments.... |
A few notes... I've already started planning how to implement On GeoParquet:
On GeoArrow:
|
I dont quite understand the problem you are referring to. I have written a The implementation I have currently uses parquet, and I was planning on having a feature to be able to select between parquet/parquet2. I do want to use parquet rather than loading it into memory as I am targeting the use case of files that will not fit into memory. (Although I know that the arrow file could be memory mapped) |
It's still worthwhile to have a geoparquet/geoarrow implementation in I think my discussion comment on As mentioned in that discussion, I have a WIP implementation of geoarrow here.
Yes, but if you're implementing the
This is true... but with the geozero approach you need to first load a chunk of data into memory and then copy all of it into |
Do you need to actually copy the memory though? Things like moves into a new-type struct are optimized away: |
That's super interesting. I've never looked into low-level compiled output before. @apps4uco Re-reading my comment I think my tone might've come off too harsh... I'm fully supportive of both approaches! I've tried to closely follow spec discussions on both GeoParquet and GeoArrow and I'd be happy to review your PRs if you'd like 🙂. Keep in mind that neither spec has reached 1.0 yet. WKB-encoded GeoParquet is the most stable (native encodings are still up in the air) so I'd start with that one. You might even be able to leverage my previous geozero PR: #39 |
Geoparquet is getting ready for a 1.0.0-beta.1 soon opengeospatial/geoparquet#147, and I'd love to see support for this in geozero. @apps4uco do you have any existing work to share? Just want to avoid duplication of work |
Would love some feedback on #80 |
Hi, I have been busy on other things, and so havent really advanced on the implementation, For what its work here is the git repo with my latest changes: github.com:apps4uco/geoparquet.git |
The In #186 we removed the minimal, outdated geoarrow integration from the geozero crate and are now pointing users to the Feel free to create issues on the |
It is possible to incorporate in your library the new format launched by the OGC, GeoParquet..This is the link https://github.com/opengeospatial/geoparquet
The text was updated successfully, but these errors were encountered: