-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometry representation and zero-copy access #31
Comments
cc @stuartlynn |
This all sounds good. It sounds like the geo-types as traits is further off than the arrow-native geometry encoding. Given that I wonder if we should be preemptive with the arrow native geometry encoding and write interchangeable utility functions to convert to and from and iterate over geometries in either format (wkb or arrow native)? Not sure if there is a reference implementation being worked on just now for the arrow native encoding, but it seems like developing it here might help test it out in a real world use cases. We could extract anything worthwhile in to it's own crate or another crate as things stabilize. We would also presumably need to figure out how this interacts with the discussion of custom extension types in polars |
I think a A stopgap measure could also be to make use of the polars
There are a couple implementations in progress e.g. https://github.com/jorisvandenbossche/python-geoarrow, https://github.com/paleolimbot/geoarrow, https://github.com/paleolimbot/geoarrow-cpp.
I mentioned in the polars discord our use case for Arrow extension data type support, but I'm don't think I explained myself well and the polars author suggested using the Polars object extension type (and I haven't responded yet). |
I tried to start some hacking here: https://github.com/kylebarron/geo-traits |
Also ref pola-rs/polars#4014, because currently we can't store arrow-native geometries (according to the spec) in polars. |
At this point, to get things started, Well Known Binary geometries are stored in an Arrow binary array. But this is an inefficient in-memory format. For one, WKB is not zero copy; offsets (such as where rings start) are not known until the buffer is parsed.
The ideal memory format would be an arrow-native geometry encoding. With Arrow-native arrays, any coordinate of any geometry in the column can be accessed with zero copy and in constant time.
But the
geo
crate revolves around its own geometry structs, defined ingeo-types
... So today, whether we store geometries in an array of WKB or in an arrow-native encoding, either way they need to be parsed and copied intogeo-types
structs.I've been talking to the
georust
project a bit (on discord) about reviving their consideration of geometry traits. That is, instead of thegeorust
algorithms requiringgeo-types
structs as input, it would instead require a trait that implements something like aPoint
, etc.This would be super powerful here because we could implement access traits for geometries stored in arrow buffers. The best issue to follow is probably georust/geo#838
The text was updated successfully, but these errors were encountered: