Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry representation and zero-copy access #31

Open
kylebarron opened this issue Jun 9, 2022 · 5 comments
Open

Geometry representation and zero-copy access #31

kylebarron opened this issue Jun 9, 2022 · 5 comments

Comments

@kylebarron
Copy link
Member

At this point, to get things started, Well Known Binary geometries are stored in an Arrow binary array. But this is an inefficient in-memory format. For one, WKB is not zero copy; offsets (such as where rings start) are not known until the buffer is parsed.

The ideal memory format would be an arrow-native geometry encoding. With Arrow-native arrays, any coordinate of any geometry in the column can be accessed with zero copy and in constant time.

But the geo crate revolves around its own geometry structs, defined in geo-types... So today, whether we store geometries in an array of WKB or in an arrow-native encoding, either way they need to be parsed and copied into geo-types structs.

I've been talking to the georust project a bit (on discord) about reviving their consideration of geometry traits. That is, instead of the georust algorithms requiring geo-types structs as input, it would instead require a trait that implements something like a Point, etc.

This would be super powerful here because we could implement access traits for geometries stored in arrow buffers. The best issue to follow is probably georust/geo#838

@kylebarron
Copy link
Member Author

cc @stuartlynn

@stuartlynn
Copy link
Collaborator

This all sounds good. It sounds like the geo-types as traits is further off than the arrow-native geometry encoding. Given that I wonder if we should be preemptive with the arrow native geometry encoding and write interchangeable utility functions to convert to and from and iterate over geometries in either format (wkb or arrow native)? Not sure if there is a reference implementation being worked on just now for the arrow native encoding, but it seems like developing it here might help test it out in a real world use cases. We could extract anything worthwhile in to it's own crate or another crate as things stabilize.

We would also presumably need to figure out how this interacts with the discussion of custom extension types in polars

@kylebarron
Copy link
Member Author

It sounds like the geo-types as traits is further off than the arrow-native geometry encoding

I think a geo-types trait is indeed a bit of a long-run goal here. But it's not clear to me that an arrow-native geometry encoding is that worthwhile without having the geo-types traits. Because then you'll still need to copy the arrow memory into a geo object for every row on every algorithm.

A stopgap measure could also be to make use of the polars ObjectArray which can wrap any Rust object. But this is just a stopgap because it doesn't store the geometries in Arrow memory, so it isn't able to be shared with e.g. pyarrow.

Not sure if there is a reference implementation being worked on just now for the arrow native encoding

There are a couple implementations in progress e.g. https://github.com/jorisvandenbossche/python-geoarrow, https://github.com/paleolimbot/geoarrow, https://github.com/paleolimbot/geoarrow-cpp.

We would also presumably need to figure out how this interacts with the discussion of custom extension types in polars

I mentioned in the polars discord our use case for Arrow extension data type support, but I'm don't think I explained myself well and the polars author suggested using the Polars object extension type (and I haven't responded yet).

@kylebarron
Copy link
Member Author

I tried to start some hacking here: https://github.com/kylebarron/geo-traits

@kylebarron
Copy link
Member Author

Also ref pola-rs/polars#4014, because currently we can't store arrow-native geometries (according to the spec) in polars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants