Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter geometries based on type #46

Open
RaczeQ opened this issue Apr 29, 2024 · 2 comments
Open

Filter geometries based on type #46

RaczeQ opened this issue Apr 29, 2024 · 2 comments

Comments

@RaczeQ
Copy link

RaczeQ commented Apr 29, 2024

Hi, I'm wondering if it would be possible to have a WkbType column and filter out geometries based on a given type (Point, LineString, Polygon etc). There are some compute functions available, there even is unique_geometry_types, but I'm not sure if any of those could help me in my use case.

@kylebarron
Copy link
Member

If you can access the indices of each geometry type, then you can do something like in developmentseed/lonboard#491 with pyarrow.Table.take instead of DataFrame.iloc

@paleolimbot
Copy link
Contributor

You're definitely right that something like geoarrow.pyarrow.geometry_type(x) (returning something the same length as x) would be a very helpful compute function for a lot of reasons. It's possible to do this using purely pyarrow compute, although we clearly have the ability to do this more efficiently/generically (since we can compute the unique geometry types), it's just not wired up yet.

import geoarrow.pyarrow as ga
import pyarrow as pa
import pyarrow.compute as pc

wkbs = ga.as_wkb(["POINT (0 1)", "LINESTRING Z (0 0 1, 1 1 2)", "MULTIPOINT (0 0, 1 1)"])

# Doesn't work with nulls
assert wkbs.null_count == 0

# Only works with little-endian WKB
endian_byte = pc.binary_slice(wkbs.storage, 0, 1)
endian = pa.Array.from_buffers(pa.int8(), len(endian_byte), [endian_byte.buffers()[0], endian_byte.buffers()[2]])
assert pc.all(pc.equal(endian, 1)).as_py()

wkb_type_bytes = pc.binary_slice(wkbs.storage, 1, 5)
geometry_type = pa.Array.from_buffers(pa.uint32(), len(wkb_type_bytes), [wkb_type_bytes.buffers()[0], wkb_type_bytes.buffers()[2]])

# Might have to do some extra work if you are expecting ZM WKB
one_thousand = pa.scalar(1000, pa.uint32())
geometry_type = pc.subtract(geometry_type, pc.multiply(pc.divide(geometry_type, one_thousand), one_thousand))

# If you're expecting EWKB you might also have to mask off the high bits
mask = pa.scalar(0x00FFFFFF, pa.uint32())
geometry_type = pc.bit_wise_and(geometry_type, mask)

geometry_type
#> <pyarrow.lib.UInt32Array object at 0x1135c5de0>
#> [
#>   1,
#>   2,
#>   4
#> ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants