-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function filter fragments returns the whole dataset #16
Comments
The (I'll take a look at the next issue which is almost certainly a bug!) |
Oh shoot! Would be so ridiculously cool to have predicate pushdown on a geometry level that would work on an "intersects" level even if it just a bounding box. I think the people developing Apache Sedona did it, but having it implemented on function level on a PC would be wonderful. Thanks for the response! |
It could almost certainly be done as a Python user-defined function based on the functions that currently exist in this package. That's not currently a development priority but it's definitely a good idea! |
Note that there is |
I am not inteligent enough to know how difficult would be to implement, but having a the functionality of "Select all geometries that intersect this bounding box " without reading in all the data, is something that is a game changer for Geospatial Analysts working with millions of features. So i'll be watching for that PR. :D |
I was just wondering @paleolimbot How could i write the data so that the function works based on the files that i have provided. Any ideas? |
The easiest way that I know about is a "hilbert sort" ( |
I tried so hard and got so far, but in the end...couldn't make it work... Still returns back the whole dataset. The row groups return batches of 50 which are grouped. Am i missing something? (*cries)
Length of the filtered |
Just a quick naive question: you are sure the Can you provide a reproducible example? The link for the files above doesn't work anymore. |
Hi @jorisvandenbossche. I appologize for the unclear comment. I have updated the above code snippet, and now you can access the files again at the link provided in the beggining. |
I think i have sort of made it work. Instead of reading in the dataset as pyarrow dataset and converting it to a GeoDataset using "from pyarrow.dataset import dataset as pads" i used "from geoarrow.pyarrow.dataset import dataset".
I've used this:
Of course the dataset in filtered_ds is much larger, but that would be consinstent with the documentation, where bbox intersection would be applied. |
Hi! Thank you for the work you do for the community. I don't know how far along this project is, but i have installed the libraries. I imported a geoparquet (one generated with QGIS the other with geopandas) and it reads it in. I might have understood it wrong from the code but would the geometry input from "filter_fragments" method from the GeoDataset class be used as a filter so that you read in only the data that intersects the bounding box of the input from "filter_fragments"? If that is so, i have tried the following
`import geopandas as gpd
import pyarrow.parquet as pa
from pyarrow.parquet import read_table
import shapely
import geoarrow.pyarrow as ga
tb = read_table(r"/home/parquet/buildings.parquet")
dataset = ga.dataset(tb,geometry_columns=["geometry"])
gpdf_mask = gpd.read_file("/home/shapefiles/area_1.gpkg")
bnds = gpdf_mask.iloc[0].geometry.wkt
x = dataset.filter_fragments(bnds)`
I have tried with datasets in both epsg:4326 and epsg:25833. When i run len(x.to_table()) the length of the result is the same as the length of the original dataset. The geometries are saved as wkb.
I have used the following files to do the testing. I dont know if it is a bug or if it is something i am doing wrong.
The text was updated successfully, but these errors were encountered: