-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add polygon extract queries to the Python API #22
Comments
+1, should be doable by:
|
I attempted a first pass at this, but got a bit stuck. From your comment, are you suggesting that this can be done entirely in Python using the existing osmx bindings, perhaps pending addressing #23? I'm using the s2sphere library as it appears to provide an s2geometry installation in an easily pip installable package so that should address getting cell ids. In attempting to unpack
Can I pass the cell id I get back from RegionCoverer directly to the |
Yes, if your coverer has both minimum and maximum cell levels set to 16, you can then query cell_node directly with this cell ID. This can be implemented as a first pass. It's not optimal though, because a huge region might need thousands or even millions of level 16 cells. A more efficient way is to let cells be anywhere from level 0 to 16 for a sparse representation of the covering; because every S2 parent cell shares a prefix with all its children, you can use LMDB cursor ranges to iterate over child cells efficiently like is done in the C++ code https://github.com/protomaps/OSMExpress/blob/master/src/storage.cpp#L171 The s2sphere package looks neat, I haven't tried it. I believe the S2 source tree itself has "official" python bindings but unsure what the workflow is to get that packaged up neatly. If the s2sphere library is easier to install and has everything we need, we should use that. |
I think
|
Looks like I wasn't setting minimum as well as maximum. Will play around with this a bit more. The s2sphere region coverer returns a list of CellId which does have the
It does here but it's not obvious to me either how to include them in a python/pip project easily.
🤦 sometimes reading source on a Monday am is hard? Thanks for the hints, I'll update with progress, no ETA though. If you find another use case for this or begin working on it let me know. |
I managed to spend some time on this today, creating a Python CLI that generates a GeoJson file of nodes and ways filtered by the requested tags and bounding box. Lots of possible future work left here, including but not limited to:
https://gist.github.com/CloudNiner/8b020f434d0509419032c6b2fe0577c5 |
Looks good so far! some random thoughts, some directly related and some more speculative:
This should be done outside of osmx, but I think for ways you need to 1) ensure the orientation is correct and 2) ensure it is not self-intersecting, which OSM ways can be, but is not allowed per the OGC Simple Features spec
the osmx API should only have one way to specify the query region, taking a covering - a list of cell IDs - as the argument. We should have some helper methods to convert from common geometry specs to S2 regions and coverings. In the C++ code I called this "region", so we should create another python module probably called "region" to handle this
unrelated to osmx but I implemented something similar in another project which you may want to look at: https://github.com/hotosm/osm-export-tool-python/blob/master/osm_export_tool/sql.py
I would avoid having to mirror the layout of OSM objects in your client code because you need to malloc for every object. Ideally the osmx bindings make it easy to take advantage of zero-copy views of OSM objects - this may mean the API provides functions that take sets of object IDs and uses
node_ids should generally be a python |
I updated my prior example to take an s2.S2Region instead of a bbox and wrote the Python code to convert a Unfortunately, the parts of this example that might be useful in the OSMExpress python bindings (polygon -> s2region and node_ids_for_region) now utilize the s2 SWIG bindings because S2Polygon isn't implemented in s2sphere. This prevented a PR because the bindings are not directly pip installable. s2-py is a potential option but installs an out of date (from 2018) version of the library. It's also troublesome (no guidance from s2 docs and a few open issues relating to the SWIG bindings) to install the SWIG bindings to anywhere other than the global default python installation. There is an open PR that adds support for building a pip wheel but its been dormant for awhile now. Regardless, I hope the example is instructive and provides a useful start to those looking to perform similar tasks or jump start a future effort to add some of these operations to the OSMX Python bindings when there's a clear path forward for inclusion of the s2 bindings as a dependency. |
I agree that the bindings not being pip installable is an issue, thanks for figuring out the workaround. Were you able to accomplish the goal of your script in an efficient manner using Python? It's useful to know if investing more in Python packaging is worth it for use cases similar to yours - If you encountered performance issues, it may be better to direct developers towards just using the C++ API |
I'd say yes in this case. There's a few different performance bottlenecks a user might run into, but once I had the efficient implementation for the function The search_by_ways code begins to slow as you increase query area or the number of results returned but as far as I can tell, the issue is just volume of operations being performed. I was still able to write 100s of mb of geojson results for areas up to 1000 sq kms returning thousands of ways, it just took 1-2mins. Whether that's satisfactory or not depends on individual use case, for us it was fine since our use case is focused on smaller areas with few results at a time. |
https://gist.github.com/CloudNiner/e5552aaae6c235193b8f196558b92e2c#file-osmx_utils-py-L23 This can be made even faster :) as is, on line 23, if you are traversing a cell at a low level, I think you are potentially issuing millions of queries to Instead, use the child_begin cell to set the lmdb cursor state and then move the cursor forward. You won't be able to use Curious to see if this improves on the slowness of big query areas you mentioned. |
I'm closing this issue because of lack of activity. The main use case for OSMExpress remains getting PBFs out of a minutely-updated planet .osmx. Analytical use cases in Python may be better served by https://docs.geodesk.com/python which looks like an excellent new project. |
We're exploring an application of OSMExpress that is interested in retrieving objects filtered by specific tags from a polygon bounding area.
In its simplest form, I'm imagining a python method that is something like:
that reproduces the functionality of https://github.com/protomaps/OSMExpress/blob/master/src/extract.cpp#L130-L134
The text was updated successfully, but these errors were encountered: