Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
pgsql-mid fails on relations with more than 32767 members #713
pgsql-mid stores the member offset for relations in the planet_osm_rels table as smallints (way_off, rell_off columns). That means that the number of members it can safely handle is restricted to 32767. However, the limit is nowhere checked. Trying to import a larger relation leads to the error:
This happened recently with relation 7066589.
osm2pgsql should drop relations that are too large early in the processing. What would a reasonable upper limit be?
There are currently 4 multipolygons in the database with more than 6k members. The largest route relation is around 8k. I don't have the numbers for boundary relations right now but I wouldn't expect them to go much beyond those numbers. So I don't think anything with more than 8k members makes sense. I would also be okay with an even lower limit.
if you do limit the number of members to some number less than 9223372036854775807 :-), I think that it would be reasonable to return such a limit in the capabilities API call (and enforce it in the API too).
I'm actually not sure if I buy it that we shouldn't handle relations with 32'767 members though (creating such a large relation is stupid, but normally we don't police such things).
@pnorman that is true, but on the other hand osm2pgsql does not live in isolation and a limit enforced by it, particularly if it would be in the few 1000, would definitely impact a lot of data consumers. I know that it is not particularly liked truth, but a lot of people use osm2pgsql imports for other purposes than just rendering.
Well, it does have an even worse impact on data consumers if their import chain breaks when osm2pgsql (or their import tool of choice) runs out of memory. So the question is, what is a reasonable limit that allows mapper to add the data they want and consumers to actually use the data using reasonably-sized hardware.
Please don't do tight limits. Sometimes I use osm2pgsql with non-osm machine-generated data to have a look at it. If there's a technical issue with 32767 and going from smallint to int will be a bigger problem, then let it be 32767.
You can produce a warning on slower than usual processing of relations together with their ID - this will catch not only relations that are big in terms of numbers, but also some algorithmic corner cases that might happen in invalid geometries.
added a commit
Mar 13, 2017
Out of interest, I have loaded the particular changeset in JOSM.
For completeness sake, especially in relation to your remarks about the ill-adviced import you made on the mailing list (which can be read here on GIS Nabble: http://gis.19327.n8.nabble.com/osm2pgsql-update-failure-insert-rel-failed-td5892960.html), I think it fair to the OSM community member(s) who initiated this and uploaded the offending relation, to list the discussion and import proposal.
As it turned out after the loading of the changeset, the data has been imported by (a member of) the Brazilian community, based on the following e-mail list post dated the 5th of March, and a Wiki import proposal referenced by it:
Actually, there doesn't seem to have been proper discussion, as the original post doesn't seem to have any follow up...
Based on knowledge of Italian and some Spanish, I can make out the import, which constitutes the geodetic base network of Brasil imported as nodes, as also witnessed by the man_made=survey_point tag on all nodes, was mainly done to facilitate better alignment of aerial photography and objects to digitize using a solid reference, and to help identify remote locations and villages.
By itself, if the data has been released in full accordance with OpenStreetMap license, I can imagine this being of use to a community like the one in Brasil.
The actual problem arose because the thousands of individual nodes were grouped into relations representing individual geodetic networks. The Wiki import page mentions this intention to create relations for these networks (which indeed was ill-adviced with a total of 77769 nodes, and 69887 alone for the vertical reference leveling network...)
Maybe it would be good to advice the community both about import guidelines, but also, if the data is in accordance with OSM, to not create relations of them, but instead just tag individual nodes with the network reference, so they can be identified (and potentially selected) as belonging to a certain network in OSM tools like JOSM or Overpass Turbo.
The sheer number of nodes (tens of thousands) by itself should pose no issue for OSM or import tools. Just grouping them in a relation is problematic.
This really gives food for thought about how to handle future incidents...