pgsql-mid fails on relations with more than 32767 members #713

Closed
lonvia opened this Issue Mar 12, 2017 · 8 comments

Comments

Projects
None yet
5 participants
@lonvia
Collaborator

lonvia commented Mar 12, 2017

pgsql-mid stores the member offset for relations in the planet_osm_rels table as smallints (way_off, rell_off columns). That means that the number of members it can safely handle is restricted to 32767. However, the limit is nowhere checked. Trying to import a larger relation leads to the error:

 Osm2pgsql failed due to ERROR: insert_rel failed: ERROR:  value "37945" is out of range for type smallint

This happened recently with relation 7066589.

osm2pgsql should drop relations that are too large early in the processing. What would a reasonable upper limit be?

@pnorman

This comment has been minimized.

Show comment
Hide comment
@pnorman

pnorman Mar 12, 2017

Collaborator

osm2pgsql should drop relations that are too large early in the processing. What would a reasonable upper limit be?

32767 would work, but a relation of that size is unreasonable.

Collaborator

pnorman commented Mar 12, 2017

osm2pgsql should drop relations that are too large early in the processing. What would a reasonable upper limit be?

32767 would work, but a relation of that size is unreasonable.

@lonvia

This comment has been minimized.

Show comment
Hide comment
@lonvia

lonvia Mar 12, 2017

Collaborator

There are currently 4 multipolygons in the database with more than 6k members. The largest route relation is around 8k. I don't have the numbers for boundary relations right now but I wouldn't expect them to go much beyond those numbers. So I don't think anything with more than 8k members makes sense. I would also be okay with an even lower limit.

Collaborator

lonvia commented Mar 12, 2017

There are currently 4 multipolygons in the database with more than 6k members. The largest route relation is around 8k. I don't have the numbers for boundary relations right now but I wouldn't expect them to go much beyond those numbers. So I don't think anything with more than 8k members makes sense. I would also be okay with an even lower limit.

@simonpoole

This comment has been minimized.

Show comment
Hide comment
@simonpoole

simonpoole Mar 12, 2017

Contributor

if you do limit the number of members to some number less than 9223372036854775807 :-), I think that it would be reasonable to return such a limit in the capabilities API call (and enforce it in the API too).

I'm actually not sure if I buy it that we shouldn't handle relations with 32'767 members though (creating such a large relation is stupid, but normally we don't police such things).

Contributor

simonpoole commented Mar 12, 2017

if you do limit the number of members to some number less than 9223372036854775807 :-), I think that it would be reasonable to return such a limit in the capabilities API call (and enforce it in the API too).

I'm actually not sure if I buy it that we shouldn't handle relations with 32'767 members though (creating such a large relation is stupid, but normally we don't police such things).

@pnorman

This comment has been minimized.

Show comment
Hide comment
@pnorman

pnorman Mar 13, 2017

Collaborator

You might be thinking of the wrong issue tracker, this is the osm2pgsql one. We don't have a capabilities call and our limit isn't the limit of the OSM API.

Collaborator

pnorman commented Mar 13, 2017

You might be thinking of the wrong issue tracker, this is the osm2pgsql one. We don't have a capabilities call and our limit isn't the limit of the OSM API.

@simonpoole

This comment has been minimized.

Show comment
Hide comment
@simonpoole

simonpoole Mar 13, 2017

Contributor

@pnorman that is true, but on the other hand osm2pgsql does not live in isolation and a limit enforced by it, particularly if it would be in the few 1000, would definitely impact a lot of data consumers. I know that it is not particularly liked truth, but a lot of people use osm2pgsql imports for other purposes than just rendering.

Contributor

simonpoole commented Mar 13, 2017

@pnorman that is true, but on the other hand osm2pgsql does not live in isolation and a limit enforced by it, particularly if it would be in the few 1000, would definitely impact a lot of data consumers. I know that it is not particularly liked truth, but a lot of people use osm2pgsql imports for other purposes than just rendering.

@lonvia

This comment has been minimized.

Show comment
Hide comment
@lonvia

lonvia Mar 13, 2017

Collaborator

Well, it does have an even worse impact on data consumers if their import chain breaks when osm2pgsql (or their import tool of choice) runs out of memory. So the question is, what is a reasonable limit that allows mapper to add the data they want and consumers to actually use the data using reasonably-sized hardware.

Collaborator

lonvia commented Mar 13, 2017

Well, it does have an even worse impact on data consumers if their import chain breaks when osm2pgsql (or their import tool of choice) runs out of memory. So the question is, what is a reasonable limit that allows mapper to add the data they want and consumers to actually use the data using reasonably-sized hardware.

@Komzpa

This comment has been minimized.

Show comment
Hide comment
@Komzpa

Komzpa Mar 13, 2017

Contributor

Please don't do tight limits. Sometimes I use osm2pgsql with non-osm machine-generated data to have a look at it. If there's a technical issue with 32767 and going from smallint to int will be a bigger problem, then let it be 32767.

You can produce a warning on slower than usual processing of relations together with their ID - this will catch not only relations that are big in terms of numbers, but also some algorithmic corner cases that might happen in invalid geometries.

Contributor

Komzpa commented Mar 13, 2017

Please don't do tight limits. Sometimes I use osm2pgsql with non-osm machine-generated data to have a look at it. If there's a technical issue with 32767 and going from smallint to int will be a bigger problem, then let it be 32767.

You can produce a warning on slower than usual processing of relations together with their ID - this will catch not only relations that are big in terms of numbers, but also some algorithmic corner cases that might happen in invalid geometries.

lonvia added a commit to lonvia/osm2pgsql that referenced this issue Mar 13, 2017

ignore relations with more than 32k members
There is a hard restriction in the pgsql middle tables, where
relation members are indexed using a smallint. To remain
consistent, drop all relations which exceed this size.

Fixes #713.
@mboeringa

This comment has been minimized.

Show comment
Hide comment
@mboeringa

mboeringa Mar 13, 2017

@lonvia ,

Out of interest, I have loaded the particular changeset in JOSM.

For completeness sake, especially in relation to your remarks about the ill-adviced import you made on the mailing list (which can be read here on GIS Nabble: http://gis.19327.n8.nabble.com/osm2pgsql-update-failure-insert-rel-failed-td5892960.html), I think it fair to the OSM community member(s) who initiated this and uploaded the offending relation, to list the discussion and import proposal.

As it turned out after the loading of the changeset, the data has been imported by (a member of) the Brazilian community, based on the following e-mail list post dated the 5th of March, and a Wiki import proposal referenced by it:

https://lists.openstreetmap.org/pipermail/talk-br/2017-March/012063.html
https://wiki.openstreetmap.org/wiki/Importa%C3%A7%C3%A3o_das_Redes_Geod%C3%A9sicas_do_IBGE

Actually, there doesn't seem to have been proper discussion, as the original post doesn't seem to have any follow up...

Based on knowledge of Italian and some Spanish, I can make out the import, which constitutes the geodetic base network of Brasil imported as nodes, as also witnessed by the man_made=survey_point tag on all nodes, was mainly done to facilitate better alignment of aerial photography and objects to digitize using a solid reference, and to help identify remote locations and villages.

By itself, if the data has been released in full accordance with OpenStreetMap license, I can imagine this being of use to a community like the one in Brasil.

The actual problem arose because the thousands of individual nodes were grouped into relations representing individual geodetic networks. The Wiki import page mentions this intention to create relations for these networks (which indeed was ill-adviced with a total of 77769 nodes, and 69887 alone for the vertical reference leveling network...)

Maybe it would be good to advice the community both about import guidelines, but also, if the data is in accordance with OSM, to not create relations of them, but instead just tag individual nodes with the network reference, so they can be identified (and potentially selected) as belonging to a certain network in OSM tools like JOSM or Overpass Turbo.

The sheer number of nodes (tens of thousands) by itself should pose no issue for OSM or import tools. Just grouping them in a relation is problematic.

Lastly:
All of this also raises another more general issue: it is likely more people might consider a similar action on nodes. While it is highly unlikely relations of ways will exceed >32k members, relations for grouping of nodes, as witnessed by this particular import of a geodetic network, could easily exceed this figure.

This really gives food for thought about how to handle future incidents...

mboeringa commented Mar 13, 2017

@lonvia ,

Out of interest, I have loaded the particular changeset in JOSM.

For completeness sake, especially in relation to your remarks about the ill-adviced import you made on the mailing list (which can be read here on GIS Nabble: http://gis.19327.n8.nabble.com/osm2pgsql-update-failure-insert-rel-failed-td5892960.html), I think it fair to the OSM community member(s) who initiated this and uploaded the offending relation, to list the discussion and import proposal.

As it turned out after the loading of the changeset, the data has been imported by (a member of) the Brazilian community, based on the following e-mail list post dated the 5th of March, and a Wiki import proposal referenced by it:

https://lists.openstreetmap.org/pipermail/talk-br/2017-March/012063.html
https://wiki.openstreetmap.org/wiki/Importa%C3%A7%C3%A3o_das_Redes_Geod%C3%A9sicas_do_IBGE

Actually, there doesn't seem to have been proper discussion, as the original post doesn't seem to have any follow up...

Based on knowledge of Italian and some Spanish, I can make out the import, which constitutes the geodetic base network of Brasil imported as nodes, as also witnessed by the man_made=survey_point tag on all nodes, was mainly done to facilitate better alignment of aerial photography and objects to digitize using a solid reference, and to help identify remote locations and villages.

By itself, if the data has been released in full accordance with OpenStreetMap license, I can imagine this being of use to a community like the one in Brasil.

The actual problem arose because the thousands of individual nodes were grouped into relations representing individual geodetic networks. The Wiki import page mentions this intention to create relations for these networks (which indeed was ill-adviced with a total of 77769 nodes, and 69887 alone for the vertical reference leveling network...)

Maybe it would be good to advice the community both about import guidelines, but also, if the data is in accordance with OSM, to not create relations of them, but instead just tag individual nodes with the network reference, so they can be identified (and potentially selected) as belonging to a certain network in OSM tools like JOSM or Overpass Turbo.

The sheer number of nodes (tens of thousands) by itself should pose no issue for OSM or import tools. Just grouping them in a relation is problematic.

Lastly:
All of this also raises another more general issue: it is likely more people might consider a similar action on nodes. While it is highly unlikely relations of ways will exceed >32k members, relations for grouping of nodes, as witnessed by this particular import of a geodetic network, could easily exceed this figure.

This really gives food for thought about how to handle future incidents...

@pnorman pnorman closed this in #716 Mar 13, 2017

pnorman added a commit to pnorman/osm2pgsql that referenced this issue Mar 14, 2017

ignore relations with more than 32k members
There is a hard restriction in the pgsql middle tables, where
relation members are indexed using a smallint. To remain
consistent, drop all relations which exceed this size.

Fixes #713.

(cherry picked from commit 8763ce9)

pnorman added a commit to pnorman/osm2pgsql that referenced this issue Mar 14, 2017

ignore relations with more than 32k members
There is a hard restriction in the pgsql middle tables, where
relation members are indexed using a smallint. To remain
consistent, drop all relations which exceed this size.

Fixes #713.

(cherry picked from commit 8763ce9)

pnorman added a commit that referenced this issue Apr 13, 2017

ignore relations with more than 32k members
There is a hard restriction in the pgsql middle tables, where
relation members are indexed using a smallint. To remain
consistent, drop all relations which exceed this size.

Fixes #713.

(cherry picked from commit 8763ce9)

raspbian-autopush pushed a commit to raspbian-packages/osm2pgsql that referenced this issue Apr 26, 2017

Bas Couwenberg
osm2pgsql (0.92.0+ds-2) unstable; urgency=medium
  * Add upstream patches from 0.92.x branch to fix two important issues:
    - Ignore relations with more than 32k members.
      There is a hard restriction in the pgsql middle tables, where
      relation members are indexed using a smallint. To remain
      consistent, drop all relations which exceed this size.
      openstreetmap/osm2pgsql#713
    - Use the same logic for queuing pending ways with multi and pgsql.
      Fixes ways disappearing from the output table.
      openstreetmap/osm2pgsql#735
    (closes: #860273)

[dgit import unpatched osm2pgsql 0.92.0+ds-2]

@Hoverbear Hoverbear referenced this issue in sozialhelden/wheelmap Apr 26, 2017

Open

Node has disappeared #521

@mmd-osm mmd-osm referenced this issue in openstreetmap/openstreetmap-website Jan 11, 2018

Closed

Limits on number of tags and relation members #1711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment