Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad multipolygon tags #80

Closed
xificurk opened this issue Sep 19, 2013 · 20 comments
Closed

Bad multipolygon tags #80

xificurk opened this issue Sep 19, 2013 · 20 comments

Comments

@xificurk
Copy link

The parsing of tags for multipolygons seems broken - I'm getting man_made=pier on polygon created from relation 936128 (admin boundary for Poland). This is a very large multipolygon - out of several hundreds of way members only two (way 227911442 and 227911443) have this tag.

I'm using osm2pgsql build from git on 2013-09-16.

@pnorman
Copy link
Collaborator

pnorman commented Sep 19, 2013

Is this with lua or C processing?

@xificurk
Copy link
Author

Well, I'm not sure - I can't find any command line switch or info about what is default. How can I find out?

@pnorman
Copy link
Collaborator

pnorman commented Sep 19, 2013

If you haven't told it to do otherwise, it will use the C processing.

This probably is another case of being too aggressive in trying to form an area from the style of tagged ways and no tags other than type on the multipolygon.

@xificurk
Copy link
Author

I'll do some tests and report back... For completeness I would like to test lua parsing as well, but can't find any documentation regarding this feature - how can I switch from C to lua parsing?

@apmon
Copy link
Contributor

apmon commented Sep 19, 2013

For the lua processing, there isn't much (any) documentation yet.

You activate it with a command line parameter to osm2pgsql of "--tag-transform-script style.lua", where style.lua is the lua script doing the tag transform. The script needs to contain the following functions:

"function filter_tags_node (keyvalues, nokeys)"
"function filter_tags_way (keyvalues, nokeys)"
"function filter_basic_tags_rel (keyvalues, nokeys)"
"function filter_tags_relation_member (keyvalues, keyvaluemembers, roles, membercount)"

For the details, you can best look at the existing sample lua script at:
https://github.com/openstreetmap/osm2pgsql/blob/master/style.lua

@xificurk
Copy link
Author

I've created a simple test for the problem, see https://github.com/xificurk/osm2pgsql-multipolygon-test

Case A) If the multipolygon relation has a polygon key, then the resulting geometries are correct (both lua and C) - polygons from relations and lines from individual ways. (file with_rel_tags.osm)

Case B) If the multipolygon relation has no polygon key, then the resulting geometries are wrong, furthermore the results of lua and C parsing differ. (file without_rel_tags.osm)

Both lua and C do the same mistake - they take all the tags from ways and put them on the resulting polygon (lua goes even further and overwrites original tags that came from the relation). I think the correct behavior is to copy over only polygon tags iff they are present on all the (outer?) ways of multipolygon.

Lua parsing suffers from one more bug in case B) - seems like it imports only "random" part of the linear features that are used in multipolygon relation. See the logs in the mentioned repo.

Important note: If you use C parsing, then what is (or is not) a polygon is determined by import.style. So, If you delete this line (https://github.com/xificurk/osm2pgsql-multipolygon-test/blob/master/import.style#L4), you'll end up with the same results in both cases.

@apmon
Copy link
Contributor

apmon commented Sep 20, 2013

Thank you for your effort of creating those test cases. However, I am starting to wonder, if this might be a "won't fix" bug and the recommendation is to simply tag the multi-polygon relation correctly rather than try and infer/transfer the tagging from the member ways.

There is just too much semantic knowledge of the tags necessary to get this transfer correct in all cases with simple algorithms. The relevant code (for the c transform) is https://github.com/openstreetmap/osm2pgsql/blob/master/tagtransform.c#L523 to line 603.
There are some tags like "fixme" or "import:xyz" that should be ignored altogether, some tags that should be copied to the relation, and some tags should determine if the ways should remain as independent lines in addition to the multi-polygon. But how do you decide which tag is which for the 1000s of different tags?

At the moment it takes the hints from the import.style to decide which ones are which, but there are clearly more complex combinations that are relevant in this decision.

Nevertheless, the algorithm can probably be improved by only copying tags from outer ways that are present on all outer ways, rather than tags that are present on at least one outer way.

@pnorman pnorman mentioned this issue Sep 20, 2013
3 tasks
@xificurk
Copy link
Author

@apmon The main problem is that the current algorithm breaks multipolygon tags even if it's tagged correctly, e.g. the relation mentioned in my initial post.

At the moment we can't get rid of tag transfer completely, because there are simply too many "old-style" tagged multipolygon relations. I think, that whatever strategy for the tag transfer is applied, it should not under (almost) any circumstances break the correctly tagged multipolygons, but it's ok if it produces buggy features for the "old-style" tagged relations - the relation should be fixed anyway.

There are three important points that need to be addressed:

  1. How do we recognize if the relation is tagged old or new way?
    At the moment this is decided by checking if the relation has at least one polygon tag.
    This works in most cases (but not all, e.g. the infamous relation that started this issue report). I think we should consider a more strict approach, specifically: If the relation has only type=multipolygon tag and nothing else, then do the tag transfer (i.e. this is "old-style" tagging), otherwise let the tags be as they are.
    (I did a quick check on smallish data extract from Central Europe - roughly half of the multipolygon relations has type=multipolygon tag only, roughly half has at least one well known polygon tag, less than 1% of multipolygons does not fall into any of these two categories.)

  2. What tags should we transfer on old-style tagged relation?
    Current strategy of simply copying everything is definitely wrong. As mentioned before, it should copy over only the tags that are present on all outer ways.

  3. What to do with member ways? Should they be imported separately?
    Current code tries to mark ways as superseeded regardless of the fact whether it's "old-style" or "new-style" multipolygon, I think this is wrong as well. It should try to mark ways as superseeded only in case of "old-style" tagged multipolygon.
    The way should be marked as superseeded only if all of its tags were transferred to the multipolygon in the previous step. This would ensure that we don't throw out linear features present on the member ways.

The proposed changes will definitely break parsing of some relations and will fix parsing of others. The main theme here is that parsing of correctly tagged multipolygon relations must result in the expected features in pgsql.

@openstreetmap-tiles
Copy link

2013/9/20 Petr Morávek notifications@github.com

  1. What tags should we transfer on old-style tagged relation?
    Current strategy of simply copying everything is definitely wrong. As
    mentioned before, it should copy over only the tags that are present on all
    outer ways.

I'm completely with apmon here. Let's stop interpreting old-style
multipolygons and they will be converted in no time. Let's put editor
warnings for untagged multipolygon relations, let's discourage usage of
old-style MP relations in the wiki...

This is lowering complexity hence augmenting edit straightforwardness for
mappers, not to speak of data consumers.

Probably other lists like talk or tagging would be more appropriate for
discussing this further.

cheers,
Martin

@apmon
Copy link
Contributor

apmon commented Sep 20, 2013

The problem with relation 936128 is that according to osm2pgsql it is an "old style" untaged relation and therefore needs the information from the member ways. The relation itself only has name tags which don't say what the relation is and "land_area=administrative" which according to the default.style is not a valid polygon tag (or linear tag for that matter). So osm2pgsql treats it as an old style untagged multi-polygon and copies over polygon tags from the member ways.

So imho the relation should be fixed to have a valid polygon tag (or the style changed for land_area=administrative" to be a valid polygon tag).

I will add an additional condition though to make sure that only outer way tags present on all outer ways get copied over.

@xificurk
Copy link
Author

@apmon That's why I was suggesting change of how the "old-style" is recognized. Imho there is nothing to fix in tagging scheme of relation 936128, it's multipolygon correctly tagged in "new-style".
osm2pgsql fails in two places here - it thinks it's the "old-style" multipolygon and then transfers too many tags from ways to relation.

Personally, I would go for something like this: https://gist.github.com/xificurk/6644655

@pnorman
Copy link
Collaborator

pnorman commented Sep 20, 2013

relation 936128 (admin boundary for Poland)

This isn't the admin boundary for poland, relation 49715 is.

I'd call this a tagging error myself. @apmon's suggestion of

Nevertheless, the algorithm can probably be improved by only copying tags from outer ways that are present on all outer ways, rather than tags that are present on at least one outer way

is probably the best approach to it, but this is actually a difference that may catch some people up, particularly for large multipolygons where they do not have all member ways downloaded. I'm thinking of some of the past issues with the large lakes in Australia and Canada and people asking on the local lists for help.

We're probably best dealing with this post-0.84

@apmon
Copy link
Contributor

apmon commented Sep 20, 2013

This behaviour has already recently changed, and it is well possible that the current issue is a direct result of that change ( 236fa7e )

The question though remains what the correct behaviour is, and what is the least problematic version, as it is likely not possible to have the "perfect solution" here.

However, I think this is a wider discussion and probably should be moved to dev or perhaps even talk to identify what the most acceptable version of this is.

@pnorman
Copy link
Collaborator

pnorman commented Sep 21, 2013

This behaviour has already recently changed, and it is well possible that the current issue is a direct result of that change ( 236fa7e )

Confirmed that it's a new issue. I downloaded /relation/936128/full from my local mirror

Locally with 0.81:

poland_test=# SELECT osm_id,way_area,name,man_made FROM planet_osm_polygon WHERE osm_id=-936128;
 osm_id  |  way_area   |     name      | man_made
---------+-------------+---------------+----------
 -936128 | 8.27603e+11 | Poland (land) |
 -936128 |     19211.7 | Poland (land) |
 -936128 |     8679.37 | Poland (land) |
(3 rows)

On errol (osm2pgsql master)

osm2pgsql=> SELECT osm_id,way_area,name,man_made FROM planet_osm_polygon WHERE osm_id=-936128;
 osm_id  |  way_area   |     name      | man_made
---------+-------------+---------------+----------
 -936128 |     8675.62 | Poland (land) | pier
 -936128 |       19208 | Poland (land) | pier
 -936128 | 8.27604e+11 | Poland (land) | pier
(3 rows)

Aside: I think this MP has an inner and outer touching near way 218521264

@xificurk
Copy link
Author

@pnorman Sorry for the mistake, you're right, it's not admin boundary, but this specific example is not really important for this issue (it was just the relation that alerted me to the problem).

I still think that there is no tagging error in the mentioned relation, but let me give you another example - imagine you don't care about e.g. water related feature, so you delete the line with waterway from your import.style. Suddenly all the riverbank multipolygons are considered "old-style" and get tags from their member ways resulting in completely different features than if you wouldn't have deleted the waterway line from import.style. In this case I doubt you will argue that it is caused by wrong tagging.
The only way how to reliable force osm2pgsql to treat the multipolygon relation as "new-style" tagging is to add area=yes to the multipolygon relation, which is just silly tagging for "renderer".

I agree that this needs to be discussed more widely, but since the behavior has already changed, I would suggest to resolve this issue before 0.84 to limit the change to a single BC break between two versions.

apmon added a commit that referenced this issue Sep 22, 2013
…uter ways

Only copy over tags that are present on all outer ways to the relation.

Member ways need to have all their tags on the relation, otherwise they will
be processed independently as ways.

This is in response to issue #80
@apmon
Copy link
Contributor

apmon commented Sep 22, 2013

I have now implemented the extra check that only tags that are present on all outer ways will be copied over to the relation. This should fix the issue with the man_made=pier turning up on the relation.

Do you have examples of relations where this is not sufficient to fix the overall issue?

@xificurk
Copy link
Author

@apmon I don't have a specific example, but see my previous post for a general example, where this code still (may) fail. I think the decision condition, whether the tag transfer should even ococur, needs improvement.

btw, I've started thread on talk about this ( https://lists.openstreetmap.org/pipermail/talk/2013-September/068189.html )

@pnorman
Copy link
Collaborator

pnorman commented Sep 24, 2013

imagine you don't care about e.g. water related feature, so you delete the line with waterway from your import.style

It's worth noting that doing this will change area behaviour in non-multi polygon cases, so I don't think it changing MP area behaviour is necessarily wrong. As a somewhat silly example, consider a closed way with highway=foo waterway=bar. With default.style this will end up in the polygon table, if you remove the waterway from the .style it will end up in the line table. I believe changing waterway from polygon to phstore will work properly. In any case, it's getting somewhat astray from this specific issue which is fixed in 29fa1d0.

I'll be opening up new issues about a couple of items that have come up here

@xificurk
Copy link
Author

@pnorman I'm not really sure if there is even sensible combination of polygon tag and "might-be-polygon" tag (as your waterway and highway example). Anyway, the only thing that changes in your example is whether the way goes to line or polygon table. In case of multipolygon the difference might be much more drastic: polygon with relation tags + several lines with tags from ways vs. polygon with tags from relation and ways + zero lines.

apmon added a commit to apmon/osm2pgsql that referenced this issue Oct 22, 2013
…uter ways

Only copy over tags that are present on all outer ways to the relation.

Member ways need to have all their tags on the relation, otherwise they will
be processed independently as ways.

This is in response to issue osm2pgsql-dev#80
@lonvia
Copy link
Collaborator

lonvia commented Dec 22, 2018

Should be fixed by no longer handling old-style multipolygons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants