Skip to content

Detect common import errors early #18

lxbarth opened this Issue Oct 14, 2013 · 5 comments

3 participants

OSM Lab member
lxbarth commented Oct 14, 2013

@emacsen - based on your review, what are common mistakes importers make? Looking for a list here that we can use as a basis for a daily script or something that creates a report that we can use for cleanups.

OSM Lab member
lxbarth commented Oct 14, 2013

From @emacsen's note to imports list:

  1. Users leaving T-interesection issues (found in the josm validator, solveable by "Add nodes at intersections", but not all done)
  2. Users leaving crossing buildings, where a node sits on the line of another way (found in the josm validator)
  3. Users leaving ways which are close, but do not touch
  4. People tagging building information (building type, etc) on address nodes
  5. Hyphen addresses have padded numbers (I think I fixed this)
  6. Buildings from previous work not conflated
  7. Multipolygons not being copied right (I think this is solved with merge layer, vs c&p, but any other ideas would be appreciated)
  8. People missing that roads are running through buildings
OSM Lab member
pnorman commented Oct 17, 2013

1 and 2 are probably fixed by the same pre-import code fix.

4 becomes much less of an issue if merging is done, if not, it remains an issue at the same rate

OSM Lab member
lxbarth commented Oct 18, 2013

@emacsen @pnorman - what are good existing tools to catch these errors? keepright? geofabrik? @pnorman - what are you using for keeping tabs on OSM data quality?

emacsen commented Oct 18, 2013

Most of the Josm checks are good, though I wish they were improved. The intersecting ways check, for example, is accurate but could be improved with some additional checks (roads intersecting buildings is good to key on, roads intersecting administrative boundaries is not).

Some additional checks we should be making are checks for duplicate addresses. This is not always an error, but it can be, especially when the address is located outside the building, and especially when there are two naked addresses (addresses without an associated POI).

For a short time, I used changemonger to monitor data quality in an area, but gave up on it due to its verbosity. With changewithin, I've thought of tying the tools together. Changemonger's classifications and labels are nice. The labels are what I was considering adding to changewithin, and I was thinking that changemonger could have a simplified mode where it didn't contact the API for additional data about nodes, or else it could use AugmentedDiffs for the same data.

OSM Lab member
lxbarth commented Oct 22, 2013


Let's start the import slowly again and make sure we reach out to individuals in a friendly way where we find mistakes. We'll use JOSM for reviews.

@lxbarth lxbarth closed this Oct 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.