Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Detect common import errors early #18

Closed
lxbarth opened this Issue · 5 comments

3 participants

@lxbarth
Owner

@emacsen - based on your review, what are common mistakes importers make? Looking for a list here that we can use as a basis for a daily script or something that creates a report that we can use for cleanups.

@lxbarth
Owner

From @emacsen's note to imports list:

  1. Users leaving T-interesection issues (found in the josm validator, solveable by "Add nodes at intersections", but not all done)
  2. Users leaving crossing buildings, where a node sits on the line of another way (found in the josm validator)
  3. Users leaving ways which are close, but do not touch
  4. People tagging building information (building type, etc) on address nodes
  5. Hyphen addresses have padded numbers (I think I fixed this)
  6. Buildings from previous work not conflated
  7. Multipolygons not being copied right (I think this is solved with merge layer, vs c&p, but any other ideas would be appreciated)
  8. People missing that roads are running through buildings
@pnorman
Owner

1 and 2 are probably fixed by the same pre-import code fix.

4 becomes much less of an issue if merging is done, if not, it remains an issue at the same rate

@lxbarth
Owner

@emacsen @pnorman - what are good existing tools to catch these errors? keepright? geofabrik? @pnorman - what are you using for keeping tabs on OSM data quality?

@emacsen
Owner

Most of the Josm checks are good, though I wish they were improved. The intersecting ways check, for example, is accurate but could be improved with some additional checks (roads intersecting buildings is good to key on, roads intersecting administrative boundaries is not).

Some additional checks we should be making are checks for duplicate addresses. This is not always an error, but it can be, especially when the address is located outside the building, and especially when there are two naked addresses (addresses without an associated POI).

For a short time, I used changemonger to monitor data quality in an area, but gave up on it due to its verbosity. With changewithin, I've thought of tying the tools together. Changemonger's classifications and labels are nice. The labels are what I was considering adding to changewithin, and I was thinking that changemonger could have a simplified mode where it didn't contact the API for additional data about nodes, or else it could use AugmentedDiffs for the same data.

@lxbarth
Owner

Ok.

Let's start the import slowly again and make sure we reach out to individuals in a friendly way where we find mistakes. We'll use JOSM for reviews.

@lxbarth lxbarth closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.