@emacsen - based on your review, what are common mistakes importers make? Looking for a list here that we can use as a basis for a daily script or something that creates a report that we can use for cleanups.
From @emacsen's note to imports list:
Users leaving T-interesection issues (found in the josm validator,
solveable by "Add nodes at intersections", but not all done)
Users leaving crossing buildings, where a node sits on the line of
another way (found in the josm validator)
Users leaving ways which are close, but do not touch
People tagging building information (building type, etc) on address nodes
Hyphen addresses have padded numbers (I think I fixed this)
Buildings from previous work not conflated
Multipolygons not being copied right (I think this is solved with
merge layer, vs c&p, but any other ideas would be appreciated)
People missing that roads are running through buildings
1 and 2 are probably fixed by the same pre-import code fix.
4 becomes much less of an issue if merging is done, if not, it remains an issue at the same rate
@emacsen @pnorman - what are good existing tools to catch these errors? keepright? geofabrik? @pnorman - what are you using for keeping tabs on OSM data quality?
Most of the Josm checks are good, though I wish they were improved. The intersecting ways check, for example, is accurate but could be improved with some additional checks (roads intersecting buildings is good to key on, roads intersecting administrative boundaries is not).
Some additional checks we should be making are checks for duplicate addresses. This is not always an error, but it can be, especially when the address is located outside the building, and especially when there are two naked addresses (addresses without an associated POI).
For a short time, I used changemonger to monitor data quality in an area, but gave up on it due to its verbosity. With changewithin, I've thought of tying the tools together. Changemonger's classifications and labels are nice. The labels are what I was considering adding to changewithin, and I was thinking that changemonger could have a simplified mode where it didn't contact the API for additional data about nodes, or else it could use AugmentedDiffs for the same data.
Let's start the import slowly again and make sure we reach out to individuals in a friendly way where we find mistakes. We'll use JOSM for reviews.