Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local country code lookup #6941

Closed
quincylvania opened this issue Oct 15, 2019 · 6 comments
Closed

Local country code lookup #6941

quincylvania opened this issue Oct 15, 2019 · 6 comments
Assignees
Labels
chore Improvements to the iD development experience or codebase
Milestone

Comments

@quincylvania
Copy link
Collaborator

iD is and will be relying more and more on location-aware behavior. See #6513, #6479, #6712, #6836, and #6713, for example.

Right now we call out to nominatim every time we need the country code for a pair of coordinates. It'd be much more efficient and reliable to do this synchronously by querying a data file bundled with iD.

The trade-off here is that the size of the file would not be trivial. But since iD doesn't require high-precision results, we could generalize the data considerably

See previous discussion on this topic in the OSMUS Slack.

@quincylvania quincylvania added the chore Improvements to the iD development experience or codebase label Oct 15, 2019
@1ec5
Copy link
Collaborator

1ec5 commented Oct 16, 2019

In addition to size, we should also keep an eye on runtime performance, considering that a single changeset can straddle a national border or jump around to different parts of the world. For example, which-polygon is very efficient for point-in-polygon lookups, but its memory usage is very sensitive to the complexity of the country polygons, and so might the time it takes to do the lookup.

The discussion in Slack points to Natural Earth as a possible source for the country geometries, but I don’t think we should use it as-is. For the features listed above, iD needs relatively high resolution along land borders but very low resolution along coastlines. For example, it’d be a good idea for the local lookup to unify Canada into a single polygon that includes all its islands. However, all of Detroit needs to be on the American side of the border and all of Mexicali on the Mexican side, with a tolerance of tens of meters perhaps, but not kilometers. A simple Douglas–Peucker simplification of the entire shapefile would result in the wrong address format and wrong language being preferred in neighborhoods on either side of the border.

Geofabrik’s data extract polygons are a good example of generalizing coastlines while retaining detail in land boundaries.

@quincylvania
Copy link
Collaborator Author

@1ec5 I totally agree. Thankfully file size and point-in-polygon performance correlate, so we can optimize for both. The raw Natural Earth dataset is much too detailed for this use case, even at 110m resolution. Coastline generalization should be a primary strategy, where islands like Iceland can be represented as simple rectangles or even triangles. For our purposes we don't need to know if a point is on land or not.

I was also thinking this would make for a good external module that other apps could also use.

@bhousel
Copy link
Member

bhousel commented Oct 16, 2019

This is a great idea, and definitely something that's been on our radar for a while, and I'd use in a bunch of projects.

The closest thing we have right now is in the osm-community-index, which includes a bunch of country-level polygons, but also a bunch of other smaller ones. You can browse the osm-community-index data here on this nice map that @mikelmaron made: https://mikelmaron.github.io/map-demos/osm-community-index/

The polygon data by itself comes out to 238k minified. We are already using which-polygon in iD to index this data and also the editor-layer-index polygons. This approach is very fast because it precalculates bounding boxes and stores them in an rbush, so its only really doing the point-in-polygon tests for the polygons with bounds that actually intersect the point.

There are obviously some seams and places where we could improve a bunch on this. Part of the issue is that each geojson has been added independently by different contributors. Using an editor like iD but that's specifically built for generating a boundary mesh would be nice because then we could snap points together.

A handful of countries make the index much larger because of their complex borders. This is not intuitive (yes, Russia and France both have about equally complicated borders, Canada and US are less than half as complex). I tend to simplify a lot in sparsely populated areas. Not all of these have been hand-edited, so there is a lot of room for improvement.

There is also a stats command so I can keep track of the polygon sizes:

Screenshot 2019-10-16 09 45 34

So.. My approach to doing this right would be:

  1. make an iD fork that is specifically for editing GeoJSON.
  2. use that to edit and refine the country mesh.

I'm working slowly towards laying the foundation that would let us do 1.

@don-vip
Copy link

don-vip commented Oct 16, 2019

You can also reuse the JOSM boundaries file: https://josm.openstreetmap.de/export/HEAD/josm/trunk/data/boundaries.osm (1.8Mb in .osm format, 5.4Mb in geojson format). It contains all countries, plus subdivisions for US, Canada, India and China:
image
See https://josm.openstreetmap.de/log/josm/trunk/data/boundaries.osm for the list of fixed issues since I introduced it 3 years ago.

@quincylvania
Copy link
Collaborator Author

@don-vip Thanks so much for the link! That's a great help, I think we'll be able to use it as a starting point.

🏎💨

@quincylvania quincylvania self-assigned this Oct 22, 2019
@quincylvania
Copy link
Collaborator Author

Update: I've been working on this for the past week or so. Check out the package repo: https://github.com/ideditor/country-coder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Improvements to the iD development experience or codebase
Projects
None yet
Development

No branches or pull requests

4 participants