Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibile wrong initial import for postcodes #3180

Closed
emandtf opened this issue Aug 29, 2023 · 6 comments
Closed

Possibile wrong initial import for postcodes #3180

emandtf opened this issue Aug 29, 2023 · 6 comments

Comments

@emandtf
Copy link

emandtf commented Aug 29, 2023

Describe the bug
I'm using latest Nominatim 4.2.3 and choosed to import all Italy database from official OSM repository using even Wiki importance data.
I discovered that "postcode" field of "placex" table contains a lot of wrong data.

To Reproduce

  • I'm starting analyzing "osm_id = 76501249" which is an highway here in Italy. Its (in my DB) "parent_place_id" is 543642
  • Now I'm getting its parent by doing "... WHERE place_id = 543642" from same table and going up until "parent_place_id" is Zero. This means I gone up all the Street Address. All is perfect and I get right Street + Hamlet + City + Region in just few records.
  • Then I check their "placex.postcode" fields and I found that only two has some data but they're wrong!

All osm_id in ordered sequence (from Street up to Region) are 76501249, 2282532051, 41833, 167044, 53937:

  1. https://www.openstreetmap.org/way/76501249
  2. https://www.openstreetmap.org/node/2282532051
  3. https://www.openstreetmap.org/relation/41833
  4. https://www.openstreetmap.org/relation/167044
  5. https://www.openstreetmap.org/relation/53937

As you can see by clicking on those links, no anyone of them has a "postcode" tag attached nor anything about it but at the first two of them a "67029" postcode is associated, but it's wrong! The right one should be 67020.
You can see ordered "lookup" of those 5 osm_ids in following/attached image and even the WRONG "postcode" field at the most right.
20230829_nominatim_postcode_bug

Wikidata of "Q47070" (which is present in "extratags" field: https://www.wikidata.org/wiki/Q47070) has the correct postalcode, so the data should be taken from somewhere else....but I don't figure out from where.

So I suppose that the Import procedure is somehow bugged about the postcode part.

Software Environment (please complete the following information):

  • Nominatim version: 4.2.3
  • Postgresql version: 14.8
  • Postgis version: 3.4
  • OS: Debian 12

Hardware Configuration (please complete the following information):

  • RAM: 32GB
  • number of CPUs: 16
  • type and size of disks: hundred TBs on multiple SSD
@mtmail
Copy link
Collaborator

mtmail commented Aug 29, 2023

On the nominatim.openstreetmap.org servers, which run a version newer than 4.2.3, I see postcode 67029. Note the 'how?' help link that tries to explain how postcodes are calculated. https://nominatim.openstreetmap.org/ui/details.html?osmtype=W&osmid=76501249

Looking inside the tables can help but there's easier ways to see the address hierarchy of a place:

  • On command-line (plus jq)
cd $your-project-directory
nominatim details --way 76501249 --addressdetails | jq '.'
  • Querying $your server/details.php?osmtype=W&osmid=76501249&addressdetails=1&format=json

  • You can also install the debug interface nominatim-ui and configure it to connect to your server.

@mtmail
Copy link
Collaborator

mtmail commented Aug 29, 2023

OpenStreetMap data contains only few postcodes in that area, none for Acciano for example. Nominatim has to guess what the postcode of that road is.

https://overpass-turbo.eu/s/1zx3
image

@emandtf
Copy link
Author

emandtf commented Aug 29, 2023

/details.php?osmtype=W&osmid=76501249&addressdetails=1&format=json

Yes, in your url it's written "calculated_postcode: 67029" because I suppose "placex.postcode" field is filled by Nominatim during the Import even if no any record has its own "postcode" in extraflag or address JSON, and /details URL reads from there which is effectly a calculated one.

@emandtf
Copy link
Author

emandtf commented Aug 29, 2023

Nominatim has to guess what the postcode of that road is.

Thank you for the explanation.
I just was starting thinking that after got this issue.

So it's not possible to rely on any postcode from Nominatim.
Is it possibile to disable the "guessing" procedure during Import?
I prefer to not have postcode at all if Nominatim should guess it because in this way any Search using Nominatim Engine or by trying to extract data using Queries could get me wrong results very often.

Doing a specific Query on Nominatim DB shows that 90% of italian Hamlets have associated more than one postcode which is not possibile (only few of them in very big cities of whole 8000 available hamlets), so extracting an "Hamlet - Postcode" association data it's not possibile due to wrong data.

PS: what about using Wikidata postcode when available? It could cover most of this issues ;)

@mtmail
Copy link
Collaborator

mtmail commented Aug 29, 2023

That logic is deep inside the import logic and can't be disabled. The "raw" postcode data is in the placex.address columns. So for example for https://www.openstreetmap.org/way/845367216 you'll see "city"=>"Civitaretenga", "street"=>"Via Risorgimento", "postcode"=>"67020"

In your database you could delete all data in placex.postcode and then fill it again from the placex.address column, about 2.3 million places in Italy have a postcode attached.

so extracting an "Hamlet - Postcode" association data it's not possibile

Postcodes are for addresses (houses), trying to find one postcode for a village or city is often not precise. That's not how the postal companies usually work, they assign their own boundaries based on how their couriers deliver the mail best. It doesn't direct map to the political or administrative boundaries.

Geocoders try to find a balance between precise data (one postcode attached to an address or building) vs calculated areas (e.g. https://en.wikipedia.org/wiki/Voronoi_diagram but Nominatim doesn't do that). Only few countries have open postal code boundaries (Germany is an example https://www.openstreetmap.org/relation/3359835). Nominatim is slowly improving https://nominatim.org/2022/06/26/state-of-postcodes.html but we still deal with incomplete data.

You might have to use government data or licensed data from the Italian postal code company. https://www.digiatlas.com/mapas/ang/italy-zip-codes-map-with-demographic-data.html

@emandtf
Copy link
Author

emandtf commented Aug 29, 2023

You might have to use government data or licensed data from the Italian postal code company

I'ts an Open Data, so it could be freely used but it should be formatted in some specific way (which I don't know) and/or be manipulated/processed by some software to be used/written in the right place in Nominatim DB.....and it could be not so simple.

However thank you again for all of your info.
I much appreciated it.

@emandtf emandtf closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants