New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert semi-colon separated house numbers to a range #1562
Convert semi-colon separated house numbers to a range #1562
Conversation
a709f0d
to
74b0a52
Compare
min_number int; | ||
max_number int; | ||
BEGIN | ||
-- Check if the input string contains a semi-colon separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s very common to tag addr:housenumber
with a range, as an alternative to enumerating each number in the range. The most common range separator is a hyphen, although some features use to
(surrounded by spaces) to avoid the ambiguity in #1558 (comment). It would make sense for the raw range separator to become an en dash too. If that sounds too easy, consider that a tag value might contain a range followed by a semicolon followed by another value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably not a perfect processing that will work in all cases - is there logic you would suggest to handle these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you’re comfortable with ignoring Queens-style addresses, you could replace -
with ;
upfront before the rest of the routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would produce very wrong results for Hawaiian addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, does Hawaii use Queens-style addressing too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with Queens-style addressing, but in Hawaii (with the exception of the city of Honolulu), all addresses are prefixed with a 2-digit code indicating what sector of the island you're at. For example, in ʻAiea the code is 99, so you would have addresses in the form 99-12 <street name>
. Conceptually, if anyone's tagged semi-colon separated addresses, you could have housenumbers that might look like 99-12;99-14
. Treating the hyphen/dash as a delimiter would produce either 99-14
or 12-99
, both of which would be horribly wrong. Further, on just a standard address of 99-12 Whatever Street
, if we treated the dash as a delimiter and sorted it, you'd end up with 12-99
which would indicate a location on a different sector of the island. So hence my concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've read it sounds like yes, this is the same situation as Queens-style addressing, and I don't think it would be prudent to cause bad address numbers for an entire US state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this routine avoids sorting if something looking like a range appears anywhere in the raw data? Either you could look for a hyphen and call it a day, or you could check whether the hyphen separates values in ascending order.
It’s probably worth starting a broader tagging discussion about resolving this ambiguity. I could see a case for eliminating ranges from the tagging scheme and always enumerating the middle values, as long as more renderers and geocoders introduce behavior like what you’ve implemented. However, the well-established Karlsruhe schema specifically allows for ranges. There are probably also edge cases to consider: what happens if a building is signposted with its start and end numbers but nothing in between? Ironically, you’d have to tag it as not a range in order for it to look like a range in OpenMapTiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the risk of inserting yet more en_US localization, perhaps the logic can be "if the string contains a dash, convert all semi-colons to commas", which would better serve the Queens/Hawaii problem.
Results evaluating commit 07297b0 (merged with base 48a2b1a as 6c12cd2). See run details. PostgreSQL DB size in MB: 4934 ⇒ 4934 (0.0% change)
expand for details...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plpgsql functions take significantly longer to be called than SQL, particularly if the SQL gets inlined then there is no cost to calling the function. Both functions should be declared as IMMUTABLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL looks good - no opinion on the overall effect.
Thanks a lot! It looks good. Is it ready to be merged? |
Yes, ready here👍 |
…ptiles#1562)" This reverts commit a7a50d8.
) This PR collapses housenumber values into the form min(housenumber)-max(housenumber) for cases where housenumber is a semi-colon separated list. If the list is all numbers, the bounds are the smallest and largest numbers. If the list includes non-numeric characters, it falls back to the first and last values in the list.
…ptiles#1562)" This reverts commit a7a50d8.
…ptiles#1562)" This reverts commit a7a50d8.
Fixes #1558
This PR collapses housenumber values into the form min(housenumber)-max(housenumber) for cases where housenumber is a semi-colon separated list. If the list is all numbers, the bounds are the smallest and largest numbers. If the list includes non-numeric characters, it falls back to the first and last values in the list.
Location (localhost link):
http://localhost:8080/data/v3/#19/41.7278712/-72.2072315