Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert semi-colon separated house numbers to a range #1562

Merged
merged 7 commits into from
Sep 13, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion layers/housenumber/housenumber.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ SELECT
-- etldoc: osm_housenumber_point -> layer_housenumber:z14_
osm_id,
geometry,
housenumber
display_housenumber(housenumber)
FROM (
SELECT
osm_id,
Expand Down
4 changes: 3 additions & 1 deletion layers/housenumber/housenumber.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ layer:
buffer_size: 8
srs: +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0.0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs +over
fields:
housenumber: Value of the [`addr:housenumber`](http://wiki.openstreetmap.org/wiki/Key:addr) tag.
housenumber: Value of the [`addr:housenumber`](http://wiki.openstreetmap.org/wiki/Key:addr) tag.
If there are multiple values separated by semi-colons, the first and last value separated by a dash.
datasource:
geometry_field: geometry
srid: 900913
query: (SELECT geometry, housenumber FROM layer_housenumber(!bbox!, z(!scale_denominator!))) AS t
schema:
- ./housenumber_display.sql
- ./housenumber_centroid.sql
- ./housenumber.sql
datasources:
Expand Down
37 changes: 37 additions & 0 deletions layers/housenumber/housenumber_display.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
CREATE OR REPLACE FUNCTION display_housenumber_nonnumeric(raw_housenumber text)
RETURNS text AS $$
DECLARE
arr text[];
BEGIN
-- Convert the input string into an array of values
arr := string_to_array(raw_housenumber, ';')::text[];

-- Return the first and last value in the array
RETURN arr[1] || '–' || arr[array_length(arr, 1)];
END;
$$ LANGUAGE plpgsql;
ZeLonewolf marked this conversation as resolved.
Show resolved Hide resolved


CREATE OR REPLACE FUNCTION display_housenumber(raw_housenumber text)
RETURNS text AS $$
DECLARE
min_number int;
max_number int;
BEGIN
-- Check if the input string contains a semi-colon separator
Copy link

@1ec5 1ec5 Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s very common to tag addr:housenumber with a range, as an alternative to enumerating each number in the range. The most common range separator is a hyphen, although some features use to (surrounded by spaces) to avoid the ambiguity in #1558 (comment). It would make sense for the raw range separator to become an en dash too. If that sounds too easy, consider that a tag value might contain a range followed by a semicolon followed by another value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably not a perfect processing that will work in all cases - is there logic you would suggest to handle these cases?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re comfortable with ignoring Queens-style addresses, you could replace - with ; upfront before the rest of the routine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would produce very wrong results for Hawaiian addresses.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, does Hawaii use Queens-style addressing too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with Queens-style addressing, but in Hawaii (with the exception of the city of Honolulu), all addresses are prefixed with a 2-digit code indicating what sector of the island you're at. For example, in ʻAiea the code is 99, so you would have addresses in the form 99-12 <street name>. Conceptually, if anyone's tagged semi-colon separated addresses, you could have housenumbers that might look like 99-12;99-14. Treating the hyphen/dash as a delimiter would produce either 99-14 or 12-99, both of which would be horribly wrong. Further, on just a standard address of 99-12 Whatever Street, if we treated the dash as a delimiter and sorted it, you'd end up with 12-99 which would indicate a location on a different sector of the island. So hence my concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've read it sounds like yes, this is the same situation as Queens-style addressing, and I don't think it would be prudent to cause bad address numbers for an entire US state.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this routine avoids sorting if something looking like a range appears anywhere in the raw data? Either you could look for a hyphen and call it a day, or you could check whether the hyphen separates values in ascending order.

It’s probably worth starting a broader tagging discussion about resolving this ambiguity. I could see a case for eliminating ranges from the tagging scheme and always enumerating the middle values, as long as more renderers and geocoders introduce behavior like what you’ve implemented. However, the well-established Karlsruhe schema specifically allows for ranges. There are probably also edge cases to consider: what happens if a building is signposted with its start and end numbers but nothing in between? Ironically, you’d have to tag it as not a range in order for it to look like a range in OpenMapTiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of inserting yet more en_US localization, perhaps the logic can be "if the string contains a dash, convert all semi-colons to commas", which would better serve the Queens/Hawaii problem.

Example: https://www.openstreetmap.org/node/2882804955

IF raw_housenumber !~ ';' THEN
RETURN raw_housenumber;
END IF;

IF raw_housenumber ~ '[^0-9;]' THEN
RETURN display_housenumber_nonnumeric(raw_housenumber);
END IF;

-- Find the minimum and maximum numbers in the list
SELECT MIN(value), MAX(value) INTO min_number, max_number
FROM unnest(string_to_array(raw_housenumber, ';')::int[]) AS value;

-- Return the consolidated range string
RETURN min_number::text || '–' || max_number::text;
END;
$$ LANGUAGE plpgsql;
ZeLonewolf marked this conversation as resolved.
Show resolved Hide resolved