Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unexpected char for names in NSI #1830

Closed
Famlam opened this issue Apr 26, 2023 · 5 comments · Fixed by #1877
Closed

Allow unexpected char for names in NSI #1830

Famlam opened this issue Apr 26, 2023 · 5 comments · Fixed by #1877
Labels

Comments

@Famlam
Copy link
Collaborator

Famlam commented Apr 26, 2023

Plugin Name_Script (item 5070, class 50701) reports when a name contains an character that's not common in the charset of the country. For example, in the case of n5782442218, I get

"name"="Søstrene Grene" onverwachte karakter "ø" (LATIN SMALL LETTER O WITH STROKE, 0x00f8)

(I marked it as false positive)

This shop however is also in the NameSuggestionIndex with the same spelling.
It would be good to whitelist names that are also in the NSI. We already do this for uppercase names:

def _download_nsi(self):
nsi_url = "https://raw.githubusercontent.com/osmlab/name-suggestion-index/main/dist/nsi.json"
json_str = urlread(nsi_url, 30)
results = json.loads(json_str)
return results['nsi']
def _whitelist_from_nsi(self, nsi, nsiprefix, country):
whitelist = set()
for tag, details in nsi.items():
if tag.startswith(nsiprefix) and "items" in details:
for preset in details["items"]:
if "locationSet" in preset:
if ("include" in preset["locationSet"] and
country not in preset["locationSet"]["include"] and
"001" not in preset["locationSet"]["include"]):
continue
if "exclude" in preset["locationSet"] and country in preset["locationSet"]["exclude"]:
continue
if "name" in preset["tags"]:
for name in preset["tags"]["name"].split():
if self.UpperTitleCase.match(name) and not self.RomanNumber.match(name):
whitelist.add(name)
for name in preset["displayName"].split():
if self.UpperTitleCase.match(name) and not self.RomanNumber.match(name):
whitelist.add(name)
return whitelist

Since the code to do so would be (nearly) a duplicate, maybe it's an idea to move these NSI-parsing lines to a separate modules file, so that we can just call the same functions?

@frodrigo
Copy link
Member

Deployed.

@Famlam
Copy link
Collaborator Author

Famlam commented May 24, 2023

I suspect that updating the plugin version doesn't clear the cached external resource (here: the NSI), correct?
Otherwise something is wrong: Søstrene Grene does still appear https://osmose.openstreetmap.fr/nl/issue/bc938e46-cf3f-c2aa-6c86-13cbc4d448da

@frodrigo
Copy link
Member

Yes, cache and code update are not related.

@Famlam Famlam removed the ready label Jun 1, 2023
@Famlam
Copy link
Collaborator Author

Famlam commented Jun 1, 2023

Ok, found the bug. The whitelist code that I took from Name_Uppercase splits by spaces, but I compare by full name, so it only worked for names without spaces

@Famlam
Copy link
Collaborator Author

Famlam commented Jun 5, 2023

Now the issue is fixed properly, the error is gone ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants