Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual Street names in Instructions depending on the 'Locale' of a query #259

Open
stephanosch opened this issue Sep 10, 2014 · 15 comments

Comments

@stephanosch
Copy link

When asking for routing in Greek language, is it possible to use the Greek name of the roads within the OSM? In general, to allow multiple names depending on the 'locale'

For example in OSM we have:

<way id='71' visible='true' version='1'>
     …
    <tag k='name' v='Arodafnis' />
    <tag k='name:el' v='Αροδαφνης' />
  </way>
@karussell
Copy link
Member

Thanks! Here the context of the discussion on the mailing list

@karussell karussell changed the title Multilingual Street names in Instructions depending on the locale of query Multilingual Street names in Instructions depending on the 'Locale' of a query Oct 28, 2014
@devemux86
Copy link
Contributor

For now we can have multiple graphs in various languages.
Based on Peter's instructions they can be easily produced by changing the name tag in the following line:
https://github.com/graphhopper/graphhopper/blob/master/core/src/main/java/com/graphhopper/routing/util/EncodingManager.java#L441

@devemux86
Copy link
Contributor

Peter how do you think we should handle this?

In Mapsforge we have the parameter preferred-language which defines the language to use primarily.

Should we add a similar parameter here too?

@devemux86
Copy link
Contributor

I mean for start, during graph creation to let user select a language to use.

At a later stage we could think about multilingual graphs.

@karussell
Copy link
Member

I mean for start, during graph creation to let user select a language to use.

Yes, this should be very easy to do

At a later stage we could think about multilingual graphs.

Should be also doable when we introduce a new getName(locale) method but the storage will be a bit more complex. Maybe we let the user decide which languages to parse up front and then we create for every language a reference array as well as the nameindex, this would avoid that there are too many languages with only a few used values (e.g. when creating a new reference array for any new encountered language). We will see and should think about it.

@karussell
Copy link
Member

The preferred language is now changable - thanks @devemux86 - will keep it still open until multiple street names can be stored&retrieved.

@karussell karussell reopened this Sep 21, 2015
@devemux86
Copy link
Contributor

@karussell
In a next more advanced step, we could use a language regex pattern (see here), for matching the osm name.
That could work if we iterate though OSMElement tags (currently protected).

@karussell
Copy link
Member

Not sure, what you mean. When we support multiple languages for import and for retrieval we will probably never be able to support an arbitrary number of languages.

@devemux86
Copy link
Contributor

Right now we try to retrieve the preferred language exactly as proposed by the user.
We could extend the findings by ignoring case, parsing osm names with - or _, etc.
e.g. en, en-US, en_US, EN-us could be combined in the same name.

@karussell
Copy link
Member

oh, I didn't thought about these - thanks for pointing this out! (how frequent are these tags?)

And yes, that would be a useful&simple addition before the real multi language support.

@devemux86
Copy link
Contributor

We have a long discussion in Mapsforge about the language regex pattern [1] and multilingual names [2].

e.g. some findings for name:en can be seen here.

@devemux86
Copy link
Contributor

I propose to follow our Mapsforge implementation of multilingual Maps and POI.

  • We introduce a plural preferred-languages option accepting comma separated language codes (ISO 639-1 or ISO 639-2).
  • We store a concatenation of the found OSM edge names, using \r delimiter among names and \b delimiter between each language and name.
  • First comes the base name, then the localized ones (if different), e.g.: Base\ren\bEnglish\rjp\bJapan.
  • If preferred-languages is not specified, only the default language with no tag will be written to the file. If only one language is specified, it will be written if its tag is found, otherwise the default language will be written. If multiple comma separated languages are specified, the default language will be written, followed by the specified languages (if present and if different than the default).
  • Then in reader we split the stored edge name accordingly to the requested language with fall back mechanisms.

It's a solution we implemented in Mapsforge and well tested with OSM data.

@karussell
Copy link
Member

This sounds good to me. How would a new getName method look like? Should we introduce getName(String) or getName(Locale)? Or even allow for a more efficient storage and use getName(int/short) where we provide a localeToIndex method somewhere converting the Locale into a short or int value upfront. Of course at the moment no efficient storage formats should be implemented but just thinking about a okayish API for this.

Also I would want that users can merge e.g. de_DE and de_CH into one or keep those separated when explicitly specified.

@devemux86
Copy link
Contributor

How would a new getName method look like?

The parsing implementation can be seen here (we use String).

Also I would want that users can merge e.g. de_DE and de_CH into one or keep those separated when explicitly specified.

During the graph creation? That needs some thinking for the API, probably can be added afterwards.

@karussell
Copy link
Member

The parsing implementation can be seen here (we use String).

Ok. Still storing should be done in a consistent way so that one just needs to convert the input once. E.g. store all keys with toLowerCase and localeStr = localeStr.replace("-", "_") (de_de, en_us, ...) then we can just use equals and don't need those many if clauses.

Another possibility I can think of is a special GHLocaleClass which could hold a short value used for the comparison instead. Making storage slightly more compact and retrieval slightly faster / cleaner.

During the graph creation? That needs some thinking for the API, probably can be added afterwards.

Yes. Merging would happen on graph creation ... we need to ensure (e.g. via simple unit tests) that these language specific codes or languages with only three letter iso codes will work too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants