New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual vector tiles #211

Closed
klokan opened this Issue Apr 7, 2017 · 15 comments

Comments

Projects
None yet
6 participants
@klokan
Member

klokan commented Apr 7, 2017

We want to add support for several languages to have them directly included in the vector tiles.

Priority is the most used languages (maybe 30 or 40?) according to https://taginfo.openstreetmap.org/search?q=name%3A and especially all European languages: http://publications.europa.eu/code/en/en-5000800.htm in this phase. Later (next release) we can add more.

We will use the fact that if a key has a value set to null it is not added into the vector tiles. The attributes for each language are therefore going to be present in a vector tile only if there are place names in OSM.

Fallback and definition of order of preference for label language must be implemented in the GL style. This should be possible with the Existential filters: https://www.mapbox.com/mapbox-gl-js/style-spec/#types-filter - but we must test it.

With this approach the tiles will by default contain all languages - and will be directly usable. No need to have a new logic on the server side (like the required JSON decoding in WikiMedia maps).

If people are interested in minimizing the vector tiles for their maps or use on mobile apps, we have developed the TileShrink utility - which removes from vector tiles all layers, features and attributtes which are not used by a given JSON GL style at different zoom levels.
This utility will run very well on the proposed language attributtes too.

Let's implement a first sample language set this way - modify a GL style - to verify the whole proposed workflow here is correct.

@klokan klokan added this to the v3.6 milestone Apr 7, 2017

@klokan klokan changed the title from Multi-language support to Multilingual vector tiles Apr 7, 2017

@klokan klokan added the featured label Apr 7, 2017

@klokan klokan unassigned jirik Apr 7, 2017

@go2coffee

This comment has been minimized.

Show comment
Hide comment
@go2coffee

go2coffee Apr 14, 2017

Hi, is Chinese considered to be included in V3.6 as well? If not, is there ETA for us as a reference?

Hi, is Chinese considered to be included in V3.6 as well? If not, is there ETA for us as a reference?

@jirik

This comment has been minimized.

Show comment
Hide comment
@jirik

jirik Apr 18, 2017

Collaborator

@go2coffee We would like to, but it is not sure now. I will start to work on this issue next week. If you are willing to cover some costs of implementing Chinese, please contact us at info@klokantech.com

Collaborator

jirik commented Apr 18, 2017

@go2coffee We would like to, but it is not sure now. I will start to work on this issue next week. If you are willing to cover some costs of implementing Chinese, please contact us at info@klokantech.com

@jirik

This comment has been minimized.

Show comment
Hide comment
@lukasmartinelli

This comment has been minimized.

Show comment
Hide comment
@lukasmartinelli

lukasmartinelli Apr 25, 2017

Collaborator

Priority is the most used languages (maybe 30 or 40?) according to https://taginfo.openstreetmap.org/search?q=name%3A and especially all European languages: http://publications.europa.eu/code/en/en-5000800.htm in this phase. Later (next release) we can add more.

If people are interested in minimizing the vector tiles for their maps or use on mobile apps, we have developed the TileShrink utility - which removes from vector tiles all layers, features and attributtes which are not used by a given JSON GL style at give zoom levels.

This needs to be combined with some shrinking/shaving utility.
30 fields in the vector tiles containing languages will make the vector tiles for places explode.

btw. website seems to be down https://openmaptiles.com/tileshrink/

Collaborator

lukasmartinelli commented Apr 25, 2017

Priority is the most used languages (maybe 30 or 40?) according to https://taginfo.openstreetmap.org/search?q=name%3A and especially all European languages: http://publications.europa.eu/code/en/en-5000800.htm in this phase. Later (next release) we can add more.

If people are interested in minimizing the vector tiles for their maps or use on mobile apps, we have developed the TileShrink utility - which removes from vector tiles all layers, features and attributtes which are not used by a given JSON GL style at give zoom levels.

This needs to be combined with some shrinking/shaving utility.
30 fields in the vector tiles containing languages will make the vector tiles for places explode.

btw. website seems to be down https://openmaptiles.com/tileshrink/

@stirringhalo

This comment has been minimized.

Show comment
Hide comment
@stirringhalo

stirringhalo Apr 25, 2017

Collaborator

@jirik Looks amazing! I wouldn't worry about the tileshrink utility yet. Check tile sizes with https://github.com/stirringhalo/tile_size to see how bad the situation is.

Collaborator

stirringhalo commented Apr 25, 2017

@jirik Looks amazing! I wouldn't worry about the tileshrink utility yet. Check tile sizes with https://github.com/stirringhalo/tile_size to see how bad the situation is.

@jirik

This comment has been minimized.

Show comment
Hide comment
@jirik

jirik May 11, 2017

Collaborator

First tests with about 50 languages show about 2% increase of .mbtiles size that seems acceptable (Belgium Z0-12).

Example of both latin and nonlatin names in one map. First row is name:cs if available, otherwise name:latin. Second row is name:nonlatin.

screenshot_20170511_130116

Collaborator

jirik commented May 11, 2017

First tests with about 50 languages show about 2% increase of .mbtiles size that seems acceptable (Belgium Z0-12).

Example of both latin and nonlatin names in one map. First row is name:cs if available, otherwise name:latin. Second row is name:nonlatin.

screenshot_20170511_130116

@sfkeller

This comment has been minimized.

Show comment
Hide comment
@sfkeller

sfkeller May 11, 2017

Collaborator

I'm looking at your nice work from a OSMNames and a more generic perspective. At least in OSMNames we're going to add translations from Wikidata sooner or later. In general I'd expect a higher increase.

BTW slightly related to this issue: Are you aware that the buffer for labels/names outside tiles seems to be more than the size of a tile (buffer =~ 1.5 * tile_width )?

Collaborator

sfkeller commented May 11, 2017

I'm looking at your nice work from a OSMNames and a more generic perspective. At least in OSMNames we're going to add translations from Wikidata sooner or later. In general I'd expect a higher increase.

BTW slightly related to this issue: Are you aware that the buffer for labels/names outside tiles seems to be more than the size of a tile (buffer =~ 1.5 * tile_width )?

@lukasmartinelli

This comment has been minimized.

Show comment
Hide comment
@lukasmartinelli

lukasmartinelli May 11, 2017

Collaborator

First tests with about 50 languages show about 2% increase of .mbtiles size that seems acceptable (Belgium Z0-12).

It depends whether you actually have 50 distinct values for these translations or whether all features have the same latin/nonlatin fallback value.

Collaborator

lukasmartinelli commented May 11, 2017

First tests with about 50 languages show about 2% increase of .mbtiles size that seems acceptable (Belgium Z0-12).

It depends whether you actually have 50 distinct values for these translations or whether all features have the same latin/nonlatin fallback value.

@lukasmartinelli

This comment has been minimized.

Show comment
Hide comment
@lukasmartinelli

lukasmartinelli May 12, 2017

Collaborator

BTW slightly related to this issue: Are you aware that the buffer for labels/names outside tiles seems to be more than the size of a tile (buffer =~ 1.5 * tile_width )?

Place labels need a very large buffer since they are most prone of cut off labels. We have tried less/more before. 128px seems to be a good balance.

Collaborator

lukasmartinelli commented May 12, 2017

BTW slightly related to this issue: Are you aware that the buffer for labels/names outside tiles seems to be more than the size of a tile (buffer =~ 1.5 * tile_width )?

Place labels need a very large buffer since they are most prone of cut off labels. We have tried less/more before. 128px seems to be a good balance.

@sfkeller

This comment has been minimized.

Show comment
Hide comment
@sfkeller

sfkeller May 12, 2017

Collaborator

omt_pfaffikon_z12_5_m100000
At last place has been increased to 256 https://github.com/openmaptiles/openmaptiles/blob/master/layers/place/place.yaml . Looking at the labels the buffer seems like 384...

Collaborator

sfkeller commented May 12, 2017

omt_pfaffikon_z12_5_m100000
At last place has been increased to 256 https://github.com/openmaptiles/openmaptiles/blob/master/layers/place/place.yaml . Looking at the labels the buffer seems like 384...

@jirik

This comment has been minimized.

Show comment
Hide comment
@jirik

jirik May 16, 2017

Collaborator

We increased the buffers during v3.5 rendering to minimize issues with cut off labels at raster tiles' borders. I do not remember numbers, but it did not increased size of final MBTiles dramatically.

Most places do not have 50 distinct values for name. Furthermore, fallback for language attributes like name:cs is in style, not in .mbtiles. Meaning if there is no name:cs tag, then there is no name:cs attribute in mbtile. I will add documentation when this is finalized.

@sfkeller Actually adding translations from Wikidata sounds really good. It will mean bigger size of mbtiles - question is, how big will be the difference. Do you have an idea when this is going to happen?

Collaborator

jirik commented May 16, 2017

We increased the buffers during v3.5 rendering to minimize issues with cut off labels at raster tiles' borders. I do not remember numbers, but it did not increased size of final MBTiles dramatically.

Most places do not have 50 distinct values for name. Furthermore, fallback for language attributes like name:cs is in style, not in .mbtiles. Meaning if there is no name:cs tag, then there is no name:cs attribute in mbtile. I will add documentation when this is finalized.

@sfkeller Actually adding translations from Wikidata sounds really good. It will mean bigger size of mbtiles - question is, how big will be the difference. Do you have an idea when this is going to happen?

@sfkeller

This comment has been minimized.

Show comment
Hide comment
@sfkeller

sfkeller May 16, 2017

Collaborator

Do you have an idea when this is going to happen?

Joining Wikidata translations to OSMNames? I don't know. I don't expect it this year anymore.

Collaborator

sfkeller commented May 16, 2017

Do you have an idea when this is going to happen?

Joining Wikidata translations to OSMNames? I don't know. I don't expect it this year anymore.

@jirik

This comment has been minimized.

Show comment
Hide comment
@jirik

jirik May 16, 2017

Collaborator

Some notes about languages. This change introduces

  • names in 57 languages covering Europe languages + major world languages
  • name:latin and name:nonlatin attributes
  • name_int attribute

Language attributes come directly from name:<lg> OSM tag (for example name:cs or name:en). For these languages there is no fallback in the data for name in another language even if available. In other words, if there is no name:cs tag, there is no name:cs attribute. However you can create fallback to another name by GL style.

Attribute name:latin comes from name:en, int_name, or name if it contains at least one [a-z] character. Attribute name:nonlatin comes from name if it does not contain any [a-z] character.

Attribute name_int comes from int_name, name:en, or name.

Attribute int_name is considered to be filled in most cases. On the other hand, 57 language attributes + name:latin and name:nonlatin are expected to be missing in many cases, so if you use them, you should create appropriate fallback in GL style. As a fallback values you can consider name, name_int, or name_en (do not confuse with name:en).

Collaborator

jirik commented May 16, 2017

Some notes about languages. This change introduces

  • names in 57 languages covering Europe languages + major world languages
  • name:latin and name:nonlatin attributes
  • name_int attribute

Language attributes come directly from name:<lg> OSM tag (for example name:cs or name:en). For these languages there is no fallback in the data for name in another language even if available. In other words, if there is no name:cs tag, there is no name:cs attribute. However you can create fallback to another name by GL style.

Attribute name:latin comes from name:en, int_name, or name if it contains at least one [a-z] character. Attribute name:nonlatin comes from name if it does not contain any [a-z] character.

Attribute name_int comes from int_name, name:en, or name.

Attribute int_name is considered to be filled in most cases. On the other hand, 57 language attributes + name:latin and name:nonlatin are expected to be missing in many cases, so if you use them, you should create appropriate fallback in GL style. As a fallback values you can consider name, name_int, or name_en (do not confuse with name:en).

@lukasmartinelli

This comment has been minimized.

Show comment
Hide comment
@lukasmartinelli

lukasmartinelli May 16, 2017

Collaborator

Language attributes come directly from name: OSM tag (for example name:cs or name:en). For these languages there is no fallback in the data for name in another language even if available. In other words, if there is no name:cs tag, there is no name:cs attribute. However you can create fallback to another name by GL style.

Just saying - for getting good enough  translation coverage you really want the labels from wikidata since OSM is much more hesitant to adding translations if there is no ground truth evidence for it.

Collaborator

lukasmartinelli commented May 16, 2017

Language attributes come directly from name: OSM tag (for example name:cs or name:en). For these languages there is no fallback in the data for name in another language even if available. In other words, if there is no name:cs tag, there is no name:cs attribute. However you can create fallback to another name by GL style.

Just saying - for getting good enough  translation coverage you really want the labels from wikidata since OSM is much more hesitant to adding translations if there is no ground truth evidence for it.

@jirik

This comment has been minimized.

Show comment
Hide comment
@jirik

jirik May 16, 2017

Collaborator

@sfkeller @lukasmartinelli Thanks for mentioning Wikidata. Just created separate ticket: #251

Collaborator

jirik commented May 16, 2017

@sfkeller @lukasmartinelli Thanks for mentioning Wikidata. Just created separate ticket: #251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment