-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Support language dictionary-based line breaking #7362
Comments
It sounds like, in the absence of something like mapbox/DEPRECATED-mapbox-gl#21, you’re looking for a universal line breaking facility that is nonetheless locale-aware. Unfortunately, line breaking, like hyphenation, varies by language rather than by script. |
Apparently |
It occurs to me we could also approach this on the data side instead of the client side. We could run a line breaking dictionary against all of our labels and insert zero-width spaces (or maybe some other code point, we'd have to make sure not to mess up shaping) at all potential line breaks. For ideographic text, we could still allow the current breaking behavior, but we'd give it a slight penalty so as to favor word-aligned breaks. We'd have to give a tool to customers providing their own data to allow them to insert the same potential-break metadata... but on the other hand, the solution would be portable across gl-js and gl-native without requiring any dictionary downloads. |
@ChrisLoer A change that involves encoding the labels like this should be part of the mapbox vector tile spec. For e.g. geometry commands are bitwise encoded, coordinates are zig zag encoded. But all this is clearly documented in the vector tile spec (and also implemented in mapnik vector tile writer amongst other writers available on github). |
The soft hyphen was designed for this purpose, and in fact it’s fairly common in Thai and Khmer text. We implemented support for breaking at soft hyphens in #2598. I think it would be perfectly reasonable for a source such as Mapbox Streets to insert soft hyphens into names written in non-space-delimited languages. This would lessen the need for a “thin” client like GL JS to bundle a Thai or Khmer word list. I would suspect (but am uncertain) that separating multisyllabic and compound words using soft hyphens would also be acceptable in Japanese and Chinese text, respectively. On the other hand, we’ll have to decide whether it makes sense to consider the soft hyphens when filtering features or returning feature querying results. mapbox/mapbox-gl-style-spec#548 could allow the style author to choose explicitly.
The vector tile specification shouldn’t need to concern itself with the use of soft hyphens. After all, it doesn’t even specify a character set. |
Actually, a soft hyphen isn’t quite what we want, since it’s rendered as a hyphen when taken into account. ZWSP sounds good in that case. |
This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions. |
We don't currently support line breaking at all for languages like Khmer, Lao, or Thai that don't use spaces. We support line breaking for the CJK languages by breaking in between characters whenever we need to make a break, but this is sub-optimal if it breaks in the middle of a word made out of multiple characters.
We can do better by including line breaking dictionaries that tell us where we can best break apart words in these languages. The challenge is that these dictionaries are large (the ICU line breaking dictionaries take up a few megabytes), so we really don't want to pull them in as a dependency.
With gl-native, however, I think we can avoid pulling in the dependency by using the line breaking support built into our host platforms. For Android, we can use BreakIterator. For iOS, we can use NSAttributedString::lineBreak. For Qt, we can use QTextBoundaryFinder.
cc @1ec5 @nickidlugash @ansis
The text was updated successfully, but these errors were encountered: