-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base font glyph + uncommon glyph system #17
Comments
That's what I thought too initially. Upon further examination, this turns out to not be such a good idea, in particular for asian languages: Mandarin has around 20k glyphs, and a typical tile uses ~50-200 of them. They seem to be mostly random; there is no concentration on a few unicode ranges. So this means that we'd have to download most of the font anyway (I measured ~5 MB to download all required font ranges for Mandarin) |
(╯°□°)╯︵ ┻━┻ |
Next actions for meI'd like to get a better feel for the requirements here. Next actions for me are to set up a demo that "simulates" download requirements by inspecting the loaded vector tiles as you pan around a map. Idea is to make glyph block sharding variable and see if there is a sweet spot in terms of shard size that optimizes no. of requests + amount of data to download. |
Rough cost of each glyph
Basic idea is to put a rough heuristic cost on each glyph on the deflated PBF.
This includes not just the texture cost but the overhead of metadata in the PBF for glyph position, etc. Samples broader range of glyphs -- tiles are from dense areas of SF, tokyo, shanghai, tel aviv and berlin. The Using 524 as avg bytesize per glyph, ballpark sizes for diff glyph tile shard sizes:
Next upI'll be using these numbers to play around with simulated glyphtile DL scenarios while panning around diff parts of the world. cc @kkaefer let me know if I'm way off here. |
@kkaefer based on a totally rough count (20x20 glyphs, say, === 215 bytes per glyph) from https://f.cloud.github.com/assets/52399/1181283/e12a5092-2205-11e3-9ddf-e4fc20b22b39.png, it seems like the cost above is pretty large. This would mean to me that either the png compression is beating deflate a lot, the overhead of the glyph position encoding is more than I would guess, or both. What is the reason this information is encoded witih each feature string btw? It seems like it should be possible to look this up on the fly (but maybe I don't understand all the mechanics here): https://github.com/mapbox/fontserver/blob/master/test/expected/shape.json#L703-L708 |
I think you're in a good spot with estimates. In my experience, tiles were about 30-40% larger with glyph images than without. PNG compression is zlib, but it employs additional tricks like filtering to get better run lengths and repetition. The reason the glyph positions are included for every string rather than using the glyph advance is that for complex scripts that perform glyph substitutions, there is no 1:1 mapping from a character in the unicode string to a glyph. While for latin, cyrillic and most east asian scripts, there usually /is/ a 1:1 mapping, scripts like arabic don't: ا and ل in a unicode sequence produce ﻻ when shaped, which is just one glyph (cf. http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=CmplxRndExamples for more background). After we get the shaping results, we could look and employ the automatic glyph-advance-based shaper to see if this produces the same result. If it does, we could drop the shaping information and use the same algorithm on the client. |
Oh, another issue that shaping is handling is bidirectional text: In RTL languages that use arabic numbers (like hebrew, arabic) you have text runs that go from right-to-left, then the numbers are written left-to-right, then the text continues right-to-left. Mapnik has code that handles this, Pango handles this in the shaping as well. |
Request overheadInitial impressions/numbers from playing around with a hacked llmr to estimate glyph tile req count + dl size is not good. More than anything else, viewing anywhere outside the US leads very rapidly to a huge request count overhead in addition to the tiles being downloaded (e.g. think 20+ additional GETs for an initial map load.) Per @kkaefer's original notes, the way glyphs are used just doesn't lend itself well to good sharding/clustering. Next actions
|
More notes,
No huge savings to be had here.
Reversing this thought. Assuming we provide base glyphs for
The character sets that will benefit from this most are Conclusion: basically I did a round trip to what @kkaefer already recommended : ) Next actions: Start with just |
After spending a while getting myself oriented to the codebase and Node C++ addons in general, I managed to get an initial sketch of a base glyph pattern hacked into place. This is nowhere near being clean or good code, just a place to start from. What should next steps be in terms of testing versus optimization? Tracking in: |
Initial observations:
|
Copying even a small 0-ff glyph library into each web worker looks to be adding significant overhead, increasing the main thread to worker callback delay from single digit milliseconds to often over 100ms and in some cases over 300ms.
The stripped down version wasn't actually writing glyphs to the copied object. Doh. Next steps:
|
@mikemorris We don't need to copy the glyph images to the workers, just the position information. That should be pretty fast. |
@kkaefer Alright, main hangup was that |
Tested packing |
Okay, so I was using an ASCII hack to build the base 0-FF base glyph tile, now I'm running into all sorts of confusion trying to expand the base glyph to a larger Unicode range - any suggestions? |
https://github.com/mapbox/fontserver/blob/baseglyph/src/tile.cpp#L729-L732 looks very fishy to me. What is it supposed to do? |
@kkaefer It's a super hacky way of iterating over a range of characters, that really only works for 0-128 ASCII. What I'm trying to do is iterate over a Unicode character range (say 0000-06FF for Latin, Cyrillic and Arabic) to build a base glyph set for each font in the stack. Haven't quite figured out how to get a reference to the actual PangoFont object, as it looks like the regular tiles are building the font list dynamically. |
Not quite sure what's causing this issue, but it looks like FreeType is returning an error when attempting to load many glyphs in the Greek+ Unicode ranges in Open Sans. The puzzling part is that the
FreeType Error Codes: http://www.freetype.org/freetype1/docs/api/freetype1.txt |
|
Why are all these extra fonts includes in PangoFontset?
|
I think the eventual goal should be to move Short term goal is to fix base glyph loading in |
To be clear: The browsers may use Fribidi but Mapnik does not: we moved from fribidi to icu for bidi in 2008: http://mapnik.org/news/2008/02/20/mapnik_unicode/ |
Ah, because of the above you retained ICU bidi even after switching to HarfBuzz @springmeyer? |
We first replaced fribidi with ICU. Because ICU only supports shaping for arabic, we then added harfbuzz for better shaping. For now we've kept ICU for a variety of things: notably for text itemization which uses the ICU bidi algo: https://github.com/mapnik/mapnik/blob/master/include/mapnik/text/itemizer.hpp |
Welp, moved the It looks like Work is in the |
As a followup, under what circumstances is that use case of |
@mikemorris There used to be a separate interface for just creating fonts and inspecting these fonts (like enumerating all glyphs in that font and getting their metrics) which I used for debugging purposes. We don't need that interface anymore. |
Thanks @kkaefer. After all this wrangling I at least have a pretty solid understanding of FreeType now to tackle the rest. |
Glyph ranges implemented, next steps for base + uncommon in #36 |
@kkaefer curious what you have in mind here. I at one point envisioned a "glyph tile" system where you'd download/cache whole chunks of unicode ranges as you ran into them
http://jrgraphix.net/research/unicode_blocks.php
This would separate the tiles fully from font glyphs, but not sure what other considerations should be involved here.
The text was updated successfully, but these errors were encountered: