Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File size diet? :) #46

Open
davelab6 opened this issue Feb 25, 2020 · 31 comments
Open

File size diet? :) #46

davelab6 opened this issue Feb 25, 2020 · 31 comments
Assignees

Comments

@davelab6
Copy link
Member

I wonder if there are any opportunities for file size reduction by better use of components, ccmp diacritic glyph construction, etc?

@mikedug
Copy link
Contributor

mikedug commented Feb 25, 2020

hi Dave is this question related to Roboto only? Most glyphs that can be made as composites are made as so in Roboto.

@dberlow
Copy link
Contributor

dberlow commented Feb 25, 2020 via email

@davelab6
Copy link
Member Author

It would be good to ship the fonts and then ship an update with a smaller file size in the following quarter.

@dberlow
Copy link
Contributor

dberlow commented Feb 26, 2020

A survey of the target glyphs (red).

And a couple of alts I think we're adding to Extremo (green).

Compositables of Classic

@dberlow
Copy link
Contributor

dberlow commented Feb 26, 2020

The plan then is for Mike to finish and deliver a version of Classic.vf by the 6th of March. By Mid-march, we'll have a new source with additional composites and then, as schedule permits, or is required, Mike will produce a new hinted version that same as delivered but with these (27) red glyphs composited.

@brawer
Copy link

brawer commented Feb 26, 2020

Doesn’t HarfBuzz fall back to Unicode Normalization Form D (NFD) when an input code point is missing from the font’s cmap table? For example, when a font doesn’t contain Ä as U+00C4 (Latin Capital Letter A with Diaeresis), I believe HarfBuzz will re-try shaping U+0041 U+0308 (Latin Capital Letter A + Combining Diaeresis). So, if my memory of HarfBuzz is true, and if the font is correctly handling combining accents (does Roboto do that?), you could try removing all precomposed characters. There’s about 13K characters in Unicode whose NFD is different, so it might save substantial space. However, the font would only work with systems based on HarfBuzz (or a shaping engine with the same fallback logic). If you try this out, can you tell what your findings were? Here’s a quick Python3 snippet for printing pre-composed codepoints:

from unicodedata import normalize
for codepoint in range(0, 0x110000):
    c = chr(codepoint)
    if c != normalize("NFD", c):
        print("U+%06X" % codepoint)

@dberlow
Copy link
Contributor

dberlow commented Feb 26, 2020

Doesn’t apply, Far as I can see

@dberlow dberlow closed this as completed Feb 26, 2020
@davelab6 davelab6 reopened this Feb 26, 2020
@davelab6
Copy link
Member Author

I think there is definitely something fishy going on here :)

In a cosmic coincidence, earlier this month I'd (a) learned that fontmake or at least vttLib doesn't build fonts that use flipped composites, and (b) requested a "Global Latin" glyph set definition from @moyogo (gftools/pull/177) to better define support for African languages that use an Extended Latin set.

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300    COMBINING GRAVE ACCENT
0x0301    COMBINING ACUTE ACCENT
0x0303    COMBINING TILDE
0x0304    COMBINING MACRON
0x0309    COMBINING HOOK ABOVE
0x0323    COMBINING DOT BELOW

@rsheeter did some research into this, and indeed, after adding those 6 characters to both Open Sans and Roboto's core subsets, which support 5 glyphs (no combining macron) for the 6 chars, then Roboto adds ~100 glyphs whereas Open Sans adds only 4!

Family gids_latin => gids_latin_plus_six_codepoints
Roboto 248 352
Open Sans 230 234

Rod is wondering if Roboto has been built in a way that adds composed glyphs for things we nowadays expect to be done with the unicode composition that @brawer described.

I am thinking that indeed this is the case, and that while a full ttfdiet may be too aggressive, a light diet may indeed yield significant file size savings.

@rsheeter
Copy link

rsheeter commented Feb 26, 2020

To give a specific example, if I subset to a, b, combining grave, combining accent I expected to see the combinations with grave/accent done by layout but Roboto does some via actual glyphs.

for family in Roboto OpenSans; do 
  pyftsubset ${family}-Regular.ttf \
  --unicodes="U+0061-0062,U+0x0300-0301" \
  --output-file=${family}-Regular_combtest.ttf; 
  ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf;
done

    hb-shape --ned -u 'U+0061,U+0300' Roboto-Regular_combtest.ttf
    [gid7]

    hb-shape --ned -u 'U+0061,U+0301' Roboto-Regular_combtest.ttf
    [gid8]

    hb-shape --ned -u 'U+0062,U+0300' Roboto-Regular_combtest.ttf
    [gid3|gid5@1149,0]

    hb-shape --ned -u 'U+0062,U+0301' Roboto-Regular_combtest.ttf
    [gid3|gid6@1149,0]

    hb-shape --ned -u 'U+0061,U+0300' OpenSans-Regular_combtest.ttf
    [gid2|gid5@1139,0]

    hb-shape --ned -u 'U+0061,U+0301' OpenSans-Regular_combtest.ttf
    [gid2|gid6@1139,0]

    hb-shape --ned -u 'U+0062,U+0300' OpenSans-Regular_combtest.ttf
    [gid3|gid5@1255,0]
    
    hb-shape --ned -u 'U+0062,U+0301' OpenSans-Regular_combtest.ttf
    [gid3|gid6@1255,0]

@dberlow
Copy link
Contributor

dberlow commented Feb 26, 2020

I see now. This should work with Roboto, and its single set of combining accents, without a problem. As long as the optical centers of glyphs and diacritics are aligned and no glyph-specific horizontal or vertical refinements are required. I think that’s the way it’s always been, but perhaps type designers have not trusted the quality compared to refined glyph construction.

So do we have refining capabilities for every single individual glyph?

Moving on to more Complex requirements, e.g. in Roboto Extremo, there are five sets of combining accents to compensate for different design requirements of the cases, the above and below accents and those involved in stacking, all of which, once projected out into a larger design space, put too much stress on one accent working for all combinations.

I like to know as much as possible about saving space as I can but on the other hand my goal is to produce the same quality Across the entire design space as exists in the default.

@rsheeter
Copy link

To give a little more context, if I compare subsetting to latin with/without the six added codepoints for Yoruba (aside: actually only 4 are for Yoruba) the size delta for Roboto is way out of line with the rest of our library.

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%).
+6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern. I'm hoping it's possible to adjust Roboto to perform closer to Open Sans when we add combining characters to the subset.

@brawer
Copy link

brawer commented Feb 27, 2020

I think there is definitely something fishy going on here

Does Roboto perhaps implement combining characters with GSUB, as in sub a acutecomb by aacute? Then, ttx will not remove the aacute glyph from the font (which is correct, since the composed glyph might look different); that would explain the gid7 and gid8 in the output above. Instead, try writing your substitution rules the other way, like Open Sans seems to do: sub aacute by a acutecomb. If the a and acutecomb glyphs have the right anchors, the mark placement feature should be generated automatically; and subsetting can then remove the aacute from the font. (You could also try deleting all accented glyphs from the font source, and someone writes a tool that synthesizes precomposed characters for platforms that really need them; shouldn’t be very hard).

@dberlow
Copy link
Contributor

dberlow commented Feb 27, 2020 via email

@davelab6
Copy link
Member Author

I see https://benkiel.com/typeDesign/ has buildAccents.py for FontLab 5 as a source-based accent builder, and https://robofont.com/documentation/how-tos/building-accented-glyphs/ suggests this is nowadays replaced with https://github.com/typemytype/glyphconstruction for folks like you using RoboFont.

@sannorozco
Copy link
Collaborator

sannorozco commented Feb 27, 2020

Yes, the latter is what we use!

@dberlow
Copy link
Contributor

dberlow commented Feb 27, 2020 via email

@khaledhosny
Copy link

Doesn’t HarfBuzz fall back to Unicode Normalization Form D (NFD) when an input code point is missing from the font’s cmap table?

Yes, but many applications (most actually) will check the ccmp table to decide whether or not to use a fallback font, before shaping with HarfBuzz. Chrome (and to some extent LibreOffice) is the only exception as it shapes first then uses fallback font for unsupported characters.

@brawer
Copy link

brawer commented Feb 27, 2020

Very curious about the actual space savings on a large font like Roboto, especially in WOFF format. If the improvement was huge, it might be worth building subsetted fonts for certain environments (eg. Google Fonts, which afaik can support browser-specific fonts); a large size difference might also be a reason for other browsers and apps to implement the same fallback logic like Chrome or LibreOffice. But if it’s only a small difference, it’s clearly not worth bothering.

@sannorozco
Copy link
Collaborator

Hallo,

I broke our findings in two issues #49 & #48

@m4rc1e
Copy link
Collaborator

m4rc1e commented Mar 20, 2020

I've removed all composite glyphs which have a unicode decomposition for Roboto Regular Hinted. The file is around ~8% smaller when we factor in all other pyftsubset optimisations.

For Chrome and modern browsers, the results are ok. There are some mark positioning issues but these can be solved.

Desktop_Windows_10_chrome_69 0_

Win 10 Chrome 70

Internet Explorer 11 is a mess.

Desktop_Windows_7_ie_11 0_

Win 7 ie11

If we ignore the shifts, we can see the accented glyphs are using a fallback font.

If we do go ahead with this diet idea, imo it has to be done by the gf backend as Sascha suggested and only served to particular browsers. I don't think @sannorozco and TN should be doing this manually.

The benefit of making this server side is we can apply it to other families as well.

@m4rc1e
Copy link
Collaborator

m4rc1e commented Mar 20, 2020

I've made a minimal test case for those who are interested.

OS X tests:

OSX_chrome_80

Chrome

OSX_safari_13

Safari


Win tests:

win10_chrome_80

Chrome

win10_edge_80

Edge 80

win10_firefox_74

Firefox

win10_ie_11

IE 11

Seems only Chrome and Edge are ok.

diet_testcase.zip

@davelab6
Copy link
Member Author

davelab6 commented Mar 26, 2020

@rsheeter please can you provide a way for @m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics?

The work being done in this thread is in response to that concern, but since this is taking much longer to complete than anticipated, I would like to decouple the two tasks:

  • ship Roboto Classic to get the VF filesize savings, and

  • ship dieted fonts that can include Yoruba without filesize increase the same as all other top fonts :)

@davelab6
Copy link
Member Author

On a 1/1 call just now, @m4rc1e also proposed that perhaps its better to do the diet'ing like https://github.com/twardoch/ttfdiet as a post-processing step, which is therefore applicable to any font project, and not in source files - since pyftsubset retains hinting

So, we probably need to checkout the current master into a holding 'diet' branch, then revert to the last commit before the diet effort started, and continue from there.

@davelab6
Copy link
Member Author

a way for @m4rc1e to reproduce the filesize increase you found when adding the Yoruba characters that appears unique to Roboto statics

Ah, @rsheeter pointed out that the information required to do this is already on this thread, in 2 parts. I wrote,

For example, there are 6 characters missing from the Google Fonts "Core" Latin set latin_unique-glyphs.nam that are needed to support Yoruba:

0x0300 COMBINING GRAVE ACCENT
0x0301 COMBINING ACUTE ACCENT
0x0303 COMBINING TILDE
0x0304 COMBINING MACRON
0x0309 COMBINING HOOK ABOVE
0x0323 COMBINING DOT BELOW

and Rod wrote,

for family in Roboto OpenSans; do 
  pyftsubset ${family}-Regular.ttf \
  --unicodes="U+0061-0062,U+0x0300-0301" \
  --output-file=${family}-Regular_combtest.ttf; 
  ttx -o ${family}-Regular_combtest.ttx ${family}-Regular_combtest.ttf;
done

Rod also wrote earlier,

+6 codepoints in latin for Open Sans adds 4 gids (230 => 234) and increases woff2 filesize by 100-200 bytes (1-2%).

+6 codepoints in latin for Roboto adds 104 gids (248 => 352) and increases woff2 filesize by 2400-2800 bytes (17-22%).

The woff2 filesize is the real concern

So, to move forward, I propose that @sannorozco and @dberlow take a step back and research how is Open Sans constructed so that adding the 6 characters does not increase the filesize by ~20%, and apply Marc's minimal test case to Open Sans to confirm that it renders as expected.

Does that sound good?

@dberlow
Copy link
Contributor

dberlow commented Mar 26, 2020 via email

@dberlow
Copy link
Contributor

dberlow commented Mar 26, 2020 via email

@davelab6
Copy link
Member Author

Enclosed is a compare of the two repertoires.

You can not attach files via email, as you tried here - you must visit the #46 page and drag and drop the files into your comment.

@davelab6
Copy link
Member Author

Where are the files that show Roboto increasing by 20% from the addition of
6 glyphs? who made them? How?

The files were not shared, but the command line to reproduce them has been shared, so I am requesting that you/santiago (re)make them, so that you can be sure how they were made and fully investigate and compare.

go forward from there?

Understood. Let's roll forwards!

The good news is, we can now rearrange the glyph indexes again if we need
to and that's open source.

After looking at the code (https://github.com/TypeNetwork/Roboto/blob/delivery-review/Scripts/mapper-VTT-gids.py) then this needs to be polished and packaged in order to be used again; I'll file a separate issue for that :)

@dberlow
Copy link
Contributor

dberlow commented Mar 27, 2020

Where is,the command line please?

Glyph repertoires below? Open Sans is top, Roboto below.
Screen Shot 2020-03-26 at 4 30 13 PM

Screen Shot 2020-03-26 at 4 30 36 PM

@davelab6
Copy link
Member Author

davelab6 commented Apr 1, 2020

I explain how to construct the command here, #46 (comment)

@m4rc1e
Copy link
Collaborator

m4rc1e commented Apr 2, 2020

The original Roboto has a dedicated webfont family which is produced by post processing the master fonts. This is the version of Roboto we use on Google Fonts.

If I run the post processing scripts which creates this family on our VFs, we get a file size of 900kb. The static ttfs we currently serve are 2.1mb. Imo, this is a massive win.

The VFs also contain Mike's VTT work.

cc @rsheeter @davelab6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants