Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up subset test suite #3089

Closed
behdad opened this issue Jul 29, 2021 · 31 comments · Fixed by #3110
Closed

Speed-up subset test suite #3089

behdad opened this issue Jul 29, 2021 · 31 comments · Fixed by #3110
Assignees
Labels

Comments

@behdad
Copy link
Member

behdad commented Jul 29, 2021

I suggest adding --batch to hb-subset, like hb-shape has. That should immensely help.

@behdad
Copy link
Member Author

behdad commented Jul 29, 2021

Currently make check -j10 in test/shaping takes 20s for me, whereas in test/subset takes 100s.

@behdad
Copy link
Member Author

behdad commented Jul 29, 2021

Tentatively assigning to @garretrieger; but @khaledhosny feel free to try since you're on a roll! Thank you.

@khaledhosny
Copy link
Collaborator

The subset tests also use FontTools to compare the fonts (saving to ttx XML first then diffing it), we can probably speed it by saving the expected output in XML format instead of generating it each time. This has the downside of having to update the expected results each time FontTools changes its output.

@behdad
Copy link
Member Author

behdad commented Jul 29, 2021

Yeah I keep forgetting what the subset test suite does. It has expected binary from fonttools saved? Then converts that and hb-subset output to ttx to compare?

So if we save XML instead, that speeds things up a bit. And I like it better honestly as is readable. The update issue is fine, we'll deal with it.

@behdad
Copy link
Member Author

behdad commented Jul 29, 2021

I think we're near-enough to a point to be able to generate ttx-like dump of font using hb itself... That would help a lot.

@behdad
Copy link
Member Author

behdad commented Jul 29, 2021

If anything, making the test-runner a bit more verbose would be great. Right now each test takes multiple seconds with no further details.

@khaledhosny
Copy link
Collaborator

It has expected binary from fonttools saved? Then converts that and hb-subset output to ttx to compare?

Yes.

khaledhosny added a commit that referenced this issue Jul 29, 2021
Speed-up subset tests by saving TTX dump of expected output instead of
generating it each time the tests are run.

Cuts down meson test --suite=subset on my system from:
real	0m38.977s
user	1m12.024s
sys	0m10.547s

to:
real	0m22.291s
user	0m44.548s
sys	0m9.221s

Part of #3089
behdad pushed a commit that referenced this issue Jul 29, 2021
Speed-up subset tests by saving TTX dump of expected output instead of
generating it each time the tests are run.

Cuts down meson test --suite=subset on my system from:
real	0m38.977s
user	1m12.024s
sys	0m10.547s

to:
real	0m22.291s
user	0m44.548s
sys	0m9.221s

Part of #3089
@behdad
Copy link
Member Author

behdad commented Jul 30, 2021

Thanks Khaled. That definitely helped.

Next: the subset tests are a matrix of cartesian product of fonts x profiles x text. For example:

FONTS:
SourceHanSans-Regular_subset.otf
PROFILES:
default.txt
drop-hints.txt
drop-hints-retain-gids.txt
retain-gids.txt
desubroutinize.txt
desubroutinize-retain-gids.txt
drop-hints-desubroutinize.txt
drop-hints-desubroutinize-retain-gids.txt
SUBSETS:
acek
明極珠度輸清
あいうえおか
あいう珠度輸

That's one font, 8 profiles, 5 texts, for a total of 40 subset tests. I think we should sparse this up. We don't need to test every combination.

khaledhosny added a commit that referenced this issue Aug 1, 2021
time meson test --suite=subset down from:
real	0m22.822s
user	0m44.561s
sys	0m9.255s

to:
real	0m19.418s
user	0m38.171s
sys	0m3.587s

Does not seem to help much, but it is something.

Part of #3089
@khaledhosny
Copy link
Collaborator

Next: the subset tests are a matrix of cartesian product of fonts x profiles x text. For example:

I’ll leave this for someone who is actually familiar with how subset and its test suite works.

@khaledhosny
Copy link
Collaborator

These two tests are currently the biggest offenders, all other tests are under 2 seconds and most under 1 second:

54/55 harfbuzz:subset+slow / cff-japanese                    OK               9.01s
55/55 harfbuzz:subset+slow / basics                          OK              20.35s

Breaking them up so they can be parallelized probably helps, even if we keep testing all the profiles.

@khaledhosny
Copy link
Collaborator

Splitting basics test into 3 tests one for each font does not seem to help either, most of the time is spent working on the 3rd font (NanumMyeongjo-Regular) anyway.

@khaledhosny
Copy link
Collaborator

Correction, most of the time is spent on Comfortaa-Regular-new.ttf.

@khaledhosny
Copy link
Collaborator

Most of the time (15 out of 17 seconds) is spent in FontTools dumping the subset font.

khaledhosny added a commit that referenced this issue Aug 1, 2021
time meson test --suite=subset down from:
real	0m22.822s
user	0m44.561s
sys	0m9.255s

to:
real	0m19.418s
user	0m38.171s
sys	0m3.587s

Does not seem to help much, but it is something.

Part of #3089
khaledhosny added a commit that referenced this issue Aug 1, 2021
time meson test --suite=subset down from:
real	0m22.822s
user	0m44.561s
sys	0m9.255s

to:
real	0m19.418s
user	0m38.171s
sys	0m3.587s

Does not seem to help much, but it is something.

Part of #3089
@behdad
Copy link
Member Author

behdad commented Aug 1, 2021

Thanks Khaled!

khaledhosny added a commit that referenced this issue Aug 1, 2021
time meson test --suite=subset down from:
real	0m22.822s
user	0m44.561s
sys	0m9.255s

to:
real	0m19.418s
user	0m38.171s
sys	0m3.587s

Does not seem to help much, but it is something.

Part of #3089
behdad pushed a commit that referenced this issue Aug 2, 2021
time meson test --suite=subset down from:
real	0m22.822s
user	0m44.561s
sys	0m9.255s

to:
real	0m19.418s
user	0m38.171s
sys	0m3.587s

Does not seem to help much, but it is something.

Part of #3089
@behdad
Copy link
Member Author

behdad commented Aug 3, 2021

Also:

$ time PYTHONPATH=~/fonttools/Lib python3 ./run-tests.py ../../util/hb-subset data/tests/basics.tests >/dev/null

real	0m20.521s
user	0m18.621s
sys	0m1.555s

$ time PYTHONPATH=~/fonttools/Lib pypy3 ./run-tests.py ../../util/hb-subset data/tests/basics.tests >/dev/null

real	0m17.811s
user	0m12.320s
sys	0m1.500s

@behdad
Copy link
Member Author

behdad commented Aug 3, 2021

Although, it slows down the entire suite. Which is expected, since only long-running processes benefit from pypy.

cpython3:
$ PYTHONPATH=~/fonttools/Lib time make check -j3 >/dev/null
       29.07 real        51.75 user         6.13 sys

pypy3:
$ PYTHONPATH=~/fonttools/Lib time make check -j3 >/dev/null
       25.47 real        49.11 user         5.61 sys

@khaledhosny
Copy link
Collaborator

What if we tried to make some shortcuts, e.g. check file checksum first and if it matches skip the ttx dump?

@garretrieger
Copy link
Collaborator

garretrieger commented Aug 3, 2021

I believe the binary output between fonttools and harfbuzz is rarely going to be exactly the same, so not sure if that will save much.

@khaledhosny
Copy link
Collaborator

But we can save the HarfBuzz output as the expected one (we don’t use FontTools subsetter when running the tests, only when adding the tests for the first time). I.e. manually verify that FontTools and HarfBuzz produce equivalent output, store HarfBuzz result in the repository and test against it moving forward.

@garretrieger
Copy link
Collaborator

good idea, yeah that should work.

@khaledhosny
Copy link
Collaborator

This will require reverting the commits that saved the XML files in the repositoiry, but with this change:

$ time python3 ./run-tests.py ../../build/util/hb-subset data/tests/basics.tests > /dev/null

real	0m1.612s
user	0m1.134s
sys	0m0.116s

@khaledhosny
Copy link
Collaborator

We can even make generate-expected-outputs.py subset with both HarfBuzz and FontTools and compare the ouput, so one does not easily miss the difference when generating tests.

@behdad
Copy link
Member Author

behdad commented Aug 3, 2021

That's BRILLIANT! Do it please.

@behdad
Copy link
Member Author

behdad commented Aug 3, 2021

We can even make generate-expected-outputs.py subset with both HarfBuzz and FontTools and compare the ouput, so one does not easily miss the difference when generating tests.

Yep.

Maybe add a --generate-expected to run-tests. (that's what the --reference was about in shape run-tests BTW.

Maybe also add a record-test like the one in test/shape, that does the comparison with FontTools as well.

@behdad
Copy link
Member Author

behdad commented Aug 3, 2021

If the two binaries differ that should already fail. Our output should be deterministic, and we'll change the expected files every time we make a change that would change them. When things fail we can ttx both and if they compare equal, fail by suggesting expected file to be regenerated...

@khaledhosny
Copy link
Collaborator

We need to documeent the version of FontTools used to generate the current expectation, I’m seeing diffs like this:

$ python3 generate-expected-outputs.py ../../build/util/hb-subset data/tests/basics.tests
Generating output files for data/expected/basics
Creating subset data/expected/basics/Roboto-Regular.abc.default.61,62,63.ttf
---
+++
@@ -152,6 +152,11 @@
       <map code="0x62" name="b"/><!-- LATIN SMALL LETTER B -->
       <map code="0x63" name="c"/><!-- LATIN SMALL LETTER C -->
     </cmap_format_4>
+    <cmap_format_12 platformID="3" platEncID="10" format="12" reserved="0" length="28" language="0" nGroups="1">
+      <map code="0x61" name="a"/><!-- LATIN SMALL LETTER A -->
+      <map code="0x62" name="b"/><!-- LATIN SMALL LETTER B -->
+      <map code="0x63" name="c"/><!-- LATIN SMALL LETTER C -->
+    </cmap_format_12>
   </cmap>

   <fpgm>
Traceback (most recent call last):
  File "/Users/khaled/Development/harfbuzz/test/subset/generate-expected-outputs.py", line 86, in <module>
    generate_expected_output(test.font_path, unicodes, test.get_profile_flags(),
  File "/Users/khaled/Development/harfbuzz/test/subset/generate-expected-outputs.py", line 63, in generate_expected_output
    raise Exception ('ttx for fonttools and harfbuzz does not match.')
Exception: ttx for fonttools and harfbuzz does not match.

@garretrieger
Copy link
Collaborator

Ah right, there was a change in fonttools recently to omit cmap12 if possible. We haven't yet brought that over to hb subset. For generating the test cases we've been using fontTools with that change (fonttools/fonttools#2146) patched out

@garretrieger
Copy link
Collaborator

I'll take a look and see if I can make that change in hb now that some of the cmap stuff has been cleaned up.

@khaledhosny
Copy link
Collaborator

Thanks, I reverted that change locally. We can update the subsetter and test results later.

@khaledhosny khaledhosny linked a pull request Aug 4, 2021 that will close this issue
@khaledhosny
Copy link
Collaborator

#3110

Next is to see why the fuzzer test takes 13 seconds by itself (almost 1.5 times the the rest of the tests).

@khaledhosny
Copy link
Collaborator

Interestingly, the dist-check CI job used to take 13 minutes in average, now it takes 7 minutes, I wasn’t expecting a change in CI times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants