-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] hb_ot_tags_from_language() too slow #3591
Comments
♥ |
Part of #3591 2. All the subtag_matches outside the switch match long strings (>= 6 or so). As such, check the tag for such length before going into any of them. benchmark-ot, before: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 172 ns 171 ns 4083155 BM_hb_ot_tags_from_script_and_language/COMMON en_US 120 ns 119 ns 5849947 BM_hb_ot_tags_from_script_and_language/LATIN en_US 113 ns 112 ns 5840326 BM_hb_ot_tags_from_script_and_language/COMMON none 4.66 ns 4.64 ns 151396224 BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 149019593 After: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 112 ns 6357763 BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.5 ns 60.3 ns 11475091 BM_hb_ot_tags_from_script_and_language/LATIN en_US 54.9 ns 54.8 ns 12575690 BM_hb_ot_tags_from_script_and_language/COMMON none 4.61 ns 4.59 ns 152388450 BM_hb_ot_tags_from_script_and_language/LATIN none 4.66 ns 4.64 ns 151497600
I did this now. New numbers:
|
I’ll start working on splitting the |
Part of #3591 "After that, bulk of the time I suppose is spent in binary-searching the language table. I suggest we split the language table in 2-letter and 3-letter tags, to speed-up the vast majority of cases that are 2-letter." benchmark-ot, before: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 112 ns 111 ns 6286271 BM_hb_ot_tags_from_script_and_language/COMMON en_US 60.6 ns 60.4 ns 11671176 BM_hb_ot_tags_from_script_and_language/LATIN en_US 61.3 ns 61.1 ns 11442645 BM_hb_ot_tags_from_script_and_language/COMMON none 4.75 ns 4.74 ns 146997235 BM_hb_ot_tags_from_script_and_language/LATIN none 4.65 ns 4.64 ns 150938747 After: ---------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 89.5 ns 89.2 ns 7747649 BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.5 ns 38.4 ns 18199432 BM_hb_ot_tags_from_script_and_language/LATIN en_US 39.0 ns 38.9 ns 18049238 BM_hb_ot_tags_from_script_and_language/COMMON none 4.53 ns 4.52 ns 154895110 BM_hb_ot_tags_from_script_and_language/LATIN none 4.54 ns 4.52 ns 154762105
Done. |
After:
|
Now perhaps looking into |
Part of #3591 Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.67 ns 8.64 ns 80324382 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 91.2 ns 90.9 ns 7674131 BM_hb_ot_tags_from_script_and_language/COMMON en_US 41.1 ns 41.0 ns 17174093 BM_hb_ot_tags_from_script_and_language/LATIN en_US 41.3 ns 41.2 ns 17000876 BM_hb_ot_tags_from_script_and_language/COMMON none 4.56 ns 4.55 ns 153914130 BM_hb_ot_tags_from_script_and_language/LATIN none 4.53 ns 4.52 ns 153830303 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.24 ns 8.21 ns 84078465 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 77.5 ns 77.2 ns 9059230 BM_hb_ot_tags_from_script_and_language/COMMON en_US 38.8 ns 38.7 ns 17790692 BM_hb_ot_tags_from_script_and_language/LATIN en_US 37.6 ns 37.5 ns 18648293 BM_hb_ot_tags_from_script_and_language/COMMON none 4.50 ns 4.49 ns 155573267 BM_hb_ot_tags_from_script_and_language/LATIN none 4.49 ns 4.47 ns 156456653
Now that we know both strings are of equal len of 2 or 3, optimize. Part of #3591 Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.50 ns 8.47 ns 81221549 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 79.6 ns 79.3 ns 8785804 BM_hb_ot_tags_from_script_and_language/COMMON en_US 40.0 ns 39.9 ns 17462768 BM_hb_ot_tags_from_script_and_language/LATIN en_US 39.2 ns 39.1 ns 17886793 BM_hb_ot_tags_from_script_and_language/COMMON none 4.31 ns 4.30 ns 162805417 BM_hb_ot_tags_from_script_and_language/LATIN none 4.32 ns 4.31 ns 162656688 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.27 ns 8.24 ns 81868701 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 56.1 ns 56.0 ns 12353284 BM_hb_ot_tags_from_script_and_language/COMMON en_US 24.3 ns 24.2 ns 28955030 BM_hb_ot_tags_from_script_and_language/LATIN en_US 24.5 ns 24.4 ns 28664868 BM_hb_ot_tags_from_script_and_language/COMMON none 4.35 ns 4.34 ns 161190014 BM_hb_ot_tags_from_script_and_language/LATIN none 4.36 ns 4.34 ns 161319000
Now down to:
|
Calling this fixed for now. |
Seems like the |
So we're A way to vastly speed up the case for 2-letter languages is to use a sparse array of 26x26 letters. Currently there's maximum three tags per language, so 9 bytes x 26 x 26 is required for that table, ~6kb, vs around 1.6kb currently. |
Using an integer tag to bsearch, instead of string. Part of: #3591 Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.11 ns 8.08 ns 87067795 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 53.6 ns 53.5 ns 13042418 BM_hb_ot_tags_from_script_and_language/COMMON en_US 24.2 ns 24.1 ns 29052731 BM_hb_ot_tags_from_script_and_language/LATIN en_US 24.4 ns 24.3 ns 28736769 BM_hb_ot_tags_from_script_and_language/COMMON none 4.43 ns 4.41 ns 160370413 BM_hb_ot_tags_from_script_and_language/LATIN none 4.35 ns 4.34 ns 160578191 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 7.97 ns 7.95 ns 85208363 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 41.7 ns 41.6 ns 16945817 BM_hb_ot_tags_from_script_and_language/COMMON en_US 16.1 ns 16.0 ns 43613523 BM_hb_ot_tags_from_script_and_language/LATIN en_US 16.5 ns 16.4 ns 42568107 BM_hb_ot_tags_from_script_and_language/COMMON none 4.30 ns 4.29 ns 164055469 BM_hb_ot_tags_from_script_and_language/LATIN none 4.29 ns 4.27 ns 163793591
I made the bsearch use integer instead of string. Now:
|
Part of #3591 Humm. Looks like not all of the fat is bsearch overhead now. I cached the last bsearch result, but most of the time is still there. I'm baffled. Before: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.08 ns 8.05 ns 84500482 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 42.2 ns 42.1 ns 16722006 BM_hb_ot_tags_from_script_and_language/COMMON en_US 16.1 ns 16.0 ns 43461527 BM_hb_ot_tags_from_script_and_language/LATIN en_US 16.5 ns 16.5 ns 42448505 BM_hb_ot_tags_from_script_and_language/COMMON none 4.34 ns 4.33 ns 161290530 BM_hb_ot_tags_from_script_and_language/LATIN none 4.34 ns 4.33 ns 162339799 After: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 8.13 ns 8.11 ns 80438134 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 40.0 ns 39.9 ns 17487939 BM_hb_ot_tags_from_script_and_language/COMMON en_US 12.7 ns 12.7 ns 55124394 BM_hb_ot_tags_from_script_and_language/LATIN en_US 13.1 ns 13.0 ns 53660125 BM_hb_ot_tags_from_script_and_language/COMMON none 4.61 ns 4.60 ns 151394104 BM_hb_ot_tags_from_script_and_language/LATIN none 4.70 ns 4.68 ns 150402847
Humm. Looks like not all of the fat is bsearch overhead now. I cached the last bsearch result, but most of the time is still there. I'm baffled. After:
|
My benchmark had a bug. Now it looks like not much time is spent in bsearch anymore... |
These are the current numbers:
|
Part of #3591 Ouch! These are the current numbers: ------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 78.0 ns 77.7 ns 8917912 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 44.9 ns 44.8 ns 15475318 BM_hb_ot_tags_from_script_and_language/COMMON en_US 17.6 ns 17.5 ns 39812340 BM_hb_ot_tags_from_script_and_language/LATIN en_US 18.2 ns 18.1 ns 38356204 BM_hb_ot_tags_from_script_and_language/COMMON none 4.76 ns 4.74 ns 148746131 BM_hb_ot_tags_from_script_and_language/LATIN none 4.73 ns 4.71 ns 148421349
Bulk of the |
Next step would be to form a |
Again, I'm happy to call this fixed at this point. :) |
Part of #3591 Comparing before to after Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY -0.3371 -0.3371 71 47 71 47
Part of #3591 Still 'zh-trad' is the slowest case. -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------------- BM_hb_ot_tags_from_script_and_language/COMMON zh_trad 136 ns 136 ns 5107838 BM_hb_ot_tags_from_script_and_language/COMMON ab_abcd 115 ns 115 ns 6103104 BM_hb_ot_tags_from_script_and_language/COMMON ab_abc 25.4 ns 25.3 ns 27674482 BM_hb_ot_tags_from_script_and_language/COMMON abcdef_XY 20.2 ns 20.1 ns 34795719 BM_hb_ot_tags_from_script_and_language/COMMON abcd_XY 19.4 ns 19.3 ns 36390401 BM_hb_ot_tags_from_script_and_language/COMMON cxy_CN 33.5 ns 33.4 ns 20998939 BM_hb_ot_tags_from_script_and_language/COMMON exy_CN 25.1 ns 25.0 ns 27705832 BM_hb_ot_tags_from_script_and_language/COMMON zh_CN 34.2 ns 34.1 ns 20564356 BM_hb_ot_tags_from_script_and_language/COMMON en_US 15.5 ns 15.5 ns 45032204 BM_hb_ot_tags_from_script_and_language/LATIN en_US 15.9 ns 15.8 ns 44412379 BM_hb_ot_tags_from_script_and_language/COMMON none 4.72 ns 4.71 ns 149101665 BM_hb_ot_tags_from_script_and_language/LATIN none 4.72 ns 4.70 ns 149254498
Current numbers:
Adding a second-level |
Although, the ones above 100ns are those going into the non-switch if case. |
[This was flagged by @matthiasclasen in https://gitlab.gnome.org/GNOME/gtk/-/issues/3334 https://gitlab.gnome.org/GNOME/gtk/uploads/a6da390ef2fd85d37dd594d88c4229ff/image.png]
The new
perf/benchmark-ot
shows this:Half of the time is spent in
hb_ot_tags_from_complex_language()
; in there, most of the time is spent insubtag_matches
cases outside theswitch
. Ideas to speed that up:subtag_matches
usestrchr
for-
instead ofstrstr
as main loop. Might or might not have any effect.subtag_matches
outside the switch match long strings (>= 6 or so). As such, check the tag for such length before going into any of them.I implementing the second.
After that, bulk of the time I suppose is spent in binary-searching the language table. I suggest we split the language table in 2-letter and 3-letter tags, to speed-up the vast majority of cases that are 2-letter.
The text was updated successfully, but these errors were encountered: