Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][format] Improves escaping performance. #88533

Merged
merged 1 commit into from
Apr 28, 2024

Conversation

mordante
Copy link
Member

@mordante mordante commented Apr 12, 2024

The previous patch implemented

  • P2713R1 Escaping improvements in std::format
  • LWG3965 Incorrect example in [format.string.escaped] p3 for formatting of combining characters

These changes were correct, but had a size and performance penalty. This patch improves the size and performance of the previous patch. The performance is still worse than before since the lookups may require two property lookups instead of one before implementing the paper. The changes give a tighter coupling between the Unicode data and the algorithm. Additional tests are added to notify about changes in future Unicode updates.

Before

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           110704 ns       110696 ns         6206
BM_unicode_escaped<char>         101371 ns       101374 ns         6862
BM_cyrillic_escaped<char>         63329 ns        63327 ns        11013
BM_japanese_escaped<char>         41223 ns        41225 ns        16938
BM_emoji_escaped<char>           111022 ns       111021 ns         6304
BM_ascii_escaped<wchar_t>        112441 ns       112443 ns         6231
BM_unicode_escaped<wchar_t>      102776 ns       102779 ns         6813
BM_cyrillic_escaped<wchar_t>      58977 ns        58975 ns        11868
BM_japanese_escaped<wchar_t>      36885 ns        36886 ns        18975
BM_emoji_escaped<wchar_t>        115885 ns       115881 ns         6051

The first change is to manually encode the entire last area and make a manual exception for the 240 excluded entries. This reduced the table from 1077 to 729 entries and gave the following benchmark results.

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104777 ns       104776 ns         6550
BM_unicode_escaped<char>          96980 ns        96982 ns         7238
BM_cyrillic_escaped<char>         60254 ns        60251 ns        11670
BM_japanese_escaped<char>         44452 ns        44452 ns        15734
BM_emoji_escaped<char>           104557 ns       104551 ns         6685
BM_ascii_escaped<wchar_t>        107456 ns       107454 ns         6505
BM_unicode_escaped<wchar_t>       96219 ns        96216 ns         7301
BM_cyrillic_escaped<wchar_t>      56921 ns        56904 ns        12288
BM_japanese_escaped<wchar_t>      39530 ns        39529 ns        17492
BM_emoji_escaped<wchar_t>        108494 ns       108496 ns         6408

An entry in the table can only contain 2048 code points. For larger ranges there are multiple entries split in chunks with a maximum size of 2048 entries. To encode the entire Unicode code point range 21 bits are required. The manual part starts at 0x323B0 this means all entries in the table fit in 18 bits. This allows to allocate 3 additional bits for the range. This allows entries to have 16384 elements. This range always avoids splitting the range in multiple chunks.

This reduces the number of table elements from 729 to 711 and gives the following benchmark results.

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104289 ns       104289 ns         6619
BM_unicode_escaped<char>          96682 ns        96681 ns         7215
BM_cyrillic_escaped<char>         59673 ns        59673 ns        11732
BM_japanese_escaped<char>         41983 ns        41982 ns        16646
BM_emoji_escaped<char>           104119 ns       104120 ns         6683
BM_ascii_escaped<wchar_t>        104503 ns       104505 ns         6693
BM_unicode_escaped<wchar_t>       93426 ns        93423 ns         7489
BM_cyrillic_escaped<wchar_t>      54858 ns        54859 ns        12742
BM_japanese_escaped<wchar_t>      36385 ns        36384 ns        19259
BM_emoji_escaped<wchar_t>        105608 ns       105610 ns         6592

@mordante mordante requested a review from a team as a code owner April 12, 2024 16:23
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Apr 12, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 12, 2024

@llvm/pr-subscribers-libcxx

Author: Mark de Wever (mordante)

Changes

The previous patch implemented

  • P2713R1 Escaping improvements in std::format
  • LWG3965 Incorrect example in [format.string.escaped] p3 for formatting of combining characters

These changes were correct, but has a size and performance penalty. This patch improves the size and performance of the previous patch. The performance is still worse than before since the lookups may require two property lookups instead of one before implementing the paper. The changes give a tighter coupling between the Unicode data and the algorithm. An additional tests are added to notify about changes in future Unicode updates.

Before

Benchmark Time CPU Iterations

BM_ascii_escaped<char> 110704 ns 110696 ns 6206
BM_unicode_escaped<char> 101371 ns 101374 ns 6862
BM_cyrillic_escaped<char> 63329 ns 63327 ns 11013
BM_japanese_escaped<char> 41223 ns 41225 ns 16938
BM_emoji_escaped<char> 111022 ns 111021 ns 6304
BM_ascii_escaped<wchar_t> 112441 ns 112443 ns 6231
BM_unicode_escaped<wchar_t> 102776 ns 102779 ns 6813
BM_cyrillic_escaped<wchar_t> 58977 ns 58975 ns 11868
BM_japanese_escaped<wchar_t> 36885 ns 36886 ns 18975
BM_emoji_escaped<wchar_t> 115885 ns 115881 ns 6051

The first change is to manually encode the entire last area and make a manual exception for the 240 excluded entries. This reduced the table from 1077 to 729 entries and gave the following benchmark results. -----------------------------------------------------------------------
Benchmark Time CPU Iterations

BM_ascii_escaped<char> 104777 ns 104776 ns 6550
BM_unicode_escaped<char> 96980 ns 96982 ns 7238
BM_cyrillic_escaped<char> 60254 ns 60251 ns 11670
BM_japanese_escaped<char> 44452 ns 44452 ns 15734
BM_emoji_escaped<char> 104557 ns 104551 ns 6685
BM_ascii_escaped<wchar_t> 107456 ns 107454 ns 6505
BM_unicode_escaped<wchar_t> 96219 ns 96216 ns 7301
BM_cyrillic_escaped<wchar_t> 56921 ns 56904 ns 12288
BM_japanese_escaped<wchar_t> 39530 ns 39529 ns 17492
BM_emoji_escaped<wchar_t> 108494 ns 108496 ns 6408

An entry in the table can only contain 2048 code points. For larger ranges there are multiple entries split in chunks with a maximum size of 2048 entries. To encode the entire Unicode code point range 21 bits are required. The manual part starts at 0x323B0 this means all entries in the table fit in 18 bits. This allows to allocate 3 additional bits for the range. This allows entries to have 16384 elements. This range always avoids splitting the range in multiple chunks.

This reduces the number of table elements from 729 to 711 and gives the following benchmark results.

Benchmark Time CPU Iterations

BM_ascii_escaped<char> 104289 ns 104289 ns 6619
BM_unicode_escaped<char> 96682 ns 96681 ns 7215
BM_cyrillic_escaped<char> 59673 ns 59673 ns 11732
BM_japanese_escaped<char> 41983 ns 41982 ns 16646
BM_emoji_escaped<char> 104119 ns 104120 ns 6683
BM_ascii_escaped<wchar_t> 104503 ns 104505 ns 6693
BM_unicode_escaped<wchar_t> 93426 ns 93423 ns 7489
BM_cyrillic_escaped<wchar_t> 54858 ns 54859 ns 12742
BM_japanese_escaped<wchar_t> 36385 ns 36384 ns 19259
BM_emoji_escaped<wchar_t> 105608 ns 105610 ns 6592


Patch is 102.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88533.diff

3 Files Affected:

  • (modified) libcxx/include/__format/escaped_output_table.h (+729-1092)
  • (added) libcxx/test/libcxx/utilities/format/format.string/format.string.std/escaped_output.pass.cpp (+100)
  • (modified) libcxx/utils/generate_escaped_output_table.py (+51-35)
diff --git a/libcxx/include/__format/escaped_output_table.h b/libcxx/include/__format/escaped_output_table.h
index 3144abdd3f9a80..6abe121661bdf6 100644
--- a/libcxx/include/__format/escaped_output_table.h
+++ b/libcxx/include/__format/escaped_output_table.h
@@ -97,7 +97,6 @@ namespace __escaped_output_table {
 /// - Unassigned.
 ///
 /// The data is generated from
-/// - https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt
 /// - https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedGeneralCategory.txt
 ///
 /// The table is similar to the table
@@ -106,1110 +105,748 @@ namespace __escaped_output_table {
 /// table lacks a property, thus having more bits available for the size.
 ///
 /// The data has 2 values:
-/// - bits [0, 10] The size of the range, allowing 2048 elements.
-/// - bits [11, 31] The lower bound code point of the range. The upper bound of
-///   the range is lower bound + size.
-_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[1077] = {
+/// - bits [0, 13] The size of the range, allowing 16384 elements.
+/// - bits [14, 31] The lower bound code point of the range. The upper bound of
+///   the range is lower bound + size. Note the code expects code units the fit
+///   into 18 bits, instead of the 21 bits needed for the full Unicode range.
+_LIBCPP_HIDE_FROM_ABI inline constexpr uint32_t __entries[711] = {
     0x00000020 /* 00000000 - 00000020 [   33] */,
-    0x0003f821 /* 0000007f - 000000a0 [   34] */,
-    0x00056800 /* 000000ad - 000000ad [    1] */,
-    0x001bc001 /* 00000378 - 00000379 [    2] */,
-    0x001c0003 /* 00000380 - 00000383 [    4] */,
-    0x001c5800 /* 0000038b - 0000038b [    1] */,
-    0x001c6800 /* 0000038d - 0000038d [    1] */,
-    0x001d1000 /* 000003a2 - 000003a2 [    1] */,
-    0x00298000 /* 00000530 - 00000530 [    1] */,
-    0x002ab801 /* 00000557 - 00000558 [    2] */,
-    0x002c5801 /* 0000058b - 0000058c [    2] */,
-    0x002c8000 /* 00000590 - 00000590 [    1] */,
-    0x002e4007 /* 000005c8 - 000005cf [    8] */,
-    0x002f5803 /* 000005eb - 000005ee [    4] */,
-    0x002fa810 /* 000005f5 - 00000605 [   17] */,
-    0x0030e000 /* 0000061c - 0000061c [    1] */,
-    0x0036e800 /* 000006dd - 000006dd [    1] */,
-    0x00387001 /* 0000070e - 0000070f [    2] */,
-    0x003a5801 /* 0000074b - 0000074c [    2] */,
-    0x003d900d /* 000007b2 - 000007bf [   14] */,
-    0x003fd801 /* 000007fb - 000007fc [    2] */,
-    0x00417001 /* 0000082e - 0000082f [    2] */,
-    0x0041f800 /* 0000083f - 0000083f [    1] */,
-    0x0042e001 /* 0000085c - 0000085d [    2] */,
-    0x0042f800 /* 0000085f - 0000085f [    1] */,
-    0x00435804 /* 0000086b - 0000086f [    5] */,
-    0x00447808 /* 0000088f - 00000897 [    9] */,
-    0x00471000 /* 000008e2 - 000008e2 [    1] */,
-    0x004c2000 /* 00000984 - 00000984 [    1] */,
-    0x004c6801 /* 0000098d - 0000098e [    2] */,
-    0x004c8801 /* 00000991 - 00000992 [    2] */,
-    0x004d4800 /* 000009a9 - 000009a9 [    1] */,
-    0x004d8800 /* 000009b1 - 000009b1 [    1] */,
-    0x004d9802 /* 000009b3 - 000009b5 [    3] */,
-    0x004dd001 /* 000009ba - 000009bb [    2] */,
-    0x004e2801 /* 000009c5 - 000009c6 [    2] */,
-    0x004e4801 /* 000009c9 - 000009ca [    2] */,
-    0x004e7807 /* 000009cf - 000009d6 [    8] */,
-    0x004ec003 /* 000009d8 - 000009db [    4] */,
-    0x004ef000 /* 000009de - 000009de [    1] */,
-    0x004f2001 /* 000009e4 - 000009e5 [    2] */,
-    0x004ff801 /* 000009ff - 00000a00 [    2] */,
-    0x00502000 /* 00000a04 - 00000a04 [    1] */,
-    0x00505803 /* 00000a0b - 00000a0e [    4] */,
-    0x00508801 /* 00000a11 - 00000a12 [    2] */,
-    0x00514800 /* 00000a29 - 00000a29 [    1] */,
-    0x00518800 /* 00000a31 - 00000a31 [    1] */,
-    0x0051a000 /* 00000a34 - 00000a34 [    1] */,
-    0x0051b800 /* 00000a37 - 00000a37 [    1] */,
-    0x0051d001 /* 00000a3a - 00000a3b [    2] */,
-    0x0051e800 /* 00000a3d - 00000a3d [    1] */,
-    0x00521803 /* 00000a43 - 00000a46 [    4] */,
-    0x00524801 /* 00000a49 - 00000a4a [    2] */,
-    0x00527002 /* 00000a4e - 00000a50 [    3] */,
-    0x00529006 /* 00000a52 - 00000a58 [    7] */,
-    0x0052e800 /* 00000a5d - 00000a5d [    1] */,
-    0x0052f806 /* 00000a5f - 00000a65 [    7] */,
-    0x0053b809 /* 00000a77 - 00000a80 [   10] */,
-    0x00542000 /* 00000a84 - 00000a84 [    1] */,
-    0x00547000 /* 00000a8e - 00000a8e [    1] */,
-    0x00549000 /* 00000a92 - 00000a92 [    1] */,
-    0x00554800 /* 00000aa9 - 00000aa9 [    1] */,
-    0x00558800 /* 00000ab1 - 00000ab1 [    1] */,
-    0x0055a000 /* 00000ab4 - 00000ab4 [    1] */,
-    0x0055d001 /* 00000aba - 00000abb [    2] */,
-    0x00563000 /* 00000ac6 - 00000ac6 [    1] */,
-    0x00565000 /* 00000aca - 00000aca [    1] */,
-    0x00567001 /* 00000ace - 00000acf [    2] */,
-    0x0056880e /* 00000ad1 - 00000adf [   15] */,
-    0x00572001 /* 00000ae4 - 00000ae5 [    2] */,
-    0x00579006 /* 00000af2 - 00000af8 [    7] */,
-    0x00580000 /* 00000b00 - 00000b00 [    1] */,
-    0x00582000 /* 00000b04 - 00000b04 [    1] */,
-    0x00586801 /* 00000b0d - 00000b0e [    2] */,
-    0x00588801 /* 00000b11 - 00000b12 [    2] */,
-    0x00594800 /* 00000b29 - 00000b29 [    1] */,
-    0x00598800 /* 00000b31 - 00000b31 [    1] */,
-    0x0059a000 /* 00000b34 - 00000b34 [    1] */,
-    0x0059d001 /* 00000b3a - 00000b3b [    2] */,
-    0x005a2801 /* 00000b45 - 00000b46 [    2] */,
-    0x005a4801 /* 00000b49 - 00000b4a [    2] */,
-    0x005a7006 /* 00000b4e - 00000b54 [    7] */,
-    0x005ac003 /* 00000b58 - 00000b5b [    4] */,
-    0x005af000 /* 00000b5e - 00000b5e [    1] */,
-    0x005b2001 /* 00000b64 - 00000b65 [    2] */,
-    0x005bc009 /* 00000b78 - 00000b81 [   10] */,
-    0x005c2000 /* 00000b84 - 00000b84 [    1] */,
-    0x005c5802 /* 00000b8b - 00000b8d [    3] */,
-    0x005c8800 /* 00000b91 - 00000b91 [    1] */,
-    0x005cb002 /* 00000b96 - 00000b98 [    3] */,
-    0x005cd800 /* 00000b9b - 00000b9b [    1] */,
-    0x005ce800 /* 00000b9d - 00000b9d [    1] */,
-    0x005d0002 /* 00000ba0 - 00000ba2 [    3] */,
-    0x005d2802 /* 00000ba5 - 00000ba7 [    3] */,
-    0x005d5802 /* 00000bab - 00000bad [    3] */,
-    0x005dd003 /* 00000bba - 00000bbd [    4] */,
-    0x005e1802 /* 00000bc3 - 00000bc5 [    3] */,
-    0x005e4800 /* 00000bc9 - 00000bc9 [    1] */,
-    0x005e7001 /* 00000bce - 00000bcf [    2] */,
-    0x005e8805 /* 00000bd1 - 00000bd6 [    6] */,
-    0x005ec00d /* 00000bd8 - 00000be5 [   14] */,
-    0x005fd804 /* 00000bfb - 00000bff [    5] */,
-    0x00606800 /* 00000c0d - 00000c0d [    1] */,
-    0x00608800 /* 00000c11 - 00000c11 [    1] */,
-    0x00614800 /* 00000c29 - 00000c29 [    1] */,
-    0x0061d001 /* 00000c3a - 00000c3b [    2] */,
-    0x00622800 /* 00000c45 - 00000c45 [    1] */,
-    0x00624800 /* 00000c49 - 00000c49 [    1] */,
-    0x00627006 /* 00000c4e - 00000c54 [    7] */,
-    0x0062b800 /* 00000c57 - 00000c57 [    1] */,
-    0x0062d801 /* 00000c5b - 00000c5c [    2] */,
-    0x0062f001 /* 00000c5e - 00000c5f [    2] */,
-    0x00632001 /* 00000c64 - 00000c65 [    2] */,
-    0x00638006 /* 00000c70 - 00000c76 [    7] */,
-    0x00646800 /* 00000c8d - 00000c8d [    1] */,
-    0x00648800 /* 00000c91 - 00000c91 [    1] */,
-    0x00654800 /* 00000ca9 - 00000ca9 [    1] */,
-    0x0065a000 /* 00000cb4 - 00000cb4 [    1] */,
-    0x0065d001 /* 00000cba - 00000cbb [    2] */,
-    0x00662800 /* 00000cc5 - 00000cc5 [    1] */,
-    0x00664800 /* 00000cc9 - 00000cc9 [    1] */,
-    0x00667006 /* 00000cce - 00000cd4 [    7] */,
-    0x0066b805 /* 00000cd7 - 00000cdc [    6] */,
-    0x0066f800 /* 00000cdf - 00000cdf [    1] */,
-    0x00672001 /* 00000ce4 - 00000ce5 [    2] */,
-    0x00678000 /* 00000cf0 - 00000cf0 [    1] */,
-    0x0067a00b /* 00000cf4 - 00000cff [   12] */,
-    0x00686800 /* 00000d0d - 00000d0d [    1] */,
-    0x00688800 /* 00000d11 - 00000d11 [    1] */,
-    0x006a2800 /* 00000d45 - 00000d45 [    1] */,
-    0x006a4800 /* 00000d49 - 00000d49 [    1] */,
-    0x006a8003 /* 00000d50 - 00000d53 [    4] */,
-    0x006b2001 /* 00000d64 - 00000d65 [    2] */,
-    0x006c0000 /* 00000d80 - 00000d80 [    1] */,
-    0x006c2000 /* 00000d84 - 00000d84 [    1] */,
-    0x006cb802 /* 00000d97 - 00000d99 [    3] */,
-    0x006d9000 /* 00000db2 - 00000db2 [    1] */,
-    0x006de000 /* 00000dbc - 00000dbc [    1] */,
-    0x006df001 /* 00000dbe - 00000dbf [    2] */,
-    0x006e3802 /* 00000dc7 - 00000dc9 [    3] */,
-    0x006e5803 /* 00000dcb - 00000dce [    4] */,
-    0x006ea800 /* 00000dd5 - 00000dd5 [    1] */,
-    0x006eb800 /* 00000dd7 - 00000dd7 [    1] */,
-    0x006f0005 /* 00000de0 - 00000de5 [    6] */,
-    0x006f8001 /* 00000df0 - 00000df1 [    2] */,
-    0x006fa80b /* 00000df5 - 00000e00 [   12] */,
-    0x0071d803 /* 00000e3b - 00000e3e [    4] */,
-    0x0072e024 /* 00000e5c - 00000e80 [   37] */,
-    0x00741800 /* 00000e83 - 00000e83 [    1] */,
-    0x00742800 /* 00000e85 - 00000e85 [    1] */,
-    0x00745800 /* 00000e8b - 00000e8b [    1] */,
-    0x00752000 /* 00000ea4 - 00000ea4 [    1] */,
-    0x00753000 /* 00000ea6 - 00000ea6 [    1] */,
-    0x0075f001 /* 00000ebe - 00000ebf [    2] */,
-    0x00762800 /* 00000ec5 - 00000ec5 [    1] */,
-    0x00763800 /* 00000ec7 - 00000ec7 [    1] */,
-    0x00767800 /* 00000ecf - 00000ecf [    1] */,
-    0x0076d001 /* 00000eda - 00000edb [    2] */,
-    0x0077001f /* 00000ee0 - 00000eff [   32] */,
-    0x007a4000 /* 00000f48 - 00000f48 [    1] */,
-    0x007b6803 /* 00000f6d - 00000f70 [    4] */,
-    0x007cc000 /* 00000f98 - 00000f98 [    1] */,
-    0x007de800 /* 00000fbd - 00000fbd [    1] */,
-    0x007e6800 /* 00000fcd - 00000fcd [    1] */,
-    0x007ed824 /* 00000fdb - 00000fff [   37] */,
-    0x00863000 /* 000010c6 - 000010c6 [    1] */,
-    0x00864004 /* 000010c8 - 000010cc [    5] */,
-    0x00867001 /* 000010ce - 000010cf [    2] */,
-    0x00924800 /* 00001249 - 00001249 [    1] */,
-    0x00927001 /* 0000124e - 0000124f [    2] */,
-    0x0092b800 /* 00001257 - 00001257 [    1] */,
-    0x0092c800 /* 00001259 - 00001259 [    1] */,
-    0x0092f001 /* 0000125e - 0000125f [    2] */,
-    0x00944800 /* 00001289 - 00001289 [    1] */,
-    0x00947001 /* 0000128e - 0000128f [    2] */,
-    0x00958800 /* 000012b1 - 000012b1 [    1] */,
-    0x0095b001 /* 000012b6 - 000012b7 [    2] */,
-    0x0095f800 /* 000012bf - 000012bf [    1] */,
-    0x00960800 /* 000012c1 - 000012c1 [    1] */,
-    0x00963001 /* 000012c6 - 000012c7 [    2] */,
-    0x0096b800 /* 000012d7 - 000012d7 [    1] */,
-    0x00988800 /* 00001311 - 00001311 [    1] */,
-    0x0098b001 /* 00001316 - 00001317 [    2] */,
-    0x009ad801 /* 0000135b - 0000135c [    2] */,
-    0x009be802 /* 0000137d - 0000137f [    3] */,
-    0x009cd005 /* 0000139a - 0000139f [    6] */,
-    0x009fb001 /* 000013f6 - 000013f7 [    2] */,
-    0x009ff001 /* 000013fe - 000013ff [    2] */,
-    0x00b40000 /* 00001680 - 00001680 [    1] */,
-    0x00b4e802 /* 0000169d - 0000169f [    3] */,
-    0x00b7c806 /* 000016f9 - 000016ff [    7] */,
-    0x00b8b008 /* 00001716 - 0000171e [    9] */,
-    0x00b9b808 /* 00001737 - 0000173f [    9] */,
-    0x00baa00b /* 00001754 - 0000175f [   12] */,
-    0x00bb6800 /* 0000176d - 0000176d [    1] */,
-    0x00bb8800 /* 00001771 - 00001771 [    1] */,
-    0x00bba00b /* 00001774 - 0000177f [   12] */,
-    0x00bef001 /* 000017de - 000017df [    2] */,
-    0x00bf5005 /* 000017ea - 000017ef [    6] */,
-    0x00bfd005 /* 000017fa - 000017ff [    6] */,
-    0x00c07000 /* 0000180e - 0000180e [    1] */,
-    0x00c0d005 /* 0000181a - 0000181f [    6] */,
-    0x00c3c806 /* 00001879 - 0000187f [    7] */,
-    0x00c55804 /* 000018ab - 000018af [    5] */,
-    0x00c7b009 /* 000018f6 - 000018ff [   10] */,
-    0x00c8f800 /* 0000191f - 0000191f [    1] */,
-    0x00c96003 /* 0000192c - 0000192f [    4] */,
-    0x00c9e003 /* 0000193c - 0000193f [    4] */,
-    0x00ca0802 /* 00001941 - 00001943 [    3] */,
-    0x00cb7001 /* 0000196e - 0000196f [    2] */,
-    0x00cba80a /* 00001975 - 0000197f [   11] */,
-    0x00cd6003 /* 000019ac - 000019af [    4] */,
-    0x00ce5005 /* 000019ca - 000019cf [    6] */,
-    0x00ced802 /* 000019db - 000019dd [    3] */,
-    0x00d0e001 /* 00001a1c - 00001a1d [    2] */,
-    0x00d2f800 /* 00001a5f - 00001a5f [    1] */,
-    0x00d3e801 /* 00001a7d - 00001a7e [    2] */,
-    0x00d45005 /* 00001a8a - 00001a8f [    6] */,
-    0x00d4d005 /* 00001a9a - 00001a9f [    6] */,
-    0x00d57001 /* 00001aae - 00001aaf [    2] */,
-    0x00d67830 /* 00001acf - 00001aff [   49] */,
-    0x00da6802 /* 00001b4d - 00001b4f [    3] */,
-    0x00dbf800 /* 00001b7f - 00001b7f [    1] */,
-    0x00dfa007 /* 00001bf4 - 00001bfb [    8] */,
-    0x00e1c002 /* 00001c38 - 00001c3a [    3] */,
-    0x00e25002 /* 00001c4a - 00001c4c [    3] */,
-    0x00e44806 /* 00001c89 - 00001c8f [    7] */,
-    0x00e5d801 /* 00001cbb - 00001cbc [    2] */,
-    0x00e64007 /* 00001cc8 - 00001ccf [    8] */,
-    0x00e7d804 /* 00001cfb - 00001cff [    5] */,
-    0x00f8b001 /* 00001f16 - 00001f17 [    2] */,
-    0x00f8f001 /* 00001f1e - 00001f1f [    2] */,
-    0x00fa3001 /* 00001f46 - 00001f47 [    2] */,
-    0x00fa7001 /* 00001f4e - 00001f4f [    2] */,
-    0x00fac000 /* 00001f58 - 00001f58 [    1] */,
-    0x00fad000 /* 00001f5a - 00001f5a [    1] */,
-    0x00fae000 /* 00001f5c - 00001f5c [    1] */,
-    0x00faf000 /* 00001f5e - 00001f5e [    1] */,
-    0x00fbf001 /* 00001f7e - 00001f7f [    2] */,
-    0x00fda800 /* 00001fb5 - 00001fb5 [    1] */,
-    0x00fe2800 /* 00001fc5 - 00001fc5 [    1] */,
-    0x00fea001 /* 00001fd4 - 00001fd5 [    2] */,
-    0x00fee000 /* 00001fdc - 00001fdc [    1] */,
-    0x00ff8001 /* 00001ff0 - 00001ff1 [    2] */,
-    0x00ffa800 /* 00001ff5 - 00001ff5 [    1] */,
-    0x00fff810 /* 00001fff - 0000200f [   17] */,
-    0x01014007 /* 00002028 - 0000202f [    8] */,
-    0x0102f810 /* 0000205f - 0000206f [   17] */,
-    0x01039001 /* 00002072 - 00002073 [    2] */,
-    0x01047800 /* 0000208f - 0000208f [    1] */,
-    0x0104e802 /* 0000209d - 0000209f [    3] */,
-    0x0106080e /* 000020c1 - 000020cf [   15] */,
-    0x0107880e /* 000020f1 - 000020ff [   15] */,
-    0x010c6003 /* 0000218c - 0000218f [    4] */,
-    0x01213818 /* 00002427 - 0000243f [   25] */,
-    0x01225814 /* 0000244b - 0000245f [   21] */,
-    0x015ba001 /* 00002b74 - 00002b75 [    2] */,
-    0x015cb000 /* 00002b96 - 00002b96 [    1] */,
-    0x0167a004 /* 00002cf4 - 00002cf8 [    5] */,
-    0x01693000 /* 00002d26 - 00002d26 [    1] */,
-    0x01694004 /* 00002d28 - 00002d2c [    5] */,
-    0x01697001 /* 00002d2e - 00002d2f [    2] */,
-    0x016b4006 /* 00002d68 - 00002d6e [    7] */,
-    0x016b880d /* 00002d71 - 00002d7e [   14] */,
-    0x016cb808 /* 00002d97 - 00002d9f [    9] */,
-    0x016d3800 /* 00002da7 - 00002da7 [    1] */,
-    0x016d7800 /* 00002daf - 00002daf [    1] */,
-    0x016db800 /* 00002db7 - 00002db7 [    1] */,
-    0x016df800 /* 00002dbf - 00002dbf [    1] */,
-    0x016e3800 /* 00002dc7 - 00002dc7 [    1] */,
-    0x016e7800 /* 00002dcf - 00002dcf [    1] */,
-    0x016eb800 /* 00002dd7 - 00002dd7 [    1] */,
-    0x016ef800 /* 00002ddf - 00002ddf [    1] */,
-    0x0172f021 /* 00002e5e - 00002e7f [   34] */,
-    0x0174d000 /* 00002e9a - 00002e9a [    1] */,
-    0x0177a00b /* 00002ef4 - 00002eff [   12] */,
-    0x017eb019 /* 00002fd6 - 00002fef [   26] */,
-    0x01800000 /* 00003000 - 00003000 [    1] */,
-    0x01820000 /* 00003040 - 00003040 [    1] */,
-    0x0184b801 /* 00003097 - 00003098 [    2] */,
-    0x01880004 /* 00003100 - 00003104 [    5] */,
-    0x01898000 /* 00003130 - 00003130 [    1] */,
-    0x018c7800 /* 0000318f - 0000318f [    1] */,
-    0x018f200a /* 000031e4 - 000031ee [   11] */,
-    0x0190f800 /* 0000321f - 0000321f [    1] */,
-    0x05246802 /* 0000a48d - 0000a48f [    3] */,
-    0x05263808 /* 0000a4c7 - 0000a4cf [    9] */,
-    0x05316013 /* 0000a62c - 0000a63f [   20] */,
-    0x0537c007 /* 0000a6f8 - 0000a6ff [    8] */,
-    0x053e5804 /* 0000a7cb - 0000a7cf [    5] */,
-    0x053e9000 /* 0000a7d2 - 0000a7d2 [    1] */,
-    0x053ea000 /* 0000a7d4 - 0000a7d4 [    1] */,
-    0x053ed017 /* 0000a7da - 0000a7f1 [   24] */,
-    0x05416802 /* 0000a82d - 0000a82f [    3] */,
-    0x0541d005 /* 0000a83a - 0000a83f [    6] */,
-    0x0543c007 /* 0000a878 - 0000a87f [    8] */,
-    0x05463007 /* 0000a8c6 - 0000a8cd [    8] */,
-    0x0546d005 /* 0000a8da - 0000a8df [    6] */,
-    0x054aa00a /* 0000a954 - 0000a95e [   11] */,
-    0x054be802 /* 0000a97d - 0000a97f [    3] */,
-    0x054e7000 /* 0000a9ce - 0000a9ce [    1] */,
-    0x054ed003 /* 0000a9da - 0000a9dd [    4] */,
-    0x054ff800 /* 0000a9ff - 0000a9ff [    1] */,
-    0x0551b808 /* 0000aa37 - 0000aa3f [    9] */,
-    0x05527001 /* 0000aa4e - 0000aa4f [    2] */,
-    0x0552d001 /* 0000aa5a - 0000aa5b [    2] */,
-    0x05561817 /* 0000aac3 - 0000aada [   24] */,
-    0x0557b809 /* 0000aaf7 - 0000ab00 [   10] */,
-    0x05583801 /* 0000ab07 - 0000ab08 [    2] */,
-    0x05587801 /* 0000ab0f - 0000ab10 [    2] */,
-    0x0558b808 /* 0000ab17 - 0000ab1f [    9] */,
-    0x05593800 /* 0000ab27 - 0000ab27 [    1] */,
-    0x05597800 /* 0000ab2f - 0000ab2f [    1] */,
-    0x055b6003 /* 0000ab6c - 0000ab6f [    4] */,
-    0x055f7001 /* 0000abee - 0000abef [    2] */,
-    0x055fd005 /* 0000abfa - 0000abff [    6] */,
-    0x06bd200b /* 0000d7a4 - 0000d7af [   12] */,
-    0x06be3803 /* 0000d7c7 - 0000d7ca [    4] */,
-    0x06bfe7ff /* 0000d7fc - 0000dffb [ 2048] */,
-    0x06ffe7ff /* 0000dffc - 0000e7fb [ 2048] */,
-    0x073fe7ff /* 0000e7fc - 0000effb [ 2048] */,
-    0x077fe7ff /* 0000effc - 0000f7fb [ 2048] */,
-    0x07bfe103 /* 0000f7fc - 0000f8ff [  260] */,
-    0x07d37001 /* 0000fa6e - 0000fa6f [    2] */,
-    0x07d6d025 /* 0000fada - 0000faff [   38] */,
-    0x07d8380b /* 0000fb07 - 0000fb12 [   12] */,
-    0x07d8c004 /* 0000fb18 - 0000fb1c [    5] */,
-    0x07d9b800 /* 0000fb37 - 0000fb37 [    1] */,
-    0x07d9e800 /* 0000fb3d - 0000fb3d [    1] */,
-    0x07d9f800 /* 0000fb3f - 0000fb3f [    1] */,
-    0x07da1000 /* 0000fb42 - 0000fb42 [    1] */,
-    0x07da2800 /* 0000fb45 - 0000fb45 [    1] */,
-    0x07de180f /* 0000fbc3 - 0000fbd2 [   16] */,
-    0x07ec8001 /* 0000fd90 - 0000fd91 [    2] */,
-    0x07ee4006 /* 0000fdc8 - 0000fdce [    7] */,
-    0x07ee801f /* 0000fdd0 - 0000fdef [   32] */,
-    0x07f0d005 /* 0000fe1a - 0000fe1f [    6] */,
-    0x07f29800 /* 0000fe53 - 0000fe53 [    1] */,
-    0x07f33800 /* 0000fe67 - 0000fe67 [    1] */,
-    0x07f36003 /* 0000fe6c - 0000fe6f [    4] */,
-    0x07f3a800 /* 0000fe75 - 0000fe75 [    1] */,
-    0x07f7e803 /* 0000fefd - 0000ff00 [    4] */,
-    0x07fdf802 /* 0000ffbf - 0000ffc1 [    3] */,
-    0x07fe4001 /* 0000ffc8 - 0000ffc9 [    2] */,
-    0x07fe8001 /* 0000ffd0 - 0000ffd1 [    2] */,
-    0x07fec001 /* 0000ffd8 - 0000ffd9 [    2] */,
-    0x07fee802 /* 0000ffdd - 0000ffdf [    3] */,
-    0x07ff3800 /* 0000ffe7 - 0000ffe7 [    1] */,
-    0x07ff780c /* 0000ffef - 0000fffb [   13] */,
-    0x07fff001 /* 0000fffe - 0000ffff [    2] */,
-    0x08006000 /* 0001000c - 0001000c [    1] */,
-    0x08013800 /* 00010027 - 00010027 [    1] */,
-    0x0801d800 /* 0001003b - 0001003b [    1] */,
-    0x0801f000 /* 0001003e - 0001003e [    1] */,
-    0x08027001 /* 0001004e - 0001004f [    2] */,
-    0x0802f021 /* 0001005e - 0001007f [   34] */,
-    0x0807d804 /* 000100fb - 000100ff [    5] */,
-    0x08081803 /* 00010103 - 00010106 [    4] */,
-    0x0809a002 /* 00010134 - 00010136 [    3] */,
-    0x080c7800 /* 0001018f - 0001018f [    1] */,
-    0x080ce802 /* 0001019d - 0001019f [    3] */,
-    0x080d082e /* 000101a1 - 000101cf [   47] */,
-    0x080ff081 /* 000101fe - 0001027f [  130] */,
-    0x0814e802 /* 0001029d - 0001029f [    3] */,
-    0x0816880e /* 000102d1 - 000102df [   15] */,
-    0x0817e003 /* 000102fc - 000102ff [    4] */,
-    0x08192008 /* 00010324 - 0001032c [    9] */,
-    0x081a5804 /* 0001034b - 0001034f [    5] */,
-    0x081bd804 /* 0001037b - 0001037f [    5] */,
-    0x081cf000 /* 0001039e - 0001039e [    1] */,
-    0x081e2003 /* 000103c4 - 000103c7 [    4] */,
-    0x081eb029 /* 000103d6 - 000103ff [   42] */,
-    0x0824f001 /* 0001049e - 0001049f [    2] */,
-    0x08255005 /* 000104aa - 000104af [    6] */,
-    0x0826a003 /* 0001...
[truncated]

@mordante mordante force-pushed the users/mordante/improves_format_escaping_performance branch from b8264d1 to e133af0 Compare April 12, 2024 16:58
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few comments!

libcxx/include/__format/escaped_output_table.h Outdated Show resolved Hide resolved
libcxx/include/__format/escaped_output_table.h Outdated Show resolved Hide resolved
libcxx/utils/generate_escaped_output_table.py Outdated Show resolved Hide resolved
libcxx/utils/generate_escaped_output_table.py Outdated Show resolved Hide resolved
libcxx/utils/generate_escaped_output_table.py Outdated Show resolved Hide resolved
@mordante mordante force-pushed the users/mordante/improves_format_escaping branch from 1f06034 to 359b961 Compare April 24, 2024 18:37
Base automatically changed from users/mordante/improves_format_escaping to main April 25, 2024 15:16
The previous patch implemented
- P2713R1 Escaping improvements in std::format
- LWG3965 Incorrect example in [format.string.escaped] p3 for formatting of combining characters

These changes were correct, but has a size and performance penalty. This
patch improves the size and performance of the previous patch. The
performance is still worse than before since the lookups may require two
property lookups instead of one before implementing the paper. The changes
give a tighter coupling between the Unicode data and the algorithm.
An additional tests are added to notify about changes in future Unicode
updates.

Before
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           110704 ns       110696 ns         6206
BM_unicode_escaped<char>         101371 ns       101374 ns         6862
BM_cyrillic_escaped<char>         63329 ns        63327 ns        11013
BM_japanese_escaped<char>         41223 ns        41225 ns        16938
BM_emoji_escaped<char>           111022 ns       111021 ns         6304
BM_ascii_escaped<wchar_t>        112441 ns       112443 ns         6231
BM_unicode_escaped<wchar_t>      102776 ns       102779 ns         6813
BM_cyrillic_escaped<wchar_t>      58977 ns        58975 ns        11868
BM_japanese_escaped<wchar_t>      36885 ns        36886 ns        18975
BM_emoji_escaped<wchar_t>        115885 ns       115881 ns         6051

The first change is to manually encode the entire last area and make a
manual exception for the 240 excluded entries. This reduced the table
from 1077 to 729 entries and gave the following benchmark results.
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104777 ns       104776 ns         6550
BM_unicode_escaped<char>          96980 ns        96982 ns         7238
BM_cyrillic_escaped<char>         60254 ns        60251 ns        11670
BM_japanese_escaped<char>         44452 ns        44452 ns        15734
BM_emoji_escaped<char>           104557 ns       104551 ns         6685
BM_ascii_escaped<wchar_t>        107456 ns       107454 ns         6505
BM_unicode_escaped<wchar_t>       96219 ns        96216 ns         7301
BM_cyrillic_escaped<wchar_t>      56921 ns        56904 ns        12288
BM_japanese_escaped<wchar_t>      39530 ns        39529 ns        17492
BM_emoji_escaped<wchar_t>        108494 ns       108496 ns         6408

An entry in the table can only contain 2048 code points. For larger ranges
there are multiple entries split in chunks with a maximum size of 2048
entries. To encode the entire Unicode code point range 21 bits are
required. The manual part starts at 0x323B0 this means all entries in the
table fit in 18 bits. This allows to allocate 3 additional bits for the
range. This allows entries to have 16384 elements. This range always
avoids splitting the range in multiple chunks.

This reduces the number of table elements from 729 to 711 and gives the
following benchmark results.
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_ascii_escaped<char>           104289 ns       104289 ns         6619
BM_unicode_escaped<char>          96682 ns        96681 ns         7215
BM_cyrillic_escaped<char>         59673 ns        59673 ns        11732
BM_japanese_escaped<char>         41983 ns        41982 ns        16646
BM_emoji_escaped<char>           104119 ns       104120 ns         6683
BM_ascii_escaped<wchar_t>        104503 ns       104505 ns         6693
BM_unicode_escaped<wchar_t>       93426 ns        93423 ns         7489
BM_cyrillic_escaped<wchar_t>      54858 ns        54859 ns        12742
BM_japanese_escaped<wchar_t>      36385 ns        36384 ns        19259
BM_emoji_escaped<wchar_t>        105608 ns       105610 ns         6592
@mordante mordante force-pushed the users/mordante/improves_format_escaping_performance branch from e133af0 to 2f51510 Compare April 25, 2024 15:17
@mordante mordante merged commit e3dea5e into main Apr 28, 2024
51 checks passed
@mordante mordante deleted the users/mordante/improves_format_escaping_performance branch April 28, 2024 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants