[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166

ilovepi · 2025-09-22T18:20:44Z

The naive char-by-char lookup performed OK, but we can skip ahead to the
next match, avoiding all the extra hash lookups in the key map. Likely
there is a faster method than this, but its already a 42% win in the
BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude
improvement for BM_Mustache_LargeOutputString.

Benchmark	Before (ns)	After (ns)	Speedup
`StringRendering/Escaped`	29,440,922	16,583,603	~44%
`LargeOutputString`	15,139,251	929,891	~94%
`HugeArrayIteration`	102,148,245	95,943,960	~6%
`PartialsRendering`	308,330,014	303,556,563	~1.6%

Unreported benchmarks, like those for parsing, had no significant change.

ilovepi · 2025-09-22T18:20:58Z

[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166 👈 (View in Graphite)
[llvm] Add benchmarks for Mustache #160164 : 1 other dependent PR (#160165 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-09-22T18:21:20Z

@llvm/pr-subscribers-llvm-support

Author: Paul Kirth (ilovepi)

Changes

The naive char-by-char lookup performed OK, but we can skip ahead to the
next match, avoiding all the extra hash lookups in the key map. Likely
there is a faster method than this, but its already a 42% win in the
BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude
improvement for BM_Mustache_LargeOutputString.

Benchmark Before (ns) After (ns) Speedup

StringRendering/Escaped 29,440,922 16,583,603 ~44%
LargeOutputString 15,139,251 929,891 ~94%
HugeArrayIteration 102,148,245 95,943,960 ~6%
PartialsRendering 308,330,014 303,556,563 ~1.6%

Unreported benchmarks, like those for parsing, had no significant change.

Full diff: https://github.com/llvm/llvm-project/pull/160166.diff

1 Files Affected:

(modified) llvm/lib/Support/Mustache.cpp (+22-8)

diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index c7cebe6b64fae..911fd5ee7fa01 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -428,19 +428,32 @@ class EscapeStringStream : public raw_ostream {
 public:
   explicit EscapeStringStream(llvm::raw_ostream &WrappedStream,
                               EscapeMap &Escape)
-      : Escape(Escape), WrappedStream(WrappedStream) {
+      : Escape(Escape), EscapeChars(Escape.keys().begin(), Escape.keys().end()),
+        WrappedStream(WrappedStream) {
     SetUnbuffered();
   }
 
 protected:
   void write_impl(const char *Ptr, size_t Size) override {
-    llvm::StringRef Data(Ptr, Size);
-    for (char C : Data) {
-      auto It = Escape.find(C);
-      if (It != Escape.end())
-        WrappedStream << It->getSecond();
-      else
-        WrappedStream << C;
+    StringRef Data(Ptr, Size);
+    size_t Start = 0;
+    while (Start < Size) {
+      // Find the next character that needs to be escaped.
+      size_t Next = Data.find_first_of(EscapeChars.str(), Start);
+
+      // If no escapable characters are found, write the rest of the string.
+      if (Next == StringRef::npos) {
+        WrappedStream << Data.substr(Start);
+        return;
+      }
+
+      // Write the chunk of text before the escapable character.
+      if (Next > Start)
+        WrappedStream << Data.substr(Start, Next - Start);
+
+      // Look up and write the escaped version of the character.
+      WrappedStream << Escape[Data[Next]];
+      Start = Next + 1;
     }
   }
 
@@ -448,6 +461,7 @@ class EscapeStringStream : public raw_ostream {
 
 private:
   EscapeMap &Escape;
+  SmallString<8> EscapeChars;
   llvm::raw_ostream &WrappedStream;
 };

nikic

Can't we make EscapeMap an std::array<std::string, 256> instead of DenseMap<char, std::string>? That would make the lookup cheap.

ilovepi · 2025-09-22T23:56:43Z

Can't we make EscapeMap an std::array<std::string, 256> instead of DenseMap<char, std::string>? That would make the lookup cheap.

Yes, that's something I'd like to do as a follow up.

ilovepi · 2025-09-23T01:13:19Z

This patch:

Benchmark	Baseline (ns)	Experiment (ns)	Speedup
`LargeOutputString`	8,926,576	591,254	~93%
`StringRendering/Escaped`	18,196,698	10,280,591	~44%
`DeeplyNestedRendering`	2,799	2,474	~12%
`PartialsRendering`	211,153,502	197,101,139	~7%
`DeepTraversal`	4,412,011	4,148,482	~6%
`HugeArrayIteration`	61,887,053	58,737,900	~5%

std::array<string,256>:

Benchmark	Baseline (ns)	Experiment (ns)	Change
`StringRendering/Escaped`	18,196,698	16,979,453	~7% faster
`PartialsRendering`	211,153,502	198,234,189	~6% faster
`LargeOutputString`	8,926,576	8,423,018	~6% faster
`HugeArrayIteration`	61,887,053	63,989,131	~3% slower

I didn't try combining them. Its not clear how we'd initialize the list of special escape characters in the stream, unless we assume you can't override them. Besides, we do many fewer lookups now, so IDK how worth it it is in practice. The Mustache generation is about 20% faster w/ this patch. That part is only a small fraction of the overall execution time, but it did a make a difference.

ilovepi · 2025-09-23T06:14:10Z

Combined:

Benchmark	Baseline (ns)	Combined (ns)	Change
`LargeOutputString`	8,926,576	595,732	~93% faster
`StringRendering/Escaped`	18,196,698	10,167,501	~44% faster
`SmallTemplateParsing`	3,526	3,965	~12% slower
`PartialsRendering`	211,153,502	258,420,352	~22% slower
`DeepTraversal`	4,412,011	5,847,327	~32% slower
`HugeArrayIteration`	61,887,053	84,320,500	~36% slower

nikic · 2025-09-23T07:47:51Z

In these tests, at which point are you constructing the std::array? Is it inside each EscapeStream or once when Template is constructed?

It's possible that std::array wasn't the right suggestion -- maybe the fact that it stores std::string makes it too large. But if you check what find_first_of actually does

llvm-project/llvm/lib/Support/StringRef.cpp

Lines 240 to 242 in ebcf1bf

    
           std::bitset<1 << CHAR_BIT> CharBits; 
        
           for (char C : Chars) 
        
             CharBits.set((unsigned char)C);

it will just take that string of characters you pass it an convert it into a bitset. We may as well directly create the bitset instead of creating the char string and then converting it to a bit set on every call.

ilovepi · 2025-09-23T17:10:28Z

In these tests, at which point are you constructing the std::array? Is it inside each EscapeStream or once when Template is constructed?

Once when the template is constructed. I used a function w/ a static variable to provide the default escapes, so I think it's just once for the whole program.

It's possible that std::array wasn't the right suggestion -- maybe the fact that it stores std::string makes it too large. But if you check what find_first_of actually does

llvm-project/llvm/lib/Support/StringRef.cpp

Lines 240 to 242 in ebcf1bf

std::bitset<1 << CHAR_BIT> CharBits;

for (char C : Chars)

CharBits.set((unsigned char)C);

it will just take that string of characters you pass it an convert it into a bitset. We may as well directly create the bitset instead of creating the char string and then converting it to a bit set on every call.

Could be. I think the big win is that when we use find_first_of we pass a big stringref to the output stream instead of passing in one string at a time. There's no copy, and the number of iterations where we write to the stream is reduced.

nikic · 2025-09-25T21:42:23Z

Could be. I think the big win is that when we use find_first_of we pass a big stringref to the output stream instead of passing in one string at a time. There's no copy, and the number of iterations where we write to the stream is reduced.

Oh, I see. I thought the bottleneck here was the escape lookup, not the write to the stream. If the stream is the slow bit, would it make sense to write everything into a SmallString first and then write the full string to the stream?

The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. Benchmark Before (ns) After (ns) Speedup ------------------------- ----------- ----------- ------- StringRendering/Escaped 29,440,922 16,583,603 ~44% LargeOutputString 15,139,251 929,891 ~94% HugeArrayIteration 102,148,245 95,943,960 ~6% PartialsRendering 308,330,014 303,556,563 ~1.6% Unreported benchmarks, like those for parsing, had no significant change.

ilovepi · 2025-09-26T00:44:25Z

Oh, I see. I thought the bottleneck here was the escape lookup, not the write to the stream. If the stream is the slow bit, would it make sense to write everything into a SmallString first and then write the full string to the stream?

hmm, I think its sort of a combo. in the char-by-char case, we do a lookup (which may be pretty fast w/ the bitset or std::array) and then write a string into the stream. That's not slow, but if we do it char by char, there's just a bunch of overhead. find_first_of() is suboptimal in that its recomputing the bitset, but it does let us nicely bound the stringref to pass to the stream, which I assume is just the one copy from src -> dst. If we have an escape char, we put that in the stream, and continue.

If I use a SmallString in the same way, I'll grow it as many times as I'd write escapes out. It seems more direct/efficient to just write it to the stream at that point, but 🤷 that's just my intuition.

I guess we'd save on bitset creation if I more or less inlined find_first_of, but 🤷 , it seems way nicer to just use the StringRef API and not worry too much. We've sped up this bit quite a lot, and I don't think its going to be a bottleneck anymore. I have a separate stack of other Mustache improvements that deal w/ the lack of spec compliance, so it may be worth revisiting the perf issues and any new regressions once that's done. I know a few of the things I did to make the implementation more correct, also had some performance implications, like removing redundant parsing, and multi-pass algorithms from the original naive implementation.

nikic

LGTM. Thanks for the detailed explanations and experiments!

…lvm#160166) The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. | Benchmark | Before (ns) | After (ns) | Speedup | | :--- | ---: | ---: | ---: | | `StringRendering/Escaped` | 29,440,922 | 16,583,603 | ~44% | | `LargeOutputString` | 15,139,251 | 929,891 | ~94% | | `HugeArrayIteration` | 102,148,245 | 95,943,960 | ~6% | | `PartialsRendering` | 308,330,014 | 303,556,563 | ~1.6% | Unreported benchmarks, like those for parsing, had no significant change.

ilovepi requested review from evelez7, nikic and petrhosek September 22, 2025 18:20

llvmbot added the llvm:support label Sep 22, 2025

ilovepi mentioned this pull request Sep 22, 2025

[llvm] Add benchmarks for Mustache #160164

Merged

ilovepi mentioned this pull request Sep 22, 2025

[llvm][mustache] Specialize delimiter search #160165

Closed

nikic reviewed Sep 22, 2025

View reviewed changes

ilovepi changed the base branch from users/ilovepi/mustache-delimiter-find to users/ilovepi/mustache-bench September 23, 2025 02:15

ilovepi force-pushed the users/ilovepi/mustache-escapestream-opt branch from 29e37be to 632536e Compare September 23, 2025 02:16

ilovepi force-pushed the users/ilovepi/mustache-bench branch from a34af38 to 17b25b0 Compare September 23, 2025 02:16

petrhosek approved these changes Sep 23, 2025

View reviewed changes

ilovepi force-pushed the users/ilovepi/mustache-bench branch from 17b25b0 to 4404c23 Compare September 25, 2025 21:37

ilovepi force-pushed the users/ilovepi/mustache-escapestream-opt branch from 632536e to bba1a54 Compare September 25, 2025 21:37

ilovepi force-pushed the users/ilovepi/mustache-bench branch 9 times, most recently from 6c45ea6 to 1f83764 Compare September 25, 2025 23:17

ilovepi force-pushed the users/ilovepi/mustache-bench branch 4 times, most recently from 2d7f1b9 to 47a5f24 Compare September 25, 2025 23:36

Base automatically changed from users/ilovepi/mustache-bench to main September 26, 2025 00:09

ilovepi force-pushed the users/ilovepi/mustache-escapestream-opt branch from bba1a54 to 67509f6 Compare September 26, 2025 00:35

nikic approved these changes Sep 26, 2025

View reviewed changes

ilovepi merged commit f9065fc into main Sep 26, 2025
7 of 9 checks passed

ilovepi deleted the users/ilovepi/mustache-escapestream-opt branch September 26, 2025 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166

[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166

ilovepi commented Sep 22, 2025 •

edited

Loading

Uh oh!

ilovepi commented Sep 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 22, 2025

Uh oh!

nikic left a comment

Uh oh!

ilovepi commented Sep 22, 2025

Uh oh!

ilovepi commented Sep 23, 2025

Uh oh!

ilovepi commented Sep 23, 2025

Uh oh!

nikic commented Sep 23, 2025

Uh oh!

ilovepi commented Sep 23, 2025 •

edited

Loading

Uh oh!

nikic commented Sep 25, 2025

Uh oh!

ilovepi commented Sep 26, 2025

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

Uh oh!

[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166

[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166

Conversation

ilovepi commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilovepi commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 22, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

ilovepi commented Sep 22, 2025

Uh oh!

ilovepi commented Sep 23, 2025

Uh oh!

ilovepi commented Sep 23, 2025

Uh oh!

nikic commented Sep 23, 2025

Uh oh!

ilovepi commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Sep 25, 2025

Uh oh!

ilovepi commented Sep 26, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ilovepi commented Sep 22, 2025 •

edited

Loading

ilovepi commented Sep 22, 2025 •

edited

Loading

ilovepi commented Sep 23, 2025 •

edited

Loading