-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream #160166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-llvm-support Author: Paul Kirth (ilovepi) ChangesThe naive char-by-char lookup performed OK, but we can skip ahead to the Benchmark Before (ns) After (ns) Speedup StringRendering/Escaped 29,440,922 16,583,603 ~44% Unreported benchmarks, like those for parsing, had no significant change. Full diff: https://github.com/llvm/llvm-project/pull/160166.diff 1 Files Affected:
diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index c7cebe6b64fae..911fd5ee7fa01 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -428,19 +428,32 @@ class EscapeStringStream : public raw_ostream {
public:
explicit EscapeStringStream(llvm::raw_ostream &WrappedStream,
EscapeMap &Escape)
- : Escape(Escape), WrappedStream(WrappedStream) {
+ : Escape(Escape), EscapeChars(Escape.keys().begin(), Escape.keys().end()),
+ WrappedStream(WrappedStream) {
SetUnbuffered();
}
protected:
void write_impl(const char *Ptr, size_t Size) override {
- llvm::StringRef Data(Ptr, Size);
- for (char C : Data) {
- auto It = Escape.find(C);
- if (It != Escape.end())
- WrappedStream << It->getSecond();
- else
- WrappedStream << C;
+ StringRef Data(Ptr, Size);
+ size_t Start = 0;
+ while (Start < Size) {
+ // Find the next character that needs to be escaped.
+ size_t Next = Data.find_first_of(EscapeChars.str(), Start);
+
+ // If no escapable characters are found, write the rest of the string.
+ if (Next == StringRef::npos) {
+ WrappedStream << Data.substr(Start);
+ return;
+ }
+
+ // Write the chunk of text before the escapable character.
+ if (Next > Start)
+ WrappedStream << Data.substr(Start, Next - Start);
+
+ // Look up and write the escaped version of the character.
+ WrappedStream << Escape[Data[Next]];
+ Start = Next + 1;
}
}
@@ -448,6 +461,7 @@ class EscapeStringStream : public raw_ostream {
private:
EscapeMap &Escape;
+ SmallString<8> EscapeChars;
llvm::raw_ostream &WrappedStream;
};
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we make EscapeMap an std::array<std::string, 256>
instead of DenseMap<char, std::string>
? That would make the lookup cheap.
Yes, that's something I'd like to do as a follow up. |
This patch:
std::array<string,256>:
I didn't try combining them. Its not clear how we'd initialize the list of special escape characters in the stream, unless we assume you can't override them. Besides, we do many fewer lookups now, so IDK how worth it it is in practice. The Mustache generation is about 20% faster w/ this patch. That part is only a small fraction of the overall execution time, but it did a make a difference. |
29e37be
to
632536e
Compare
a34af38
to
17b25b0
Compare
Combined:
|
In these tests, at which point are you constructing the std::array? Is it inside each EscapeStream or once when Template is constructed? It's possible that llvm-project/llvm/lib/Support/StringRef.cpp Lines 240 to 242 in ebcf1bf
|
Once when the template is constructed. I used a function w/ a static variable to provide the default escapes, so I think it's just once for the whole program.
Could be. I think the big win is that when we use find_first_of we pass a big stringref to the output stream instead of passing in one string at a time. There's no copy, and the number of iterations where we write to the stream is reduced. |
17b25b0
to
4404c23
Compare
632536e
to
bba1a54
Compare
Oh, I see. I thought the bottleneck here was the escape lookup, not the write to the stream. If the stream is the slow bit, would it make sense to write everything into a SmallString first and then write the full string to the stream? |
6c45ea6
to
1f83764
Compare
2d7f1b9
to
47a5f24
Compare
The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. Benchmark Before (ns) After (ns) Speedup ------------------------- ----------- ----------- ------- StringRendering/Escaped 29,440,922 16,583,603 ~44% LargeOutputString 15,139,251 929,891 ~94% HugeArrayIteration 102,148,245 95,943,960 ~6% PartialsRendering 308,330,014 303,556,563 ~1.6% Unreported benchmarks, like those for parsing, had no significant change.
bba1a54
to
67509f6
Compare
hmm, I think its sort of a combo. in the char-by-char case, we do a lookup (which may be pretty fast w/ the bitset or std::array) and then write a string into the stream. That's not slow, but if we do it char by char, there's just a bunch of overhead. If I use a SmallString in the same way, I'll grow it as many times as I'd write escapes out. It seems more direct/efficient to just write it to the stream at that point, but 🤷 that's just my intuition. I guess we'd save on bitset creation if I more or less inlined find_first_of, but 🤷 , it seems way nicer to just use the StringRef API and not worry too much. We've sped up this bit quite a lot, and I don't think its going to be a bottleneck anymore. I have a separate stack of other Mustache improvements that deal w/ the lack of spec compliance, so it may be worth revisiting the perf issues and any new regressions once that's done. I know a few of the things I did to make the implementation more correct, also had some performance implications, like removing redundant parsing, and multi-pass algorithms from the original naive implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the detailed explanations and experiments!
…lvm#160166) The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. | Benchmark | Before (ns) | After (ns) | Speedup | | :--- | ---: | ---: | ---: | | `StringRendering/Escaped` | 29,440,922 | 16,583,603 | ~44% | | `LargeOutputString` | 15,139,251 | 929,891 | ~94% | | `HugeArrayIteration` | 102,148,245 | 95,943,960 | ~6% | | `PartialsRendering` | 308,330,014 | 303,556,563 | ~1.6% | Unreported benchmarks, like those for parsing, had no significant change.
…lvm#160166) The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. | Benchmark | Before (ns) | After (ns) | Speedup | | :--- | ---: | ---: | ---: | | `StringRendering/Escaped` | 29,440,922 | 16,583,603 | ~44% | | `LargeOutputString` | 15,139,251 | 929,891 | ~94% | | `HugeArrayIteration` | 102,148,245 | 95,943,960 | ~6% | | `PartialsRendering` | 308,330,014 | 303,556,563 | ~1.6% | Unreported benchmarks, like those for parsing, had no significant change.
The naive char-by-char lookup performed OK, but we can skip ahead to the
next match, avoiding all the extra hash lookups in the key map. Likely
there is a faster method than this, but its already a 42% win in the
BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude
improvement for BM_Mustache_LargeOutputString.
StringRendering/Escaped
LargeOutputString
HugeArrayIteration
PartialsRendering
Unreported benchmarks, like those for parsing, had no significant change.