-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Compiled" static header maps instead of big trie #33932
Changes from 6 commits
9b0d62c
2e83788
be5dd61
611e1dd
150040e
90ba622
e055146
dd3a9fa
1990767
5df1270
92625f3
23042de
0808002
b293819
9eccaa9
a3f8919
d1a8ed8
7ece7eb
57b9055
facdef8
941d3fa
d75190d
30bab4b
5469839
dc91907
4348681
9bd6190
9ee31c0
405ac4d
a743ba2
c7458f4
c4858ec
56c5e4e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
#pragma once | ||
|
||
#include <algorithm> | ||
#include <string> | ||
#include <vector> | ||
|
||
#include "absl/strings/string_view.h" | ||
|
||
namespace Envoy { | ||
|
||
/** | ||
* This is a specialized structure intended for static header maps, but | ||
* there may be other use cases. | ||
* The structure is: | ||
* 1. a length-based lookup table so only keys the same length as the | ||
* target key are considered. | ||
* 2. a trie that branches on the "most divisions" position of the key. | ||
* | ||
* For example, if we consider the case where the set of headers is | ||
* `x-prefix-banana` | ||
* `x-prefix-babana` | ||
* `x-prefix-apple` | ||
* `x-prefix-pineapple` | ||
* `x-prefix-barana` | ||
* `x-prefix-banaka` | ||
* | ||
* A standard front-first trie looking for `x-prefix-banana` would walk | ||
* 7 nodes through the tree, first for `x`, then for `-`, etc. | ||
* | ||
* This structure first jumps to matching length, eliminating in this | ||
* example case apple and pineapple. | ||
* Then the "best split" node is on | ||
* `x-prefix-banana` | ||
* ^ | ||
* so the first node has 3 non-miss branches, n, b and r for that position. | ||
* Down that n branch, the "best split" is on | ||
* `x-prefix-banana` | ||
* ^ | ||
* which has two branches, n or k. | ||
* Down the n branch is the leaf node (only `x-prefix-banana` remains) - at | ||
* this point a regular string-compare checks if the key is an exact match | ||
* for the string node. | ||
*/ | ||
ravenblackx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
template <class Value> class CompiledStringMap { | ||
using FindFn = std::function<Value(const absl::string_view&)>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. standard best practice is to pass string_view as value rather than as a ref. I think it's really close but this was the decision made by smart people some time ago :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a comment about references being faster for this use-case. |
||
|
||
public: | ||
using KV = std::pair<std::string, Value>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Scanning through the code I'm wondering if there' s some string copying going on here that we might be able to avoid. Can we have a string-store held in the class, and have the KV that we manipulate in various sub-vectors have either a string_view or a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made the input a vector of string_views which eliminates the string copying during compile() except the copy taken for a leaf node (which is kept in case at some point the table wants to be initialized from a transient source rather than global singleton-constant-likes). |
||
/** | ||
* Returns the value with a matching key, or the default value | ||
* (typically nullptr) if the key was not present. | ||
* @param key the key to look up. | ||
*/ | ||
Value find(const absl::string_view& key) const { | ||
if (key.size() >= table_.size() || table_[key.size()] == nullptr) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we avoid the duplicate lookup of I think for the compiler to optimize it away it would have to see inside table_.size() to ensure it could not mutate key. That seems obvious to humans but idk how smart compilers are with that. I think if you can't see inside table_.size() impl (ie not inlined) then you can't know that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried taking it to a local variable, but that has to be after key.size() being too large has already been checked, which makes it two if-branches, which from benchmarking appears to make both the hit paths ~5% slower. It's not like these are actual table lookups, they're just offsets, so even if it wasn't optimized away it might still be faster than taking an additional copy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why does it have to be after the first check? Can't you have the first line of the function be the assignment to a temp? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
return {}; | ||
} | ||
return table_[key.size()](key); | ||
}; | ||
/** | ||
* Construct the lookup table. This is a somewhat slow multi-pass | ||
* operation - using this structure is not recommended unless the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this note be in the class comment? For a moment I thought you were saying that you could use this class, but not call 'compile' and instead fall back to an interpret-mode, but I don't think that's what you are saying :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved. |
||
* table is initialize-once, use-many. | ||
* @param initial a vector of key->value pairs. | ||
*/ | ||
void compile(std::vector<KV> initial) { | ||
jmarantz marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is the parameter called
ravenblackx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (initial.empty()) { | ||
return; | ||
} | ||
size_t longest = 0; | ||
for (const KV& pair : initial) { | ||
longest = std::max(pair.first.size(), longest); | ||
} | ||
table_.resize(longest + 1); | ||
std::sort(initial.begin(), initial.end(), | ||
[](const KV& a, const KV& b) { return a.first.size() < b.first.size(); }); | ||
auto it = initial.begin(); | ||
for (size_t i = 0; i <= longest; i++) { | ||
auto start = it; | ||
while (it != initial.end() && it->first.size() == i) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am guessing we don't consider compiling to be important to be fast. But this nested loop seems like it might be slower than needed. How about this for an algo:
I realize this creates a data structure which doesn't map the ones you are using, but it seems like a good way to go get the elements organized by size. After that I'm not sure how this would fit into your algo. I'm having a little trouble following the algorithm also. What causes Maybe some more comment would help, or breaking down your compile flow into a few helper methods that might be more self-documenting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was already not really a nested loop, in that the two indexes were moving "in parallel" (i.e. it wasn't n*m, it was n+m), but I've changed it to make that a bit clearer. Though still not much clearer since the std::find_if still looks like it might be an m-length lookup. But it's not! |
||
it++; | ||
} | ||
if (it != start) { | ||
std::vector<KV> node_contents; | ||
node_contents.reserve(it - start); | ||
std::copy(start, it, std::back_inserter(node_contents)); | ||
table_[i] = createEqualLengthNode(node_contents); | ||
} | ||
} | ||
} | ||
|
||
private: | ||
static FindFn createEqualLengthNode(std::vector<KV> node_contents) { | ||
if (node_contents.size() == 1) { | ||
return [pair = node_contents[0]](const absl::string_view& key) -> Value { | ||
if (key != pair.first) { | ||
return {}; | ||
} | ||
return pair.second; | ||
}; | ||
} | ||
struct IndexSplitInfo { | ||
uint8_t index, min, max, count; | ||
ravenblackx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} best{0, 0, 0, 0}; | ||
for (size_t i = 0; i < node_contents[0].first.size(); i++) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would this be faster?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it would (it's just reading a value from an address), and we don't care about the microoptimization performance here anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to make this code more understandable consider naming: consider renaming: |
||
std::array<bool, 256> hits{}; | ||
IndexSplitInfo info{static_cast<uint8_t>(i), 255, 0, 0}; | ||
for (size_t j = 0; j < node_contents.size(); j++) { | ||
uint8_t v = node_contents[j].first[i]; | ||
if (!hits[v]) { | ||
hits[v] = true; | ||
info.count++; | ||
info.min = std::min(v, info.min); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if it would be faster to write this as:
It feels to me like what you have is a little more readable, but may have an extra branch. And it will be always executing two writes into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shouldn't really care about microoptimizing the speed of any function that isn't find(). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. I was having trouble figuring out what's find() and what's not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully that's a lot clearer now that there's no lambdas. Now only |
||
info.max = std::max(v, info.max); | ||
} | ||
} | ||
if (info.count > best.count) { | ||
best = info; | ||
} | ||
} | ||
std::vector<FindFn> nodes; | ||
nodes.resize(best.max - best.min + 1); | ||
std::sort(node_contents.begin(), node_contents.end(), [&best](const KV& a, const KV& b) { | ||
return a.first[best.index] < b.first[best.index]; | ||
}); | ||
auto it = node_contents.begin(); | ||
for (int i = best.min; i <= best.max; i++) { | ||
auto start = it; | ||
while (it != node_contents.end() && it->first[best.index] == i) { | ||
it++; | ||
} | ||
if (it != start) { | ||
// Optimization was tried here, std::array<KV, 256> rather than | ||
// a smaller-range vector with bounds, to keep locality and reduce | ||
// comparisons. It didn't help. | ||
std::vector<KV> next_contents; | ||
next_contents.reserve(it - start); | ||
std::copy(start, it, std::back_inserter(next_contents)); | ||
nodes[i - best.min] = createEqualLengthNode(next_contents); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this recursive lambda generation looks really cool but I'm wondering if it's the most efficient thing we can do. As a thought experiment I'm wondering what the code would look like if you had to write this in C. I assume it would be possilbe (if a little more verbose). Would it be faster? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I was doing it in C it would be almost the same - I don't know how much overhead there is in std::function but for the C structure of this I'd be using something like a struct containing a function pointer and a union of the captured data that each of the node-types takes, and that function pointer would take a pointer to that same structure... which seems like it's probably at least about the same as what a lambda is doing. It might be possible to optimize further with more manual constructs, but I think a 60% speedup is probably good enough, and it could have another pass later if someone wants to try to do better. |
||
} | ||
} | ||
return [nodes = std::move(nodes), min = best.min, | ||
index = best.index](const absl::string_view& key) -> Value { | ||
uint8_t k = static_cast<uint8_t>(key[index]); | ||
// Possible optimization was tried here, populating empty nodes with | ||
// a function that returns {} to reduce branching vs checking for null | ||
// nodes. Checking for null nodes benchmarked faster. | ||
if (k < min || k >= min + nodes.size() || nodes[k - min] == nullptr) { | ||
return {}; | ||
} | ||
return nodes[k - min](key); | ||
}; | ||
} | ||
std::vector<FindFn> table_; | ||
}; | ||
|
||
} // namespace Envoy |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#include "source/common/common/compiled_string_map.h" | ||
|
||
#include "gmock/gmock.h" | ||
#include "gtest/gtest.h" | ||
|
||
namespace Envoy { | ||
|
||
using testing::IsNull; | ||
|
||
TEST(CompiledStringMapTest, FindsEntriesCorrectly) { | ||
CompiledStringMap<const char*> map; | ||
map.compile({ | ||
{"key-1", "value-1"}, | ||
{"key-2", "value-2"}, | ||
{"longer-key", "value-3"}, | ||
{"bonger-key", "value-4"}, | ||
{"bonger-bey", "value-5"}, | ||
{"only-key-of-this-length", "value-6"}, | ||
}); | ||
EXPECT_EQ(map.find("key-1"), "value-1"); | ||
EXPECT_EQ(map.find("key-2"), "value-2"); | ||
EXPECT_THAT(map.find("key-0"), IsNull()); | ||
EXPECT_THAT(map.find("key-3"), IsNull()); | ||
EXPECT_EQ(map.find("longer-key"), "value-3"); | ||
EXPECT_EQ(map.find("bonger-key"), "value-4"); | ||
EXPECT_EQ(map.find("bonger-bey"), "value-5"); | ||
EXPECT_EQ(map.find("only-key-of-this-length"), "value-6"); | ||
EXPECT_THAT(map.find("songer-key"), IsNull()); | ||
EXPECT_THAT(map.find("absent-length-key"), IsNull()); | ||
} | ||
|
||
TEST(CompiledStringMapTest, EmptyMapReturnsNull) { | ||
CompiledStringMap<const char*> map; | ||
map.compile({}); | ||
EXPECT_THAT(map.find("key-1"), IsNull()); | ||
} | ||
|
||
} // namespace Envoy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the "best split" can be anywhere in the string. I think modifying the example to the following, will clarify how this should work.
then the best split would've been in the character before-last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be anywhere in the string, only at an index where there's a difference. The example is specifically showing that 3 branches would be "best" over 2 branches. I don't understand what your example clarifies that the existing example doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was looking at it from a "trie" perspective.
When I first read the comment, I thought that in each branch, the structure only keeps the "suffix" (after the branch point). I think that updating the example to a case where the branching happens due to a "best-split" that is not the first split, will reduce that confusion to future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the md file explanation covers this sufficiently (though it doesn't do the second split earlier in the string).
If you'd like me to rearrange the strings in the new example so that it does the "last" index split first and the earlier one second I can do that, but I think the diagram having numbers in it probably makes it clear enough. (And late first would be just as potentially confusing as early first, and having three sequential splits would be getting so long as to be harder to read...)