-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TrieLookupTable destruction non-recursive #34783
Conversation
Signed-off-by: Raven Black <ravenblack@dropbox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Left one a high-level question.
/assign @jmarantz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick improvement!
source/common/common/utility.h
Outdated
for (uint32_t i = 0; i < flat_tree.size(); i++) { | ||
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) { | ||
if (child != nullptr) { | ||
flat_tree.push_back(std::move(child)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider maintaining an additional total node counter during add
, and then reserve the flat_tree
vector to avoid multiple reallocation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered doing a reserve, didn't seem worth it since the destruction of these things is infrequent. The increased complexity is not worth the performance saving IMO. If someone else wants to make that change later feel free, but I didn't even want to make this change at all, I certainly don't want to make it more complicated than it needs to be to resolve someone else's weird test scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, and agreed this is kind of edge cases that can cause problems and pre-existing, but it not weird and a key can be large so we want to prevent it as much as possible. thanks for the improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just drop a TODO to do a counting pass for reserving if this shows up in a flame-graph in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you can do this destruction explicitly with a stack whose size will be proportional to the depth of the recursion, rather than a vector whose size will be that of the whole tree. That would be a better TODO than just precalculating the amount to reserve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait till Boteng reviews before doing a pass
@ravenblackx I think the original code was buggy also; we shouldn't be recursing through so many levels. Envoy might need to run in an environment with smaller stacks than our test machines as well, so it's good we are sorting this out.
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Yes, that's why I'm disgruntled for this to be operating on a "fix our external test or we'll revert your change even though the actual problem was pre-existing" basis. Feels like a certain company is exerting undue power here - if someone else's PR broke one of my external-to-envoy tests based on a problem the PR didn't introduce, I'm pretty sure I wouldn't be able to threaten to revert their PR if they don't immediately address a pre-existing problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, defer to @jmarantz for another pass.
source/common/common/utility.h
Outdated
for (uint32_t i = 0; i < flat_tree.size(); i++) { | ||
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) { | ||
if (child != nullptr) { | ||
flat_tree.push_back(std::move(child)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, and agreed this is kind of edge cases that can cause problems and pre-existing, but it not weird and a key can be large so we want to prevent it as much as possible. thanks for the improvement!
source/common/common/utility.h
Outdated
for (uint32_t i = 0; i < flat_tree.size(); i++) { | ||
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) { | ||
if (child != nullptr) { | ||
flat_tree.push_back(std::move(child)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just drop a TODO to do a counting pass for reserving if this shows up in a flame-graph in the future?
@RyanTheOptimist for senior approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
@@ -709,7 +707,7 @@ template <class Value> class TrieLookupTable { | |||
children_[child_index - min_child_] = std::move(node); | |||
} | |||
|
|||
private: | |||
Value value_{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide, members should be private. Can we use friend
or accessors to expose the innards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it a struct instead. Seems daft to have a private type need to friend its parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According the to style guide, structs should be passive objects which simply carry data. They should not enforce invariants, for example. I think this is definitely a class, so friend
or accessors would work nicely. (I wonder if instead of exposing the innards to the parent, if we could perhaps do more work in this class itself. But that's likely outside the scope of this PR.)
https://google.github.io/styleguide/cppguide.html#Structs_vs._Classes
if (child != nullptr) { | ||
flat_tree.push_back(std::move(child)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, once we finish moving flat_tree[i]->children_
into flat_tree
we are finished with flat_tree[i]
, right? At that point we could nuke flat_tree[i]
. As such it seems like flat_tree
is conceptually split into two parts:
- the head of the tree which contains nodes that can be deleted.
- the tail of the tree with contains nodes that need to be processed.
Is that right? If so, I think I would suggest a slightly different approach. Instead of putting every node into a new std::vector flat_tree
what if we instead used something like std::deque nodes_to_process
. Then we could do something like:
// To avoid stack overflow on recursive destruction if the tree is very deep,
// destroy the element iteratively.
std::deque<std::unique_ptr<TrieNode>> nodes_to_process;
for (std::unique_ptr<TrieNode>& child : root_.children_) {
if (child != nullptr) {
nodes_to_process.push_back(std::move(child));
}
}
while (!nodes_to_process.empty()) {
std::unique_ptr<TrieNode> node = nodes_to_process.pop_front();
for (std::unique_ptr<TrieNode>& child : node) {
if (child != nullptr) {
nodes_to_process.push_back(std::move(child));
}
}
}
I think this is both a bit more clear and a bit more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A deque would perform significantly worse than a vector, though it would save on flattening the entire tree. Went with jmarantz's superior suggestion of a stack - since we don't really care about the order of deletion, consuming from the back provides all the advantages of the deque without giving up the performance advantage of the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a deque perform significantly worse than a vector? It uses contiguous memory so it should be very performant? That being said, a stack is also fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::deque is surprisingly poor in performance in my distant-past experience, at least in some dimensions. YMMV and my measurements and impressions are more than 10 years old so maybe not to be trusted :)
Having said that, a stack is constructed using a deque by default, so I was thinking of using a vector, but using it as a stack :)
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Maybe prefer #34849 instead since it fixes the stack overflow destructor and further improves performance. |
That one looks good; should we submit this first though while the other gets reviewed? Or close this PR? |
We should close this PR if we can get agreement that that one is preferred. |
Actually, yeah, let's just close this one, because I don't want to make |
Commit Message: Another refactor of TrieLookupTable Additional Description: The previous refactor magnified the pre-existing problem of a recursive destructor. PR #34783 is being annoying in review, so here's an alternative that simplifies it, putting it in a flat structure while the trie is being filled, so there's no need for a special destructor, and also further improves performance. In addition to changing the implementation again, this PR also moves `TrieLookupTable` out of `utility.h` and into its own separate header and library, since it's used in relatively few places. Risk Level: Small risk, testing is pretty thorough. Testing: Passes all the existing tests, plus the additional new test for not inducing a stack overflow. Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Benchmarks (note that RequestHeaders and ResponseHeaders are no longer actually relevant since the static headers no longer use this trie.) `bazel run -c opt test/common/common:trie_lookup_table_speed_test -- --benchmark_repetitions=10 --benchmark_out_format=csv --benchmark_out=somefile.csv` Benchmark | MeanBefore | MeanAfter | Difference (>100% means before was faster) -- | -- | -- | -- bmTrieLookupsRequestHeaders | 45.5528ns | 50.5076ns | 111% bmTrieLookupsResponseHeaders | 41.4313ns | 47.0212ns | 113% bmTrieLookups/10/Mixed | 303.578ns | 253.114ns | 83% bmTrieLookups/100/Mixed | 351.339ns | 260.583ns | 74% bmTrieLookups/1000/Mixed | 576.878ns | 334.986ns | 58% bmTrieLookups/10000/Mixed | 768.84ns | 518.188ns | 67% bmTrieLookups/10/8 | 12.7108ns | 13.8114ns | 109% bmTrieLookups/100/8 | 17.8588ns | 14.103ns | 79% bmTrieLookups/1000/8 | 29.3718ns | 24.4657ns | 83% bmTrieLookups/10000/8 | 34.1951ns | 30.2386ns | 88% bmTrieLookups/10/128 | 1018.77ns | 467.067ns | 46% bmTrieLookups/100/128 | 749.413ns | 507.53ns | 68% bmTrieLookups/1000/128 | 1083.53ns | 658.86ns | 61% bmTrieLookups/10000/128 | 1319.94ns | 848.408ns | 64% --------- Signed-off-by: Raven Black <ravenblack@dropbox.com>
Commit Message: Make TrieLookupTable destruction non-recursive
Additional Description: Apparently someone might feed a 20000-character long key into this trie, for example in a fuzz test, but never a 40000-character string because that would be ridiculous, so I have to do this or we'd have to revert back to the version that was worse in every other way, because that absolutely makes sense, and we definitely should not have someone who is in charge of the ridiculous test fix it.