Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TrieLookupTable destruction non-recursive #34783

Closed
wants to merge 5 commits into from

Conversation

ravenblackx
Copy link
Contributor

Commit Message: Make TrieLookupTable destruction non-recursive
Additional Description: Apparently someone might feed a 20000-character long key into this trie, for example in a fuzz test, but never a 40000-character string because that would be ridiculous, so I have to do this or we'd have to revert back to the version that was worse in every other way, because that absolutely makes sense, and we definitely should not have someone who is in charge of the ridiculous test fix it.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Left one a high-level question.

source/common/common/utility.h Show resolved Hide resolved
source/common/common/utility.h Outdated Show resolved Hide resolved
@adisuissa
Copy link
Contributor

/assign @jmarantz

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick improvement!

source/common/common/utility.h Outdated Show resolved Hide resolved
for (uint32_t i = 0; i < flat_tree.size(); i++) {
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) {
if (child != nullptr) {
flat_tree.push_back(std::move(child));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider maintaining an additional total node counter during add, and then reserve the flat_tree vector to avoid multiple reallocation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered doing a reserve, didn't seem worth it since the destruction of these things is infrequent. The increased complexity is not worth the performance saving IMO. If someone else wants to make that change later feel free, but I didn't even want to make this change at all, I certainly don't want to make it more complicated than it needs to be to resolve someone else's weird test scenario.

Copy link
Member

@botengyao botengyao Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, and agreed this is kind of edge cases that can cause problems and pre-existing, but it not weird and a key can be large so we want to prevent it as much as possible. thanks for the improvement!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just drop a TODO to do a counting pass for reserving if this shows up in a flame-graph in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you can do this destruction explicitly with a stack whose size will be proportional to the depth of the recursion, rather than a vector whose size will be that of the whole tree. That would be a better TODO than just precalculating the amount to reserve.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait till Boteng reviews before doing a pass

@ravenblackx I think the original code was buggy also; we shouldn't be recursing through so many levels. Envoy might need to run in an environment with smaller stacks than our test machines as well, so it's good we are sorting this out.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx
Copy link
Contributor Author

@ravenblackx I think the original code was buggy also; we shouldn't be recursing through so many levels. Envoy might need to run in an environment with smaller stacks than our test machines as well, so it's good we are sorting this out.

Yes, that's why I'm disgruntled for this to be operating on a "fix our external test or we'll revert your change even though the actual problem was pre-existing" basis. Feels like a certain company is exerting undue power here - if someone else's PR broke one of my external-to-envoy tests based on a problem the PR didn't introduce, I'm pretty sure I wouldn't be able to threaten to revert their PR if they don't immediately address a pre-existing problem.

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, defer to @jmarantz for another pass.

for (uint32_t i = 0; i < flat_tree.size(); i++) {
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) {
if (child != nullptr) {
flat_tree.push_back(std::move(child));
Copy link
Member

@botengyao botengyao Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, and agreed this is kind of edge cases that can cause problems and pre-existing, but it not weird and a key can be large so we want to prevent it as much as possible. thanks for the improvement!

jmarantz
jmarantz previously approved these changes Jun 20, 2024
for (uint32_t i = 0; i < flat_tree.size(); i++) {
for (std::unique_ptr<TrieNode>& child : flat_tree[i]->children_) {
if (child != nullptr) {
flat_tree.push_back(std::move(child));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just drop a TODO to do a counting pass for reserving if this shows up in a flame-graph in the future?

@jmarantz
Copy link
Contributor

@RyanTheOptimist for senior approval

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@@ -709,7 +707,7 @@ template <class Value> class TrieLookupTable {
children_[child_index - min_child_] = std::move(node);
}

private:
Value value_{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the style guide, members should be private. Can we use friend or accessors to expose the innards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a struct instead. Seems daft to have a private type need to friend its parent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According the to style guide, structs should be passive objects which simply carry data. They should not enforce invariants, for example. I think this is definitely a class, so friend or accessors would work nicely. (I wonder if instead of exposing the innards to the parent, if we could perhaps do more work in this class itself. But that's likely outside the scope of this PR.)

https://google.github.io/styleguide/cppguide.html#Structs_vs._Classes

if (child != nullptr) {
flat_tree.push_back(std::move(child));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, once we finish moving flat_tree[i]->children_ into flat_tree we are finished with flat_tree[i], right? At that point we could nuke flat_tree[i]. As such it seems like flat_tree is conceptually split into two parts:

  1. the head of the tree which contains nodes that can be deleted.
  2. the tail of the tree with contains nodes that need to be processed.

Is that right? If so, I think I would suggest a slightly different approach. Instead of putting every node into a new std::vector flat_tree what if we instead used something like std::deque nodes_to_process. Then we could do something like:

    // To avoid stack overflow on recursive destruction if the tree is very deep,
    // destroy the element iteratively.
    std::deque<std::unique_ptr<TrieNode>> nodes_to_process;
    for (std::unique_ptr<TrieNode>& child : root_.children_) {
      if (child != nullptr) {
        nodes_to_process.push_back(std::move(child));
      }
    }
    while (!nodes_to_process.empty()) {
      std::unique_ptr<TrieNode> node = nodes_to_process.pop_front();
      for (std::unique_ptr<TrieNode>& child : node) {
        if (child != nullptr) {
          nodes_to_process.push_back(std::move(child));
        }
      }
    }

I think this is both a bit more clear and a bit more efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this.

Copy link
Contributor Author

@ravenblackx ravenblackx Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deque would perform significantly worse than a vector, though it would save on flattening the entire tree. Went with jmarantz's superior suggestion of a stack - since we don't really care about the order of deletion, consuming from the back provides all the advantages of the deque without giving up the performance advantage of the vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a deque perform significantly worse than a vector? It uses contiguous memory so it should be very performant? That being said, a stack is also fine!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::deque is surprisingly poor in performance in my distant-past experience, at least in some dimensions. YMMV and my measurements and impressions are more than 10 years old so maybe not to be trusted :)

Having said that, a stack is constructed using a deque by default, so I was thinking of using a vector, but using it as a stack :)

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx
Copy link
Contributor Author

Maybe prefer #34849 instead since it fixes the stack overflow destructor and further improves performance.

@jmarantz
Copy link
Contributor

That one looks good; should we submit this first though while the other gets reviewed? Or close this PR?

@ravenblackx
Copy link
Contributor Author

We should close this PR if we can get agreement that that one is preferred.

@ravenblackx
Copy link
Contributor Author

Actually, yeah, let's just close this one, because I don't want to make friend accessors for members of private internal data structures, so this one is stalled anyway. The other version doesn't have that problem, and is simpler, and is more performant, and uses less RAM, and cleans up the dependency mess a bit, so I can't think of any valid reason for this PR to be preferred.

@ravenblackx ravenblackx deleted the node_depth branch June 24, 2024 15:44
ravenblackx added a commit that referenced this pull request Jul 10, 2024
Commit Message: Another refactor of TrieLookupTable
Additional Description: The previous refactor magnified the pre-existing
problem of a recursive destructor. PR #34783 is being annoying in
review, so here's an alternative that simplifies it, putting it in a
flat structure while the trie is being filled, so there's no need for a
special destructor, and also further improves performance.
In addition to changing the implementation again, this PR also moves
`TrieLookupTable` out of `utility.h` and into its own separate header
and library, since it's used in relatively few places.
Risk Level: Small risk, testing is pretty thorough.
Testing: Passes all the existing tests, plus the additional new test for
not inducing a stack overflow.
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Benchmarks (note that RequestHeaders and ResponseHeaders are no longer
actually relevant since the static headers no longer use this trie.)
`bazel run -c opt test/common/common:trie_lookup_table_speed_test --
--benchmark_repetitions=10 --benchmark_out_format=csv
--benchmark_out=somefile.csv`

Benchmark | MeanBefore | MeanAfter | Difference (>100% means before was
faster)
-- | -- | -- | --
bmTrieLookupsRequestHeaders | 45.5528ns | 50.5076ns | 111%
bmTrieLookupsResponseHeaders | 41.4313ns | 47.0212ns | 113%
bmTrieLookups/10/Mixed | 303.578ns | 253.114ns | 83%
bmTrieLookups/100/Mixed | 351.339ns | 260.583ns | 74%
bmTrieLookups/1000/Mixed | 576.878ns | 334.986ns | 58%
bmTrieLookups/10000/Mixed | 768.84ns | 518.188ns | 67%
bmTrieLookups/10/8 | 12.7108ns | 13.8114ns | 109%
bmTrieLookups/100/8 | 17.8588ns | 14.103ns | 79%
bmTrieLookups/1000/8 | 29.3718ns | 24.4657ns | 83%
bmTrieLookups/10000/8 | 34.1951ns | 30.2386ns | 88%
bmTrieLookups/10/128 | 1018.77ns | 467.067ns | 46%
bmTrieLookups/100/128 | 749.413ns | 507.53ns | 68%
bmTrieLookups/1000/128 | 1083.53ns | 658.86ns | 61%
bmTrieLookups/10000/128 | 1319.94ns | 848.408ns | 64%

---------

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants