Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Compiled" static header maps instead of big trie #33932

Merged
merged 33 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9b0d62c
Add static lookup benchmarks for header_map_impl
ravenblackx May 1, 2024
2e83788
compiled_string_map v1
ravenblackx May 2, 2024
be5dd61
optimize
ravenblackx May 2, 2024
611e1dd
Format
ravenblackx May 2, 2024
150040e
Document
ravenblackx May 2, 2024
90ba622
Format
ravenblackx May 2, 2024
e055146
Header
ravenblackx May 2, 2024
dd3a9fa
Comment and fix issue.
ravenblackx May 2, 2024
1990767
More comment
ravenblackx May 2, 2024
5df1270
Fix and simplify
ravenblackx May 3, 2024
92625f3
Merge branch 'headers' into headers_compiled
ravenblackx May 3, 2024
23042de
Split finalize and compile to support legacy host header injection
ravenblackx May 3, 2024
0808002
Merge branch 'main' into headers_compiled
ravenblackx May 7, 2024
b293819
Empty commit to retest
ravenblackx May 10, 2024
9eccaa9
Reduce string copies while populating lookup table
ravenblackx May 10, 2024
a3f8919
Rearrange node creation to be clearer that it's effectively a single …
ravenblackx May 10, 2024
d1a8ed8
More comments
ravenblackx May 10, 2024
7ece7eb
Explicit move
ravenblackx May 10, 2024
57b9055
Comment explaining string-view by reference
ravenblackx May 10, 2024
facdef8
Spelling
ravenblackx May 10, 2024
941d3fa
len
ravenblackx May 13, 2024
d75190d
key_size to avoid calling size() repeatedly (no difference)
ravenblackx May 13, 2024
30bab4b
Virtual class instead of lambdas
ravenblackx May 13, 2024
5469839
PURE
ravenblackx May 13, 2024
dc91907
Cut out a size comparison
ravenblackx May 13, 2024
4348681
move
ravenblackx May 14, 2024
9bd6190
Trailing underscores, some readability rearranging
ravenblackx May 14, 2024
9ee31c0
Comment on KV ownership.
ravenblackx May 14, 2024
405ac4d
data() not &[0]
ravenblackx May 14, 2024
a743ba2
Comments
ravenblackx May 16, 2024
c7458f4
Spelling
ravenblackx May 16, 2024
c4858ec
Fix comment
ravenblackx May 16, 2024
56c5e4e
Add compiled_string_map.md
ravenblackx May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions source/common/common/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,11 @@ envoy_cc_library(
],
)

envoy_cc_library(
name = "compiled_string_map_lib",
hdrs = ["compiled_string_map.h"],
)

envoy_cc_library(
name = "packed_struct_lib",
hdrs = ["packed_struct.h"],
Expand Down
235 changes: 235 additions & 0 deletions source/common/common/compiled_string_map.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
#pragma once

#include <algorithm>
#include <array>
#include <string>
#include <vector>

#include "envoy/common/pure.h"

#include "absl/strings/string_view.h"

namespace Envoy {

/**
* This is a specialized structure intended for static header maps, but
* there may be other use cases.
* The structure is:
* 1. a length-based lookup table so only keys the same length as the
* target key are considered.
* 2. a trie that branches on the "most divisions" position of the key.
*
* Using this structure is not recommended unless the table is
* initialize-once, use-many, as the "compile" operation is expensive.
*
* Unlike a regular trie, this structure cannot be used for prefix-based
* matching.
*
* For example, if we consider the case where the set of headers is
* `x-prefix-banana`
* `x-prefix-babana`
* `x-prefix-apple`
* `x-prefix-pineapple`
* `x-prefix-barana`
* `x-prefix-banaka`
*
* A standard front-first trie looking for `x-prefix-banana` would walk
* 7 nodes through the tree, first for `x`, then for `-`, etc.
*
* This structure first jumps to matching length, eliminating in this
* example case apple and pineapple.
* Then the "best split" node is on
* `x-prefix-banana`
* ^
* so the first node has 3 non-miss branches, n, b and r for that position.
* Down that n branch, the "best split" is on
* `x-prefix-banana`
* ^
* which has two branches, n or k.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the "best split" can be anywhere in the string. I think modifying the example to the following, will clarify how this should work.

 * `x-prefix-banana`
 * `x-prefix-banara`
 * `x-prefix-apple`
 * `x-prefix-pineapple`
 * `x-prefix-barana`
 * `x-prefix-banaka`

then the best split would've been in the character before-last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be anywhere in the string, only at an index where there's a difference. The example is specifically showing that 3 branches would be "best" over 2 branches. I don't understand what your example clarifies that the existing example doesn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was looking at it from a "trie" perspective.
When I first read the comment, I thought that in each branch, the structure only keeps the "suffix" (after the branch point). I think that updating the example to a case where the branching happens due to a "best-split" that is not the first split, will reduce that confusion to future readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the md file explanation covers this sufficiently (though it doesn't do the second split earlier in the string).

If you'd like me to rearrange the strings in the new example so that it does the "last" index split first and the earlier one second I can do that, but I think the diagram having numbers in it probably makes it clear enough. (And late first would be just as potentially confusing as early first, and having three sequential splits would be getting so long as to be harder to read...)

* Down the n branch is the leaf node (only `x-prefix-banana` remains) - at
* this point a regular string-compare checks if the key is an exact match
* for the string node.
*/
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
template <class Value> class CompiledStringMap {
class Node {
public:
// While it is usual to take a string_view by value, in this
// performance-critical context with repeatedly passing the same
// value, passing it by reference benchmarks out slightly faster.
virtual Value find(const absl::string_view& key) PURE;
virtual ~Node() = default;
};

class LeafNode : public Node {
public:
LeafNode(absl::string_view key, Value&& value) : key_(key), value_(std::move(value)) {}
Value find(const absl::string_view& key) override {
// String comparison unnecessarily checks size equality first, we can skip
// to memcmp here because we already know the sizes are equal.
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
// Since this is a super-hot path we don't even ASSERT here, to avoid adding
// slowdown in debug builds.
if (memcmp(key.data(), key_.data(), key.size())) {
return {};
}
return value_;
}

private:
const std::string key_;
const Value value_;
};

class BranchNode : public Node {
public:
BranchNode(size_t index, uint8_t min, std::vector<std::unique_ptr<Node>>&& branches)
: index_(index), min_(min), branches_(std::move(branches)) {}
Value find(const absl::string_view& key) override {
uint8_t k = static_cast<uint8_t>(key[index_]);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
// Possible optimization was tried here, populating empty nodes with
// a function that returns {} to reduce branching vs checking for null
// nodes. Checking for null nodes benchmarked faster.
if (k < min_ || k >= min_ + branches_.size() || branches_[k - min_] == nullptr) {
return {};
}
return branches_[k - min_]->find(key);
}

private:
const size_t index_;
const uint8_t min_;
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
// Possible optimization was tried here, using std::array<std::unique_ptr<Node>, 256>
// rather than a smaller-range vector with bounds, to keep locality and reduce
// comparisons. It didn't help.
const std::vector<std::unique_ptr<Node>> branches_;
};

public:
// The caller owns the string-views during `compile`. Ownership of the passed in
// Values is transferred to the CompiledStringMap.
using KV = std::pair<absl::string_view, Value>;
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
/**
* Returns the value with a matching key, or the default value
* (typically nullptr) if the key was not present.
* @param key the key to look up.
*/
Value find(absl::string_view key) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you found passing by const ref was faster in another case, but not here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly left this one for the sake of leaving the external API unchanged. Feels more reasonable to buck the style guide for internal implementation than for API.

const size_t key_size = key.size();
if (key_size >= table_.size() || table_[key_size] == nullptr) {
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
return {};
}
return table_[key_size]->find(key);
};
/**
* Construct the lookup table. This can be a somewhat slow multi-pass
* operation if the input table is large.
* @param initial a vector of key->value pairs. This is taken by value because
* we're going to modify it. If the caller still wants the original
* then it can be copied in, if not it can be moved in.
* Note that the keys are string_views - the base string data must
* exist for the duration of compile(). The leaf nodes take copies
* of the key strings, so the string_views can be invalidated once
* compile has completed.
*/
void compile(std::vector<KV> initial) {
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the parameter called initial? Would contents be a better description?

ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
if (initial.empty()) {
return;
}
std::sort(initial.begin(), initial.end(),
[](const KV& a, const KV& b) { return a.first.size() < b.first.size(); });
size_t longest = initial.back().first.size();
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
table_.resize(longest + 1);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
auto range_start = initial.begin();
// Populate the sub-nodes for each length of key that exists.
while (range_start != initial.end()) {
// Find the first key whose length differs from the current key length.
// Everything in between is keys with the same length.
auto range_end =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

std::find_if(range_start, initial.end(), [len = range_start->first.size()](const KV& e) {
return e.first.size() != len;
});
std::vector<KV> node_contents;
// Populate a FindFn for the nodes in that range.
node_contents.reserve(range_end - range_start);
std::move(range_start, range_end, std::back_inserter(node_contents));
table_[range_start->first.size()] = createEqualLengthNode(node_contents);
range_start = range_end;
}
}

private:
/**
* Details of a node branch point; the index into the string at which
* characters should be looked up, the lowest valued character in the
* branch, the highest valued character in the branch, and how many
* branches there are.
*/
struct IndexSplitInfo {
uint8_t index_, min_, max_, count_;
size_t size() { return max_ - min_ + 1; }
size_t offsetOf(uint8_t c) { return c - min_; }
};

/**
* @param node_contents the key-value pairs to be branched upon.
* @return details of the index on which the node should branch
* - the index which produces the most child branches.
*/
static IndexSplitInfo findBestSplitPoint(const std::vector<KV>& node_contents) {
IndexSplitInfo best{0, 0, 0, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an ASSERT(node_contents.size() > 1);

for (size_t i = 0; i < node_contents[0].first.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be faster?

for (size_t i = 0, n = node_contents[0].first.size(); i < n; i++) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would (it's just reading a value from an address), and we don't care about the microoptimization performance here anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this code more understandable consider naming:
const size_t each_key_length = node_contents[0].first.size(); (or something similar)

consider renaming:i -> key_char_idx, j -> key_idx, v -> ch/val

std::array<bool, 256> hits{};
IndexSplitInfo info{static_cast<uint8_t>(i), 255, 0, 0};
for (size_t j = 0; j < node_contents.size(); j++) {
uint8_t v = node_contents[j].first[i];
if (!hits[v]) {
hits[v] = true;
info.count_++;
info.min_ = std::min(v, info.min_);
info.max_ = std::max(v, info.max_);
}
}
if (info.count_ > best.count_) {
best = info;
}
}
return best;
}

/*
* @param node_contents the set of key-value pairs that will be children of
* this node.
* @return the recursively generated tree node that leads to all of node_contents.
* If there is only one entry in node_contents then a LeafNode, otherwise a BranchNode.
*/
static std::unique_ptr<Node> createEqualLengthNode(std::vector<KV> node_contents) {
if (node_contents.size() == 1) {
return std::make_unique<LeafNode>(node_contents[0].first, std::move(node_contents[0].second));
}
IndexSplitInfo best = findBestSplitPoint(node_contents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
std::vector<std::unique_ptr<Node>> nodes;
nodes.resize(best.size());
std::sort(node_contents.begin(), node_contents.end(),
[index = best.index_](const KV& a, const KV& b) {
return a.first[index] < b.first[index];
});
auto range_start = node_contents.begin();
// Populate the sub-nodes for each character-branch.
while (range_start != node_contents.end()) {
// Find the first key whose character at position [best.index_] differs from the
// character of the current range.
// Everything in the range has keys with the same character at this index.
auto range_end = std::find_if(range_start, node_contents.end(),
[index = best.index_, c = range_start->first[best.index_]](
const KV& e) { return e.first[index] != c; });
std::vector<KV> next_contents;
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
next_contents.reserve(range_end - range_start);
std::move(range_start, range_end, std::back_inserter(next_contents));
nodes[best.offsetOf(range_start->first[best.index_])] = createEqualLengthNode(next_contents);
range_start = range_end;
}
return std::make_unique<BranchNode>(best.index_, best.min_, std::move(nodes));
}
std::vector<std::unique_ptr<Node>> table_;
};

} // namespace Envoy
1 change: 1 addition & 0 deletions source/common/http/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,7 @@ envoy_cc_library(
":headers_lib",
"//envoy/http:header_map_interface",
"//source/common/common:assert_lib",
"//source/common/common:compiled_string_map_lib",
"//source/common/common:dump_state_utils",
"//source/common/common:empty_string",
"//source/common/common:non_copyable",
Expand Down
16 changes: 9 additions & 7 deletions source/common/http/header_map_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -119,19 +119,21 @@ template <> HeaderMapImpl::StaticLookupTable<RequestHeaderMap>::StaticLookupTabl
INLINE_REQ_HEADERS(REGISTER_DEFAULT_REQUEST_HEADER)
INLINE_REQ_RESP_HEADERS(REGISTER_DEFAULT_REQUEST_HEADER)

finalizeTable();
auto input = finalizedTable();

// Special case where we map a legacy host header to :authority.
const auto handle =
CustomInlineHeaderRegistry::getInlineHeader<RequestHeaderMap::header_map_type>(
Headers::get().Host);
add(Headers::get().HostLegacy.get().c_str(), [handle](HeaderMapImpl& h) -> StaticLookupResponse {
return {&h.inlineHeaders()[handle.value().it_->second], &handle.value().it_->first};
});
input.emplace_back(
Headers::get().HostLegacy.get(), [handle](HeaderMapImpl& h) -> StaticLookupResponse {
return {&h.inlineHeaders()[handle.value().it_->second], &handle.value().it_->first};
});
compile(std::move(input));
}

template <> HeaderMapImpl::StaticLookupTable<RequestTrailerMap>::StaticLookupTable() {
finalizeTable();
compile(finalizedTable());
}

template <> HeaderMapImpl::StaticLookupTable<ResponseHeaderMap>::StaticLookupTable() {
Expand All @@ -142,7 +144,7 @@ template <> HeaderMapImpl::StaticLookupTable<ResponseHeaderMap>::StaticLookupTab
INLINE_REQ_RESP_HEADERS(REGISTER_RESPONSE_HEADER)
INLINE_RESP_HEADERS_TRAILERS(REGISTER_RESPONSE_HEADER)

finalizeTable();
compile(finalizedTable());
}

template <> HeaderMapImpl::StaticLookupTable<ResponseTrailerMap>::StaticLookupTable() {
Expand All @@ -151,7 +153,7 @@ template <> HeaderMapImpl::StaticLookupTable<ResponseTrailerMap>::StaticLookupTa
Headers::get().name);
INLINE_RESP_HEADERS_TRAILERS(REGISTER_RESPONSE_TRAILER)

finalizeTable();
compile(finalizedTable());
}

uint64_t HeaderMapImpl::appendToHeader(HeaderString& header, absl::string_view data,
Expand Down
13 changes: 10 additions & 3 deletions source/common/http/header_map_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "envoy/config/core/v3/base.pb.h"
#include "envoy/http/header_map.h"

#include "source/common/common/compiled_string_map.h"
#include "source/common/common/non_copyable.h"
#include "source/common/common/utility.h"
#include "source/common/http/headers.h"
Expand Down Expand Up @@ -146,18 +147,21 @@ class HeaderMapImpl : NonCopyable {
*/
template <class Interface>
struct StaticLookupTable
: public TrieLookupTable<std::function<StaticLookupResponse(HeaderMapImpl&)>> {
: public CompiledStringMap<std::function<StaticLookupResponse(HeaderMapImpl&)>> {
StaticLookupTable();

void finalizeTable() {
std::vector<KV> finalizedTable() {
CustomInlineHeaderRegistry::finalize<Interface::header_map_type>();
auto& headers = CustomInlineHeaderRegistry::headers<Interface::header_map_type>();
size_ = headers.size();
std::vector<KV> input;
input.reserve(size_);
for (const auto& header : headers) {
this->add(header.first.get().c_str(), [&header](HeaderMapImpl& h) -> StaticLookupResponse {
input.emplace_back(header.first.get(), [&header](HeaderMapImpl& h) -> StaticLookupResponse {
return {&h.inlineHeaders()[header.second], &header.first};
});
}
return input;
}

static size_t size() {
Expand All @@ -181,6 +185,9 @@ class HeaderMapImpl : NonCopyable {
}
}

// This is the size of the number of callbacks; in the case of Requests,
// this is one smaller than the number of entries in the lookup table,
// because of legacy `host` mapping to the same thing as `:authority`.
size_t size_;
};

Expand Down
5 changes: 5 additions & 0 deletions test/common/common/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,11 @@ envoy_cc_test(
],
)

envoy_cc_test(
name = "compiled_string_map_test",
srcs = ["compiled_string_map_test.cc"],
)

envoy_cc_test(
name = "packed_struct_test",
srcs = ["packed_struct_test.cc"],
Expand Down
38 changes: 38 additions & 0 deletions test/common/common/compiled_string_map_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#include "source/common/common/compiled_string_map.h"

#include "gmock/gmock.h"
#include "gtest/gtest.h"

namespace Envoy {

using testing::IsNull;

TEST(CompiledStringMapTest, FindsEntriesCorrectly) {
CompiledStringMap<const char*> map;
map.compile({
{"key-1", "value-1"},
{"key-2", "value-2"},
{"longer-key", "value-3"},
{"bonger-key", "value-4"},
{"bonger-bey", "value-5"},
{"only-key-of-this-length", "value-6"},
});
EXPECT_EQ(map.find("key-1"), "value-1");
EXPECT_EQ(map.find("key-2"), "value-2");
EXPECT_THAT(map.find("key-0"), IsNull());
EXPECT_THAT(map.find("key-3"), IsNull());
EXPECT_EQ(map.find("longer-key"), "value-3");
EXPECT_EQ(map.find("bonger-key"), "value-4");
EXPECT_EQ(map.find("bonger-bey"), "value-5");
EXPECT_EQ(map.find("only-key-of-this-length"), "value-6");
EXPECT_THAT(map.find("songer-key"), IsNull());
EXPECT_THAT(map.find("absent-length-key"), IsNull());
}

TEST(CompiledStringMapTest, EmptyMapReturnsNull) {
CompiledStringMap<const char*> map;
map.compile({});
EXPECT_THAT(map.find("key-1"), IsNull());
}

} // namespace Envoy
Loading