Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflection-based deterministic message hashing #30761

Merged
merged 44 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
9c55848
Reflection-based deterministic message hashing
ravenblackx Nov 7, 2023
e9ca067
Add comment
ravenblackx Nov 7, 2023
44f7ebe
Use GetStringReference
ravenblackx Nov 7, 2023
465d833
Spelling
ravenblackx Nov 7, 2023
0bfb6f1
Debug
ravenblackx Nov 7, 2023
ef99f1e
Empty message hash must be nonzero
ravenblackx Nov 9, 2023
11aba21
Make http_cache use deterministic hash
ravenblackx Nov 9, 2023
a5b4a89
Messages of different types with identical contents should hash diffe…
ravenblackx Nov 9, 2023
6c7d090
Template map sorting
ravenblackx Nov 10, 2023
29532f6
Fewer macros
ravenblackx Nov 10, 2023
aa9e5fd
Type-typo
ravenblackx Nov 10, 2023
9e38e75
Add some tests, fix a bug they found
ravenblackx Nov 10, 2023
edcf363
Last of the tests and another fix
ravenblackx Nov 10, 2023
9b67119
Untemplate
ravenblackx Nov 13, 2023
c172e2d
No lambdas version
ravenblackx Nov 13, 2023
c686b3a
Remove nested struct
ravenblackx Nov 13, 2023
5b4e623
Satisfy Windows inferior "no return value" checks
ravenblackx Nov 13, 2023
cd2d3c1
Panic > bug for invalid enum
ravenblackx Nov 14, 2023
38c7f6d
Reduce coverage threshold
ravenblackx Nov 14, 2023
b6e84b0
Remove one uncovered line :)
ravenblackx Nov 14, 2023
3320ed0
Coverage back up by 0.1% for the one line
ravenblackx Nov 16, 2023
5afe859
Remove most auto, Comparer->Comparator
ravenblackx Nov 17, 2023
da47756
const
ravenblackx Nov 17, 2023
e128ee4
Explanatory comment
ravenblackx Nov 17, 2023
974f4f0
Verbose comment about sorting map fields, plus an ASSERT
ravenblackx Nov 17, 2023
5b0b818
Moar templates
ravenblackx Nov 28, 2023
f5393b6
Runtime guard
ravenblackx Nov 29, 2023
8830f12
Remove unused WithAny test proto
ravenblackx Nov 29, 2023
f308278
Add comment about unknown Any types and MessageDifferencer
ravenblackx Nov 29, 2023
2104c6e
Add comment about type URLs
ravenblackx Nov 29, 2023
234cbc8
Return coverage to 96.5
ravenblackx Nov 29, 2023
7b737bc
Spelling
ravenblackx Nov 29, 2023
c829856
Add benchmark
ravenblackx Nov 30, 2023
ecdd501
More benchmarks
ravenblackx Nov 30, 2023
2757c86
Unorder map hashing without sorting
ravenblackx Dec 4, 2023
3baddeb
Add relnote
ravenblackx Dec 4, 2023
40cefb2
Merge branch 'main' into hash
ravenblackx Dec 5, 2023
fffa18c
Use xxHash64Value
ravenblackx Dec 5, 2023
4255b62
Runtime guard defaulting to false
ravenblackx Dec 8, 2023
23c6aa9
Adi's comments
ravenblackx Dec 11, 2023
7feee62
Add ASSERT for GetPrototype
ravenblackx Dec 11, 2023
f6a44ef
Add tests for Any and repeated message
ravenblackx Dec 11, 2023
4455b88
Merge branch 'main' into hash
ravenblackx Dec 11, 2023
fbed541
Merge branch 'main' into hash
ravenblackx Dec 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ minor_behavior_changes:
Added new configuration field :ref:`rate_limited_as_resource_exhausted
<envoy_v3_api_field_extensions.filters.http.local_ratelimit.v3.LocalRateLimit.rate_limited_as_resource_exhausted>`
to allow for setting if rate limit grpc response should be RESOURCE_EXHAUSTED instead of the default UNAVAILABLE.
- area: config parsing, http cache filter
change: |
replaces protobuf hashing by human-readable string with a dedicated deterministic hashing algorithm.
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
The performance of the hash operation is improved by 2-10x depending on the structure of the message,
which is expected to reduce config update time or startup time by 10-25%. The new algorithm is also
used for http_cache_filter hashing, which will effectively cause a one-time cache flush on update
for users with a persistent cache. To enable this behavior set ``envoy.restart_features.use_fast_protobuf_hash`` to true.
- area: filter state
change: |
Added config name of filter sending a local reply in filter state with key
Expand Down
12 changes: 12 additions & 0 deletions source/common/protobuf/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -199,13 +199,25 @@ envoy_cc_library(
]),
)

envoy_cc_library(
name = "deterministic_hash_lib",
srcs = ["deterministic_hash.cc"],
hdrs = ["deterministic_hash.h"],
deps = [
":protobuf",
"//source/common/common:assert_lib",
"//source/common/common:hash_lib",
],
)

envoy_cc_library(
name = "utility_lib",
srcs = ["utility.cc"],
external_deps = [
"protobuf",
],
deps = [
":deterministic_hash_lib",
":message_validator_lib",
":protobuf",
":utility_lib_header",
Expand Down
228 changes: 228 additions & 0 deletions source/common/protobuf/deterministic_hash.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
#if defined(ENVOY_ENABLE_FULL_PROTOS)
#include "source/common/protobuf/deterministic_hash.h"

#include "source/common/common/assert.h"
#include "source/common/common/hash.h"

namespace Envoy {
namespace DeterministicProtoHash {
namespace {

// Get a scalar field from protobuf reflection field definition. The return
// type must be specified by the caller. Every implementation is a specialization
// because the reflection interface did separate named functions instead of a
// template.
template <typename T>
T reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field);

template <>
uint32_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetUInt32(message, &field);
}

template <>
int32_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetInt32(message, &field);
}

template <>
uint64_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetUInt64(message, &field);
}

template <>
int64_t reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetInt64(message, &field);
}

template <>
float reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetFloat(message, &field);
}

template <>
double reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetDouble(message, &field);
}

template <>
bool reflectionGet(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field) {
return reflection.GetBool(message, &field);
}

// Takes a field of scalar type, which may be a repeated field, and hashes
// it (or each of it).
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
template <typename T, std::enable_if_t<std::is_scalar_v<T>, bool> = true>
uint64_t hashScalarField(const Protobuf::Reflection& reflection, const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
if (field.is_repeated()) {
for (const T& scalar : reflection.GetRepeatedFieldRef<T>(message, &field)) {
seed = HashUtil::xxHash64Value(scalar, seed);
}
} else {
seed = HashUtil::xxHash64Value(reflectionGet<T>(reflection, message, field), seed);
}
return seed;
}

uint64_t reflectionHashMessage(const Protobuf::Message& message, uint64_t seed = 0);
uint64_t reflectionHashField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed);

// To make a map serialize deterministically we need to ignore the order of
// the map fields. To do that, we simply combine the hashes of each entry
// using an unordered operator (addition), and then apply that combined hash to
// the seed.
uint64_t reflectionHashMapField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
const Protobuf::Reflection& reflection = *message.GetReflection();
ASSERT(field.is_map());
const auto& entries = reflection.GetRepeatedFieldRef<Protobuf::Message>(message, &field);
ASSERT(!entries.empty());
const Protobuf::Descriptor& map_descriptor = *entries.begin()->GetDescriptor();
const Protobuf::FieldDescriptor& key_field = *map_descriptor.map_key();
const Protobuf::FieldDescriptor& value_field = *map_descriptor.map_value();
uint64_t combined_hash = 0;
for (const Protobuf::Message& entry : entries) {
uint64_t entry_hash = reflectionHashField(entry, key_field, 0);
entry_hash = reflectionHashField(entry, value_field, entry_hash);
combined_hash += entry_hash;
}
return HashUtil::xxHash64Value(combined_hash, seed);
}

uint64_t reflectionHashField(const Protobuf::Message& message,
const Protobuf::FieldDescriptor& field, uint64_t seed) {
using Protobuf::FieldDescriptor;
const Protobuf::Reflection& reflection = *message.GetReflection();
seed = HashUtil::xxHash64Value(field.number(), seed);
switch (field.cpp_type()) {
case FieldDescriptor::CPPTYPE_INT32:
seed = hashScalarField<int32_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_UINT32:
seed = hashScalarField<uint32_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_INT64:
seed = hashScalarField<int64_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_UINT64:
seed = hashScalarField<uint64_t>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_DOUBLE:
seed = hashScalarField<double>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_FLOAT:
seed = hashScalarField<float>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_BOOL:
seed = hashScalarField<bool>(reflection, message, field, seed);
break;
case FieldDescriptor::CPPTYPE_ENUM:
if (field.is_repeated()) {
int c = reflection.FieldSize(message, &field);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
for (int i = 0; i < c; i++) {
seed = HashUtil::xxHash64Value(reflection.GetRepeatedEnumValue(message, &field, i), seed);
}
} else {
seed = HashUtil::xxHash64Value(reflection.GetEnumValue(message, &field), seed);
}
break;
case FieldDescriptor::CPPTYPE_STRING: {
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
if (field.is_repeated()) {
for (const std::string& str : reflection.GetRepeatedFieldRef<std::string>(message, &field)) {
seed = HashUtil::xxHash64(str, seed);
}
} else {
std::string scratch;
seed = HashUtil::xxHash64(reflection.GetStringReference(message, &field, &scratch), seed);
}
} break;
case FieldDescriptor::CPPTYPE_MESSAGE:
if (field.is_map()) {
seed = reflectionHashMapField(message, field, seed);
} else if (field.is_repeated()) {
for (const Protobuf::Message& submsg :
reflection.GetRepeatedFieldRef<Protobuf::Message>(message, &field)) {
seed = reflectionHashMessage(submsg, seed);
}
} else {
seed = reflectionHashMessage(reflection.GetMessage(message, &field), seed);
}
break;
}
return seed;
}

// Converts from type urls OR descriptor full names to descriptor full names.
// Type urls are as used in envoy yaml config, e.g.
// "type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig"
// becomes
// "envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig"
absl::string_view typeUrlToDescriptorFullName(absl::string_view url) {
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
const size_t pos = url.rfind('/');
if (pos != absl::string_view::npos) {
return url.substr(pos + 1);
}
return url;
}

std::unique_ptr<Protobuf::Message> unpackAnyForReflection(const ProtobufWkt::Any& any) {
const Protobuf::Descriptor* descriptor =
Protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(
typeUrlToDescriptorFullName(any.type_url()));
// If the type name refers to an unknown type, we treat it the same as other
// unknown fields - not including its contents in the hash.
if (descriptor == nullptr) {
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
return nullptr;
}
const Protobuf::Message* prototype =
Protobuf::MessageFactory::generated_factory()->GetPrototype(descriptor);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
std::unique_ptr<Protobuf::Message> msg(prototype->New());
any.UnpackTo(msg.get());
return msg;
}

// This is intentionally ignoring unknown fields.
jmarantz marked this conversation as resolved.
Show resolved Hide resolved
uint64_t reflectionHashMessage(const Protobuf::Message& message, uint64_t seed) {
using Protobuf::FieldDescriptor;
std::string scratch;
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
const Protobuf::Reflection* reflection = message.GetReflection();
const Protobuf::Descriptor* descriptor = message.GetDescriptor();
seed = HashUtil::xxHash64(descriptor->full_name(), seed);
if (descriptor->well_known_type() == Protobuf::Descriptor::WELLKNOWNTYPE_ANY) {
const ProtobufWkt::Any* any = Protobuf::DynamicCastToGenerated<ProtobufWkt::Any>(&message);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
std::unique_ptr<Protobuf::Message> submsg = unpackAnyForReflection(*any);
if (submsg == nullptr) {
// If we wanted to handle unknown types in Any, this is where we'd have to do it.
return seed;
}
return reflectionHashMessage(*submsg, seed);
}
std::vector<const FieldDescriptor*> fields;
reflection->ListFields(message, &fields);
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
// If we wanted to handle unknown fields, we'd need to also GetUnknownFields here.
for (const FieldDescriptor* field : fields) {
seed = reflectionHashField(message, *field, seed);
}
// Hash one extra character to signify end of message, so that
// msg{} field2=2
// hashes differently from
// msg{field2=2}
return HashUtil::xxHash64("\x17", seed);
}
} // namespace

uint64_t hash(const Protobuf::Message& message) { return reflectionHashMessage(message, 0); }

} // namespace DeterministicProtoHash
} // namespace Envoy
#endif
26 changes: 26 additions & 0 deletions source/common/protobuf/deterministic_hash.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#pragma once

#include "source/common/protobuf/protobuf.h"

#if defined(ENVOY_ENABLE_FULL_PROTOS)
namespace Envoy {
namespace DeterministicProtoHash {

// Note: this ignores unknown fields and unrecognized types in Any fields.
// An alternative approach might treat such fields as "raw data" and include
// them in the hash, which would risk breaking the deterministic behavior,
// versus this way risks ignoring significant data.
//
// Ignoring unknown fields was chosen as the implementation because the
// TextFormat-based hashing this replaces was explicitly ignoring unknown
// fields.
//
// If this is used as part of making a hash table, it may result in
// collisions if unknown fields are present and are not ignored by the
// corresponding comparator. A `MessageDifferencer` can be configured to
// ignore unknown fields, or not to.
uint64_t hash(const Protobuf::Message& message);

} // namespace DeterministicProtoHash
} // namespace Envoy
#endif
13 changes: 7 additions & 6 deletions source/common/protobuf/utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "source/common/common/assert.h"
#include "source/common/common/documentation_url.h"
#include "source/common/common/fmt.h"
#include "source/common/protobuf/deterministic_hash.h"
#include "source/common/protobuf/message_validator_impl.h"
#include "source/common/protobuf/protobuf.h"
#include "source/common/protobuf/visitor.h"
Expand Down Expand Up @@ -129,22 +130,22 @@ void ProtoExceptionUtil::throwProtoValidationException(const std::string& valida
}

size_t MessageUtil::hash(const Protobuf::Message& message) {
std::string text_format;

#if defined(ENVOY_ENABLE_FULL_PROTOS)
{
if (Runtime::runtimeFeatureEnabled("envoy.restart_features.use_fast_protobuf_hash")) {
return DeterministicProtoHash::hash(message);
} else {
std::string text_format;
Protobuf::TextFormat::Printer printer;
printer.SetExpandAny(true);
printer.SetUseFieldNumber(true);
printer.SetSingleLineMode(true);
printer.SetHideUnknownFields(true);
printer.PrintToString(message, &text_format);
return HashUtil::xxHash64(text_format);
}
#else
absl::StrAppend(&text_format, message.SerializeAsString());
return HashUtil::xxHash64(message.SerializeAsString());
#endif

return HashUtil::xxHash64(text_format);
}

#if !defined(ENVOY_ENABLE_FULL_PROTOS)
Expand Down
2 changes: 2 additions & 0 deletions source/common/runtime/runtime_features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ FALSE_RUNTIME_GUARD(envoy_restart_features_use_eds_cache_for_ads);
FALSE_RUNTIME_GUARD(envoy_reloadable_features_enable_universal_header_validator);
// TODO(pksohn): enable after fixing https://github.com/envoyproxy/envoy/issues/29930
FALSE_RUNTIME_GUARD(envoy_reloadable_features_quic_defer_logging_to_ack_listener);
// TODO(ravenblack): flip this to true after some test time.
ravenblackx marked this conversation as resolved.
Show resolved Hide resolved
FALSE_RUNTIME_GUARD(envoy_restart_features_use_fast_protobuf_hash);

// Block of non-boolean flags. Use of int flags is deprecated. Do not add more.
ABSL_FLAG(uint64_t, re2_max_program_size_error_level, 100, ""); // NOLINT
Expand Down
2 changes: 1 addition & 1 deletion source/extensions/filters/http/cache/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ envoy_cc_library(
"//source/common/common:assert_lib",
"//source/common/http:header_utility_lib",
"//source/common/http:headers_lib",
"//source/common/protobuf:utility_lib",
"//source/common/protobuf:deterministic_hash_lib",
"@envoy_api//envoy/extensions/filters/http/cache/v3:pkg_cc_proto",
],
)
Expand Down
10 changes: 8 additions & 2 deletions source/extensions/filters/http/cache/http_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#include "source/common/http/header_utility.h"
#include "source/common/http/headers.h"
#include "source/common/http/utility.h"
#include "source/common/protobuf/utility.h"
#include "source/common/protobuf/deterministic_hash.h"
#include "source/extensions/filters/http/cache/cache_custom_headers.h"
#include "source/extensions/filters/http/cache/cache_headers_utils.h"

Expand Down Expand Up @@ -54,7 +54,13 @@ LookupRequest::LookupRequest(const Http::RequestHeaderMap& request_headers, Syst
// Unless this API is still alpha, calls to stableHashKey() must always return
// the same result, or a way must be provided to deal with a complete cache
// flush.
size_t stableHashKey(const Key& key) { return MessageUtil::hash(key); }
size_t stableHashKey(const Key& key) {
if (Runtime::runtimeFeatureEnabled("envoy.restart_features.use_fast_protobuf_hash")) {
return DeterministicProtoHash::hash(key);
} else {
return MessageUtil::hash(key);
}
}

void LookupRequest::initializeRequestCacheControl(const Http::RequestHeaderMap& request_headers) {
const absl::string_view cache_control =
Expand Down
2 changes: 0 additions & 2 deletions source/extensions/filters/http/cache/http_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,6 @@ using LookupResultPtr = std::unique_ptr<LookupResult>;
//
// When providing a cached response, Caches must ensure that the keys (and not
// just their hashes) match.
//
// TODO(toddmgreer): Ensure that stability guarantees above are accurate.
size_t stableHashKey(const Key& key);

// LookupRequest holds everything about a request that's needed to look for a
Expand Down