Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata_exchange: combine into native implementation #4789

Merged
merged 7 commits into from
Jul 18, 2023

Conversation

kyessenov
Copy link
Contributor

@kyessenov kyessenov commented Jul 7, 2023

Implements HTTP metadata exchange as a native filter on top of the existing CONNECT baggage filter.

The main benefit is a smooth migration from Istio-specific headers via fallback mechanisms.

The recommended setting for 1.19 sidecars would be:

  • for discovery: use WDS first, HTTP MX headers second.
  • for propagation: use HTTP MX headers.
  • allow turning-off WDS in case of instability (as it's coming from Ambient alpha code)

The recommended setting for 1.20 sidecars would be:

  • for discovery: use WDS.
  • for propagation: none.

The recommended setting for 1.19+ waypoints would be:

  • for discovery: use WDS, HTTP CONNECT second;
  • for propagation: none
  • turn off response CONNECT baggage since it is likely not used and is a waste

Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov kyessenov requested a review from a team as a code owner July 7, 2023 21:08
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 7, 2023
@kyessenov kyessenov added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jul 7, 2023
@zirain
Copy link
Member

zirain commented Jul 8, 2023

so this means there's connect filter will replace metadata_exchange in the future?

@kyessenov
Copy link
Contributor Author

so this means there's connect filter will replace metadata_exchange in the future?

Yeah, seems like it's best to combine all the reasonable ways into one filter (HTTP MX, baggage, and WDS). It's impossible to extend Wasm version to support WDS.

@zirain
Copy link
Member

zirain commented Jul 8, 2023

+1 for using a new native filter replace the wasm one.

will you make this landing in 1.19, it will cut branch at end of July, see https://github.com/istio/istio/wiki/Istio-Release-1.19

@zirain
Copy link
Member

zirain commented Jul 8, 2023

another thought: can we add an option to propagate metadata into filterstate, with that it will be easy to get metadata in accesslog

@kyessenov WDYT?

@kyessenov
Copy link
Contributor Author

@zirain metadata is already stored in the filter state in many cases, but it's a bit messy:

  • istio_stats can call WDS directly if it doesn't find it
  • ambient doesn't have upstream
  • TCP/HTTP MX store them in different filter states (upstream vs downstream).

Ideally, I'd like standard Otel logging work for ambient, hence we need a dedicated metadata extension that's not coupled with stats/stackdriver.

@zirain
Copy link
Member

zirain commented Jul 8, 2023

@zirain metadata is already stored in the filter state in many cases, but it's a bit messy:

  • istio_stats can call WDS directly if it doesn't find it
  • ambient doesn't have upstream
  • TCP/HTTP MX store them in different filter states (upstream vs downstream).

Ideally, I'd like standard Otel logging work for ambient, hence we need a dedicated metadata extension that's not coupled with stats/stackdriver.

still use x-envoy-peer-metadata encoding by flatbuff?

Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
@istio-testing istio-testing added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 13, 2023
@kyessenov
Copy link
Contributor Author

still use x-envoy-peer-metadata encoding by flatbuff?

Yes, but it doesn't matter since native Envoy filters can reflect on it with CEL.

@kyessenov
Copy link
Contributor Author

FYI @costinm . This should be ready for review.

Signed-off-by: Kuat Yessenov <kuat@google.com>
@zirain
Copy link
Member

zirain commented Jul 14, 2023

still use x-envoy-peer-metadata encoding by flatbuff?

Yes, but it doesn't matter since native Envoy filters can reflect on it with CEL.

can you add an example about how to use?

@kyessenov
Copy link
Contributor Author

/retest

@kyessenov
Copy link
Contributor Author

kyessenov commented Jul 14, 2023

@zirain

I think %CEL(filter_state["wasm.downstream_peer"].namespace)% should work.

@kyessenov
Copy link
Contributor Author

/retest

Comment on lines +45 to +47
Baggage baggage = 1;
WorkloadDiscovery workload_discovery = 2;
IstioHeaders istio_headers = 3;
Copy link
Member

@hzxuzhonghu hzxuzhonghu Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: from the pr desc, WorkloadDiscovery will be the choice in the future, why do we need to set discovery method. I mean it can do this by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkloadDiscovery is still alpha since it's a part of ambient so we need a way to disable it if it goes wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC, thanks

@kyessenov
Copy link
Contributor Author

/retest

@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jul 14, 2023
@kyessenov
Copy link
Contributor Author

Ref: istio/istio#43937

@kyessenov
Copy link
Contributor Author

Ping for review @zirain.

@zirain zirain added do-not-merge Block automatic merging of a PR. do-not-merge/hold Block automatic merging of a PR. and removed do-not-merge Block automatic merging of a PR. labels Jul 18, 2023
Copy link
Member

@zirain zirain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, add DNM for someone want to take a review.

also, we should add more test in istio repo to make sure everything works fine.

@@ -231,7 +231,8 @@ flatbuffers::DetachedBuffer convertWorkloadMetadataToFlatNode(const WorkloadMeta
node.add_labels(labels_offset);
auto data = node.Finish();
fbb.Finish(data);
return fbb.Release();
auto fb = fbb.Release();
return std::string(reinterpret_cast<const char*>(fb.data()), fb.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can you use bit_cast here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still in C++17 here, but we can switch to C++20 shortly.

return it->second;
}
}
const auto bytes = Base64::decodeWithoutPadding(value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "WithoutPadding" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility with "Wasm" implementation of the same logic here. Without padding is more forgiving so we have to keep it lax like that.

if (static_cast<uint32_t>(cache_.size()) > max_peer_cache_size_) {
cache_.erase(cache_.begin(), std::next(cache_.begin(), max_peer_cache_size_ / 4));
}
cache_.emplace(id, out);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this matters or not, but if id is already in the cache, this might unnecessarily cause the cache to clear if id already exists in the cache.

Maybe:

if(!id.empty()){
  auto id_it = cache_.find(id);
  if (id_it != cache.end()){
      id_it->second = out;
      return;
  }
  if (max_peer_cache_size_ > 0 && !id.empty()) {
    // do not let the cache grow beyond max cache size.
    if (static_cast<uint32_t>(cache_.size()) > max_peer_cache_size_) {
      cache_.erase(cache_.begin(), std::next(cache_.begin(), max_peer_cache_size_ / 4));
    }
    cache_.emplace(id, out);
}
return out;

Might be bikeshedding. Up to you if you want to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above: backwards compatibility https://github.com/istio/proxy/blob/master/extensions/metadata_exchange/plugin.cc#L133. There's no proper automated test for this caching logic so I am hesitant to touch it in any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly, getting poisoned memory here:

==24==ERROR: AddressSanitizer: use-after-poison on address 0x60f0001ae2b8 at pc 0x55d2b6f82a0a bp 0x7f7b57fc6370 sp 0x7f7b57fc5b38
WRITE of size 24 at 0x60f0001ae2b8 thread T8
[2023-07-19T05:19:58.890Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 783 7 4 "-" "Go-http-client/1.1" "62665433-eb4d-4798-bfd0-0b5b21eb2e9b" "b.echo1-2-87467.svc.cluster.local" "[fd00:10:244::1c]:18080" inbound|18080|| [::6]:32965 [fd00:10:244::1c]:18080 [fd00:10:244::13]:60782 outbound_.80_._.b.echo1-2-87467.svc.cluster.local default
#0 0x55d2b6f82a09  (/usr/local/bin/envoy+0x143bca09) /tmp/llvm/utils/release/final/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22
#1 0x55d2b6fcb3e9  (/usr/local/bin/envoy+0x144053e9) /usr/lib/llvm/bin/../include/c++/v1/string:1975
#2 0x55d2b73e3e7c  (/usr/local/bin/envoy+0x1481de7c) /usr/lib/llvm/bin/../include/c++/v1/utility:310
#3 0x55d2b73e3e39  (/usr/local/bin/envoy+0x1481de39) /usr/lib/llvm/bin/../include/c++/v1/memory:911
#4 0x55d2b73e3d72  (/usr/local/bin/envoy+0x1481dd72) /usr/lib/llvm/bin/../include/c++/v1/__memory/allocator_traits.h:288
#5 0x55d2b73e3caa  (/usr/local/bin/envoy+0x1481dcaa) external/com_google_absl/absl/container/internal/container_memory.h:440
#6 0x55d2b73e3c58  (/usr/local/bin/envoy+0x1481dc58) external/com_google_absl/absl/container/flat_hash_map.h:580
#7 0x55d2b73e3c48  (/usr/local/bin/envoy+0x1481dc48) external/com_google_absl/absl/container/internal/common_policy_traits.h:100
#8 0x55d2b73e3528  (/usr/local/bin/envoy+0x1481d528) external/com_google_absl/absl/container/internal/common_policy_traits.h:66
#9 0x55d2b73e311e  (/usr/local/bin/envoy+0x1481d11e) external/com_google_absl/absl/container/internal/raw_hash_set.h:2449
#10 0x55d2b73e2326  (/usr/local/bin/envoy+0x1481c326) external/com_google_absl/absl/container/internal/raw_hash_set.h:2594
#11 0x55d2b73e1c77  (/usr/local/bin/envoy+0x1481bc77) external/com_google_absl/absl/container/internal/raw_hash_set.h:2581
#12 0x55d2b764386e  (/usr/local/bin/envoy+0x14a7d86e) external/com_google_absl/absl/container/internal/raw_hash_set.h:2381
#13 0x55d2b7643559  (/usr/local/bin/envoy+0x14a7d559) external/com_google_absl/absl/container/internal/container_memory.h:140
#14 0x55d2b764341d  (/usr/local/bin/envoy+0x14a7d41d) external/com_google_absl/absl/container/internal/container_memory.h:207
#15 0x55d2b76433c2  (/usr/local/bin/envoy+0x14a7d3c2) external/com_google_absl/absl/container/flat_hash_map.h:587
#16 0x55d2b7643372  (/usr/local/bin/envoy+0x14a7d372) external/com_google_absl/absl/container/internal/hash_policy_traits.h:134
#17 0x55d2b763810f  (/usr/local/bin/envoy+0x14a7210f) external/com_google_absl/absl/container/internal/raw_hash_set.h:1999
#18 0x55d2b76374eb  (/usr/local/bin/envoy+0x14a714eb) source/extensions/filters/http/peer_metadata/filter.cc:183
#19 0x55d2b7636c01  (/usr/local/bin/envoy+0x14a70c01) source/extensions/filters/http/peer_metadata/filter.cc:152
#20 0x55d2b763b4a8  (/usr/local/bin/envoy+0x14a754a8) source/extensions/filters/http/peer_metadata/filter.cc:278
#21 0x55d2b763b20a  (/usr/local/bin/envoy+0x14a7520a) source/extensions/filters/http/peer_metadata/filter.cc:267
#22 0x55d2b763d608  (/usr/local/bin/envoy+0x14a77608) source/extensions/filters/http/peer_metadata/filter.cc:318

0x60f0001ae2b8 is located 72 bytes inside of 168-byte region [0x60f0001ae270,0x60f0001ae318)
allocated by thread T8 here:
#0 0x55d2b6f8355d  (/usr/local/bin/envoy+0x143bd55d) /tmp/llvm/utils/release/final/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145
    #1 0x7f7b6154ff97  (/lib/x86_64-linux-gnu/libc++abi.so.1+0x25f97)
#2 0x55d2b6fae91a  (/usr/local/bin/envoy+0x143e891a) /usr/lib/llvm/bin/../include/c++/v1/new:261
#3 0x55d2b6ff0d53  (/usr/local/bin/envoy+0x1442ad53) /usr/lib/llvm/bin/../include/c++/v1/memory:865
#4 0x55d2b6ff0d28  (/usr/local/bin/envoy+0x1442ad28) /usr/lib/llvm/bin/../include/c++/v1/__memory/allocator_traits.h:260
#5 0x55d2b6ff087b  (/usr/local/bin/envoy+0x1442a87b) external/com_google_absl/absl/container/internal/container_memory.h:65
#6 0x55d2b73e37e2  (/usr/local/bin/envoy+0x1481d7e2) external/com_google_absl/absl/container/internal/raw_hash_set.h:1343
#7 0x55d2b73e3479  (/usr/local/bin/envoy+0x1481d479) external/com_google_absl/absl/container/internal/raw_hash_set.h:2427
#8 0x55d2b73e2f20  (/usr/local/bin/envoy+0x1481cf20) external/com_google_absl/absl/container/internal/raw_hash_set.h:2437
#9 0x55d2b73e2326  (/usr/local/bin/envoy+0x1481c326) external/com_google_absl/absl/container/internal/raw_hash_set.h:2594
#10 0x55d2b73e1c77  (/usr/local/bin/envoy+0x1481bc77) external/com_google_absl/absl/container/internal/raw_hash_set.h:2581
#11 0x55d2b764386e  (/usr/local/bin/envoy+0x14a7d86e) external/com_google_absl/absl/container/internal/raw_hash_set.h:2381
#12 0x55d2b7643559  (/usr/local/bin/envoy+0x14a7d559) external/com_google_absl/absl/container/internal/container_memory.h:140
#13 0x55d2b764341d  (/usr/local/bin/envoy+0x14a7d41d) external/com_google_absl/absl/container/internal/container_memory.h:207
#14 0x55d2b76433c2  (/usr/local/bin/envoy+0x14a7d3c2) external/com_google_absl/absl/container/flat_hash_map.h:587
#15 0x55d2b7643372  (/usr/local/bin/envoy+0x14a7d372) external/com_google_absl/absl/container/internal/hash_policy_traits.h:134
#16 0x55d2b763810f  (/usr/local/bin/envoy+0x14a7210f) external/com_google_absl/absl/container/internal/raw_hash_set.h:1999
#17 0x55d2b76374eb  (/usr/local/bin/envoy+0x14a714eb) source/extensions/filters/http/peer_metadata/filter.cc:183
#18 0x55d2b7636c01  (/usr/local/bin/envoy+0x14a70c01) source/extensions/filters/http/peer_metadata/filter.cc:152
#19 0x55d2b763b4a8  (/usr/local/bin/envoy+0x14a754a8) source/extensions/filters/http/peer_metadata/filter.cc:278
#20 0x55d2b763b20a  (/usr/local/bin/envoy+0x14a7520a) source/extensions/filters/http/peer_metadata/filter.cc:267
#21 0x55d2b763d608  (/usr/local/bin/envoy+0x14a77608) source/extensions/filters/http/peer_metadata/filter.cc:318

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is thread management, will send a follow-up.

@kyessenov
Copy link
Contributor Author

@briansonnenberg @edtanous or anyone else let me know if you have further comments, I will create a follow-up to respond. Need to have a test in another repo, so letting it merge here.

@kyessenov kyessenov removed the do-not-merge/hold Block automatic merging of a PR. label Jul 18, 2023
@kyessenov
Copy link
Contributor Author

/retest
CustomAccessLog appears to be another flake

@kyessenov
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants