dynamic_modules: add stats sink extension #44861
Conversation
Adds `envoy.stat_sinks.dynamic_modules`, a contrib stats sink that delegates Envoy's flush and histogram callbacks to a shared object via the existing dynamic modules ABI. Includes new ABI hooks and snapshot readback callbacks, Go SDK bindings (StatSink / StatSinkConfigFactory / MetricSnapshot + cgo glue), a v3 proto, and full unit + integration tests with C test modules for symbol-missing and config-new-fail paths. Alpha; security posture requires_trusted_downstream_and_upstream Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
Thank you @mathetake I will wait for this build to finish to catch any errors, also do you mean I can move this code under Also here is an E2E example of how to use the Stats Sink Dynamic Module, along with a sample envoy.yaml https://github.com/prashanthjos/go-stats-sink-dynamic-module/blob/main/http_sink.go |
yes |
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
|
CC @envoyproxy/coverage-shephards: FYI only for changes made to |
8433009 to
d8b3d69
Compare
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
d8b3d69 to
e26108c
Compare
|
@mathetake @wbpcode @agrawroh I have moved all the code to the source, can you please take a look and leave your thoughts and in the mean time I will ask Gemini to review :) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the DynamicModuleStatsSink extension, which allows Envoy to load shared object files via dlopen to implement custom statistics sink behavior. The extension provides a C ABI for modules to receive periodic metric snapshots (counters, gauges, and text readouts) and synchronous histogram observations. It includes a Go SDK to facilitate module development in Go, along with comprehensive documentation and tests covering configuration, ABI implementation, and end-to-end integration. I have no feedback to provide as there are no review comments.
| } | ||
|
|
||
| // -------------------------------------------------------------------------- | ||
| // Stats Sink snapshot callbacks |
There was a problem hiding this comment.
Can you not have stats sink impl here like other extension points?
| development; capabilities and ABI are expected to evolve. | ||
|
|
||
| How it works | ||
| ------------ |
There was a problem hiding this comment.
Can you delete this implementation detail?
| * ``google.protobuf.Struct`` -- serialized to JSON before delivery. | ||
| * Any other message type -- raw proto bytes. | ||
|
|
||
| Go-module caveats |
| prefix: "envoy." | ||
|
|
||
| Configuration payloads | ||
| ---------------------- |
| /*/extensions/grpc_credentials/file_based_metadata @wozz @yanavlasov | ||
| /*/extensions/internal_redirect @botengyao @penguingao | ||
| /*/extensions/stat_sinks/dog_statsd @taiki45 @paul-r-gall | ||
| /*/extensions/stat_sinks/dynamic_modules @prashanthjos @mattklein123 |
| @@ -91,3 +91,4 @@ directories: | |||
| source/extensions/config_subscription/rest: 94.9 | |||
| source/extensions/matching/input_matchers/cel_matcher: 100.0 | |||
| source/extensions/dynamic_modules/sdk/cpp: 0.0 # SDK code self not directly tested | |||
| source/extensions/stat_sinks/dynamic_modules: 95.0 # Symbol-resolution error paths not fully exercised. | |||
There was a problem hiding this comment.
Can you not add this unless absolute necessary
There was a problem hiding this comment.
This is removed
| DynamicModuleStatsSink::DynamicModuleStatsSink(DynamicModuleStatsSinkConfigSharedPtr config) | ||
| : config_(std::move(config)) {} | ||
|
|
||
| void DynamicModuleStatsSink::flush(Stats::MetricSnapshot& snapshot) { |
There was a problem hiding this comment.
Would it be possible to avoid all the vector allocations in here?
There was a problem hiding this comment.
Yes this is removed
|
|
||
| void DynamicModuleStatsSink::onHistogramComplete(const Stats::Histogram& histogram, | ||
| uint64_t value) { | ||
| // Metric::name() returns std::string by value; bind it to a local so the |
There was a problem hiding this comment.
Can you remove this kind of unnecessary comments? Smells pretty much like AI
| // server start without an init manager isn't a pattern we want to introduce. | ||
| absl::StatusOr<Extensions::DynamicModules::DynamicModulePtr> dynamic_module_or_error; | ||
| if (module_config.has_module()) { | ||
| if (module_config.module().has_remote()) { |
There was a problem hiding this comment.
It might not be for this PR but can't we just reuse the existing code for loading? What makes stats sink special?
There was a problem hiding this comment.
I have corrected this code.
| * @param snapshot_envoy_ptr is the pointer to the MetricSnapshot. | ||
| * @param index is the index of the counter (0-based). | ||
| * @param name_out is the output buffer for the counter name. The buffer is owned by Envoy and | ||
| * valid only during this call. |
There was a problem hiding this comment.
What do you mean by "during this call"? What's the expected lifetime here?
There was a problem hiding this comment.
Updated, changed the wording to "guaranteed to be valid until the end of the envoy_dynamic_module_on_stat_sink_flush event hook", which matches the lifetime convention used by other ABI callbacks (e.g. HTTP filter headers/data). Applied the same fix to all three snapshot callbacks (counter, gauge, text readout).
| * @param snapshot_envoy_ptr is the pointer to the MetricSnapshot. | ||
| * @param index is the index of the text readout (0-based). | ||
| * @param name_out is the output buffer for the text readout name. The buffer is owned by Envoy | ||
| * and valid only during this call. |
| * @param name_out is the output buffer for the text readout name. The buffer is owned by Envoy | ||
| * and valid only during this call. | ||
| * @param value_out is the output buffer for the text readout value. The buffer is owned by Envoy | ||
| * and valid only during this call. |
| * | ||
| * @param snapshot_envoy_ptr is the pointer to the MetricSnapshot. | ||
| * @param index is the index of the gauge (0-based). | ||
| * @param name_out is the output buffer for the gauge name. The buffer is owned by Envoy and |
|
/wait |
@RyanTheOptimist I am going to address the comments, is the wait because of an upcoming release or you see any issue with the PR? |
|
@prashanthjos you won't need to wait for others to comment. |
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
|
/coverage |
|
Coverage for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-cncf-pr/44861/coverage/index.html For comparison, current coverage on https://storage.googleapis.com/envoy-cncf-postsubmit/main/coverage/index.html The coverage results are (re-)rendered each time the CI |
| @@ -0,0 +1,309 @@ | |||
| #include "source/extensions/dynamic_modules/abi/abi.h" | |||
There was a problem hiding this comment.
Could you keep these test files at the test/extensions/dynamic_modules like other modules? thanks.
There was a problem hiding this comment.
Thanks for the catch, I missed it, will move the test files to test/extensions/dynamic_modules/stat_sink/ to match the pattern used by the HTTP filter, bootstrap, listener, network, and UDP test directories. Will do this in the next push.
There was a problem hiding this comment.
This is done, I moved the files.
| // compiled module or plugins. | ||
| var httpFilterConfigFactoryRegistry = make(map[string]shared.HttpFilterConfigFactory) | ||
| var networkFilterConfigFactoryRegistry = make(map[string]shared.NetworkFilterConfigFactory) | ||
| var statSinkConfigFactoryRegistry = make(map[string]shared.StatSinkConfigFactory) |
There was a problem hiding this comment.
Thanks for taking Golang SDK into account 👍 But please also add the rust SDK at least. And we need to ensure our test data also cover the rust and go, cpp SDK and test are optional but if you can provided one, it would be better. (AI should be pretty helpful for this type of tasks)
There was a problem hiding this comment.
@wbpcode this is already a big PR, should we do the rust SDK in one more iteration? Also just want to let you know that I am not a rust expert :)
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
|
Thanks for adding this! Would be very useful |
Signed-off-by: Prashanth Josyula <prashanth.16@gmail.com>
|
@wbpcode @mathetake please take another look at the PR when you get a chance |
|
@mathetake @agrawroh @wbpcode gentle reminder ! |
wbpcode
left a comment
There was a problem hiding this comment.
Thanks. I want to say we should add rust SDK and cpp SDK. But this PR is there for a while. We can left that tasks to others.
But because the go SDK is there, please add go based test data and test the feature works correctly like my previous comments. Then I am good to land this first.
|
|
||
| // CounterSnapshot is one counter in a MetricSnapshot. | ||
| // The Name field shares memory with Envoy's buffer and is only valid during the | ||
| // enclosing OnFlush callback — copy it with ToString() if you need to keep it. | ||
| type CounterSnapshot struct { | ||
| Name UnsafeEnvoyBuffer | ||
| Value uint64 | ||
| Delta uint64 | ||
| } |
There was a problem hiding this comment.
Please merge main, and then, you will find we will place different module's API into different file. Like the Envoy exposed API should be stats_sink.go and the interface the users need to implement should be placed in the stats_sink_api.go.
| namespace { | ||
|
|
||
| const Envoy::Extensions::StatSinks::DynamicModules::StatSinkFlushContext* | ||
| toCtx(envoy_dynamic_module_type_stat_sink_snapshot_envoy_ptr ptr) { |
There was a problem hiding this comment.
toStatsSinkFlushContext. In the cpp, we prefer full name if it's not that long.
Commit Message: Adds
envoy.stat_sinks.dynamic_modules, a contrib stats sink that delegates Envoy's flush and histogram callbacks to a shared object via the existing dynamic modules ABI. Includes new ABI hooks and snapshot readback callbacks, Go SDK bindings (StatSink / StatSinkConfigFactory / MetricSnapshot + cgo glue), a v3 proto, and full unit + integration tests with C test modules for symbol-missing and config-new-fail paths.Alpha; security posture requires_trusted_downstream_and_upstream
Additional Description: Introduces four new event hooks and five snapshot readback callbacks in the dynamic modules ABI for stats sinks, a contrib C++ implementation that supports both name-based and explicit-path module loading (remote sources are rejected; stats sinks are a process-wide bootstrap resource), a new v3 proto under api/contrib/, and Go SDK bindings (StatSink interface, factory registry, and cgo glue that returns a C-owned handle to satisfy Go's
pointer-passing rules across the ABI boundary). Per-flush name caching is scoped to the call stack to give modules stable pointers during the callback without steady-state memory growth.
Risk Level: Low. New opt-in contrib extension; no behavioral change unless configured. Security posture is requires_trusted_downstream_and_upstream (matches access_loggers/dynamic_modules and bootstrap/dynamic_modules, since dynamic modules run in-process with full Envoy privileges).
Testing: Unit coverage under contrib/stat_sinks/dynamic_modules/test/ (config_test, sink_test, abi_impl_test) exercises factory registration, proto validation, both load paths, all four symbol-missing failures, null-return from config_new, snapshot callback correctness across counters/gauges/text readouts, and regression guards for -Wdangling-gsl on Metric::name(). Integration test boots real Envoy with a C test module that emits log markers via envoy_dynamic_module_callback_log and asserts them with EXPECT_LOG_CONTAINS. Manually validated end-to-end with a Go module posting metrics to an HTTP receiver across many flush cycles under traffic.
Docs Changes: API proto is fully commented (package-level protodoc-title, per-field docs including serialization semantics for google.protobuf.Any). No changes under docs/root/.
Release Notes: Added to changelogs/current.yaml under new_features, area: stat_sinks.
Platform Specific Features: None. Extension uses the existing dlopen-based dynamic modules infrastructure, which is already supported on Linux and macOS.
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]