Log the estimate of batch metrics memory consumption#18916
Log the estimate of batch metrics memory consumption#18916andsel wants to merge 17 commits intoelastic:mainfrom
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label. Could you fix it @andsel? 🙏
|
| final Class<USER_METRIC> type = metricFactory.getType(); | ||
| if (!type.isAssignableFrom(result.getJavaClass())) { | ||
| LOGGER.warn("UserMetric type mismatch for %s (expected: %s, received: %s); " + | ||
| LOGGER.warn("UserMetric type mismatch for {} (expected: {}, received: {}); " + |
There was a problem hiding this comment.
Note for reviewer
I scratched this while checking for other errors, so eventually I can split in a separate PR.
| input { dummy_input {} } | ||
| filter { | ||
| #{" nil_flushing_filter {}\n" * 2000} | ||
| #{" nil_flushing_filter {}\n" * 2500} |
There was a problem hiding this comment.
Note for reviewer
2000 filters produced flaky tests, when run singularly, so raised a little bit the limit.
|
💚 CLA has been signed |
There was a problem hiding this comment.
Pull request overview
This PR adds an estimate of memory consumed by batch-structure flow-metrics histograms and logs the estimate per pipeline on startup when batch metrics are enabled, helping users understand the RAM impact of these metrics and decide whether to disable them.
Changes:
- Extend histogram/retention-policy APIs to expose datapoint counts and estimated histogram footprint.
- Add pipeline- and queue-level estimation methods and log the estimate during pipeline startup.
- Update and adjust specs/tests to account for batch-metrics initialization requirements.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| logstash-core/src/test/java/org/logstash/instrument/metrics/BatchStructureMetricTest.java | Adds a unit test for footprint estimation. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/UserMetric.java | Fixes parameterized logging placeholders. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/histogram/LifetimeHistogramMetric.java | Implements footprint estimation for lifetime histograms. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/histogram/HistogramMetric.java | Extends histogram metric interface with estimation APIs. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/FlowMetricRetentionPolicy.java | Adds datapointsCount() to support estimating retained datapoints. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/BuiltInFlowMetricRetentionPolicies.java | Implements datapointsCount() for the lifetime policy. |
| logstash-core/src/main/java/org/logstash/instrument/metrics/BatchStructureMetric.java | Adds estimation based on histogram footprint × datapoints. |
| logstash-core/src/main/java/org/logstash/execution/QueueReadClientBatchMetrics.java | Adds estimation for queue-side batch histogram collectors. |
| logstash-core/src/main/java/org/logstash/execution/QueueReadClientBase.java | Exposes estimation through the queue read client base. |
| logstash-core/src/main/java/org/logstash/execution/QueueReadClient.java | Adds estimation method to the queue client interface. |
| logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java | Adds JRuby methods to estimate/log batch metrics occupation. |
| logstash-core/spec/logstash/pipeline_reporter_spec.rb | Disables batch metrics to avoid needing full flow-metrics init in these specs. |
| logstash-core/spec/logstash/pipeline_action/stop_spec.rb | Disables batch metrics in test pipeline setup. |
| logstash-core/spec/logstash/pipeline_action/stop_and_delete_spec.rb | Disables batch metrics in test pipeline setup. |
| logstash-core/spec/logstash/pipeline_action/reload_spec.rb | Disables batch metrics in test pipeline setup. |
| logstash-core/spec/logstash/pipeline_action/delete_spec.rb | Disables batch metrics in test pipeline setup. |
| logstash-core/spec/logstash/java_pipeline_spec.rb | Adds coverage for estimation + adjusts pipeline settings setup. |
| logstash-core/lib/logstash/java_pipeline.rb | Calls batch-metrics occupation logging during pipeline start. |
Comments suppressed due to low confidence (1)
logstash-core/spec/logstash/java_pipeline_spec.rb:1259
- The comment above this config says it creates a pipeline with 2000 filters, but the generated config now includes 2500 nil_flushing_filter entries. Please update the comment to match the new value (or revert the value if 2000 is still the intended threshold).
output { dummy_output {} }
EOS
end
let(:output) { ::LogStash::Outputs::DummyOutput.new }
before do
allow(::LogStash::Outputs::DummyOutput).to receive(:new).with(any_args).and_return(output)
allow(LogStash::Plugin).to receive(:lookup).with("input", "dummy_input").and_return(LogStash::Inputs::DummyBlockingInput)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
logstash-core/src/main/java/org/logstash/instrument/metrics/FlowMetricRetentionPolicy.java
Show resolved
Hide resolved
...h-core/src/main/java/org/logstash/instrument/metrics/BuiltInFlowMetricRetentionPolicies.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java
Outdated
Show resolved
Hide resolved
logstash-core/src/test/java/org/logstash/instrument/metrics/BatchStructureMetricTest.java
Outdated
Show resolved
Hide resolved
| # BatchStructureMetric has 4 policies | ||
| let(:last_1_minute_datapoints) { 60 / 3 } | ||
| let(:last_5_minutes_datapoints) { 5 * 60 / 15 } | ||
| let(:last_15_minutes_datapoints) { 15 * 60 / 30 } | ||
| let(:lifetime_datapoints) { 1 } | ||
| let(:single_batch_metric_datapoints) do | ||
| last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints# + lifetime_datapoints |
There was a problem hiding this comment.
In this new spec block, the comment says "BatchStructureMetric has 4 policies" but the arithmetic and the implementation only cover three. Also, the unused lifetime_datapoints variable and commented-out additions add confusion; please align the comment/variables with what the code actually does and remove the commented-out code.
| # BatchStructureMetric has 4 policies | |
| let(:last_1_minute_datapoints) { 60 / 3 } | |
| let(:last_5_minutes_datapoints) { 5 * 60 / 15 } | |
| let(:last_15_minutes_datapoints) { 15 * 60 / 30 } | |
| let(:lifetime_datapoints) { 1 } | |
| let(:single_batch_metric_datapoints) do | |
| last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints# + lifetime_datapoints | |
| # BatchStructureMetric has 3 policies | |
| let(:last_1_minute_datapoints) { 60 / 3 } | |
| let(:last_5_minutes_datapoints) { 5 * 60 / 15 } | |
| let(:last_15_minutes_datapoints) { 15 * 60 / 30 } | |
| let(:single_batch_metric_datapoints) do | |
| last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints |
934e718 to
70657a2
Compare
…nabled and when don't
…ons classes. This is needed to avoid start collecting such data on not yet fully initialized pipelines.
…ipeline report tests
… metrics. This simplifies the configuration. Instaed of setting also 'pipeline.batch.metrics.sampling_mode' to disabled when 'metric.collect' is set to false. Just set 'metric.collect' to false and batch metricsconsumption log is not printed
…ged capture histogram instance
70657a2 to
cb09bd3
Compare
estolfo
left a comment
There was a problem hiding this comment.
Should we document this as well? It looks like from the code, especially this comment, that the memory estimate is a lower bound. So in the documentation, we might want to note that to set expectations for anyone using this feature.
logstash-core/src/main/java/org/logstash/config/ir/compiler/OutputStrategyExt.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java
Show resolved
Hide resolved
| @JRubyMethod(name = "log_batch_metrics_occupation") | ||
| public final IRubyObject logEstimatedBatchMetricOccupation(final ThreadContext context) { | ||
| if (metric.collector(context).isNil() || !getSetting(context, "metric.collect").isTrue()) { | ||
| LOGGER.debug("Metrics collection is disabled, skipping batch metrics logging"); |
There was a problem hiding this comment.
If metrics collection is disabled, should we still log that metrics logging will be skipped? Or would a user expect that if metrics collection is disabled, no logging related to metrics will be done?
There was a problem hiding this comment.
If metric collection is disabled, and can be only programmatically disabled, like in the monitoring pipeline, then no batch metrics log line is expected.
This debug log is here to recall the reason why no batch memory log is happening.
But can be eventually removed and just return nil.
logstash-core/src/test/java/org/logstash/instrument/metrics/BatchStructureMetricTest.java
Outdated
Show resolved
Hide resolved
logstash-core/src/test/java/org/logstash/instrument/metrics/BatchStructureMetricTest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Emily S <emily.s@elastic.co>
💚 Build Succeeded
History
cc @andsel |
There was a problem hiding this comment.
The code looks good now but I just want to check that documentation is not needed?
Also, tested and see the log line:
[2026-04-10T13:08:37,377][INFO ][org.logstash.execution.AbstractPipelineExt] Pipeline `main` batch metrics estimated memory consumption: 5191680 bytes
Release notes
For each pipeline that has structured batch metrics enabled, log a line with memory consumed to collect such data.
What does this PR do?
Why is it important/What is the impact to the user?
Provides the user a direct information of how much memory the batch flow histograms consumes. With this information the user can select to disable it globally and enable only for single pipelines, if the memory consumption is too big for his setup.
Checklist
[ ] I have made corresponding changes to the documentationAuthor's Checklist
How to test this PR locally
Run Logstash and check that it prints a line like the following, for each pipeline:
Related issues
Use cases
Screenshots
Logs