Skip to content

Log the estimate of batch metrics memory consumption#18916

Open
andsel wants to merge 17 commits intoelastic:mainfrom
andsel:feature/estimate_batch_metrics_memory_consumption_and_log
Open

Log the estimate of batch metrics memory consumption#18916
andsel wants to merge 17 commits intoelastic:mainfrom
andsel:feature/estimate_batch_metrics_memory_consumption_and_log

Conversation

@andsel
Copy link
Copy Markdown
Contributor

@andsel andsel commented Mar 30, 2026

Release notes

For each pipeline that has structured batch metrics enabled, log a line with memory consumed to collect such data.

What does this PR do?

  • Updates all the classes that uses Histogram to expose or sumup the memory consumed by those internal structures.
  • Updates pipeline startup to print a log line with that information if batch metrics is enabled.
  • Updates all the existing classes that start a pipeline to disable batch metrics when the flow metrics are not fully initialized.

Why is it important/What is the impact to the user?

Provides the user a direct information of how much memory the batch flow histograms consumes. With this information the user can select to disable it globally and enable only for single pipelines, if the memory consumption is too big for his setup.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

Run Logstash and check that it prints a line like the following, for each pipeline:

Pipeline `main`batch metrics estimated memory occupation: 4925440 bytes

Related issues

Use cases

Screenshots

Logs

[2026-03-30T17:28:45,791][INFO ][org.logstash.execution.AbstractPipelineExt] Pipeline `main`batch metrics estimated memory occupation: 4925440 bytes

@andsel andsel self-assigned this Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

final Class<USER_METRIC> type = metricFactory.getType();
if (!type.isAssignableFrom(result.getJavaClass())) {
LOGGER.warn("UserMetric type mismatch for %s (expected: %s, received: %s); " +
LOGGER.warn("UserMetric type mismatch for {} (expected: {}, received: {}); " +
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer
I scratched this while checking for other errors, so eventually I can split in a separate PR.

input { dummy_input {} }
filter {
#{" nil_flushing_filter {}\n" * 2000}
#{" nil_flushing_filter {}\n" * 2500}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer
2000 filters produced flaky tests, when run singularly, so raised a little bit the limit.

@andsel andsel changed the title Feature/estimate batch metrics memory consumption and log Log the estimate of batch metrics memory consumption Apr 2, 2026
@andsel andsel marked this pull request as ready for review April 2, 2026 13:34
@andsel andsel requested a review from Copilot April 7, 2026 08:03
@cla-checker-service
Copy link
Copy Markdown

cla-checker-service bot commented Apr 7, 2026

💚 CLA has been signed

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an estimate of memory consumed by batch-structure flow-metrics histograms and logs the estimate per pipeline on startup when batch metrics are enabled, helping users understand the RAM impact of these metrics and decide whether to disable them.

Changes:

  • Extend histogram/retention-policy APIs to expose datapoint counts and estimated histogram footprint.
  • Add pipeline- and queue-level estimation methods and log the estimate during pipeline startup.
  • Update and adjust specs/tests to account for batch-metrics initialization requirements.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
logstash-core/src/test/java/org/logstash/instrument/metrics/BatchStructureMetricTest.java Adds a unit test for footprint estimation.
logstash-core/src/main/java/org/logstash/instrument/metrics/UserMetric.java Fixes parameterized logging placeholders.
logstash-core/src/main/java/org/logstash/instrument/metrics/histogram/LifetimeHistogramMetric.java Implements footprint estimation for lifetime histograms.
logstash-core/src/main/java/org/logstash/instrument/metrics/histogram/HistogramMetric.java Extends histogram metric interface with estimation APIs.
logstash-core/src/main/java/org/logstash/instrument/metrics/FlowMetricRetentionPolicy.java Adds datapointsCount() to support estimating retained datapoints.
logstash-core/src/main/java/org/logstash/instrument/metrics/BuiltInFlowMetricRetentionPolicies.java Implements datapointsCount() for the lifetime policy.
logstash-core/src/main/java/org/logstash/instrument/metrics/BatchStructureMetric.java Adds estimation based on histogram footprint × datapoints.
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBatchMetrics.java Adds estimation for queue-side batch histogram collectors.
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBase.java Exposes estimation through the queue read client base.
logstash-core/src/main/java/org/logstash/execution/QueueReadClient.java Adds estimation method to the queue client interface.
logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java Adds JRuby methods to estimate/log batch metrics occupation.
logstash-core/spec/logstash/pipeline_reporter_spec.rb Disables batch metrics to avoid needing full flow-metrics init in these specs.
logstash-core/spec/logstash/pipeline_action/stop_spec.rb Disables batch metrics in test pipeline setup.
logstash-core/spec/logstash/pipeline_action/stop_and_delete_spec.rb Disables batch metrics in test pipeline setup.
logstash-core/spec/logstash/pipeline_action/reload_spec.rb Disables batch metrics in test pipeline setup.
logstash-core/spec/logstash/pipeline_action/delete_spec.rb Disables batch metrics in test pipeline setup.
logstash-core/spec/logstash/java_pipeline_spec.rb Adds coverage for estimation + adjusts pipeline settings setup.
logstash-core/lib/logstash/java_pipeline.rb Calls batch-metrics occupation logging during pipeline start.
Comments suppressed due to low confidence (1)

logstash-core/spec/logstash/java_pipeline_spec.rb:1259

  • The comment above this config says it creates a pipeline with 2000 filters, but the generated config now includes 2500 nil_flushing_filter entries. Please update the comment to match the new value (or revert the value if 2000 is still the intended threshold).
      output { dummy_output {} }
      EOS
    end
    let(:output) { ::LogStash::Outputs::DummyOutput.new }

    before do
      allow(::LogStash::Outputs::DummyOutput).to receive(:new).with(any_args).and_return(output)
      allow(LogStash::Plugin).to receive(:lookup).with("input", "dummy_input").and_return(LogStash::Inputs::DummyBlockingInput)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1155 to +1161
# BatchStructureMetric has 4 policies
let(:last_1_minute_datapoints) { 60 / 3 }
let(:last_5_minutes_datapoints) { 5 * 60 / 15 }
let(:last_15_minutes_datapoints) { 15 * 60 / 30 }
let(:lifetime_datapoints) { 1 }
let(:single_batch_metric_datapoints) do
last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints# + lifetime_datapoints
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new spec block, the comment says "BatchStructureMetric has 4 policies" but the arithmetic and the implementation only cover three. Also, the unused lifetime_datapoints variable and commented-out additions add confusion; please align the comment/variables with what the code actually does and remove the commented-out code.

Suggested change
# BatchStructureMetric has 4 policies
let(:last_1_minute_datapoints) { 60 / 3 }
let(:last_5_minutes_datapoints) { 5 * 60 / 15 }
let(:last_15_minutes_datapoints) { 15 * 60 / 30 }
let(:lifetime_datapoints) { 1 }
let(:single_batch_metric_datapoints) do
last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints# + lifetime_datapoints
# BatchStructureMetric has 3 policies
let(:last_1_minute_datapoints) { 60 / 3 }
let(:last_5_minutes_datapoints) { 5 * 60 / 15 }
let(:last_15_minutes_datapoints) { 15 * 60 / 30 }
let(:single_batch_metric_datapoints) do
last_1_minute_datapoints + last_5_minutes_datapoints + last_15_minutes_datapoints

Copilot uses AI. Check for mistakes.
@andsel andsel force-pushed the feature/estimate_batch_metrics_memory_consumption_and_log branch from 934e718 to 70657a2 Compare April 7, 2026 08:14
@andsel andsel force-pushed the feature/estimate_batch_metrics_memory_consumption_and_log branch from 70657a2 to cb09bd3 Compare April 7, 2026 12:56
Copy link
Copy Markdown
Contributor

@estolfo estolfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document this as well? It looks like from the code, especially this comment, that the memory estimate is a lower bound. So in the documentation, we might want to note that to set expectations for anyone using this feature.

@JRubyMethod(name = "log_batch_metrics_occupation")
public final IRubyObject logEstimatedBatchMetricOccupation(final ThreadContext context) {
if (metric.collector(context).isNil() || !getSetting(context, "metric.collect").isTrue()) {
LOGGER.debug("Metrics collection is disabled, skipping batch metrics logging");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metrics collection is disabled, should we still log that metrics logging will be skipped? Or would a user expect that if metrics collection is disabled, no logging related to metrics will be done?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metric collection is disabled, and can be only programmatically disabled, like in the monitoring pipeline, then no batch metrics log line is expected.
This debug log is here to recall the reason why no batch memory log is happening.
But can be eventually removed and just return nil.

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @andsel

@andsel andsel requested a review from estolfo April 9, 2026 14:20
Copy link
Copy Markdown
Contributor

@estolfo estolfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good now but I just want to check that documentation is not needed?

Also, tested and see the log line:

[2026-04-10T13:08:37,377][INFO ][org.logstash.execution.AbstractPipelineExt] Pipeline `main` batch metrics estimated memory consumption: 5191680 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose the size of batch metrics histograms, to be aware of the memory consumption for pipeline.

4 participants