Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for increasing BatchSpanProcessor throughput under slow network / ingestion #4245

Closed

Conversation

trask
Copy link
Member

@trask trask commented Mar 8, 2022

I'm guessing you would want this option to go through spec(?), but wanted to get some initial feedback here , including the alternate option (see #4246).

The problem: under high telemetry load and slower network / ingestion, the BatchSpanProcessor single worker thread can spend a lot of time waiting for responses.

This PR: this option tries to send as many complete batches as possible (up to a new config option maxConcurrentExports), and then waits exporterTimeoutNanos for all of those batches to complete.

@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #4245 (4dc56a3) into main (d2fbcba) will decrease coverage by 0.01%.
The diff coverage is 84.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #4245      +/-   ##
============================================
- Coverage     90.31%   90.30%   -0.02%     
  Complexity     4741     4741              
============================================
  Files           553      553              
  Lines         14584    14598      +14     
  Branches       1402     1404       +2     
============================================
+ Hits          13172    13183      +11     
- Misses          953      957       +4     
+ Partials        459      458       -1     
Impacted Files Coverage Δ
...ry/sdk/trace/export/BatchSpanProcessorBuilder.java 89.18% <25.00%> (-7.79%) ⬇️
...telemetry/sdk/trace/export/BatchSpanProcessor.java 90.19% <95.23%> (-0.02%) ⬇️
...entelemetry/sdk/logs/export/BatchLogProcessor.java 88.80% <0.00%> (+0.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2fbcba...4dc56a3. Read the comment docs.

@trask trask marked this pull request as draft March 8, 2022 06:21
@trask trask marked this pull request as ready for review March 9, 2022 22:19
@anuraaga
Copy link
Contributor

LGTM having this option for batch processor seems like it would be nice

concurrentExports.add(exportCurrentBatch());
}
}
if (concurrentExports.isEmpty() && System.nanoTime() >= nextExportTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what case is this trying to catch? It would only trigger if somehow the queue were empty, but the time had expired...but the queue was just empty! Why would you expect there to be something to export at this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the intent is to catch the scenario where the items have been drained from the queue and added to the batch, but not enough items to trigger an export such that batch.size() > maxExportBatchSize on line 243.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah.. we talked about this last week. I asked @trask to add a comment about it, since it wasn't obvious to me what this was for.

@jack-berg
Copy link
Member

Seems reasonable. Would also be good for batch log processor. I believe this is already possible for PeriodicMetricReader because you can provide your own ScheduledExecutorService.

@trask
Copy link
Member Author

trask commented Mar 14, 2022

Closing.

@jkwatson had the keen recollection that the spec explicitly disallows calling the exporter concurrently:

Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.

@jkwatson also had the great workaround to have the exporter always return CompletableResultCode.ofSuccess(), and then flush the exporter after flushing the BatchSpanProcessor, which could even be written as a wrapper around an existing exporter, e.g.

public class ConcurrentSpanExporter implements SpanExporter {

  private final SpanExporter delegate;
  private final int maxConcurrentExports;

  private final Set<CompletableResultCode> activeExportResults =
      Collections.newSetFromMap(new ConcurrentHashMap<>());

  public ConcurrentSpanExporter(SpanExporter delegate, int maxConcurrentExports) {
    this.delegate = delegate;
    this.maxConcurrentExports = maxConcurrentExports;
  }

  @Override
  public CompletableResultCode export(Collection<SpanData> spans) {
    return maybeAddToActiveExportResults(delegate.export(spans));
  }

  private CompletableResultCode maybeAddToActiveExportResults(CompletableResultCode result) {
    if (activeExportResults.size() >= maxConcurrentExports) {
      return result;
    }
    activeExportResults.add(result);
    result.whenComplete(() -> activeExportResults.remove(result));
    return CompletableResultCode.ofSuccess();
  }

  @Override
  public CompletableResultCode flush() {
    return CompletableResultCode.ofAll(activeExportResults);
  }

  @Override
  public CompletableResultCode shutdown() {
    return delegate.shutdown();
  }
}

Btw, not sure if it makes sense for the BatchSpanProcessor to officially support span exporters that don't return a real CompletableResultCode, e.g. by calling flush on the SpanExporter after it flushes the worker. I'll open an issue to discuss/track.

@trask trask closed this Mar 14, 2022
@trask trask deleted the batch-span-processor-some-concurrency branch April 22, 2022 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants