Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger Exporter Performance #1254

Merged
merged 8 commits into from Sep 11, 2020
Merged

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Sep 10, 2020

Changes

  • Removed the extra batching going on in Jaeger Exporter. It isn't really needed now that BatchExportActivityProcessor is doing the job. When it is given a batch to export it will build one or more UDP packets depending on if the data <= the max packet size option or not.
  • Fixed up the allocations so memory usage is ~flat again. The idea is for the exporter to allocate all the memory it needs for a full batch and then use that memory over and over again.

Performance before:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
JaegerExporter_Batching 1 10000 120.9 ms 1.78 ms 1.58 ms - - - 512.27 KB
JaegerExporter_Batching 10 10000 1,182.4 ms 12.02 ms 11.25 ms - - - 2901.73 KB
JaegerExporter_Batching 100 10000 11,772.2 ms 99.48 ms 93.05 ms 4000.0000 4000.0000 4000.0000 26792.54 KB

Performance after:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
JaegerExporter_Batching 1 10000 119.2 ms 2.38 ms 2.64 ms - - - 236.35 KB
JaegerExporter_Batching 10 10000 1,160.6 ms 14.53 ms 12.13 ms - - - 238.03 KB
JaegerExporter_Batching 100 10000 11,700.6 ms 142.61 ms 133.40 ms - - - 240.22 KB
  • CHANGELOG.md updated for non-trivial changes
  • Unit tests cleanup.

activity.TagObjects,
ref jaegerTags,
ProcessActivityTagRef);
activity.EnumerateTagValues(ref jaegerTags);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanwest FYI I updated this for the new extension. It is a slightly easier API to use, and a bit faster because it doesn't hit the concurrent dictionary each time.

{
foreach (var batch in this.CurrentBatches)
{
var task = this.thriftClient.WriteBatchAsync(batch.Value, CancellationToken.None);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I know someone will ask 😄

The thriftClient instance Jaeger is using is hooked up to a UdpClient. The API is all async but ultimately writes to a socket buffer and returns completely synchronously. To save some perf I skipped awaiting the task. The #if DEBUG checks in here are just in case that ever changes.

@@ -155,13 +119,25 @@ private Activity CreateTestActivity()
ActivityTraceFlags.Recorded)),
};

return activitySource.StartActivity(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was returning null before because there was nothing sampling the source, which caused the benchmarks to blow up. ActivitySource API scoffs at the pit of success! 🕳️

@codecov
Copy link

codecov bot commented Sep 11, 2020

Codecov Report

Merging #1254 into master will increase coverage by 0.01%.
The diff coverage is 84.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1254      +/-   ##
==========================================
+ Coverage   79.01%   79.02%   +0.01%     
==========================================
  Files         216      215       -1     
  Lines        6270     6169     -101     
==========================================
- Hits         4954     4875      -79     
+ Misses       1316     1294      -22     
Impacted Files Coverage Δ
...Jaeger/Implementation/JaegerExporterEventSource.cs 50.00% <0.00%> (+6.25%) ⬆️
...Telemetry.Exporter.Jaeger/JaegerExporterOptions.cs 100.00% <ø> (ø)
src/OpenTelemetry/Internal/PooledList.cs 72.00% <50.00%> (+3.37%) ⬆️
...rc/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs 84.04% <81.53%> (+14.27%) ⬆️
...nTelemetry.Exporter.Jaeger/Implementation/Batch.cs 84.44% <92.85%> (+1.51%) ⬆️
...ry.Exporter.Jaeger/Implementation/EmitBatchArgs.cs 100.00% <100.00%> (ø)
....Jaeger/Implementation/JaegerActivityExtensions.cs 95.56% <100.00%> (-0.11%) ⬇️
...porter.Jaeger/Implementation/JaegerThriftClient.cs 100.00% <100.00%> (ø)
...ZPages/Implementation/ZPagesExporterEventSource.cs 56.25% <0.00%> (-6.25%) ⬇️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 77.77% <0.00%> (-3.18%) ⬇️
... and 3 more

@CodeBlanch CodeBlanch marked this pull request as ready for review September 11, 2020 03:08
@CodeBlanch CodeBlanch requested a review from a team as a code owner September 11, 2020 03:08
{
spanProcess = new Process(spanServiceName, this.Process.Tags);
spanProcess.Message = this.BuildThriftMessage(spanProcess).ToArray();
this.processCache.Add(spanServiceName, spanProcess);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - is there a need to put an upper bound (something like splay tree with a max size) or we are pretty confident that there won't be too many unique service names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Kind of related to #1235. I feel like it would be a special/rare case where you would be connecting to an ever-growing list of services. Some kind of gateway thing perhaps?

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid.

One idea for consideration - the batching/grouping logic would save the bandwidth by only sending the service name once for a set of spans (if they share the same service name), while it comes with the cost of extra dictionary maintenance + GC overhead + complex code, wonder if it is a break even deal.

@cijothomas cijothomas merged commit d8c3ea3 into open-telemetry:master Sep 11, 2020
@eddynaka eddynaka deleted the jaeger-perf branch September 11, 2020 14:24
@CodeBlanch
Copy link
Member Author

@reyang I've been thinking about your comment. It's definitely an interesting idea. One thing to note, it's not just the service name we send in that batch header. Process can also have tags. Here's where the message is constructed:

internal async Task WriteAsync(TProtocol oprot, CancellationToken cancellationToken)
{
oprot.IncrementRecursionDepth();
try
{
var struc = new TStruct("Process");
await oprot.WriteStructBeginAsync(struc, cancellationToken).ConfigureAwait(false);
var field = new TField
{
Name = "serviceName",
Type = TType.String,
ID = 1,
};
await oprot.WriteFieldBeginAsync(field, cancellationToken).ConfigureAwait(false);
await oprot.WriteStringAsync(this.ServiceName, cancellationToken).ConfigureAwait(false);
await oprot.WriteFieldEndAsync(cancellationToken).ConfigureAwait(false);
if (this.Tags != null)
{
field.Name = "tags";
field.Type = TType.List;
field.ID = 2;
await oprot.WriteFieldBeginAsync(field, cancellationToken).ConfigureAwait(false);
{
await oprot.WriteListBeginAsync(new TList(TType.Struct, this.Tags.Count), cancellationToken).ConfigureAwait(false);
foreach (var jt in this.Tags)
{
await jt.Value.WriteAsync(oprot, cancellationToken).ConfigureAwait(false);
}
await oprot.WriteListEndAsync(cancellationToken).ConfigureAwait(false);
}
await oprot.WriteFieldEndAsync(cancellationToken).ConfigureAwait(false);
}
await oprot.WriteFieldStopAsync(cancellationToken).ConfigureAwait(false);
await oprot.WriteStructEndAsync(cancellationToken).ConfigureAwait(false);
}
finally
{
oprot.DecrementRecursionDepth();
}
}

There's probably a sweet spot where the process data is small enough we should duplicate it on every span? We cache the bytes for those messages so it would be an easy check to add, just not sure how to figure out the size for the sweet spot.

I just made a version that sends a Jaeger Batch (process + span) for every Activity:

Before:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
JaegerExporter_Batching 1 10000 121.4 ms 2.33 ms 2.29 ms 122.1 ms - - - 236.35 KB
JaegerExporter_Batching 10 10000 1,207.3 ms 24.08 ms 65.52 ms 1,179.5 ms - - - 236.61 KB
JaegerExporter_Batching 100 10000 11,968.4 ms 230.42 ms 256.12 ms 11,997.0 ms - - - 252.54 KB

After:

Method NumberOfBatches NumberOfSpans Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
JaegerExporter_Batching 1 10000 129.1 ms 2.57 ms 5.07 ms 126.6 ms - - - 17.27 KB
JaegerExporter_Batching 10 10000 1,243.3 ms 9.24 ms 7.72 ms 1,244.6 ms - - - 17.36 KB
JaegerExporter_Batching 100 10000 12,502.6 ms 120.67 ms 94.21 ms 12,508.5 ms - - - 17.36 KB

Much less memory used up in the process, but more bytes written to the socket (not tracked in the benchmark). A bit slower too, which was unexpected. Not sure if that is statistically significant. Could be the result of copying many more small buffers to the socket?

Where do you think we should go from here?

@reyang
Copy link
Member

reyang commented Sep 14, 2020

Much less memory used up in the process, but more bytes written to the socket (not tracked in the benchmark). A bit slower too, which was unexpected. Not sure if that is statistically significant. Could be the result of copying many more small buffers to the socket?

This could be tested by replacing the actual socket syscall with a "drop-on-the-floor" dummy function?

Where do you think we should go from here?

The current perf results doesn't give a strong enough reason to make the change. Where is the perf test code (that has the "process + span" change)? - I wish to take a look.

@CodeBlanch
Copy link
Member Author

@reyang

This could be tested by replacing the actual socket syscall with a "drop-on-the-floor" dummy function?

Sorry, my statement was misleading. The benchmark isn't actually writing to a socket, it writes to a "black hole" transport which just drops the bytes. I should have said: "Could be the result of copying many more small buffers to the stream?" That is more accurate with what is actually going on, I think.

Where is the perf test code

Check out this branch: https://github.com/CodeBlanch/opentelemetry-dotnet/tree/jaeger-process-batch

Luckily I stashed the changes when I was doing this.

{
var task = this.thriftClient.WriteBatchAsync(batch.Value, CancellationToken.None);
#if DEBUG
if (task.Status != TaskStatus.RanToCompletion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like a Debug.Assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on #1262.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants