Jaeger Exporter Performance #1254

CodeBlanch · 2020-09-10T07:07:18Z

Changes

Removed the extra batching going on in Jaeger Exporter. It isn't really needed now that BatchExportActivityProcessor is doing the job. When it is given a batch to export it will build one or more UDP packets depending on if the data <= the max packet size option or not.
Fixed up the allocations so memory usage is ~flat again. The idea is for the exporter to allocate all the memory it needs for a full batch and then use that memory over and over again.

Performance before:

Method	NumberOfBatches	NumberOfSpans	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
JaegerExporter_Batching	1	10000	120.9 ms	1.78 ms	1.58 ms	-	-	-	512.27 KB
JaegerExporter_Batching	10	10000	1,182.4 ms	12.02 ms	11.25 ms	-	-	-	2901.73 KB
JaegerExporter_Batching	100	10000	11,772.2 ms	99.48 ms	93.05 ms	4000.0000	4000.0000	4000.0000	26792.54 KB

Performance after:

Method	NumberOfBatches	NumberOfSpans	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
JaegerExporter_Batching	1	10000	119.2 ms	2.38 ms	2.64 ms	-	-	-	236.35 KB
JaegerExporter_Batching	10	10000	1,160.6 ms	14.53 ms	12.13 ms	-	-	-	238.03 KB
JaegerExporter_Batching	100	10000	11,700.6 ms	142.61 ms	133.40 ms	-	-	-	240.22 KB

CHANGELOG.md updated for non-trivial changes
Unit tests cleanup.

…in some perf improvements.

CodeBlanch · 2020-09-10T07:10:45Z

src/OpenTelemetry.Exporter.Jaeger/Implementation/JaegerActivityExtensions.cs

-                activity.TagObjects,
-                ref jaegerTags,
-                ProcessActivityTagRef);
+            activity.EnumerateTagValues(ref jaegerTags);


@alanwest FYI I updated this for the new extension. It is a slightly easier API to use, and a bit faster because it doesn't hit the concurrent dictionary each time.

CodeBlanch · 2020-09-10T07:18:03Z

src/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs

+            {
+                foreach (var batch in this.CurrentBatches)
+                {
+                    var task = this.thriftClient.WriteBatchAsync(batch.Value, CancellationToken.None);


Just because I know someone will ask 😄

The thriftClient instance Jaeger is using is hooked up to a UdpClient. The API is all async but ultimately writes to a socket buffer and returns completely synchronously. To save some perf I skipped awaiting the task. The #if DEBUG checks in here are just in case that ever changes.

CodeBlanch · 2020-09-10T07:24:31Z

test/Benchmarks/Exporter/JaegerExporterBenchmarks.cs

@@ -155,13 +119,25 @@ private Activity CreateTestActivity()
                    ActivityTraceFlags.Recorded)),
            };

-            return activitySource.StartActivity(


This was returning null before because there was nothing sampling the source, which caused the benchmarks to blow up. ActivitySource API scoffs at the pit of success! 🕳️

codecov · 2020-09-11T03:04:28Z

Codecov Report

Merging #1254 into master will increase coverage by 0.01%.
The diff coverage is 84.90%.

@@            Coverage Diff             @@
##           master    #1254      +/-   ##
==========================================
+ Coverage   79.01%   79.02%   +0.01%     
==========================================
  Files         216      215       -1     
  Lines        6270     6169     -101     
==========================================
- Hits         4954     4875      -79     
+ Misses       1316     1294      -22

Impacted Files	Coverage Δ
...Jaeger/Implementation/JaegerExporterEventSource.cs	`50.00% <0.00%> (+6.25%)`	⬆️
...Telemetry.Exporter.Jaeger/JaegerExporterOptions.cs	`100.00% <ø> (ø)`
src/OpenTelemetry/Internal/PooledList.cs	`72.00% <50.00%> (+3.37%)`	⬆️
...rc/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs	`84.04% <81.53%> (+14.27%)`	⬆️
...nTelemetry.Exporter.Jaeger/Implementation/Batch.cs	`84.44% <92.85%> (+1.51%)`	⬆️
...ry.Exporter.Jaeger/Implementation/EmitBatchArgs.cs	`100.00% <100.00%> (ø)`
....Jaeger/Implementation/JaegerActivityExtensions.cs	`95.56% <100.00%> (-0.11%)`	⬇️
...porter.Jaeger/Implementation/JaegerThriftClient.cs	`100.00% <100.00%> (ø)`
...ZPages/Implementation/ZPagesExporterEventSource.cs	`56.25% <0.00%> (-6.25%)`	⬇️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs	`77.77% <0.00%> (-3.18%)`	⬇️
... and 3 more

reyang · 2020-09-11T04:40:57Z

src/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs

+            {
+                spanProcess = new Process(spanServiceName, this.Process.Tags);
+                spanProcess.Message = this.BuildThriftMessage(spanProcess).ToArray();
+                this.processCache.Add(spanServiceName, spanProcess);


Curious - is there a need to put an upper bound (something like splay tree with a max size) or we are pretty confident that there won't be too many unique service names?

Not sure. Kind of related to #1235. I feel like it would be a special/rare case where you would be connecting to an ever-growing list of services. Some kind of gateway thing perhaps?

reyang

Looks solid.

One idea for consideration - the batching/grouping logic would save the bandwidth by only sending the service name once for a set of spans (if they share the same service name), while it comes with the cost of extra dictionary maintenance + GC overhead + complex code, wonder if it is a break even deal.

CodeBlanch · 2020-09-13T18:39:49Z

@reyang I've been thinking about your comment. It's definitely an interesting idea. One thing to note, it's not just the service name we send in that batch header. Process can also have tags. Here's where the message is constructed:

opentelemetry-dotnet/src/OpenTelemetry.Exporter.Jaeger/Process.cs

Lines 70 to 118 in 768eaa3

    
           internal async Task WriteAsync(TProtocol oprot, CancellationToken cancellationToken) 
        
           { 
        
               oprot.IncrementRecursionDepth(); 
        
               try 
        
               { 
        
                   var struc = new TStruct("Process"); 
        
                   await oprot.WriteStructBeginAsync(struc, cancellationToken).ConfigureAwait(false); 
        
                   var field = new TField 
        
                   { 
        
                       Name = "serviceName", 
        
                       Type = TType.String, 
        
                       ID = 1, 
        
                   }; 
        
                   await oprot.WriteFieldBeginAsync(field, cancellationToken).ConfigureAwait(false); 
        
                   await oprot.WriteStringAsync(this.ServiceName, cancellationToken).ConfigureAwait(false); 
        
                   await oprot.WriteFieldEndAsync(cancellationToken).ConfigureAwait(false); 
        
                   if (this.Tags != null) 
        
                   { 
        
                       field.Name = "tags"; 
        
                       field.Type = TType.List; 
        
                       field.ID = 2; 
        
                       await oprot.WriteFieldBeginAsync(field, cancellationToken).ConfigureAwait(false); 
        
                       { 
        
                           await oprot.WriteListBeginAsync(new TList(TType.Struct, this.Tags.Count), cancellationToken).ConfigureAwait(false); 
        
                           foreach (var jt in this.Tags) 
        
                           { 
        
                               await jt.Value.WriteAsync(oprot, cancellationToken).ConfigureAwait(false); 
        
                           } 
        
                           await oprot.WriteListEndAsync(cancellationToken).ConfigureAwait(false); 
        
                       } 
        
                       await oprot.WriteFieldEndAsync(cancellationToken).ConfigureAwait(false); 
        
                   } 
        
                   await oprot.WriteFieldStopAsync(cancellationToken).ConfigureAwait(false); 
        
                   await oprot.WriteStructEndAsync(cancellationToken).ConfigureAwait(false); 
        
               } 
        
               finally 
        
               { 
        
                   oprot.DecrementRecursionDepth(); 
        
               } 
        
           }

There's probably a sweet spot where the process data is small enough we should duplicate it on every span? We cache the bytes for those messages so it would be an easy check to add, just not sure how to figure out the size for the sweet spot.

I just made a version that sends a Jaeger Batch (process + span) for every Activity:

Before:

Method	NumberOfBatches	NumberOfSpans	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
JaegerExporter_Batching	1	10000	121.4 ms	2.33 ms	2.29 ms	122.1 ms	-	-	-	236.35 KB
JaegerExporter_Batching	10	10000	1,207.3 ms	24.08 ms	65.52 ms	1,179.5 ms	-	-	-	236.61 KB
JaegerExporter_Batching	100	10000	11,968.4 ms	230.42 ms	256.12 ms	11,997.0 ms	-	-	-	252.54 KB

After:

Method	NumberOfBatches	NumberOfSpans	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
JaegerExporter_Batching	1	10000	129.1 ms	2.57 ms	5.07 ms	126.6 ms	-	-	-	17.27 KB
JaegerExporter_Batching	10	10000	1,243.3 ms	9.24 ms	7.72 ms	1,244.6 ms	-	-	-	17.36 KB
JaegerExporter_Batching	100	10000	12,502.6 ms	120.67 ms	94.21 ms	12,508.5 ms	-	-	-	17.36 KB

Much less memory used up in the process, but more bytes written to the socket (not tracked in the benchmark). A bit slower too, which was unexpected. Not sure if that is statistically significant. Could be the result of copying many more small buffers to the socket?

Where do you think we should go from here?

reyang · 2020-09-14T16:30:22Z

Much less memory used up in the process, but more bytes written to the socket (not tracked in the benchmark). A bit slower too, which was unexpected. Not sure if that is statistically significant. Could be the result of copying many more small buffers to the socket?

This could be tested by replacing the actual socket syscall with a "drop-on-the-floor" dummy function?

Where do you think we should go from here?

The current perf results doesn't give a strong enough reason to make the change. Where is the perf test code (that has the "process + span" change)? - I wish to take a look.

CodeBlanch · 2020-09-15T03:42:06Z

@reyang

This could be tested by replacing the actual socket syscall with a "drop-on-the-floor" dummy function?

Sorry, my statement was misleading. The benchmark isn't actually writing to a socket, it writes to a "black hole" transport which just drops the bytes. I should have said: "Could be the result of copying many more small buffers to the stream?" That is more accurate with what is actually going on, I think.

Where is the perf test code

Check out this branch: https://github.com/CodeBlanch/opentelemetry-dotnet/tree/jaeger-process-batch

Luckily I stashed the changes when I was doing this.

ThomsonTan · 2020-09-22T20:59:53Z

src/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs

+                {
+                    var task = this.thriftClient.WriteBatchAsync(batch.Value, CancellationToken.None);
+#if DEBUG
+                    if (task.Status != TaskStatus.RanToCompletion)


This sounds more like a Debug.Assert?

Updated on #1262.

CodeBlanch added 3 commits September 8, 2020 20:21

Fixed broken jaeger benchmarks.

82763f9

Setting up benchmarks for improvement testing.

9678b20

Removed the double-batching going on with the Jaeger Exporter. Added …

42d333d

…in some perf improvements.

CodeBlanch commented Sep 10, 2020

View reviewed changes

CodeBlanch added 4 commits September 10, 2020 19:11

Merging latest from master.

197e4ef

Fixing tests.

69c86eb

Porting JaegerUdpBatcher tests to JaegerExporterTests.

3efe50c

Fixed some whitespace.

abfd546

Updated CHANGELOG.

5ee1be6

CodeBlanch marked this pull request as ready for review September 11, 2020 03:08

CodeBlanch requested a review from a team as a code owner September 11, 2020 03:08

reyang reviewed Sep 11, 2020

View reviewed changes

reyang approved these changes Sep 11, 2020

View reviewed changes

reyang added the pr:please-merge label Sep 11, 2020

cijothomas merged commit d8c3ea3 into open-telemetry:master Sep 11, 2020

eddynaka deleted the jaeger-perf branch September 11, 2020 14:24

CodeBlanch mentioned this pull request Sep 13, 2020

Jaeger Exporter Tweaks #1262

Merged

ThomsonTan reviewed Sep 22, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jaeger Exporter Performance #1254

Jaeger Exporter Performance #1254

CodeBlanch commented Sep 10, 2020 •

edited

CodeBlanch Sep 10, 2020

CodeBlanch Sep 10, 2020

CodeBlanch Sep 10, 2020

codecov bot commented Sep 11, 2020 •

edited

reyang Sep 11, 2020

CodeBlanch Sep 13, 2020

reyang left a comment

CodeBlanch commented Sep 13, 2020

reyang commented Sep 14, 2020

CodeBlanch commented Sep 15, 2020

ThomsonTan Sep 22, 2020

CodeBlanch Sep 23, 2020

Jaeger Exporter Performance #1254

Jaeger Exporter Performance #1254

Conversation

CodeBlanch commented Sep 10, 2020 • edited

Changes

CodeBlanch Sep 10, 2020

Choose a reason for hiding this comment

CodeBlanch Sep 10, 2020

Choose a reason for hiding this comment

CodeBlanch Sep 10, 2020

Choose a reason for hiding this comment

codecov bot commented Sep 11, 2020 • edited

Codecov Report

reyang Sep 11, 2020

Choose a reason for hiding this comment

CodeBlanch Sep 13, 2020

Choose a reason for hiding this comment

reyang left a comment

Choose a reason for hiding this comment

CodeBlanch commented Sep 13, 2020

reyang commented Sep 14, 2020

CodeBlanch commented Sep 15, 2020

ThomsonTan Sep 22, 2020

Choose a reason for hiding this comment

CodeBlanch Sep 23, 2020

Choose a reason for hiding this comment

CodeBlanch commented Sep 10, 2020 •

edited

codecov bot commented Sep 11, 2020 •

edited