fix for performance observed during enrichment process #100

kotharironak · 2021-01-07T19:53:29Z

Issue:
On one of the deployments, it has been observed that the hypertrace-trace-enricher component was legging heavy in the ingestion pipeline and it was not able to come down. We have also observed continuous re-balancing for its consumed topic in kafka.

Observations:
While investigating, we observed that during the enrichment process, couple of enricher uses an intermediate data structure - StructureTraceGraph, and it has been constructed (or processed) proportional to trace size. In the current set of enrichers, the following enrichers,

While measuring, our analysis suggested, that it was the main cause of an issue. Our sampled trace was having 447 spans in a trace, and it used to spend ~12-13 secs totally in the above two enricher's loop for constructing the graph.

Solution:
As the above data structure - StructureTraceGraph - changes during the processing of trace, we are currently solving it by maintaining it in ThreadLocal and re-constructing it whenever the state change instead of constructing it linearly. As a long-term solution, we are evaluating the option of splitting the above structure into mutable/non-mutable components along with adding enrichment context in the pipeline to pass on such states.

Testing:
We have verified the fix by comparing enriched structured trace with this patch and without this patch using the same sample trace which we used earlier for investigation.

codecov · 2021-01-07T19:59:20Z

Codecov Report

Merging #100 (bd93b39) into main (c529c39) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #100      +/-   ##
============================================
- Coverage     68.62%   68.61%   -0.02%     
  Complexity      821      821              
============================================
  Files            84       84              
  Lines          3481     3486       +5     
  Branches        367      367              
============================================
+ Hits           2389     2392       +3     
- Misses          948      950       +2     
  Partials        144      144

Flag	Coverage Δ	Complexity Δ
unit	`68.61% <100.00%> (-0.02%)`	`0.00 <3.00> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...raceenricher/enrichment/AbstractTraceEnricher.java	`100.00% <100.00%> (ø)`	`21.00 <1.00> (ø)`
...chment/enrichers/DefaultServiceEntityEnricher.java	`92.68% <100.00%> (ø)`	`14.00 <0.00> (ø)`
...trace/enricher/StructuredTraceEnrichProcessor.java	`89.28% <100.00%> (+2.32%)`	`7.00 <2.00> (ø)`
...er/enrichment/enrichers/BackendEntityEnricher.java	`65.90% <0.00%> (-3.41%)`	`17.00% <0.00%> (-1.00%)`
...s/resolver/backend/ClientSpanEndpointResolver.java	`100.00% <0.00%> (+8.33%)`	`6.00% <0.00%> (+1.00%)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c529c39...bd93b39. Read the comment docs.

pavan-traceable · 2021-01-08T06:58:08Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+  private static ThreadLocal<StructuredTraceGraph> cachedGraph = new ThreadLocal<>();
+  private static ThreadLocal<StructuredTrace> cachedTrace = new ThreadLocal<>();
+
+  public static StructuredTraceGraph buildGraph(StructuredTrace trace) {


@kotharironak Basic question, Why buildGraph is called for every span. Can we call only once for Trace instead of every span ?

@pavan-traceable Where does the buildGraph get called for every span?

Thats my understanding from this slack chat https://hypertrace.slack.com/archives/G01BLU6M52N/p1609913769019700?thread_ts=1609841154.015200&cid=G01BLU6M52N.. I may be wrong

@pavan-traceable buildGraph is called at 3 places,
ApiBoundaryTypeAttributeEnricher
DefaultServiceEntityEnricher
BackendEntityEnricher
Also we work with complete trace in this service, so buildGraph is invoked at Trace level

pavan-traceable · 2021-01-08T06:58:33Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+    }
+
+    // trace internally changed
+    if(cachedTrace.get().getEntityList().size()  != trace.getEntityList().size() ||


When this scenario will occur ?

Can we pull this out in a separate boolean function? say hasTraceChanged?

What about the set of attributes and enriched attributes? If the attributes and enriched attributes change, the graph structure will stay the same, but will we get the latest set of attributes from the graph nodes?

Basically, Does it create a copy of the Event, or just keeps the reference to it?

It is keeping reference. ApiTraceGraph is using getRootEvents, but it is not in the enrichment process. @skjindal93 Can you also re-check?

For hasTraceChanged currently, remaining condition of functions - isTraceSame and doeTraceExists works together for this function. So, if we move out, we have to move those conditions too. So, thinking of keeping this as-is for now.

skjindal93 · 2021-01-08T07:12:02Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+public class StructuredTraceGraphBuilder {
+  private static final Logger LOG = LoggerFactory.getLogger(StructuredTraceGraphBuilder.class);
+
+  private static ThreadLocal<StructuredTraceGraph> cachedGraph = new ThreadLocal<>();


Why do we want it specific to a Thread? Can we use Supplier instead?

Currently, StructureTraceGraph's state is changing while processing a trace, and we want to re-construct whenever the state affecting its changes. So, we choose thread-local to share the state in a given kstream thread.
@s what do you mean by supplier here? Are you referring to supplying StructureTraceGraph at the start of processing of topology?
cc: @laxmanchekka

@kotharironak We use Suppliers in a similar way to cache the results https://mkyong.com/java8/java-8-supplier-examples/

Though I am not sure, if you can change the entry in supplier based on a certain condition, if the graph structure can potentially change

findingrish · 2021-01-08T08:47:35Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+  private static ThreadLocal<StructuredTraceGraph> cachedGraph = new ThreadLocal<>();
+  private static ThreadLocal<StructuredTrace> cachedTrace = new ThreadLocal<>();
+
+  public static StructuredTraceGraph buildGraph(StructuredTrace trace) {


@pavan-traceable buildGraph is called at 3 places,
ApiBoundaryTypeAttributeEnricher
DefaultServiceEntityEnricher
BackendEntityEnricher
Also we work with complete trace in this service, so buildGraph is invoked at Trace level

findingrish · 2021-01-08T08:52:41Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+public class StructuredTraceGraphBuilder {
+  private static final Logger LOG = LoggerFactory.getLogger(StructuredTraceGraphBuilder.class);
+
+  private static ThreadLocal<StructuredTraceGraph> cachedGraph = new ThreadLocal<>();


I understand that it is a quick optimisation. My question is eventually should we instead of caching the StructuredTraceGraph at thread level, pass around the instance of this graph between enrichers? Anyways the graph is going to be changed for each Trace. We can just create this graph once for every trace at the entry point and other enrichers can use this graph. Any enrichment task can then update the StructuredTraceGraph if there is a need.
That way we also get rid of the need to compare the graphs and building graph multiple times.

Currently, it is another way around, the trace is changing during the enrichment process while processing by going over every span and so we have to re-construct StructureTraceGraph every time if the trace state has been modified. Yes, we have to rethink the current processing, and if it's the right data structure, or can we do it differently.

…-perf-issue

findingrish · 2021-01-11T06:32:05Z

...r-api/src/main/java/org/hypertrace/traceenricher/trace/util/StructuredTraceGraphBuilder.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class StructuredTraceGraphBuilder {


Can we add unit test for this class?

This is a utility wrapper method with a set of ordered condition. I think, you are referring here to test around the structure trace graph itself. Can handle it as part of #105. Can you add more detail to that ticket too?

buchi-busireddy · 2021-01-11T07:35:01Z

@kotharironak I think this code needs to be microbenchmarked at some point because it's so critical.

github-actions · 2021-01-11T08:01:48Z

Unit Test Results

  47 files ±0   47 suites ±0 48s ⏱️ -5s
241 tests ±0 241 ✔️ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 792f8b2. ± Comparison against base commit c529c39.

* fix for performance * functional validation * changes logs * cleaning PR by removing measuing logging codes * Removes un-used logger * Adds comments for redability

fix for performance

f0d87b2

surajpuvvada requested review from buchi-busireddy, skjindal93 and tim-mwangi January 7, 2021 21:17

pavan-traceable reviewed Jan 8, 2021

View reviewed changes

skjindal93 reviewed Jan 8, 2021

View reviewed changes

functional validation

c6a8118

findingrish reviewed Jan 8, 2021

View reviewed changes

kotharironak and others added 2 commits January 8, 2021 16:13

changes logs

4a07319

Merge branch 'main' into fix-for-perf-issue

62d9fb0

This comment has been minimized.

Sign in to view

kotharironak added 2 commits January 11, 2021 09:44

cleaning PR by removing measuing logging codes

499c1aa

Merge remote-tracking branch 'origin/fix-for-perf-issue' into fix-for…

d2fd150

…-perf-issue

This comment has been minimized.

Sign in to view

Removes un-used logger

6ae6b39

kotharironak marked this pull request as ready for review January 11, 2021 04:19

kotharironak requested a review from a team January 11, 2021 04:19

kotharironak changed the title ~~fix for performance~~ fix for performance observed during enrichment process Jan 11, 2021

This comment has been minimized.

Sign in to view

kotharironak requested review from laxmanchekka and ravisingal January 11, 2021 04:24

Adds comments for redability

f2831e1

kotharironak requested review from findingrish, pavan-traceable and skjindal93 January 11, 2021 06:27

This comment has been minimized.

Sign in to view

findingrish reviewed Jan 11, 2021

View reviewed changes

laxmanchekka approved these changes Jan 11, 2021

View reviewed changes

Merge branch 'main' into fix-for-perf-issue

bd93b39

This comment has been minimized.

Sign in to view

kotharironak mentioned this pull request Jan 11, 2021

optimising usage of StructureTraceGraph #105

Closed

kotharironak merged commit 792f8b2 into main Jan 11, 2021

kotharironak deleted the fix-for-perf-issue branch January 11, 2021 07:59

fix for performance observed during enrichment process #100

fix for performance observed during enrichment process #100

Uh oh!

Conversation

kotharironak commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavan-traceable Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findingrish Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

buchi-busireddy commented Jan 11, 2021

Uh oh!

This comment has been minimized.

github-actions bot commented Jan 11, 2021

Unit Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kotharironak commented Jan 7, 2021 •

edited

Loading

codecov bot commented Jan 7, 2021 •

edited

Loading

pavan-traceable Jan 8, 2021 •

edited

Loading

findingrish Jan 8, 2021 •

edited

Loading