Breakdown metrics #564

axw · 2019-06-17T09:49:39Z

Implementation of agent breakdown metrics (elastic/apm#78)

Checklist/TL;DR version of the design:

Closes #528

codecov-io · 2019-06-17T10:15:06Z

Codecov Report

Merging #564 into master will increase coverage by 0.48%.
The diff coverage is 90.82%.

@@            Coverage Diff             @@
##           master     #564      +/-   ##
==========================================
+ Coverage   84.16%   84.64%   +0.48%     
==========================================
  Files         116      120       +4     
  Lines        6894     7190     +296     
==========================================
+ Hits         5802     6086     +284     
- Misses        783      789       +6     
- Partials      309      315       +6

Impacted Files	Coverage Δ
env.go	`77.77% <ø> (ø)`	⬆️
internal/apmlog/logger.go	`82.66% <0%> (-2.27%)`	⬇️
transaction.go	`95% <100%> (+0.21%)`	⬆️
builtin_metrics.go	`83.33% <100%> (+0.2%)`	⬆️
fnv.go	`100% <100%> (ø)`
metrics.go	`90.47% <100%> (+0.15%)`	⬆️
modelwriter.go	`96.15% <100%> (+0.18%)`	⬆️
span.go	`89.82% <100%> (+2.6%)`	⬆️
utils.go	`80.85% <100%> (+0.85%)`	⬆️
model/marshal.go	`78.37% <100%> (+0.21%)`	⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14219ba...98eaf6d. Read the comment docs.

axw · 2019-06-18T06:20:10Z

Here's a description/explanation of some of the more interesting parts.

The Tracer has a "breakdownMetrics" field which has an active/inactive pair of metrics to avoid contention with the metrics gatherer.
Each Transaction and Span has a "childrenTimer" which tracks the duration of direct children, used for computing self-time. Transactions also have a "spanTimings" map, much like in the Java agent.
When a span ends, it notifies its parent's childrenTimer and adds itself to its transaction's spanTimings if the transaction hasn't already ended.
When a transaction ends, it enqueues to the tracer, and the tracer updates the "active" breakdown metrics. If the queue is full, the goroutine that ended the transaction will update the breakdown metrics directly. In either case a lock is held, like the writer critical section in the Java agent implementation. The space for span timings is part of the recyclable object, and so be reused to minimise allocations.
Just before the metrics gatherers run the tracer swaps active/inactive breakdown metrics. The metrics gatherer then translates the breakdown metrics into the metrics structures for serialisation.

Due to the last two points, we do not have long-lived timer objects: we still limit the breakdown metrics to 1000 {transactionType, transactionName, spanType, spanSubtype} tuples, but they may differ between metrics reporting periods.

The locking involved in (4) concerns me. Under the "happy path" where transactions get enqueued to the tracer there will be no contention between transactions, but when the load is high enough that the tracer cannot process transactions quickly enough, we'll introduce contention between transaction goroutines.

felixbarny · 2019-06-18T06:29:51Z

we still limit the breakdown metrics to 1000

The idea was to limit the number of unique metricsets to 1000.

The locking involved in (4) concerns me.

Maybe you should consider the writer-reader phaser the Java agent uses as well. It does not block writers at any time.

axw · 2019-06-18T07:41:54Z

The idea was to limit the number of unique metricsets to 1000.

The important thing is to have a bound on agent memory? Does it matter if it's 1000 unique metrics for the process's lifetime vs. at any one time?

Maybe you should consider the writer-reader phaser the Java agent uses as well. It does not block writers at any time.

Indeed. I misunderstood the guarantees of the phaser. I'll take a closer look at that.

axw force-pushed the breakdown-metrics branch from 6d58ad2 to 63dd5f2 Compare June 18, 2019 04:14

axw force-pushed the breakdown-metrics branch 2 times, most recently from 5e1d0cc to 5c067d7 Compare June 19, 2019 06:42

axw added the [zube]: In Progress label Jun 19, 2019

axw force-pushed the breakdown-metrics branch from 3c4e563 to ad6b364 Compare July 1, 2019 07:43

axw marked this pull request as ready for review July 1, 2019 08:29

axw closed this Jul 1, 2019

axw reopened this Jul 1, 2019

zube bot closed this Jul 1, 2019

zube bot added [zube]: Done and removed [zube]: In Progress labels Jul 1, 2019

zube bot reopened this Jul 1, 2019

zube bot added [zube]: In Progress and removed [zube]: Done labels Jul 1, 2019

zube bot closed this Jul 1, 2019

zube bot added [zube]: Done and removed [zube]: In Progress labels Jul 1, 2019

zube bot reopened this Jul 1, 2019

zube bot added [zube]: In Progress and removed [zube]: Done labels Jul 1, 2019

zube bot closed this Jul 1, 2019

zube bot added [zube]: Done and removed [zube]: In Progress labels Jul 1, 2019

axw reopened this Jul 1, 2019

axw added [zube]: In Review and removed [zube]: Done labels Jul 1, 2019

axw closed this Jul 16, 2019

axw reopened this Jul 16, 2019

zube bot closed this Jul 16, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 16, 2019

zube bot reopened this Jul 16, 2019

zube bot added [zube]: In Review and removed [zube]: Done labels Jul 16, 2019

zube bot closed this Jul 16, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 16, 2019

zube bot reopened this Jul 16, 2019

zube bot added [zube]: In Review and removed [zube]: Done labels Jul 16, 2019

zube bot closed this Jul 16, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 16, 2019

zube bot reopened this Jul 16, 2019

zube bot added [zube]: In Review and removed [zube]: Done labels Jul 16, 2019

axw closed this Jul 17, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 17, 2019

axw reopened this Jul 17, 2019

zube bot added [zube]: Inbox and removed [zube]: Done labels Jul 17, 2019

axw merged commit be583fe into elastic:master Jul 17, 2019

axw deleted the breakdown-metrics branch July 17, 2019 01:44

zube bot added [zube]: Done and removed [zube]: Inbox labels Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breakdown metrics #564

Breakdown metrics #564

axw commented Jun 17, 2019 •

edited

Loading

codecov-io commented Jun 17, 2019 •

edited

Loading

axw commented Jun 18, 2019

felixbarny commented Jun 18, 2019

axw commented Jun 18, 2019

Breakdown metrics #564

Breakdown metrics #564

Conversation

axw commented Jun 17, 2019 • edited Loading

codecov-io commented Jun 17, 2019 • edited Loading

Codecov Report

axw commented Jun 18, 2019

felixbarny commented Jun 18, 2019

axw commented Jun 18, 2019

axw commented Jun 17, 2019 •

edited

Loading

codecov-io commented Jun 17, 2019 •

edited

Loading