Skip to content

Latest commit

 

History

History
46 lines (35 loc) · 5.51 KB

metrics.md

File metadata and controls

46 lines (35 loc) · 5.51 KB

Definition of Metrics

Following metrics are separated into four different areas with overall twenty metrics. The first area concerns the basic setup of distributed tracing. When those Boolean metrics are fulfilled by a service, RPCs to the service and made by the service are successfully represented in the resulting traces. The second area covers best practices that have been proposed by practitioners (see Mastering Distributed Tracing and Distributed Tracing in Practice). Different kinds of coverage measurements are put in the third area. Here is already to say that some coverage measurements such as SpansPerService or AttributeFunctionRatio may only have informative value in specific use cases.

So far, those three areas of metrics are kept general meaning that they can be applied to several instrumentation APIs. The last area, on the other hand, looks at recommendations of features that are available by OpenTelemetry which doesn’t mean they are not provided by other instrumentation APIs. Their use depends on the application at hand and needs to be evaluated by the developer or DevOps engineer.

Setup

Metric (output primitive) Rationale behind
HasUpToDateVersions (boolean) Software evolves over time and undergoes version updates. To run properly library versions should be always up-to-date.
HasGeneralConfiguration (boolean) Exporter, span processor and trace provider are common components of every instrumentation API and the cornerstones to start a trace or to generate spans which will be further processed by the tracing backend. Hence, they need to be configured.
HasResources (boolean) Resources are special types of attributes that apply to all spans generated by a service. They include essential information such as the service name.
EnablesContextPropagation (boolean) Context propagation is the funda- mental feature that separates distributed tracing from traditional logging and metrics since it allows you to share values (including a trace id) across multiple services. Therefore, the trace provider needs to be globally available, and the propagator needs to be configured.
WrapsEndpoints (boolean) Through wrapper and helper functions (which are already available for common frameworks or libraries) endpoints can be automat- ically instrumented. Depending on the endpoint design those plugins make use of the middleware pattern or interceptors.
ShutdownsTracerProvider (boolean) Since spans might be batched together, intentional or unintentional process shutdowns cause the loss of not exported spans. A proper shutdown of the trace provider triggers the transmission of the last spans automatically.

Best Practice

Metric (output primitive) Rationale behind
SetsErrorState (boolean) All error conditions under a given span appropriately set the span status to an error state.
EndsSpans (boolean) All spans that are started will be finished.
UsesSemanticConventions (boolean) Semantic conventions provide a consistent presentation of data.
UsesNamespaces (boolean) Attributes are meant for filtering and thereby should include namespaces (e.g. service name) to prevent collisions between key names.
IncludesDependencyVersion (boolean) Spans that represent work by a de- pendency have an attribute for that dependency and its version.
IncludesUnits (boolean) Attributes with numerical values should include the unit of measurement in the key name (e.g. payload size kb).
ExcludesPersonalInformation (boolean) Attributes should not contain any personal identifiable information.

Coverage

Metric (output primitive) Rationale behind
InstrumentedServices (float) In order to exploit the total potential of dis- tributed tracing, as many services as possible need to participate meaning they need to be instrumented. This metric gives the current state of instrumented services.
SpansPerService (int) This metric can be used to compare di↵erent services and to challenge the number of spans created per service which might be too high or too low.
AttributeFunctionRatio (float) In general attributes are critical for the ability to filter, search through, visualize, or otherwise analyze in the aggregate. To provide more clarity about the system it might help to see how many functions tag their results in an attribute.
ErrorCoverage (float) In order to understand the root cause of a system failure, providing error events in a trace can be insightful. This metric detects the coverage of instrumented error events.

Recommendation

Metric (output primitive) Rationale behind
UsesBatchProcessing (boolean) OpenTelemetry allows you to send completed spans immediately to the exporter which might have a negative impact on service performance. A recommended way is to export spans in batches.
UsesSampling (boolean) OpenTelemetry exports all spans by default. The large amount of evolving data might have a negative impact on network performance and storage. A recommended way is to consider a sampling strategy which restricts the amount of generated traces.
UsesResourceDetector (boolean) OpenTelemetry provides the possibility to automatically scan the system for relevant data and attach it to the spans created. A recommended way is to at least consider it.