Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic convention for timing durations #1142

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions specification/metrics/semantic_conventions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

The following semantic conventions surrounding metrics are defined:

* [Timing](timing-metrics.md): General semantic conventions for timed operations.
* [HTTP Metrics](http-metrics.md): Semantic conventions and instruments for HTTP metrics.
* [System Metrics](system-metrics.md): Semantic conventions and instruments for standard system metrics.
* [Process Metrics](process-metrics.md): Semantic conventions and instruments for standard process metrics.
Expand Down
66 changes: 66 additions & 0 deletions specification/metrics/semantic_conventions/timing-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# General Semantic Conventions for Timed Operations

<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [Overview](#overview)
- [Duration Instrument](#duration-instrument)
* [Value](#value)
* [Metric Labels](#metric-labels)

<!-- tocstop -->

## Overview

This document describes the metric instruments and labels to use when measuring
the timing of any operation.

This is the generic specification; there are context-specific specifications
that follow from it for the following:

* [HTTP operations](./http-metrics.md)
* TODO: add the rest of these when they're written
MrAlias marked this conversation as resolved.
Show resolved Hide resolved

## Duration Instrument

For each category of operation, a single `ValueRecorder` metric instrument
should be created.
Its name should include the name of the category, followed by the kind of
operation being timed, like this:

```
{category}.{subcategory}.duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the span name as prefix here?

Suggested change
{category}.{subcategory}.duration
{spanname}.duration

It is supposed to have low cardinality.

Copy link
Member Author

@justinfoote justinfoote Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have low cardinality, and it would support correlation between spans and metrics, which is certainly a goal of mine. But I see two reasons not to use the span name:

First, there may not be a span for the operation being timed here (though really, in this case, I think we should recommend both a span and a metric for an operation whose duration is important).

Second, I don't think the most meaningful aggregation is per-span-name. That is, I'd like to see http.server.duration to aggregate response time across all of my service's endpoints, and I'd like to be able to facet that graph by the HTTP method and path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about a convention to include span.name as a label when there is a matching span?

Copy link
Member

@Oberon00 Oberon00 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there may not be a span for the operation being timed here

That is not really relevant. You can still use the name that a span tracing that operation would have.

Second, I don't think the most meaningful aggregation is per-span-name.

This is something that was discussed a bit in #557, but I guess is still an open point. However, if you come up with some categorization scheme here, it would certainly also be desirable as span attribute (span.category?) and warrant broader discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we agree about the first point, but in practice, an instrumentation writer may find a scenario in which there is no span, and the spec doesn't include any semantics for such a span, but they still want to create a timing metric for the operation. I think the correct approach would be to encourage that instrumentation author to provide semantics for the sort of operation they are timing.

I do indeed think it would be useful to have a span.category attribute. We have an implied category already, with separate semantic conventions for database, rpc, faas, etc... I think it would be worthwhile to add that explicitly to spans.
...but, I don't want to tackle that discussion in this PR. This is about generic timing metrics. It's intended to be guidance to spec writers about how to write category-specific specs, and guidance to instrumentation authors about how to construct a timing metric for something that isn't explicitly called out in one of our category-specific specs.

How about if I create an issue to have the span.category discussion separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1161

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a situation where the subcategory would be omitted? Should it be optional?

```

If the duration is derived from spans that follow one of the common
semantic-conventional _areas_, the category should be the label prefix used in
that semantic convention, and the subcategory should be the span kind.

Example names:

* `http.server.duration`
* `db.client.duration`
* `messaging.producer.duration`
* `rpc.client.duration`

### Value

The duration of each operation **in seconds** should be recorded onto this value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why seconds?

Why not milliseconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the precedent set in our system metrics like system.cpu.time.

I see this as a placeholder until a PR lands for standardization of units, which I expect will define a units label with one of the UCUM standard unit and prefix abbreviations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @justinfoote

I was hoping that with standardization of units we would not have to specify which timing unit will be used. The semantics are the unchanged by the units, for data carried over OpenTelemetry or any protocol that can preserve units.

Exporters may want to convert time units, of course. Statsd exporters would use milliseconds, conventionally. If the stopwatch measures nanoseconds, it should be OK to record that precision too. @kenfinnigan does this address your concern? If so, could we add a TODO about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding standardization of units correctly, there would be a "units" label that has, for instance, "seconds", and the meter would just have a value of "5"?

For me, that aligns with how Micrometer does units, and I'd be good with that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a new label is needed because the unit is already a part of the instrument definition. This could effect the value type (float or int) of the instruments though, should we recommend using an int of the original precision?

If the stopwatch measures nanoseconds, it should be OK to record that precision too.

Like in this case, prefer int nanoseconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. It's part of the instrument, and it's a separate field in the proto.

I would support the idea of allowing int nanoseconds. But I think this is part of the larger can of worms that is #705.

recorder.

### Metric Labels

It is up to the implementor to annotate the duration instruments with labels
specific to the represented operation, but care should be taken when adding
labels to avoid excessive cardinality.

If there is a corresponding span for the operation with defined trace semantic
conventions, the labels should follow the attribute guidelines found in those
semantic conventions, with the following caveats:

1. Avoid including any high-cardinality attributes, or replace them with a
low-cardinality substitution.
2. Replace non-string values with reasonable string representations when
possible, and do not include attributes that cannot be easily represented as a
string.