Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic convention for timing durations #1142

Closed

Conversation

justinfoote
Copy link
Member

Fixes #1035

Changes

This PR adds semantic conventions for metrics that represent how long an operation takes to run. It includes guidance about how to incorporate tracing semantic conventions, but only in the case that the operation is already described there.

In addition to guiding the creation of metric instruments, I'm hoping that this is sort of the base class for all of the other duration metric semantic conventions, and it can serve as a guide to future spec writers about our general conventions around this sort of metric.

Question: Does this warrant a changelog update?

Related oteps #129


### Value

The duration of each operation **in seconds** should be recorded onto this value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why seconds?

Why not milliseconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the precedent set in our system metrics like system.cpu.time.

I see this as a placeholder until a PR lands for standardization of units, which I expect will define a units label with one of the UCUM standard unit and prefix abbreviations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @justinfoote

I was hoping that with standardization of units we would not have to specify which timing unit will be used. The semantics are the unchanged by the units, for data carried over OpenTelemetry or any protocol that can preserve units.

Exporters may want to convert time units, of course. Statsd exporters would use milliseconds, conventionally. If the stopwatch measures nanoseconds, it should be OK to record that precision too. @kenfinnigan does this address your concern? If so, could we add a TODO about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding standardization of units correctly, there would be a "units" label that has, for instance, "seconds", and the meter would just have a value of "5"?

For me, that aligns with how Micrometer does units, and I'd be good with that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a new label is needed because the unit is already a part of the instrument definition. This could effect the value type (float or int) of the instruments though, should we recommend using an int of the original precision?

If the stopwatch measures nanoseconds, it should be OK to record that precision too.

Like in this case, prefer int nanoseconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. It's part of the instrument, and it's a separate field in the proto.

I would support the idea of allowing int nanoseconds. But I think this is part of the larger can of worms that is #705.

@jmacd jmacd added the spec:metrics Related to the specification/metrics directory label Oct 27, 2020
operation being timed, like this:

```
{category}.{subcategory}.duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the span name as prefix here?

Suggested change
{category}.{subcategory}.duration
{spanname}.duration

It is supposed to have low cardinality.

Copy link
Member Author

@justinfoote justinfoote Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have low cardinality, and it would support correlation between spans and metrics, which is certainly a goal of mine. But I see two reasons not to use the span name:

First, there may not be a span for the operation being timed here (though really, in this case, I think we should recommend both a span and a metric for an operation whose duration is important).

Second, I don't think the most meaningful aggregation is per-span-name. That is, I'd like to see http.server.duration to aggregate response time across all of my service's endpoints, and I'd like to be able to facet that graph by the HTTP method and path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about a convention to include span.name as a label when there is a matching span?

Copy link
Member

@Oberon00 Oberon00 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there may not be a span for the operation being timed here

That is not really relevant. You can still use the name that a span tracing that operation would have.

Second, I don't think the most meaningful aggregation is per-span-name.

This is something that was discussed a bit in #557, but I guess is still an open point. However, if you come up with some categorization scheme here, it would certainly also be desirable as span attribute (span.category?) and warrant broader discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we agree about the first point, but in practice, an instrumentation writer may find a scenario in which there is no span, and the spec doesn't include any semantics for such a span, but they still want to create a timing metric for the operation. I think the correct approach would be to encourage that instrumentation author to provide semantics for the sort of operation they are timing.

I do indeed think it would be useful to have a span.category attribute. We have an implied category already, with separate semantic conventions for database, rpc, faas, etc... I think it would be worthwhile to add that explicitly to spans.
...but, I don't want to tackle that discussion in this PR. This is about generic timing metrics. It's intended to be guidance to spec writers about how to write category-specific specs, and guidance to instrumentation authors about how to construct a timing metric for something that isn't explicitly called out in one of our category-specific specs.

How about if I create an issue to have the span.category discussion separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1161


### Value

The duration of each operation **in seconds** should be recorded onto this value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a new label is needed because the unit is already a part of the instrument definition. This could effect the value type (float or int) of the instruments though, should we recommend using an int of the original precision?

If the stopwatch measures nanoseconds, it should be OK to record that precision too.

Like in this case, prefer int nanoseconds?

Co-authored-by: Aaron Abbott <aaronabbott@google.com>
operation being timed, like this:

```
{category}.{subcategory}.duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a situation where the subcategory would be omitted? Should it be optional?

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I'd like to refer any remaining questions over units to #1177.

@jmacd
Copy link
Contributor

jmacd commented Nov 12, 2020

There are two remaining questions associated with the use of "seconds" here that are partly addressed in #1177:

  • Should it be OK to mix units with different order-of-magnitude prefixes? (Yes, but they can't be aggregated together.)
  • Should it be OK to mix integer and floating-point values for the same metric name? (Open question? I'd say: Yes, but they can't be aggregated together.)

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 20, 2020
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 27, 2020
@jgals
Copy link

jgals commented Dec 4, 2020

@justinfoote and @jmacd, would one of you be able to reopen this PR? I don't think it should have been closed as inactive. 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define semantic conventions for timing metrics
8 participants