Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use total instead of count in the metrics semantic conventions #3457

Closed
emschwartz opened this issue Apr 28, 2023 · 19 comments · Fixed by open-telemetry/semantic-conventions#107
Assignees
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory

Comments

@emschwartz
Copy link

emschwartz commented Apr 28, 2023

What are you trying to achieve?

I am working on a project built on OTel and Prometheus (overview blog post here) and am trying to make the metrics I generate fit both OpenTelemetry and OpenMetrics as best as I can. However, there is an unfortunate and seemingly unnecessary divergence between these two standards with regard to the naming of counters.

The OTel semantic conventions say that count should be used as a suffix while OpenMetrics mandates that counters end with a total suffix. It seems silly that metrics produced with OTel and then exported to Prometheus would end up with a _count_total suffix.

I completely understand if this part of the spec is too far along to change it now. But, if not, I'd like to propose changing the recommendation to say that the word total should be used instead. Those two words are pretty close to synonymous so it doesn't seem like it would be that painful to go with the one chosen by OpenMetrics for that extra little bit of compatibility.

For what it's worth, I personally prefer the word count so I can see why OTel chose it, but I'd opt for the substantive benefit of better compatibility between the standards over that instinct.

Additional context.

The Metrics Semantic Conventions say:

Use count Instead of Pluralization

If the value being recorded represents the count of concepts signified by the namespace then the metric should be named count (within its namespace). The pluralization rule does not apply in this case.

For example if we have a namespace system.processes which contains all metrics related to the processes then to represent the count of the processes we can have a metric named system.processes.count. The suffix count here indicates that it is the count of system.processes.

OpenMetrics says:

Counter

The MetricPoint's Total Value Sample MetricName MUST have the suffix "_total".

@emschwartz emschwartz added the spec:metrics Related to the specification/metrics directory label Apr 28, 2023
@trask
Copy link
Member

trask commented Apr 28, 2023

adding this as a JVM runtime metrics stability blocker (to decide one way or another) because of process.runtime.jvm.threads.count

@arminru arminru added the area:semantic-conventions Related to semantic conventions label Apr 28, 2023
@jack-berg
Copy link
Member

@trask what is the stability status of the General Guidelines section? The document is marked "mixed", and other sections are indicated as "stable", but General Guidelines is missing a designation. Presumably its experimental?

@jack-berg
Copy link
Member

jack-berg commented Apr 28, 2023

I'm confused by the "use count instead of pluralization" rule:

If the value being recorded represents the count of concepts signified by the namespace then the metric should be named count (within its namespace). The pluralization rule does not apply in this case.

The definition of "concepts signified by the namespace" seems ambiguous. Examples are given:

  • system.paging.faults - doesn't need count because presumably paging is an acceptable namespace
  • system.processes.count - needs the count because somehow processes is the namespace here instead of system?

When is a namespace different that the concept being counted? Why is processes a namespace but faults is not?

Extended to this java example, why can we not think of jvm as the namespace in process.runtime.jvm.threads, and drop the count suffix?

@trask
Copy link
Member

trask commented Apr 28, 2023

@trask what is the stability status of the General Guidelines section? The document is marked "mixed", and other sections are indicated as "stable", but General Guidelines is missing a designation. Presumably its experimental?

ya, I believe it should be experimental, i'll send a PR to clarify

@trask
Copy link
Member

trask commented Apr 28, 2023

Extended to this java example, why can we not think of jvm as the namespace in process.runtime.jvm.threads, and drop the count suffix?

I think the difference is that if we use process.runtime.jvm.threads for the counter, then it cannot later be used as a namespace, and we cannot add something like process.runtime.jvm.threads.created (say if we wanted to track number of created threads over time).

@jack-berg
Copy link
Member

I think the difference is that if we use process.runtime.jvm.threads for the counter, then it cannot later be used as a namespace, and we cannot add something like process.runtime.jvm.threads.created

Hmm... that could be the intent. I think I disagree. That policy encourages being conservative and including .count to preserve the ability to add additional metrics in the namespace later, even if none are currently anticipated.

I don't think it's so bad to start with process.runtime.jvm.threads, and later add process.runtime.jvm.threads.created or process.runtime.jvm.threads_created. In systems like prometheus where . is replaced with _, the distinction of a namespace would go away anyway as both *.threads.created and *.threads_created result in *_threads_created.

I propose we get rid of the "use count instead of pluralization" rule.

@emschwartz
Copy link
Author

@reyang I think it would make sense to keep this issue open. Even if that section is marked as experimental, I'd still like to propose changing the recommendation or taking it out entirely as @jack-berg suggested.

@reyang reyang reopened this May 1, 2023
@trask
Copy link
Member

trask commented May 1, 2023

(oops my bad, I didn't realize Fixes #3457 (comment) would close this issue)

@jsuereth
Copy link
Contributor

jsuereth commented May 1, 2023

So in terms of impact, this would only change 3 metric semantic conventions, two JVM + one host/system metric.

I think in terms of impact vs. gain here, moving to prometheus conventions makes a lot of sense here.

@pichlermarc
Copy link
Member

I think one thing to consider is that total may not accurately describe delta temporality metrics, as they are not really a "total". So I agree with removing the recommendation and including neither count nor total as suggested above (if I understood correctly). 🙂

@dyladan
Copy link
Member

dyladan commented May 2, 2023

I think one thing to consider is that total may not accurately describe delta temporality metrics, as they are not really a "total". So I agree with removing the recommendation and including neither count nor total as suggested above (if I understood correctly). 🙂

_total comes from prometheus where cumulative counters are the only choice. I agree it is not a good fit for delta counters.

@trask
Copy link
Member

trask commented May 4, 2023

I think one thing to consider is that total may not accurately describe delta temporality metrics, as they are not really a "total". So I agree with removing the recommendation and including neither count nor total as suggested above (if I understood correctly). 🙂

_total comes from prometheus where cumulative counters are the only choice. I agree it is not a good fit for delta counters.

good point.

that would leave us with .count and adding a prometheus mapping rule that converts .count to _total

@jsuereth preferred not to add any more prometheus mapping rules, but maybe that's the lesser evil in this case?

@emschwartz
Copy link
Author

emschwartz commented May 4, 2023

To @jack-berg's suggestion above, is there a downside of having a namespace that is also a metric in and of itself? (Maybe a bad analogy but this makes me think of how REST APIs allow you to interact directly with a collection as well as resources nested underneath it.)

I understand the point about wanting to minimize Prometheus mapping rules.

@trask
Copy link
Member

trask commented May 4, 2023

is there a downside of having a namespace that is also a metric in and of itself?

oh, I think I was getting attribute namespacing rules and metric namespacing rules mixed up.

attribute namespaces have this restriction in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/attribute-naming.md

Names SHOULD NOT coincide with namespaces. For example if service.instance.id is an attribute name then it is no longer valid to have an attribute named service.instance because service.instance is already a namespace. Because of this rule be careful when choosing names: every existing name prohibits existence of an equally named namespace in the future, and vice versa: any existing namespace prohibits existence of an equally named attribute key in the future.

but I don't see any similar restriction on metric namespaces in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md

@pirgeo
Copy link
Member

pirgeo commented May 4, 2023

The .count suffix is basically redundant information at this point, is it not? If we are sure that we don't need to "keep the metric namespaces clean" by not using the metric name without the count, we could just drop the .count. The type of the instrument (and potentially the unit, although one should not rely on it) should give away that this instrument is a counter, and will be exported with the _total suffix for Prometheus.

@trask
Copy link
Member

trask commented May 5, 2023

I've sent #3476 and #3477 so we can get a definitive answer on whether metric namespaces can also be metric names, please vote with your approvals/comments there.

@joaopgrassi
Copy link
Member

joaopgrassi commented May 9, 2023

Just to have the impact recorded, here's the metrics we have today that would need adapting:

system-metrics

  • system.processes.count

runtime-environment-metrics

  • process.runtime.jvm.threads.count
  • process.runtime.jvm.buffer.count

@jmacd
Copy link
Contributor

jmacd commented May 9, 2023

Related: In Golang's runtime/metrics package, a similar scenario came up. The input consists of two metrics, both cumulative counters, one unitless (objects) and one in bytes.

/gc/heap/allocs:bytes
	Cumulative sum of memory allocated to the heap by the application.

/gc/heap/allocs:objects
	Cumulative count of heap allocations triggered by the application.
	Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects,
	only tiny blocks.

I've mapped these into:

  process.runtime.go.gc.heap.allocs (unit: bytes)
  process.runtime.go.gc.heap.allocs.objects (unitless)

When these are converted to Prometheus, I get

process_runtime_go_gc_heap_allocs_objects_total
process_runtime_go_gc_heap_allocs_bytes_total

To get this outcome, I used a namespace as a metric name.

@trask
Copy link
Member

trask commented May 9, 2023

this was discussed in the specification meeting today, and I've posted a PR with the proposal discussed: #3493

Please add any comments about the PR directly on this issue for now as that PR will be closed and moved to the new semconv repository shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory
Projects
None yet