Messaging-System Metric semantic conventions #1077

cwildman · 2020-10-08T22:48:12Z

Changes

Add semantic conventions for messaging system metrics.

Related issues:
Fixes #1014

PR Details

The general approach I took here was to take the messaging operations referenced in the tracing semantic conventions (sending, receiving, processing) and break them into 3 distinct metric groups. I created metrics for monitoring throughput and latency within these operations.

I've left individual commits to highlight some open questions/points of interest that I will elaborate on here.

Metric naming with Span Kind

@justinfoote mentioned that the preference is that when name spacing labels and metrics the second field should be the "span kind". The second commit attempts to follow this convention. However the span kind as defined in the tracing semantic conventions of messaging suggest producer/consumer should be used for async message processing, and server/client should be used for synchronous message processing. Should we change the metric names based on whether the application is sync or async? I assumed no. Also there's a bit of ambiguity here because the tracing semantic conventions yaml refers to things like messaging.consumer.synchronous instead of messaging.client.

Superfluous Counters vs. ease of use

I created a specific Counter to track the messages sent, received and processed. This information is already captured in the
count of the duration metric ValueRecorder. Personally I feel like the extra metric is worth the ease of use but I don't know where the OTEL community stands on this.

Semconv format and generator

I went ahead and attempted to use the semconv yaml format and generator. Many of these labels are identical to tracing messaging attributes but I went ahead and copied them into a separate yaml for messaging metrics. I could not get the generator to actually work though. Here's an example: docker run --rm otel/semconvgen --yaml-root ../../opentelemetry-specification/semantic_conventions markdown --markdown-root ../../opentelemetry-specification/specification

Batch metrics

The last commit adds metrics around batching because I see that as a common feature that needs to be monitored. Introducing metrics around batching does create confusion around whether some of the non-batch metrics still exist or what they mean. I'm interested to hear if others feel the behavior is sufficiently explained or if batch metrics should be dropped entirely.

Other metrics

Here's a list of other potential metric categories that I omitted. I'm assuming these are too specific to only a subset of messaging systems to be included here:

Lag
I/O time
Connection pooling
Errors
Retries

@jmacd @bogdandrutu mentioning you as potential reviewers.

linux-foundation-easycla · 2020-10-08T22:48:16Z

The committers are authorized under a signed CLA.

✅ cwildman (53fd5a5, 9adc80e, deb28ab, 9f4a7b5, 6dbd507, ba169fe, a60bc2d, 612115e, 03cb521, 8c0d5b2, 183816a)

arminru · 2020-10-09T11:49:09Z

Please add an entry in CHANGELOG.md

specification/metrics/semantic_conventions/messaging.md

semantic_conventions/metrics/messaging.yaml

kenfinnigan · 2020-10-09T17:33:36Z

semantic_conventions/metrics/messaging.yaml

+          brief: 'Connection string.'
+          examples: ['tibjmsnaming://localhost:7222', 'https://queue.amazonaws.com/80398EXAMPLE/MyQueue']
+      constraints:
+        - any_of:


Based on the work on #1027, the net.* names should be removed as constraints and added as references like for tracing

specification/metrics/semantic_conventions/messaging.md

kenfinnigan · 2020-10-09T17:37:32Z

specification/metrics/semantic_conventions/messaging.md

+
+| Name                 | Instrument    | Units        | Description |
+|----------------------|---------------|--------------|-------------|
+| `messaging.consumer.processed.messages` | Counter | messages | Sum of messages processed. |


The metric name might need altering, or messaging.consumer.messages needs altering to include "receive" in the name.

Otherwise, this metric could be implied to be a child of the other metric by virtue of the naming.

That makes sense, I'll think about how to reword these. My instinct was to have the metric name be messaging.processed.messages or messaging.processor.messages, however following the span kind naming convention it says that the processor should use "consumer" as its span kind.

cwildman · 2020-10-13T05:03:15Z

I was able to resolve some of my issues with the semconv generator. There were a few things going on:

My /Users bind mount in docker for mac doesn't work even though it is automatically included.
The semconv generator doesn't allow for multiple instances of the same id, e.g. messaging. I just used a different id for the metrics messaging labels named metrics-messaging.
I used a reference to point to all the labels that would have the same name as an existing tracing attribute.

While the generator works correctly for me it seems I'm failing the md-check with File /spec/metrics/semantic_conventions/messaging.md contains a table that would be reformatted. even though I'm using the same generator to create the table. Anyone have suggestions for that?

specification/metrics/semantic_conventions/messaging.md

jmacd · 2020-10-15T18:52:34Z

Discussed in the Metrics SIG today: we think it would be best to eliminate the superfluous counters. This may be problematic in the short term, but in the long term we can address these with a Views API, we think.

cwildman · 2020-10-15T21:20:01Z

@jmacd thanks for the heads up. Removed those.

cwildman · 2020-10-19T23:32:15Z

@open-telemetry/specs-metrics-approvers this is ready for review!

justinfoote · 2020-10-19T23:40:27Z

@open-telemetry/specs-metrics-approvers this is ready for review!

specification/metrics/semantic_conventions/messaging.md

MrAlias

The structure, labels, and instrument definitions look good. Just issues with some of the terminology and formatting. Good job 👍

specification/metrics/semantic_conventions/messaging.md

andrewhsu · 2020-11-06T00:21:32Z

at the metrics sig mtg today talked about this issue's staleness

cwildman · 2020-11-06T18:59:26Z

@MrAlias I think I addressed almost all of your feedback. I had questions/comments on a couple of them. Thanks!

jmacd

I re-read this PR and wonder if the recently-added guidelines on metrics named .io should be applied here. Instead of having two metrics <something>.bytes for each direction of travel, we can use <something>.io with a label to indicate whether it was produced or consumed.

cwildman · 2020-11-12T18:09:50Z

@jmacd interesting. To be a little more concrete you're imagining renaming these metrics:

messaging.producer.bytes -> messaging.io
messaging.consumer.received.bytes -> messaging.io

And then we'd have some labels like direction/operation which can be one of ['produce', 'consume']?

Some of my initial reactions to this:

What happens to the other metrics like messaging.producer.duration? It seems confusing as a user that there's not a consistent naming scheme for my producer metrics.
I'm surprised the semantic conventions recommend using .io for any bidirectional data flow. For me I/O is specific to data flowing in or out of process from external devices. This is in contrast to data flow in process that is abstracted from the raw I/O that may have been involved or may not have come directly from I/O at all.
It seems like the .io convention should have some guidance on what the direction label should be called.

aabmass · 2020-11-13T20:20:35Z

@cwildman maybe this could be helpful to decide:

As a rule of thumb, aggregations over all the dimensions of a given metric SHOULD be meaningful," as Prometheus recommends.

Would a user ever want to aggregate together the produced/consumed bytes into one timeseries (or would it be meaningful)? If not, then I suppose they should be separate.

It seems like the .io convention should have some guidance on what the direction label should be called.
That should probably be specified. For system metrics, it's label direction with value read | write

github-actions · 2020-11-21T03:20:51Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2020-11-28T03:23:18Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

jgals · 2020-12-04T16:15:59Z

Hey @cwildman and @jmacd, I'm commenting here to avoid having this closed by the bot. 🤖

cwildman · 2021-05-03T21:00:27Z

I'd like to reopen this PR and hopefully discuss at the spec meeting this week.

sergeykonkin · 2022-04-15T20:38:25Z

@cwildman Hi!
Any chances to revive this PR and finish a work on this? We are looking forward on this specification

cwildman requested review from a team as code owners October 8, 2020 22:48

github-actions bot assigned yurishkuro Oct 8, 2020

arminru requested a review from thisthat October 9, 2020 11:48

kenfinnigan reviewed Oct 9, 2020

View reviewed changes

justinfoote mentioned this pull request Oct 9, 2020

Update Metrics Semantic Conventions README #1084

Open

cwildman force-pushed the metric-semantic-conventions-messaging branch 6 times, most recently from 9e89bdf to 2cd7b4c Compare October 13, 2020 04:59

justinfoote reviewed Oct 14, 2020

View reviewed changes

specification/metrics/semantic_conventions/messaging.md Outdated Show resolved Hide resolved

cwildman force-pushed the metric-semantic-conventions-messaging branch from b8c94dc to 8c0d5b2 Compare October 15, 2020 17:44

jmacd approved these changes Oct 15, 2020

View reviewed changes

andrewhsu mentioned this pull request Oct 23, 2020

Define metric semantic conventions for messaging systems #1014

Closed

cwildman force-pushed the metric-semantic-conventions-messaging branch from ca01fa0 to cf9ba47 Compare October 23, 2020 00:28

cwildman changed the title ~~Metric semantic conventions messaging~~ Metric semantic conventions messaging fixes #1014 Oct 23, 2020

cwildman force-pushed the metric-semantic-conventions-messaging branch from cf9ba47 to 4988c5e Compare October 23, 2020 00:33

jmacd changed the title ~~Metric semantic conventions messaging fixes #1014~~ Messaging-System Metric semantic conventions Oct 23, 2020

jmacd reviewed Oct 23, 2020

View reviewed changes

specification/metrics/semantic_conventions/messaging.md Outdated Show resolved Hide resolved

justinfoote mentioned this pull request Nov 3, 2020

SMI Metrics and OpenTelemetry servicemeshinterface/smi-spec#199

Closed

MrAlias reviewed Nov 5, 2020

View reviewed changes

github-actions bot removed the Stale label Nov 6, 2020

MrAlias approved these changes Nov 10, 2020

View reviewed changes

cwildman added 14 commits November 10, 2020 11:24

Initial semantic conventions for messaging metrics. open-telemetry#1014

828c7d5

Use span kind in metric naming.

edf4012

Attempt to use the semconv yaml format and generator.

aa439d0

Add metrics for batching.

56cdbcb

Update changelog.

78e7612

Make duration metrics more clear.

19f9623

Switch to use references instead of constraints for net labels.

b21f5a7

Change wording of description on Counters.

3353e68

Add received to consumer metrics to differentiate from processing.

182f261

Use references for metrics-messaging labels.

683200f

Remove redundant counters.

6041dc6

Fix lint errors.

c573be5

Fix table formatting.

dc5c766

PR feedback.

8683a2a

cwildman force-pushed the metric-semantic-conventions-messaging branch from aa5f02c to 8683a2a Compare November 10, 2020 19:25

jmacd reviewed Nov 10, 2020

View reviewed changes

github-actions bot added the Stale label Nov 21, 2020

github-actions bot closed this Nov 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Messaging-System Metric semantic conventions #1077

Messaging-System Metric semantic conventions #1077

cwildman commented Oct 8, 2020 •

edited

Loading

linux-foundation-easycla bot commented Oct 8, 2020 •

edited

Loading

arminru commented Oct 9, 2020

kenfinnigan Oct 9, 2020

kenfinnigan Oct 9, 2020

cwildman Oct 9, 2020

cwildman commented Oct 13, 2020

jmacd commented Oct 15, 2020

cwildman commented Oct 15, 2020

cwildman commented Oct 19, 2020

justinfoote commented Oct 19, 2020

MrAlias left a comment

andrewhsu commented Nov 6, 2020

cwildman commented Nov 6, 2020

jmacd left a comment

cwildman commented Nov 12, 2020

aabmass commented Nov 13, 2020

github-actions bot commented Nov 21, 2020

github-actions bot commented Nov 28, 2020

jgals commented Dec 4, 2020

cwildman commented May 3, 2021

sergeykonkin commented Apr 15, 2022

Messaging-System Metric semantic conventions #1077

Messaging-System Metric semantic conventions #1077

Conversation

cwildman commented Oct 8, 2020 • edited Loading

Changes

PR Details

Metric naming with Span Kind

Superfluous Counters vs. ease of use

Semconv format and generator

Batch metrics

Other metrics

linux-foundation-easycla bot commented Oct 8, 2020 • edited Loading

arminru commented Oct 9, 2020

kenfinnigan Oct 9, 2020

Choose a reason for hiding this comment

kenfinnigan Oct 9, 2020

Choose a reason for hiding this comment

cwildman Oct 9, 2020

Choose a reason for hiding this comment

cwildman commented Oct 13, 2020

jmacd commented Oct 15, 2020

cwildman commented Oct 15, 2020

cwildman commented Oct 19, 2020

justinfoote commented Oct 19, 2020

MrAlias left a comment

Choose a reason for hiding this comment

andrewhsu commented Nov 6, 2020

cwildman commented Nov 6, 2020

jmacd left a comment

Choose a reason for hiding this comment

cwildman commented Nov 12, 2020

aabmass commented Nov 13, 2020

github-actions bot commented Nov 21, 2020

github-actions bot commented Nov 28, 2020

jgals commented Dec 4, 2020

cwildman commented May 3, 2021

sergeykonkin commented Apr 15, 2022

cwildman commented Oct 8, 2020 •

edited

Loading

linux-foundation-easycla bot commented Oct 8, 2020 •

edited

Loading