Is direction dimension a mistake in semantic conventions? #2589

tigrannajaryan · 2022-05-30T13:19:05Z

We have metrics that have a "direction" attribute, for example disk metrics.

This seems to be in contradiction to the general guidelines about attributes

"As a rule of thumb, aggregations over all the attributes of a given metric SHOULD be meaningful," as Prometheus recommends.

I am not sure what aggregation is meaningful for "direction" attribute.

Should the attribute be removed and the metrics split into 2, one for read and one for write?

It does look pretty natural to express this dimension as an attribute, it is low cardinality, it groups 2 very related measurements. Is this an indication that the guidelines need to be adjusted or this is an exception from the rule? Or perhaps there is a meaningful aggregation of "direction" that I just don't see?

reyang · 2022-05-31T16:51:04Z

+1, also in most cases the removal of this dimension would result in meaningless value (e.g. what does a "total read/write bytes" mean for a disk?)

In addition, many disk/NICs have extra operations beyond read and write (I just took a screenshot from my laptop):

bogdandrutu · 2022-05-31T19:16:08Z

@tigrannajaryan does the aggregation not represent "total" bytes send/received over the network? Is that not meaningful?

tigrannajaryan · 2022-05-31T19:37:45Z

@tigrannajaryan does the aggregation not represent "total" bytes send/received over the network? Is that not meaningful?

Yes, probably meaningful for network (e.g. total bytes/throughput can be interesting to know as a percentage of bidirectional bandwidth available for half-duplex physical connections).

Is it also meaningful for disks? I am not sure what's the meaning of total combined bytes written and read for disk. Sure, both are bytes flowing from/to the same device, maybe that's useful somehow?

I am not against keeping the direction if we think it is meaningful (but see also @reyang comment which shows we do not fully understand the allowed values for the dimension).

codeboten · 2022-06-01T15:09:29Z

Speaking specifically to @reyang's example, it would appear that aggregating read/writes/other would be meaningful as per the following description of "I/O other bytes" :

"I/O Other Bytes" is "The number of bytes transferred in input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os

jmacd · 2022-06-01T16:14:54Z

I believe direction was a mistake in our semantic conventions. I support undo-ing it.
Prometheus documentation addresses this specifically:

Read/write and send/receive are best as separate metrics, rather than as a label. This is usually because you care about only one of them at a time, and it is easier to use them that way.

I consider this a good exception to the rule otherwise given by Prometheus quoted above:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics.

The issue about directional transfers tests this rule: we are able to find ways that the sum() of network traffic is meaningful and even useful. The usefulness/meaningfulness needs to be weighed against potential harm due to "uselessness". If we believe "usually you care about only one of them at a time", and we believe the common mode of viewing metrics is to aggregate them, there's real harm in having a direction label as it will obscure two useful things in a commonly-useless way.

Is there an underlying logic we can use for similar situations? Here's one answer: in what sense are network bytes read and written the same? In what sense are they different? It makes sense to sum() or rate(sum()) these counters when they are the same, and it does not make to sum() or rate(sum()) these counters when they are not the same. Of course, a byte is a byte, so the sum is meaningful (units are "By"). However, if a metric is being used as a proxy variable for load -- and the load induced by a byte read is different than the load induced by a byte written, then it is harmful/useless to combine them (units are "{bytes_read}" and "{bytes_written}").

carlosalberto · 2022-06-01T19:33:36Z

I believe direction was a mistake in our semantic conventions. I support undo-ing it.

At least we should start by addressing this with upstream metrics addition (and yes, later on update the existing ones ;) )

This PR looks to address the concerns voiced in open-telemetry#2589. It adds new metrics that include the direction in their names, and marks existing metrics as deprecated. Fix open-telemetry#2589

bertysentry · 2022-06-23T10:57:55Z

I have mixed feelings about this. If you take the network traffic example, you can calculate the bandwidth utilization with something like: max(rate(system.network.io)) / hw.network.speed, where the max() function applies to each direction of the system.network.io metric, for a full-duplex link.

I don't see how we could calculate the same bandwidth utilization metric from system.network.io.receive and system.network.io.transmit.

Also, as noted by @reyang, having separate metrics is reasonable when you have a strictly defined set of dimensions. It does fit well with network (receive vs transmit), but doesn't even for a simple metric like system.disk.io.

Last point: What are the drawbacks of using a direction attribute versus a separate metric? As explained above and by others, there are a few minor benefits of using the direction attribute, but I can't see anything in the "cons" column. Performance? Readability? Consistency?

bertysentry · 2022-06-23T11:08:32Z

This change will require changes in 7 receivers for consistency:

activedirectorydsreceiver
elasticsearchreceiver
hostmetricsreceiver
iisreceiver
kubeletstatsreceiver
mongodbatlasreceiver
nsxtreceiver

reyang · 2022-08-16T15:46:57Z

During the Aug. 16th, 2022 Specification SIG meeting, there were discussions about network metrics - based on the comment here folks believe that Disk I/O should NOT have a dimension/attribute called direction, the question is whether Network I/O should do the same or not, and issue #2726 suggested Network I/O should keep direction.

Here are what I've found so far about Windows operating systems (NT kernel based):

"Network" is a vague term, NT models network performance counters as several things:
- Physical Network Interface Card (NIC) Activity - at such low level, the NIC doesn't necessarily connect to the actual network (e.g. one can have an NIC that is NOT corrected connected, but still valid from the OS kernel / device driver perspective). So it only has device level counters such as "Device Power State" and "Low Power Transitions (Lifetime)".
- Network Adapter https://docs.microsoft.com/en-us/windows/win32/iphlp/managing-network-adapters. On the adapter level, the Rx and Tx value added up to Total https://social.technet.microsoft.com/Forums/en-US/5f8f9a16-3488-4c73-868e-121885702fd8/understanding-the-performance-counters-network-adapter-bytes-totalsec-microsoft-network?forum=operationsmanagergeneral.
- Network Interface https://docs.microsoft.com/en-us/windows/win32/network-interfaces
Process only has I/O Read/Write/Other Bytes/Operations, it doesn't have direct connection to the underlying Network Interface nor Network Adapter. And it does have "Other Bytes" (I can imagine a process trying to listen on a TCP port, which is considered as control bytes but not Rx/Tx).

sebastien-rosset · 2023-01-19T23:53:22Z

Shouldn't direction be renamed to something like hw.io_direction?

bertysentry · 2023-01-20T11:07:21Z

@sebastien-rosset I don't see a reason to use a specific attribute for hardware metrics. direction is specified at the general level for IO metrics so it shall be used in specific semantic conventions that cover IOs as well.

sebastien-rosset · 2023-01-20T14:54:43Z

The spec states the instrument should have attributes for direction; oddly, plural is used, and it's not stating the attribute should be named direction.

io - an instrument that measures bidirectional data flow should be called entity.io and have attributes for direction. For example, system.network.io.

That would also break the semantic rules:

Metric names and attributes exist within a single universe and a single hierarchy. Metric names and attributes MUST be considered within the universe of all existing metric names. When defining new metric names and attributes, consider the prior art of existing standard metrics and metrics from frameworks/libraries.

Perhaps it should be hw.io_direction. Another namespace could have a different concept of "direction" such as up, down, or East, West, left, right, etc. For example a wind sensor could have a direction attribute.

In addition, the spec states:

Names SHOULD NOT coincide with namespaces. For example if service.instance.id is an attribute name then it is no longer valid to have an attribute named service.instance because service.instance is already a namespace. Because of this rule be careful when choosing names: every existing name prohibits existence of an equally named namespace in the future, and vice versa: any existing namespace prohibits existence of an equally named attribute key in the future.

Defining lots of attributes with no name hierarchy has the side effect of preventing any future use of that word as a namespace. Currently, I can see these attributes are declared with a flat naming structure: direction, vendor, model, serial_number, vendor, firmware_version, bios_version, driver_version, firmware_version, id, name, parent, chemistry, capacity, limit_type, state, sensor_location, type, task, raid_level, logical_addresses, physical_addresses, smart_attribute.

capacity is defined as the capacity in Watts-hours or Amper-hours, which is highly specific to batteries. However, the word capacity is contextual, it could mean something entirely different in another namespace. It seems that by allowing names to be flat, more flat-names will be created in short order, which inevitably will lead to collisions or other naming problems. At some point, shouldn't there be a registry of namespaces, similar to what was done at the IETF or IEEE?

Under the semantic guidelines, HTTP, RPC, Database, System, Process, Runtime Environment, FaaS are consistently using hierarchical names (with state noted as an exception). Only "Hardware" uses flat names liberally.

Out of curiosity, I looked at all the possible values for state:

state: idle | user | system | ...
state: used | cached | free | ...
state: idle, used (for database connection pool)
state: idle, user, system, interrupt, etc
state: used, free, cached, etc.
state: used, free, reserved
state: close, close_wait, closing, delete, established, fin_wait_1, fin_wait_2, last_ack, listen, syn_recv, syn_sent, time_wait.
state: system, user, wait
state: ok, degraded, failed
state: charging, discharging
state: ok, degraded, failed, charging, discharging
state: ok, degraded, failed, predicted_failure
state: ok, degraded, failed, open
state: remaining
state: ok, degraded, failed, needs_cleaning

sebastien-rosset · 2023-01-23T18:31:08Z

I created #3129 since it's a different issue

tigrannajaryan added spec:metrics Related to the specification/metrics directory area:semantic-conventions Related to semantic conventions labels May 30, 2022

github-actions bot assigned arminru May 30, 2022

tigrannajaryan mentioned this issue May 30, 2022

Add http metrics for content length #2567

Closed

tigrannajaryan mentioned this issue May 31, 2022

Add network metrics to process semantic conventions #2556

Merged

codeboten mentioned this issue Jun 15, 2022

[receiver/hostmetrics] add udp connections open-telemetry/opentelemetry-collector-contrib#11046

Closed

codeboten mentioned this issue Jun 15, 2022

add metrics to replace metrics with direction #2617

Merged

reyang closed this as completed in #2617 Jun 28, 2022

djaglowski mentioned this issue Jul 11, 2022

[receiver/elasticsearch] Add additional metrics open-telemetry/opentelemetry-collector-contrib#12176

Merged

1 task

codeboten mentioned this issue Jul 22, 2022

add schema definition for direction change #2688

Merged

This was referenced Aug 12, 2022

Add additional process metrics to the metrics semantic conventions #2706

Merged

Mistake while applying the logic in #2617 from #2589: process.network.io and system.network.* should not have been split #2726

Closed

andrzej-stencel mentioned this issue Sep 19, 2022

Add process page fault metrics #2809

Closed

andrzej-stencel mentioned this issue Nov 7, 2022

Add process.disk.operations metric to semantic conventions #2845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is direction dimension a mistake in semantic conventions? #2589

Is direction dimension a mistake in semantic conventions? #2589

tigrannajaryan commented May 30, 2022

reyang commented May 31, 2022

bogdandrutu commented May 31, 2022

tigrannajaryan commented May 31, 2022

codeboten commented Jun 1, 2022 •

edited

jmacd commented Jun 1, 2022

carlosalberto commented Jun 1, 2022

bertysentry commented Jun 23, 2022

bertysentry commented Jun 23, 2022

reyang commented Aug 16, 2022

sebastien-rosset commented Jan 19, 2023

bertysentry commented Jan 20, 2023

sebastien-rosset commented Jan 20, 2023 •

edited

sebastien-rosset commented Jan 23, 2023

Is direction dimension a mistake in semantic conventions? #2589

Is direction dimension a mistake in semantic conventions? #2589

Comments

tigrannajaryan commented May 30, 2022

reyang commented May 31, 2022

bogdandrutu commented May 31, 2022

tigrannajaryan commented May 31, 2022

codeboten commented Jun 1, 2022 • edited

jmacd commented Jun 1, 2022

carlosalberto commented Jun 1, 2022

bertysentry commented Jun 23, 2022

bertysentry commented Jun 23, 2022

reyang commented Aug 16, 2022

sebastien-rosset commented Jan 19, 2023

bertysentry commented Jan 20, 2023

sebastien-rosset commented Jan 20, 2023 • edited

sebastien-rosset commented Jan 23, 2023

codeboten commented Jun 1, 2022 •

edited

sebastien-rosset commented Jan 20, 2023 •

edited