Dual placement of Labels makes certain translations expensive #141

tigrannajaryan · 2020-04-25T16:30:29Z

Problem

Metric proto currently allows Labels in 2 places: in the MetricDescriptor and in each DataPoint.

The semantics is that Labels in MetricDescriptor apply to all DataPoints. This requires complicated and expensive merging of Labels maps (which possibly is made even more expensive since they are stored as lists, requiring either intermediary data structures or O(nm) merging) when performing translations from OTLP to other formats that have no equivalent notion of having label keys per data point and per group of data points (e.g. OpenCensus).

Approach 1

I suggest that we eliminate one or the other and have Labels in one place only.

Approach 2

Alternatively, we can require that Labels in MetricDescriptor and in DataPoints cannot overlap, i.e. there do not exists any entries that have the same key in both places. This way the Labels in MetricDescriptor allow to avoid repeating common labels for all data points. In this case merging is possible by plain concatenation and there is no need to expensive merging that ensures uniqueness of the keys.

This approach is probably less desirable since it makes impossible to translate series of data points from OTLP to OpenCensus TimeSeries because each DataPoint can have a different set of keys (not possible in OpenCensus).

However, this would still be an improvement over current state of things if we really want to keep dual Labels placement.

tigrannajaryan · 2020-04-25T16:30:54Z

@bogdandrutu what do you think?

bogdandrutu · 2020-04-26T18:19:41Z

This approach is probably less desirable since it makes impossible to translate series of data points from OTLP to OpenCensus TimeSeries because each DataPoint can have a different set of keys (not possible in OpenCensus).

This is already the case unfortunately (I tried my best to fight for having this as a requirement that a metric always produces the same set of label keys, @jmacd pushed to not have this requirement). So in open-telemetry we don't know the set of label-keys that we produce and the default aggregation for all the instruments will produce unbounded number of label-keys. I know this is not easily compatible with OpenCensus but I don't think we can do anything at this moment, unless we add the constraint to predefined the keys. There was a tentative way to solve this by using the "recommended label keys", but that was not enforced as requirement to produce aggregation only using these keys, so the problem was not resolved.

I suggest that we eliminate one or the other and have LabelKeys in one place only.

We can remove only the labels from the MetriDescriptor if we want to remove any of them.

So overall I think we will have to pay a large cost of translating from OTLP to OpenCensus anyway because of the fact that the set of available label keys is not known in OTLP. So you need to iterate over all the points and find the set of label keys anyway. So I am not sure how does the Approach 2 help with this problem.

I do agree that if we keep both we need to define a mechanism to remove duplicates, and maybe the easiest way for the moment may be to remove the MetricDescriptor labels (we will need them soon) but we can discuss that when we need them. But keep in mind that if you remove these labels you still don't achieve what you want - faster conversion between OTLP and OpenCensus.

bogdandrutu · 2020-04-26T18:22:44Z

@tigrannajaryan I think the main cause why the translation is expensive is not the dual labels is actually the fact that OpenTelemetry do not enforce the set of label-keys for an instrument, so you may want to clarify the issue title.

bogdandrutu · 2020-04-26T21:30:13Z

FYI: I can see a lot of the protocols not requiring the label keys to be predefined so we have to do a trade-off, if we relax the protocol we can have a more optimal conversion for other protocols, and with the idea that OpenCensus will be deprecated is probably better long term to not require that.

tigrannajaryan · 2020-04-27T14:49:56Z

@bogdandrutu I do not quite understand what does "predefined" label mean. Perhaps I was not clear in my description, let me try to clarify.

The problem that I see with current protocol definition is shown by the following example. Let's assume I have the following OTLP data (pardon the data syntax):

"metric": {
  "descriptor": {"labelkeys": {"a": "1", "b": "2"}},
  "int64_data_points": [{"value":100,"labelkeys": {"a": "5", "c": "10"}}]
}

If I need to translate this to OpenCensus I have to merge labelkeys in a way that "a" is only listed once and the data point overrides:

"metric": {
  "descriptor": {"labelkeys": {"a": "5", "b": "2", "c": "10"}},
  "timeseries": [{"value":100}]
}

To do this merging we need expensive per-key merging over lists.

This is not only a problem with OpenCensus format but with any other format that does not have OTLP-like dual placement of labels with implied overridinig of values (I am not aware of any metric formats that allow this, which means many other format translations will be slow too).

If it is necessary to have the labelkeys in 2 places, because it makes client libraries simpler (is this true?) then at least I think we should require that the 2 labelkeys maps do not have overlapping keys (so that "a" cannot exist in 2 places in the above example). In that case labelkeys can be merged by concatenation, which is faster.

@jmacd can you clarify what is your requirement and why we need this?

Resolves: open-telemetry#141 Metric proto previously allowed Labels in 2 places: in the MetricDescriptor and in each DataPoint. The semantics is that Labels in MetricDescriptor apply to all DataPoints. This requires complicated and expensive merging of Labels maps (which possibly is made even more expensive since they are stored as lists, requiring either intermediary data structures or O(nm) merging) when performing translations from OTLP to other formats that have no equivalent notion of having label keys per data point and per group of data points (e.g. OpenCensus). This commit removes Labels from MetricDescriptor. My understanding is that there is currently no client library that emits MetricDescriptor with labels so this is not used and can be safely removed. This will simplify and make faster translation code in the Collector. If this is needed we can think carefully about adding it back after there is a clear justification and tradeoffs are well understood.

Resolves: #141 Metric proto previously allowed Labels in 2 places: in the MetricDescriptor and in each DataPoint. The semantics is that Labels in MetricDescriptor apply to all DataPoints. This requires complicated and expensive merging of Labels maps (which possibly is made even more expensive since they are stored as lists, requiring either intermediary data structures or O(nm) merging) when performing translations from OTLP to other formats that have no equivalent notion of having label keys per data point and per group of data points (e.g. OpenCensus). This commit removes Labels from MetricDescriptor. My understanding is that there is currently no client library that emits MetricDescriptor with labels so this is not used and can be safely removed. This will simplify and make faster translation code in the Collector. If this is needed we can think carefully about adding it back after there is a clear justification and tradeoffs are well understood. Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>

tigrannajaryan changed the title ~~Dual placement of LabelKeys makes translations expensive~~ Dual placement of Labels makes certain translations expensive Apr 27, 2020

tigrannajaryan mentioned this issue Apr 27, 2020

Remove common labels from MetricDescriptor #144

Merged

bogdandrutu closed this as completed in #144 Apr 27, 2020

tigrannajaryan mentioned this issue May 7, 2020

Clarify and potentially remove InstrumentationLibraryMetrics #148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dual placement of Labels makes certain translations expensive #141

Dual placement of Labels makes certain translations expensive #141

tigrannajaryan commented Apr 25, 2020 •

edited

Loading

tigrannajaryan commented Apr 25, 2020

bogdandrutu commented Apr 26, 2020

bogdandrutu commented Apr 26, 2020

bogdandrutu commented Apr 26, 2020

tigrannajaryan commented Apr 27, 2020

Dual placement of Labels makes certain translations expensive #141

Dual placement of Labels makes certain translations expensive #141

Comments

tigrannajaryan commented Apr 25, 2020 • edited Loading

Problem

Approach 1

Approach 2

tigrannajaryan commented Apr 25, 2020

bogdandrutu commented Apr 26, 2020

bogdandrutu commented Apr 26, 2020

bogdandrutu commented Apr 26, 2020

tigrannajaryan commented Apr 27, 2020

tigrannajaryan commented Apr 25, 2020 •

edited

Loading