Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual placement of Labels makes certain translations expensive #141

Closed
tigrannajaryan opened this issue Apr 25, 2020 · 5 comments · Fixed by #144
Closed

Dual placement of Labels makes certain translations expensive #141

tigrannajaryan opened this issue Apr 25, 2020 · 5 comments · Fixed by #144

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Apr 25, 2020

Problem

Metric proto currently allows Labels in 2 places: in the MetricDescriptor and in each DataPoint.

The semantics is that Labels in MetricDescriptor apply to all DataPoints. This requires complicated and expensive merging of Labels maps (which possibly is made even more expensive since they are stored as lists, requiring either intermediary data structures or O(nm) merging) when performing translations from OTLP to other formats that have no equivalent notion of having label keys per data point and per group of data points (e.g. OpenCensus).

Approach 1

I suggest that we eliminate one or the other and have Labels in one place only.

Approach 2

Alternatively, we can require that Labels in MetricDescriptor and in DataPoints cannot overlap, i.e. there do not exists any entries that have the same key in both places. This way the Labels in MetricDescriptor allow to avoid repeating common labels for all data points. In this case merging is possible by plain concatenation and there is no need to expensive merging that ensures uniqueness of the keys.

This approach is probably less desirable since it makes impossible to translate series of data points from OTLP to OpenCensus TimeSeries because each DataPoint can have a different set of keys (not possible in OpenCensus).

However, this would still be an improvement over current state of things if we really want to keep dual Labels placement.

@tigrannajaryan
Copy link
Member Author

@bogdandrutu what do you think?

@bogdandrutu
Copy link
Member

This approach is probably less desirable since it makes impossible to translate series of data points from OTLP to OpenCensus TimeSeries because each DataPoint can have a different set of keys (not possible in OpenCensus).

This is already the case unfortunately (I tried my best to fight for having this as a requirement that a metric always produces the same set of label keys, @jmacd pushed to not have this requirement). So in open-telemetry we don't know the set of label-keys that we produce and the default aggregation for all the instruments will produce unbounded number of label-keys. I know this is not easily compatible with OpenCensus but I don't think we can do anything at this moment, unless we add the constraint to predefined the keys. There was a tentative way to solve this by using the "recommended label keys", but that was not enforced as requirement to produce aggregation only using these keys, so the problem was not resolved.

I suggest that we eliminate one or the other and have LabelKeys in one place only.

We can remove only the labels from the MetriDescriptor if we want to remove any of them.

So overall I think we will have to pay a large cost of translating from OTLP to OpenCensus anyway because of the fact that the set of available label keys is not known in OTLP. So you need to iterate over all the points and find the set of label keys anyway. So I am not sure how does the Approach 2 help with this problem.

I do agree that if we keep both we need to define a mechanism to remove duplicates, and maybe the easiest way for the moment may be to remove the MetricDescriptor labels (we will need them soon) but we can discuss that when we need them. But keep in mind that if you remove these labels you still don't achieve what you want - faster conversion between OTLP and OpenCensus.

@bogdandrutu
Copy link
Member

@tigrannajaryan I think the main cause why the translation is expensive is not the dual labels is actually the fact that OpenTelemetry do not enforce the set of label-keys for an instrument, so you may want to clarify the issue title.

@bogdandrutu
Copy link
Member

FYI: I can see a lot of the protocols not requiring the label keys to be predefined so we have to do a trade-off, if we relax the protocol we can have a more optimal conversion for other protocols, and with the idea that OpenCensus will be deprecated is probably better long term to not require that.

@tigrannajaryan
Copy link
Member Author

@bogdandrutu I do not quite understand what does "predefined" label mean. Perhaps I was not clear in my description, let me try to clarify.

The problem that I see with current protocol definition is shown by the following example. Let's assume I have the following OTLP data (pardon the data syntax):

"metric": {
  "descriptor": {"labelkeys": {"a": "1", "b": "2"}},
  "int64_data_points": [{"value":100,"labelkeys": {"a": "5", "c": "10"}}]
}

If I need to translate this to OpenCensus I have to merge labelkeys in a way that "a" is only listed once and the data point overrides:

"metric": {
  "descriptor": {"labelkeys": {"a": "5", "b": "2", "c": "10"}},
  "timeseries": [{"value":100}]
}

To do this merging we need expensive per-key merging over lists.

This is not only a problem with OpenCensus format but with any other format that does not have OTLP-like dual placement of labels with implied overridinig of values (I am not aware of any metric formats that allow this, which means many other format translations will be slow too).

If it is necessary to have the labelkeys in 2 places, because it makes client libraries simpler (is this true?) then at least I think we should require that the 2 labelkeys maps do not have overlapping keys (so that "a" cannot exist in 2 places in the above example). In that case labelkeys can be merged by concatenation, which is faster.

@jmacd can you clarify what is your requirement and why we need this?

@tigrannajaryan tigrannajaryan changed the title Dual placement of LabelKeys makes translations expensive Dual placement of Labels makes certain translations expensive Apr 27, 2020
tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Apr 27, 2020
Resolves: open-telemetry#141

Metric proto previously allowed Labels in 2 places: in the MetricDescriptor and in each DataPoint.

The semantics is that Labels in MetricDescriptor apply to all DataPoints.
This requires complicated and expensive merging of Labels maps (which
possibly is made even more expensive since they are stored as lists,
requiring either intermediary data structures or O(nm) merging) when performing
translations from OTLP to other formats that have no equivalent notion of
having label keys per data point and per group of data points (e.g. OpenCensus).

This commit removes Labels from MetricDescriptor. My understanding is that
there is currently no client library that emits MetricDescriptor with labels
so this is not used and can be safely removed.

This will simplify and make faster translation code in the Collector.

If this is needed we can think carefully about adding it back after there
is a clear justification and tradeoffs are well understood.
bogdandrutu added a commit that referenced this issue Apr 27, 2020
Resolves: #141

Metric proto previously allowed Labels in 2 places: in the MetricDescriptor and in each DataPoint.

The semantics is that Labels in MetricDescriptor apply to all DataPoints.
This requires complicated and expensive merging of Labels maps (which
possibly is made even more expensive since they are stored as lists,
requiring either intermediary data structures or O(nm) merging) when performing
translations from OTLP to other formats that have no equivalent notion of
having label keys per data point and per group of data points (e.g. OpenCensus).

This commit removes Labels from MetricDescriptor. My understanding is that
there is currently no client library that emits MetricDescriptor with labels
so this is not used and can be safely removed.

This will simplify and make faster translation code in the Collector.

If this is needed we can think carefully about adding it back after there
is a clear justification and tradeoffs are well understood.

Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants