Decouple Number Type from Point Kind in OTLP protocol #264

victlu · 2021-02-17T18:59:10Z

The OTLP message Metric has the following definition for instrument types...

  oneof data {
    IntGauge int_gauge = 4;
    DoubleGauge double_gauge = 5;
    IntSum int_sum = 6;
    DoubleSum double_sum = 7;
    IntHistogram int_histogram = 8;
    DoubleHistogram double_histogram = 9;
    DoubleSummary double_summary = 11;
  }

I propose we only define Instrument Types (i.e. Gauge, Sum, etc...) at the message Metric level.

The data type (i.e. int, double, etc...) should be lower level, maybe at the message DataPoint level.

i.e.

message DataPoint {
  // ...

  // value itself.
  // sfixed64 value = 4;     <= Replace

  oneof value {
     sfixed64 Int64Value = 4;
     double doubleValue = 5;
  }
}

The text was updated successfully, but these errors were encountered:

rakyll · 2021-02-22T22:26:46Z

This would require at least one data point to arrive in order for exporters to create metric descriptors in the backend. What are the use cases where the current data model is not working for you?

jmacd · 2021-02-23T22:19:29Z

I've proposed similar, #172, but at this point I'm happy with what we have.

bogdandrutu · 2021-02-24T19:08:00Z

@victlu the problem with this approach, is that if there are systems that consider the data point kind as part of the identity for a metric will be impossible to have that. For example stackdriver in GCP does that, they consider an int sum different than and double sum, and they need to know if the metric will have one or the other.

victlu · 2021-02-24T20:50:58Z

@bogdandrutu, In your case, I would think it serves them better to explicitly name their metrics accordingly (as they want two time series) rather than by the data type.

I think this question is related to #1366. How do we consider the "name" to be unique in identifying a time series.

victlu · 2021-02-24T21:02:38Z

This would require at least one data point to arrive in order for exporters to create metric descriptors in the backend. What are the use cases where the current data model is not working for you?

@rakyll Why can't metric descriptors be created (without data type)? Or are you saying a descriptor MUST include the data type? In which case, that's the argument I'm making here...

IMHO, metric backend systems ultimately shows/graphs/alerts/store/etc... numeric based Time-Series. And for performance and efficiency, they want to use the most efficient representation of the "number". Thus, promotion of data types are common especially for storage considerations. The important thing is that it's just a number regardless if its transmitted as a byte, short, ushort, int, uint, long, ulong, float, double, etc...

bogdandrutu · 2021-02-24T21:20:25Z

Or are you saying a descriptor MUST include the data type? In which case, that's the argument I'm making here...

If the type is included in the metric descriptor/name what is the benefit of doing what you suggested? Also your suggestion will add an extra check possible that you need to verify the type in the descriptor matches the type in the points.

Also on top of everything that adds extra allocations on serialization/deserialization from proto bytes.

rakyll · 2021-02-24T21:25:30Z

@victlu Some backends like Stackdriver see the type as an integral part of the metric. See the value_type at https://cloud.google.com/monitoring/api/ref_v3/rpc/google.api#google.api.MetricDescriptor. This model gives us an easy/cost effective way of identifying the data type in metric creation. These backends will reject if send a floating number to an integer metric, and it'd require validation of the data points in the exporter as @bogdandrutu suggests.

victlu · 2021-03-02T01:03:14Z

I am trying to understand, but I have a few questions...

From last few responses, it seems we are specializing OTLP protocol based on specific vendor's backend implementation? I assume we would do this for performance? We should call out what backend system we are optimizing for in this case.

What happens if a vendor does not need to separate ints/doubles, then it would need to merge ints and doubles together? That does not seem performant to me either.

I would think the Spec is more "higher-level" to convey the meaning and semantics of metrics. The data type of data points seems more like details.

Looking at a few existing client APIs (i.e. Prometheus, micrometer) , I don't see Counters/Gauge/Histogram/etc... decorated with a data type (i.e. IntCounter/DoubleCounter/etc...). Or they just take double to cover both int/double. What should we do here?

How will this work if I'm using a type-less language (i.e. javascript)? I assume I will need to separate out ints/doubles/etc... into different metrics?

I'm also not suggesting every data point needs to be type-less. I just don't see the data-type being at the topmost level with semantics like Sum/Gauge/Histogram/etc... Maybe we have oneof IntTimeSeries/DoubleTimeSeries under the oneof Sum/Gauge/Histogram/etc.

victlu · 2021-03-02T21:44:12Z

I propose we change to following...

message Metric {
  // ...

  oneof data {
    Gauge gauge = 4;
    Sum sum = 5;
    Histogram histogram = 6;
    Summary summary = 7;
  }
}

message Gauge {
  repeated IntDataPoint int_data_points = 1;
  repeated DoubleDataPoint double_data_points = 2;
}

message Sum {
  // aggregation_temporality describes if the aggregator reports delta changes
  // since last report time, or cumulative changes since a fixed start time.
  AggregationTemporality aggregation_temporality = 2;

  // If "true" means that the sum is monotonic.
  bool is_monotonic = 3;

  repeated IntDataPoint int_data_points = 4;
  repeated DoubleDataPoint double_data_points = 5;
}

message Histogram {
  // aggregation_temporality describes if the aggregator reports delta changes
  // since last report time, or cumulative changes since a fixed start time.
  AggregationTemporality aggregation_temporality = 2;

  repeated IntDataPoint int_data_points = 3;
  repeated DoubleDataPoint double_data_points = 4;
}

message Summary {
  repeated DoubleDataPoint double_data_points = 1;
}

bogdandrutu · 2021-03-03T22:03:46Z

For Summary see #269 and for Historgram see #270

bogdandrutu · 2021-03-03T22:05:54Z

@victlu I kind of like your proposal, but I would like to understand how do you see this as fixing an issue? I think this is a nice cosmetic improvement, but I am not clear if this solves a real issue or we see this as just an cosmetic improvement :)

Update:
@victlu how do we deal with a case where both repeated fields contain values?

bogdandrutu · 2021-03-03T22:08:02Z

@jmacd what do you think about this improvement in readability #264 (comment) ?

victlu · 2021-03-04T00:05:31Z

The change I'm trying to affect is to decouple the semantic of an instrument (i.e. Counter vs Gauge vs Histogram) from the details of the data type (bytes/int/long/double/float/etc). I think it may have these beneficial consequences.

Help to clarify the "Identity" of a metric / data collection. It's not based on data type but rather on semantic.
Can translate to a more simplified API surface (and resolve conflict for type-less languages)
Allows easier future growth of new data type timeseries (i.e. I want bytes or BigInts or Sketches), while retaining the scope and semantic of the instrument. Potentially allowing for mapping of data to more optimal encoding.
Allow vendors freedom to merge or keep separate these timeseries (@bogdandrutu response to repeated fields containing values)
Minimize on code/performance changes

victlu · 2021-03-04T00:16:40Z

Separately, I think we should group by Labels first before we have DataPoints to minimize on buffer size. I'll file a separate issue for this.

jmacd · 2021-03-04T00:38:35Z

@jmacd what do you think about this improvement in readability?

I like it, but it raises the question you asked.

@victlu how do we deal with a case where both repeated fields contain values?

That's a tough one. We discussed in yesterday's data model meeting that SDKs should not generate conflicting types (loosely defined), and we discussed that we would like the collector to pass-through conflicting types (again loosely defined) as if they were separate metrics. This appearance of both double and integer repeated fields makes it possible to have a single Metric that looks "not separate", in the sense that it contains multiple number types. I think if we allow what @victlu proposes, we should firmly state that OTel welcomes you to mix integer and double metrics--that this is not a semantic conflict--just that we would like you not to do so in the same SDK.

Naturally we could use a oneof to prevent mixing integer and double values. We did a lot of benchmarking last summer to talk ourselves out of oneof messages as much as possible.

Even when we allow mixing repeated integer- with repeated double-values, it will be difficult to work with this mixture of values without combining and interleaving them (sort by time) into an intermediate representation, which probably brings up the problem discussed next.

Separately, I think we should group by Labels first before we have DataPoints to minimize on buffer size. I'll file a separate issue for this.

Yes, let's keep that separate. I recall a historical conversation, but would prefer not to dig up old threads. @bogdandrutu do have thoughts? I believe we were avoiding additional message layers where possible. The best argument in favor of the current design that I remember is that an SDK should never output more than one point per metric and label set per collection period. So, the extra layer for label set will always have 1 entry in the encoding @victlu proposes, under a simple collection model. I haven't thought through the implications of this in the collector.

I think the greater win would be to move labels and label sets into separate regions in the Export request, with a more-compact encoding.

jsuereth · 2021-03-04T20:59:20Z

@victlu When you first mentioned this I was skeptical, but I do like the simplicity proposed here. My only concern right now is the amount of churn this could cause right now. Ideally this should be done via a "deprecation" and "transition" period, as we DO have implementations that folks are trying to use with metrics.

Also to @jmacd and @bogdandrutu IIUC "oneof" is slow in implementation in Go, right? That's not a fundamental limitation in protocol buffers AFAIK, it's basically just the same as "optional" but with some runtime checks.

jmacd · 2021-03-05T17:17:46Z

Like @jsuereth I'm worried about churn. I'd like to connect the question posed here, about reducing the number of types inside the Metric oneof with the question @bogdandrutu poses about histograms in #259. I propose we adopt the same approach for both of these, whichever it is.

On the one hand, we can have just the four semantically distinct types (Counter, Gauge, Histogram, Summary) and push all their variation into the types themselves. This is Victor's proposal for Gauge/Sum, and this is my proposal for Histogram in #272.

On the other hand, we can go with a distinct type for each variation in the Metric oneof. Right now, we have a distinct type for IntGauge and DoubleGauge, for example, and the question is whether we should keep that approach or not. If we keep that approach, then @bogdandrutu's point in #259 should be addressed and each distinct form of Histogram buckets gets its own type.

jsuereth · 2021-03-05T19:23:58Z

Agree w/ @jmacd on consistency between histogram + metric types. If we go with #272 then we should also take this proposal/direction (albeit I'd like to see a smooth transition accompany the propsoal).

@victlu If you want to take a crack at what migration from existing OTLP to this proposal looks like, that'd be good. This will disrupt a lot of the ecosystem in OTel and I know there's already some users of metrics where such a large change could break them if it's not done smoothly. I realize metrics isn't marked "stable" but users are users. If you want help thinking through a smooth-transition plan, I'm happy to take time to brainstorm with you.

victlu · 2021-03-05T20:18:43Z

I'll take a crack at it. @jsuereth I'll reach out on slack as I am sure I'm not up-to-speed on all the intricacies of this task.

hdost · 2021-03-05T21:28:07Z

I'd agree with the approach similar to #272 add the new form with simplistic names.
I assume there would need to be changes in the OLTP exporter. Should there be a a relate ticket for the "supported" languages? Which of those are there?
** Maybe a config option added to the exporter in order to switch them in the short term, dual write, etc.
How long of a transition period would be needed? If there's supposed to be a stable Model https://github.com/open-telemetry/opentelemetry-specification/milestone/16 1 month seems a little tight given a number of the open areas.

Edit:
I have created an issue #275 discuss this further since it seems to span a few issues.

victlu · 2021-03-09T01:21:10Z

Proposal for smooth transition for this (and other) protocol changes.

Migrating OTLP protocol

Strategy

To deliver OTLP protocol changes, the following proposed steps should be followed
to achieve a smooth transition/migration for all affected components.

Apply changes to OTLP protocol
- Include new OTLP messages (must be "additive" and expect no incompatibilites
  with existing components)
- Comment/mark old OTLP messages "depricated" with expected transition date
- Release changes
  - Coordinate multiple changes into appropriate batches as necessary
Update and release all dependant components
- Components to remove support for old messages/protocol
- Components to support new messages/protocol
Wait till after transition date
- Review all affected components are updated
- Remove old OTLP messages marked "depricated"
- Release final changes
Repeat as necessary

For issue #264

Mark old messages as "depricated" and add new messages...

oneof data {
    // DEPRICATED BY 4/15/2021
    IntGauge int_gauge = 4;
    DoubleGauge double_gauge = 5;
    IntSum int_sum = 6;
    DoubleSum double_sum = 7;
    IntHistogram int_histogram = 8;
    DoubleHistogram double_histogram = 9;
    DoubleSummary double_summary = 11;

    // New messages
    Gauge guage = 12;
    Sum sum = 13;
    Summary summary = 14;
    Histogram histogram = 15;
}

Add new messages...

message Gauge {
  repeated IntDataPoint int_data_points = 1;
  repeated DoubleDataPoint double_data_points = 2;
}

message Sum {
  // aggregation_temporality describes if the aggregator reports delta changes
  // since last report time, or cumulative changes since a fixed start time.
  AggregationTemporality aggregation_temporality = 2;

  // If "true" means that the sum is monotonic.
  bool is_monotonic = 3;

  repeated IntDataPoint int_data_points = 4;
  repeated DoubleDataPoint double_data_points = 5;
}

message Histogram {
  // aggregation_temporality describes if the aggregator reports delta changes
  // since last report time, or cumulative changes since a fixed start time.
  AggregationTemporality aggregation_temporality = 2;

  repeated IntDataPoint int_data_points = 3;
  repeated DoubleDataPoint double_data_points = 4;
}

message Summary {
  repeated DoubleDataPoint double_data_points = 1;
}

jsuereth · 2021-03-09T13:59:01Z

@victlu I think the transition is a bit more dramatic here. Remember that OTLP is an in-flight protocol and deprecations have a slow-rollout between producers/consumers, so you need to call that out in the transition. I.e. what you define may be a little too aggressive/churny for end users.

Here's an abridged update to what you had. I think maybe this deserves a full OTEP, but for the purposes of your change we just need to agree to the general outline.

Migrating OTLP protocol

Strategy

To deliver OTLP protocol changes, the following proposed steps should be followed
to achieve a smooth transition/migration for all affected components.

Apply changes to OTLP protocol
- Include new OTLP messages (must be "additive" and expect no incompatibilites
  with existing components)
  - For the Protocol Buffer protocol this means message names can change, but not numbers
  - For JSON protocol, definition is TBD. (Note: JSON is not marked stable yet)
- Denote all deprecations semantics
  - Mark fields/messages as deprecated
    - for PB use [deprecated=true], optionally rename field to deprecated_
    - for JSON, TBD.
  - Annotate interpretation semantics for deprecated fields:
    - If necessary describe how to interpret from deprecated field to new field
    - Define who is allowed to use deprecated semantics. E.g. "As of release X, all new
      clients may not use this field"
- Release changes
  - Coordinate multiple changes into appropriate batches as necessary
Update all consumer components of OTLP
- Update "core" code to use "new" fields/messages.
- Ensure components appropriately transition from deprecated messages/fields to "new" or "replacement" fields/messages.
- Consumers MUST NOT remove deprecated field handling until EOL of the protocol they were introduced in.
Update all producers of OTLP
- Update all producers of OTLP (SDKs/Exporters) to only produce new/replacement fields/messages
- All producers MUST NOT use deprecated messages to declare OTLP version compliance
Wait till EOL date for deprecation
- Remove handling from consumer components of OTLP and release them.
- Remove old OTLP messages marked "deprecated"
- Release final changes
Repeat as necessary

jmacd · 2021-03-15T19:16:57Z

Please review #278.

victlu · 2021-03-15T21:31:32Z

With #278, my concern has been resolved! Thank you @jmacd .

jmacd · 2021-03-15T22:11:12Z

Let's leave this open until the PR merges 😀

Oberon00 transferred this issue from open-telemetry/opentelemetry-specification Feb 23, 2021

Oberon00 added the spec:metrics label Feb 23, 2021

jmacd added the area:data-model label Feb 23, 2021

jmacd changed the title ~~Decouple Data type from Instrument Kind in OTLP protocol~~ Decouple Number Type from Point Kind in OTLP protocol Feb 23, 2021

jmacd mentioned this issue Mar 5, 2021

Create a oneof for Histogram buckets; add ExplicitBuckets; deprecate DoubleHistogram #272

Closed

hdost mentioned this issue Mar 6, 2021

metrics: Meta issue about Data Model Stability Guarantees #275

Closed

jmacd mentioned this issue Mar 15, 2021

Move integer/double variation into NumberDataPoint #278

Merged

victlu closed this as completed Mar 15, 2021

jmacd reopened this Mar 15, 2021

jsuereth added this to Types, Schema and Metadata in Spec - Metrics Data Model and Protocol Mar 16, 2021

bogdandrutu closed this as completed in #278 Mar 19, 2021

jsuereth moved this from Types, Schema and Metadata to Done in Spec - Metrics Data Model and Protocol Mar 22, 2021

victlu mentioned this issue Apr 9, 2021

REQUEST: New membership for victlu open-telemetry/community#710

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple Number Type from Point Kind in OTLP protocol #264

Decouple Number Type from Point Kind in OTLP protocol #264

victlu commented Feb 17, 2021

rakyll commented Feb 22, 2021

jmacd commented Feb 23, 2021

bogdandrutu commented Feb 24, 2021

victlu commented Feb 24, 2021 •

edited

Loading

victlu commented Feb 24, 2021

bogdandrutu commented Feb 24, 2021

rakyll commented Feb 24, 2021

victlu commented Mar 2, 2021 •

edited

Loading

victlu commented Mar 2, 2021 •

edited

Loading

bogdandrutu commented Mar 3, 2021

bogdandrutu commented Mar 3, 2021 •

edited

Loading

bogdandrutu commented Mar 3, 2021

victlu commented Mar 4, 2021 •

edited

Loading

victlu commented Mar 4, 2021 •

edited

Loading

jmacd commented Mar 4, 2021 •

edited

Loading

jsuereth commented Mar 4, 2021

jmacd commented Mar 5, 2021 •

edited

Loading

jsuereth commented Mar 5, 2021

victlu commented Mar 5, 2021

hdost commented Mar 5, 2021 •

edited

Loading

victlu commented Mar 9, 2021

jsuereth commented Mar 9, 2021

jmacd commented Mar 15, 2021

victlu commented Mar 15, 2021 •

edited

Loading

jmacd commented Mar 15, 2021

Decouple Number Type from Point Kind in OTLP protocol #264

Decouple Number Type from Point Kind in OTLP protocol #264

Comments

victlu commented Feb 17, 2021

rakyll commented Feb 22, 2021

jmacd commented Feb 23, 2021

bogdandrutu commented Feb 24, 2021

victlu commented Feb 24, 2021 • edited Loading

victlu commented Feb 24, 2021

bogdandrutu commented Feb 24, 2021

rakyll commented Feb 24, 2021

victlu commented Mar 2, 2021 • edited Loading

victlu commented Mar 2, 2021 • edited Loading

bogdandrutu commented Mar 3, 2021

bogdandrutu commented Mar 3, 2021 • edited Loading

bogdandrutu commented Mar 3, 2021

victlu commented Mar 4, 2021 • edited Loading

victlu commented Mar 4, 2021 • edited Loading

jmacd commented Mar 4, 2021 • edited Loading

jsuereth commented Mar 4, 2021

jmacd commented Mar 5, 2021 • edited Loading

jsuereth commented Mar 5, 2021

victlu commented Mar 5, 2021

hdost commented Mar 5, 2021 • edited Loading

victlu commented Mar 9, 2021

Migrating OTLP protocol

Strategy

For issue #264

jsuereth commented Mar 9, 2021

Migrating OTLP protocol

Strategy

jmacd commented Mar 15, 2021

victlu commented Mar 15, 2021 • edited Loading

jmacd commented Mar 15, 2021

victlu commented Feb 24, 2021 •

edited

Loading

victlu commented Mar 2, 2021 •

edited

Loading

victlu commented Mar 2, 2021 •

edited

Loading

bogdandrutu commented Mar 3, 2021 •

edited

Loading

victlu commented Mar 4, 2021 •

edited

Loading

victlu commented Mar 4, 2021 •

edited

Loading

jmacd commented Mar 4, 2021 •

edited

Loading

jmacd commented Mar 5, 2021 •

edited

Loading

hdost commented Mar 5, 2021 •

edited

Loading

victlu commented Mar 15, 2021 •

edited

Loading