Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTLP Trace Data Format specification #59

Merged

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Oct 11, 2019

This is a continuation of OTLP RFC proposal #35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use make benchmark-encoding target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

  • Attribute key/value pairs are represented as a list rather than as a map.
    This results in significant performance gains and at the same time changes
    the semantic of attributes because now it is
    makes possible to have multiple attributes
    with the same key. This is also in-line with Jaeger's tags representation.

  • Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
    significant performance improvements for certain payload compositions (e.g. lots of
    TimedEvents).

  • Resource labels use the same data type as Span attributes which now allows
    to have labels with other data types (OpenCensus only allowed strings).

@z1c0
Copy link
Contributor

z1c0 commented Oct 14, 2019

... at the same time changes the semantic of attributes ...

This requires a dedicated PR against the specification repo.

@Oberon00
Copy link
Member

Oberon00 commented Oct 14, 2019

Or a dedicated OTEP, with a title like "Change SetAttribute to record all values instead of just the last one."

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about one of the stated requirements of OTLP, namely the augmenting and partial inspection. How are those achieved with proto format?

Ideally it must also support very fast pass-through mode (when no modifications to the data are needed), fast “augmenting” or “tagging” of data and partial inspection of data (e.g. check for presence of specific tag). These requirements help to create fast Agents and Collectors.

text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0000-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Show resolved Hide resolved
Copy link
Member Author

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... at the same time changes the semantic of attributes ...

This requires a dedicated PR against the specification repo.

@z1c0 @Oberon00

I think this part of my PR description is poorly worded. The attribute semantic is defined by the API/Specification. The SDK Exporter is then tasked with translating the semantics to a correct implementation when implementing a particular protocol. OTLP and this PR allows to have multiple attributes of the same value but nothing requires that the semantics are different from what is defined by OpenTelemetry API (the implementations are free to enforce uniqueness of the attributes).

I am not proposing a semantic change of the API in this PR and agree such a change definitely requires a dedicated PR.

@tigrannajaryan
Copy link
Member Author

I am curious about one of the stated requirements of OTLP, namely the augmenting and partial inspection. How are those achieved with proto format?

@yurishkuro

There is no in-principle improvement in this area compared to OpenCensus. However because encoding and decoding are significantly faster one can argue that we made progress in this area.

I had more ambitious goals when I wrote the requirement and it was the reason I explored Capnproto and other encodings but ended up not going in that direction due to reasons outlined in #35.

I think this goal still holds and we can come up with a better protocol. What we have here is a good starting point for the first release though.

@yurishkuro
Copy link
Member

If you still think those additional goals are achievable in the future, we may want to list them in the otep as still-to-be-met.

@tigrannajaryan
Copy link
Member Author

If you still think those additional goals are achievable in the future, we may want to list them in the otep as still-to-be-met.

@yurishkuro I added a note.

@tigrannajaryan
Copy link
Member Author

I changed the status to approved, so that this gets merged in approved status (when it gets merged). If I understand correctly this is the right flow for RFCs.

@Oberon00
Copy link
Member

Actually, in the current flow you can remove the "Status" field entirely in the OTEP and PR merged=accepted.

tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this pull request Oct 24, 2019
This change applies the refinement approach that is already performed on Traces
Protobuf definitions as part of open-telemetry/oteps#59 and
which proved to yield significant performance improvements.

Notable changes are:
- Replace google.protobuf.Timestamp by int64 time in unix epoch nanoseconds.
- Eliminate unnecessary messages, move the fields into the containing message.
- Replace oneof by a set of fields.

Simple benchmark in Go demonstrates the following improvement compared to OpenCensus
Metrics encoding and decoding:

```
BenchmarkEncode/OpenCensus/Metrics-8         	       2	 645252504 ns/op
BenchmarkEncode/OTLP/Metrics-8               	       4	 288457433 ns/op
BenchmarkDecode/OpenCensus/Metrics-8                   1	1154650804 ns/op
BenchmarkDecode/OTLP/Metrics-8                         3	 475370617 ns/op
```

Encoding is about 2.2 times faster, decoding is about 2.4 times faster.

Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series
and one Histogram of doubles with 1 time series and single bucket. Each time series for
both metrics contains 5 data points. Both metrics have 2 labels.

Benchmark source code is available at:
https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go
tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this pull request Oct 30, 2019
This change applies the refinement approach that is already performed on Traces
Protobuf definitions as part of open-telemetry/oteps#59 and
which proved to yield significant performance improvements.

I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds.

Simple benchmark in Go demonstrates the following improvement of encoding and decoding
compared to the current state:

```
===== Encoded sizes
Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/MetricOne              20000 bytes  [1.000], gziped 1506 bytes  [1.000]
Proposed/MetricOne              18250 bytes  [1.096], gziped 1433 bytes  [1.051]

Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/MetricSeries           51797 bytes  [1.000], gziped 6455 bytes  [1.000]
Proposed/MetricSeries           43047 bytes  [1.203], gziped 6093 bytes  [1.059]

goos: darwin
goarch: amd64
pkg: github.com/tigrannajaryan/exp-otelproto/encodings
BenchmarkEncode/Baseline/MetricOne-8         	      30	 186998840 ns/op
BenchmarkEncode/Proposed/MetricOne-8         	      36	 166668705 ns/op

BenchmarkEncode/Baseline/MetricSeries-8      	       8	 632391842 ns/op
BenchmarkEncode/Proposed/MetricSeries-8      	      10	 537384515 ns/op

BenchmarkDecode/Baseline/MetricOne-8         	      16	 348156010 ns/op	171896049 B/op	 4974000 allocs/op
BenchmarkDecode/Proposed/MetricOne-8         	      19	 314727259 ns/op	155096036 B/op	 4624000 allocs/op

BenchmarkDecode/Baseline/MetricSeries-8      	       5	1013035422 ns/op	440696048 B/op	11874000 allocs/op
BenchmarkDecode/Proposed/MetricSeries-8      	       6	 846887981 ns/op	356696040 B/op	10124000 allocs/op
```

It is 10-15% faster and is 10-20% smaller on the wire and in memory.

Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series
and one Histogram of doubles with 1 time series and single bucket. Each time series for
both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries).
Both metrics have 2 labels.

Benchmark source code is available at:
https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go
bogdandrutu pushed a commit to open-telemetry/opentelemetry-proto that referenced this pull request Oct 30, 2019
#33)

* Change timestamp data type to unixnano in Metrics Protobuf definitions

This change applies the refinement approach that is already performed on Traces
Protobuf definitions as part of open-telemetry/oteps#59 and
which proved to yield significant performance improvements.

I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds.

Simple benchmark in Go demonstrates the following improvement of encoding and decoding
compared to the current state:

```
===== Encoded sizes
Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/MetricOne              20000 bytes  [1.000], gziped 1506 bytes  [1.000]
Proposed/MetricOne              18250 bytes  [1.096], gziped 1433 bytes  [1.051]

Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/MetricSeries           51797 bytes  [1.000], gziped 6455 bytes  [1.000]
Proposed/MetricSeries           43047 bytes  [1.203], gziped 6093 bytes  [1.059]

goos: darwin
goarch: amd64
pkg: github.com/tigrannajaryan/exp-otelproto/encodings
BenchmarkEncode/Baseline/MetricOne-8         	      30	 186998840 ns/op
BenchmarkEncode/Proposed/MetricOne-8         	      36	 166668705 ns/op

BenchmarkEncode/Baseline/MetricSeries-8      	       8	 632391842 ns/op
BenchmarkEncode/Proposed/MetricSeries-8      	      10	 537384515 ns/op

BenchmarkDecode/Baseline/MetricOne-8         	      16	 348156010 ns/op	171896049 B/op	 4974000 allocs/op
BenchmarkDecode/Proposed/MetricOne-8         	      19	 314727259 ns/op	155096036 B/op	 4624000 allocs/op

BenchmarkDecode/Baseline/MetricSeries-8      	       5	1013035422 ns/op	440696048 B/op	11874000 allocs/op
BenchmarkDecode/Proposed/MetricSeries-8      	       6	 846887981 ns/op	356696040 B/op	10124000 allocs/op
```

It is 10-15% faster and is 10-20% smaller on the wire and in memory.

Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series
and one Histogram of doubles with 1 time series and single bucket. Each time series for
both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries).
Both metrics have 2 labels.

Benchmark source code is available at:
https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go

* Change timestamp from int64 to sfixed64

* Change timestamp interval from closed to open
@tmc
Copy link

tmc commented Nov 1, 2019

I think this discussion from the other repository is relevant: open-telemetry/opentelemetry-proto#11 (comment)

@tigrannajaryan
Copy link
Member Author

I think this discussion from the other repository is relevant: open-telemetry/opentelemetry-proto#11 (comment)

@tmc adding support for nested attribute values is easy. As soon as API/Spec decides to have such semantics the AttributeKeyValue in this proposal can be extended in backward compatible manner.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers please review. I believe there are no more outstanding issues and this is ready to be merged.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving as I think it's ok to merge, but raised a couple questions.

text/0059-otlp-trace-data-format.md Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Show resolved Hide resolved
text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers do I need more approvals to get this merged? There are no outstanding issues or comments in this PR anymore. I believe it is ready to be merged.

@bogdandrutu bogdandrutu merged commit 75ee7ba into open-telemetry:master Nov 7, 2019
@tigrannajaryan
Copy link
Member Author

@bogdandrutu thank you.

@Oberon00
Copy link
Member

Oberon00 commented Nov 8, 2019

@tigrannajaryan It seems you are currently working on applying the OTEP bit by bit to opentelemetry-proto. Is there an overview somewhere which parts are applied and which are missing?

@tigrannajaryan
Copy link
Member Author

@Oberon00 yes, I am doing it now. No, there is no overview, I don't have a formal list of changes anywhere, however the target is well defined, if there is an interest on how much progress is made you can diff OTEP and proto repo content.

@tigrannajaryan tigrannajaryan deleted the feature/tigran/dataschema branch November 12, 2019 20:32
tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this pull request Nov 22, 2019
This brings the improved comments from open-telemetry/oteps#59
bogdandrutu pushed a commit to open-telemetry/opentelemetry-proto that referenced this pull request Nov 22, 2019
This brings the improved comments from open-telemetry/oteps#59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants