Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A72: OpenTelemetry Tracing #389

Merged
merged 36 commits into from
Jun 20, 2024
Merged

A72: OpenTelemetry Tracing #389

merged 36 commits into from
Jun 20, 2024

Conversation

YifeiZhuang
Copy link
Contributor

@YifeiZhuang YifeiZhuang commented Aug 22, 2023

Design doc: go/grpc-open-telemetry-tracing-design

cc. @yashykt @zasweq @XuanWang-Amos

@yashykt
Copy link
Member

yashykt commented Sep 13, 2023

I don't see the span name mentioned here. I was going to ask whether we can be consistent on the naming. @dfawley had suggested we "rcvd" instead of "recv" as the shortform for received in the OTel metrics #380

@YifeiZhuang
Copy link
Contributor Author

I don't see the span name mentioned here. I was going to ask whether we can be consistent on the naming. @dfawley had suggested we "rcvd" instead of "recv" as the shortform for received in the OTel metrics #380

The span name will inherit what census uses, like in server Recv.<service name>.<method name>, client side Sent.<service name>.<method name> (A45), as mentioned in New Function in OpenTelemetry Plugin

In the new function we will produce the same tracing information as we produce for Census

But it looks A45 does not mention Recv at all.
rcvd seems fine. When there is decision there, we can certainly make the name consistent here.

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
YifeiZhuang and others added 5 commits December 19, 2023 15:05
ejona's language suggestion

Co-authored-by: Eric Anderson <ejona@google.com>
1. [done] Add that OpenTelemetry module should document that things/configurations will change as OT changes.
2. [done] Move C++ fast path to rationale. Capture two alternatives about the fast path that may have in the future.
3. [done] Update C++ API about adding a new method in OpenTelemetryPluginBuilder, and reference to the current code and the gRFC.
4. [todo] complete the C++ GrpcTraceBinPropagator and GrpcTextMapCarrier
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few details left here!

PLMK if you have any questions. Thanks!

A72-open-telemetry-tracing.md Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
Copy link
Member

@yashykt yashykt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments are mostly minor assuming we can easily change the formatting of the event messages and attributes

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Show resolved Hide resolved
events for compressed/uncompressed message sizes, the same below)
with name "Outbound message sent" and the following attributes:
* key `message.event.type` with string value "SENT".
* key `message.message.id` with integer value of the seq no. The seq no. indicates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how was the namespacing for "message.message.id" decided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was from the Otel java opencensus shim. Languages can be different.
I think whenever possible, we can make the name generated from the shim and the one generated directly from otel be consistent. This is not happening because the naming in our census needs improvements. And I am fine with that. @markdroth @ejona86

(https://github.com/open-telemetry/opentelemetry-java/blob/main/opencensus-shim/src/main/java/io/opentelemetry/opencensusshim/SpanConverter.java#L24-L27)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to copy the shim's behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed new names. Let's move forward and fix/vote in API stabilization.

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one issue remaining here!

PLMK if you have any questions. Thanks!

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the edits! This looks great!

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have. I think yashykt had some good comments.

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
* key `message.message.id` with integer value of the seq no. The seq no. indicates
the order of the sent messages on the attempt (i.e., it starts at 0 and is
incremented by 1 for each message sent), the same below.
* key named `message.event.size.uncompressed` if the message needs compression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"needs compression" is a bit difficult. I assume that is "if this message will be compressed." But I'm not certain all languages can do this (know before the message is compressed whether the message will be compressed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashykt could you help confirm whether C++ is able to tell the message needs compression or not when it is reporting the first Event?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric is right. Till we actually go and compress, we won't know whether we will compress it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is illegal to have an attribute named message.event.size if there is another attribute message.event.size.uncompressed where message.event.size is the namespace.

Refer https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is illegal to have an attribute named message.event.size if there is another attribute message.event.size.uncompressed where message.event.size is the namespace.

Good point. I have two naming proposals:

  1. Extending from opencensus and use "-" to form the message: e.g. "message-sequence-no", "message-size","message-size-compressed".
  2. Follow ot spec and use ".", then we may end up with "grpc.previous.rpc.attempts", "grpc.transparent.retry", "grpc.message.sequence.no", "grpc.message.size", "grpc.compressed.message.size", or similar.

I'm fine either way. Does anyone have any other naming suggestions?

A72-open-telemetry-tracing.md Show resolved Hide resolved
A72-open-telemetry-tracing.md Show resolved Hide resolved
events for compressed/uncompressed message sizes, the same below)
with name "Outbound message sent" and the following attributes:
* key `message.event.type` with string value "SENT".
* key `message.message.id` with integer value of the seq no. The seq no. indicates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to copy the shim's behavior.

* key `message.message.id` with integer value of the seq no. The seq no. indicates
the order of the sent messages on the attempt (i.e., it starts at 0 and is
incremented by 1 for each message sent), the same below.
* key named `message.event.size.uncompressed` if the message needs compression,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashykt could you help confirm whether C++ is able to tell the message needs compression or not when it is reporting the first Event?

* key `message.message.id` with integer value of the seq no. The seq no. indicates
the order of the received messages on the attempt (i.e., it starts at 0 and is
incremented by 1 for each message received), the same below.
* key named `message.event.size.compressed` if the message needs decompression,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashykt same here for inbound.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the inbound message, we know from the wire format whether the message was compressed, so this is fine.

But, it is illegal to have an attribute named message.event.size if there is another attribute message.event.size.uncompressed where message.event.size is the namespace.

* When the application sends an outbound message, add Event(s)
(it depends on implementation whether there is a single event or an additional
separate event with name "Outbound message compressed" for compressed message size)
with name "Outbound message sent" and the following attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about this earlier, but "sent" in "Outbound message sent" is confusing. Can we drop it?
For C++, this would otherwise mean that we would first see -
"Outbound message sent" and then "Outbound message compressed"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Outbound message sent" and then "Outbound message compressed"

It looks fine. "sent" means this trace happens somewhere from application to the wire, and we made that vague and general(decision made from a discussion). Are you confused because compression should not happen after "sent"?

* key `message.message.id` with integer value of the seq no. The seq no. indicates
the order of the received messages on the attempt (i.e., it starts at 0 and is
incremented by 1 for each message received), the same below.
* key named `message.event.size.compressed` if the message needs decompression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the inbound message, we know from the wire format whether the message was compressed, so this is fine.

But, it is illegal to have an attribute named message.event.size if there is another attribute message.event.size.uncompressed where message.event.size is the namespace.

* key `message.message.id` with integer value of the seq no. The seq no. indicates
the order of the sent messages on the attempt (i.e., it starts at 0 and is
incremented by 1 for each message sent), the same below.
* key named `message.event.size.uncompressed` if the message needs compression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is illegal to have an attribute named message.event.size if there is another attribute message.event.size.uncompressed where message.event.size is the namespace.

Refer https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/

@YifeiZhuang
Copy link
Contributor Author

@yashykt Thanks for the review. Eric and I made improvements on the otel message size reporting details (mainly 8afa317), please take a look and comment whether it is feasible in C++.

cc. @ejona86 @dfawley

A72-open-telemetry-tracing.md Outdated Show resolved Hide resolved
@YifeiZhuang YifeiZhuang merged commit 596b629 into grpc:master Jun 20, 2024
1 check passed
@YifeiZhuang YifeiZhuang deleted the ot-tracing branch June 20, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants