Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support W3C - no relationship between parentid and spanid #1945

Closed
varania opened this issue May 11, 2021 · 17 comments
Closed

Support W3C - no relationship between parentid and spanid #1945

varania opened this issue May 11, 2021 · 17 comments

Comments

@varania
Copy link

varania commented May 11, 2021

Good afternoon,

In our project we have enabled W3C support for Sleuth. We have done some tests and we are surprised by the way the trace-id field of the traceparent is being generated.

Let's assume we have micro 1 and it receives the header:

  • traceparent: 00-609a534e112b3e7b920264c8e933aae5-2f644cc2fbb7d434-01

The TraceContext is initialized with the following values:

  • traceId: 609a534e112b3e7b920264c8e933aae5
  • spanId: 26e3ebadac5aeecc
  • parentId: 2f644cc2fbb7d434

Then micro1 invokes micro2 and receives:

  • traceparent: 00-609a534e112b3e7b920264c8e933aae5-2d179eaf853ae86c-01

The TraceContext associated with micro 2 will be:

  • traceId: 609a534e112b3e7b920264c8e933aae5
  • spanId: 9ffb18ac465f1ec1
  • parentId: 2d179eaf853ae86c

It is observed that during the execution of micro 1, Sleuth generates (for the outgoing request) a completely new parentId, disregarding the current spainId.

Why is the parentId not generated based on the spainId in a similar way as B3 behaves?... because with B3 it is easy to trace the relationship between micro1 and micro2 because the parentId of micro2 is the same as the spainId of micro1, but with the current W3C implementation how can I establish this relationship?

Thank you very much.

@marcingrzejszczak
Copy link
Contributor

marcingrzejszczak commented May 12, 2021

I adopted this class https://github.com/spring-cloud/spring-cloud-sleuth/blob/main/spring-cloud-sleuth-brave/src/main/java/org/springframework/cloud/sleuth/brave/bridge/W3CPropagation.java from https://github.com/open-telemetry/opentelemetry-java/blob/main/api/all/src/main/java/io/opentelemetry/api/trace/propagation/W3CTraceContextPropagator.java (I'm linking the current version, I have copied it a couple of months ago). I wonder if it has changed in the meantime, cause if it has then I have to adopt the changes too. Before we do a more thorough analysis of the issue. are you willing to do a short analysis of whether there are changes between the two?

@varania
Copy link
Author

varania commented May 13, 2021

Hi Marcin,

I have been checking the last couple of months and the only changes they seem to have introduced are related to some validations on the format of traceparent and tracestate and null control so they don't seem to affect the overall behavior.

To try to clarify further, what I expected to receive on micro 2 was a traceparent header with the following value: 00-609a534e112b3e7b920264c8e933aae5-26e3ebadac5aeecc-01

Honestly, I don't know if this is an issue or not, in which case I have to understand that it has been implemented this way for some reason. The specification seems somewhat ambiguous on this point but there are references describing the expected behavior, e.g. -> https://sgryphon.wordpress.com/2020/11/16/a-guide-to-w3c-trace-context/

Thank you very much.

@varania varania changed the title Support W3C - Always generate new parent-id Support W3C - no relationship between parentid and spanid May 13, 2021
@marcingrzejszczak
Copy link
Contributor

Hi Varania, do you have a small sample that replicates this issue? I don't want to guess what the problem might be (what comes to my mind is that a client span for e.g. a RestTemplate call was made so as a parent id you will not have the HTTP controller span but the RestTemplate one), I'd prefer to work on sth that has a confirmed issue.

@sr325
Copy link

sr325 commented Oct 4, 2021

Hi, I am getting the same error.

I am using spring-cloud-starter-sleuth : 3.0.3.

My scenario:
Microservice (MS)1 is a producer and sends the message to topic1.
MS2 picks up the message and sends to topic2.
MS3 picks up the message.
Here, MS2's parentID should be MS1's spanID and MS3's parentID should be MS2's spanID.
TraceID are same. spanID and parentID are all different therefore I cannot track the message in sequence.

To capture traceId, spanId and parentId, I am using Sleuth's @Autowired Tracer then tracer.currentSpan().context().traceId(), tracer.currentSpan().context().spanId() and tracer.currentSpan().context().parentId() respectively.

Is this issue fixed or are there any alternatives?

@marcingrzejszczak
Copy link
Contributor

Nothing is fixed cause I don't understand what the problem is. Can someone actually create a simple sample that reproduces the problem?

@varania
Copy link
Author

varania commented Oct 4, 2021

Hi Marcin, I will upload a simple sample this week.

@sr325
Copy link

sr325 commented Oct 4, 2021

I have uploaded a sample here, https://github.com/sr325/kafka-tracer.

Producer log
2021-10-04 21:02:48.686 INFO [producer,e2868d30510862fa,e2868d30510862fa] 27152 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.7.1
2021-10-04 21:02:48.693 INFO [producer,e2868d30510862fa,e2868d30510862fa] 27152 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 61dbce85d0d41457
2021-10-04 21:02:48.694 INFO [producer,e2868d30510862fa,e2868d30510862fa] 27152 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1633377768677

Consumer log
2021-10-04 21:02:50.611 INFO [consumer,e2868d30510862fa,4dd5d0ab97c76699] 5260 --- [ntainer#0-0-C-1] com.demo.consumer_one.KafkaConsumer : Tracer id is e2868d30510862fa
2021-10-04 21:02:50.612 INFO [consumer,e2868d30510862fa,4dd5d0ab97c76699] 5260 --- [ntainer#0-0-C-1] com.demo.consumer_one.KafkaConsumer : Span id is 4dd5d0ab97c76699
2021-10-04 21:02:50.612 INFO [consumer,e2868d30510862fa,4dd5d0ab97c76699] 5260 --- [ntainer#0-0-C-1] com.demo.consumer_one.KafkaConsumer : Parent id is cdb59ff94a162b3c
2021-10-04 21:02:50.612 INFO [consumer,e2868d30510862fa,4dd5d0ab97c76699] 5260 --- [ntainer#0-0-C-1] com.demo.consumer_one.KafkaConsumer : Consumed message hello world

Use case
In the above log, producer's traceId/SpanId is e2868d30510862fa therefore Consumer's parentId should be e2868d30510862fa but it is cdb59ff94a162b3c.

I have simply used spring-cloud-sleuth to generate traces and @Autowired Sleuth's tracer to generate traceId, spanId and parentId. Thanks.

@marcingrzejszczak
Copy link
Contributor

I don't think I follow. I added zipkin to your sample and see the following:

image
The rest template span Span ID: e64b42279a413017 Parent ID: None

image
The kafka template span Span ID: 3fab62ba62027fd7 Parent ID: e64b42279a413017 . So this is the child of the rest template span.

image
The listener container span Span ID: 98271bd42f0a4c2c Parent ID: 3fab62ba62027fd7. So this is the child of the kafka template span.

image
The actual on message call on the consumer side Span ID: bad331619fe58973 Parent ID: 98271bd42f0a4c2c. So this is the child of the listener container span.

Can you explain what is wrong in your opinion?

@sr325
Copy link

sr325 commented Oct 6, 2021

I am communicating via Kafka 2.7.0 (kafka_2.12-2.7.0.tgz).

Problem statement:
Producer log, traceId is 3a105242f3aa175d and spanId is 3a105242f3aa175d

image

Consumer log, traceId is 3a105242f3aa175d and parentId is 654c6c202519f923. Here parentId is not the spanId from producer as expected. I am using org.springframework.cloud.sleuth.Tracer to trace these ids.

image

image

@marcingrzejszczak
Copy link
Contributor

So parent 654... is from the poll span that you can see in my message. You're pointing to a span that in my example is the last one. Please check my message again where you have full explanation of all 4 spans.

@varania
Copy link
Author

varania commented Oct 7, 2021

Hello again Marcin,

I apologize for not being able to respond until now. I have done a test with spring-boot 2.5.5 and spring-cloud 2020.0.4 and the W3C support seems to work correctly, the problem I see is that the behavior seems different from B3.

When you enable B3 support:

curl -si -H "x-b3-traceid: 765eee11cf9ac83a" -H "x-b3-spanid: 22658a5fa2373b1e" -H "x-b3-parentspanid: 33f77175916ae247" -H "x-b3-sampled: 1" -X GET http://localhost:8080/trace1

image

image

image

The same test with W3C support:

curl -si -H "traceparent: 00-609a534e112b3e7b920264c8e933aae5-2f644cc2fbb7d434-01" -X GET http://localhost:8080/trace1

image

image

image

image

It seems that when W3C is enabled an additional span is added that with B3 is not. is this the expected behavior?
I attach the code I used to perform the test. For the invocation I use WebClient instead of RestTemplate.

Thank you very much.
w3c.zip

@subhajitgoswami
Copy link

So parent 654... is from the poll span that you can see in my message. You're pointing to a span that in my example is the last one. Please check my message again where you have full explanation of all 4 spans.

@marcingrzejszczak : could you please elaborate 'poll' span . Also as you can see w.r.t @sr325 comment, 654c6c202519f923( consumer parent id) != 3a105242f3aa175d(producer spanId). Want to understand how 654c6c202519f923 is pointing to span id that in your example is last one.

Your last component has parent Id=98271bd42f0a4c2c which looks fine as it follows up with the listener container span id. But in case of @sr325, consumer parentId(654c6c202519f923) != producer spanId (3a105242f3aa175d).

If possible can you the github link for the change you made.

@marcingrzejszczak
Copy link
Contributor

Your last component has parent Id=98271bd42f0a4c2c which looks fine as it follows up with the listener container span id. But in case of @sr325, consumer parentId(654c6c202519f923) != producer spanId (3a105242f3aa175d).
If possible can you the github link for the change you made.

I haven't done additional changes. I've just used the latest versions. I'll look into the W3C example soon.

@marcingrzejszczak
Copy link
Contributor

It's been some time now but I think I remember what the problem was @subhajitgoswami . B3 always having a shared span. That means that you have the same span on the client and on the server side. With W3C the shared span is not allowed. That means that always a new span will be created on the server side. Here we're talking about Kafka but have you observed a similar "problem" with HTTP?

@spring-cloud-issues
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-cloud-issues
Copy link

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@lpozo16
Copy link

lpozo16 commented Oct 20, 2022

Hi guys. As @marcingrzejszczak said, W3C does not consider share de spanId. In my opinion, this is a really annoying limitation, because it makes it difficult or even impossible to correlate logs if an APM agent is tracking your service.

The reason is that you don't have access to the context in which the spanId was created, so if you want to generate a correlated log to send to another log analysis platform than your APM, you can't.

And I think this puts at risk one of the reasons why the W3C Trace Context was created: to open up traceability and avoid vendors locking.

What do you think? Is there any way to raise this thought to the owners of the W3C Trace Context specification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants