Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka plugin, multiple consumer issue, UI/spanId related. #7212

Closed
1 task done
yjqg6666 opened this issue Sep 3, 2020 · 31 comments
Closed
1 task done

Kafka plugin, multiple consumer issue, UI/spanId related. #7212

yjqg6666 opened this issue Sep 3, 2020 · 31 comments
Assignees
Milestone

Comments

@yjqg6666
Copy link
Contributor

yjqg6666 commented Sep 3, 2020

Prerequisites

  • I have checked the FAQ, and issues and found no answer.

What version of pinpoint are you using?

2.0.4

Describe the bug

When one app send a kafka message triggered by a http request, multiple apps consume the same message and process the message, then these apps make another http request to a http service. The whole thing could be described as the following server map (two outbounds).
image

issue 1:
The main server map could not show correctly(the consumer node could be omitted or linked) when one outbound is chosen as the screenshot below.

image

issue 2:
The app consumer-verify1, consumer-verify2 and consumer-verify3 each called verify-http-server-dev. They all should be linked to the verify-http-server-dev node and the call counts should be 1 for each.
image

The server map for this transaction should look like the following:

image

image

The request headers:

verify1 -> http-server

GET /api/verify1?seq=msg_1599102848086 HTTP/1.0
Host: verify.example.com
Connection: close
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
Accept-Encoding: gzip,deflate
Pinpoint-TraceID: pub-loc1^1599102239212^2
Pinpoint-SpanID: 109811431385031189
Pinpoint-pSpanID: 6902915841584694206
Pinpoint-Flags: 0
Pinpoint-pAppName: consumer-verify1
Pinpoint-pAppType: 1210
Pinpoint-Host: verify.example.com

verify2 -> http-server

GET /api/verify2?seq=msg_1599102848086 HTTP/1.0
Host: verify.example.com
Connection: close
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
Accept-Encoding: gzip,deflate
Pinpoint-TraceID: pub-loc1^1599102239212^2
Pinpoint-SpanID: 4440757486945348215
Pinpoint-pSpanID: 6902915841584694206
Pinpoint-Flags: 0
Pinpoint-pAppName: consumer-verify2
Pinpoint-pAppType: 1210
Pinpoint-Host: verify.example.com

verify3 -> http-server

GET /api/verify3?seq=msg_1599102848086 HTTP/1.0
Host: verify.example.com
Connection: close
User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
Accept-Encoding: gzip,deflate
Pinpoint-TraceID: pub-loc1^1599102239212^2
Pinpoint-SpanID: -1471803627387986967
Pinpoint-pSpanID: 6902915841584694206
Pinpoint-Flags: 0
Pinpoint-pAppName: consumer-verify3
Pinpoint-pAppType: 1210
Pinpoint-Host: verify.example.com

They all had the same Pinpoint-pSpanID but got different Pinpoint-pAppName. The Pinpoint-pSpanIDs should be different.

@yjqg6666 yjqg6666 changed the title Kafka plugin, multiple consumer issue, UI related. Kafka plugin, multiple consumer issue, UI/spanId related. Sep 3, 2020
@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 3, 2020

The main server map got this issue when there is only one consumer.
image

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 3, 2020

The web side got these logs.

2020-09-03 13:29:40 WARN one to N replaced. node:verify-pub(STAND_ALONE:1000)->host:kafka.example.com:9092 accept:[AcceptApplication{host='kafka.example.com:9092', application=consumer-verify1(SPRING_BOOT:1210)}, AcceptApplication{host='kafka.example.com:9092', application=kafka.example.com:9092(KAFKA_CLIENT:8660)}]
2020-09-03 13:29:40 ERROR targetLinkData not found findLinkKey:LinkKey{fromApplication='verify-pub', fromServiceType=STAND_ALONE, toApplication='consumer-verify1', toServiceType=SPRING_BOOT, hash=870137464}

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 3, 2020

The config for the kafka plugin:

profiler.kafka.producer.enable=true
profiler.kafka.consumer.enable=true
profiler.springkafka.consumer.enable=true
profiler.kafka.consumer.entryPoint=

@koo-taejin
Copy link
Member

hi @yjqg6666

In my guessing, this is probably due to that Queue is a virtual node.
When draws Servermap, it is expected that the application after the queue will be included, because Queue is a virtual node, it is not included as depth,
In this case, since the line is limited by the depth, it is assumed that it is not drawn together and only the nodes that has no connection are expressed .

This is my guess, and the exact problem seems to need analysis.

thanks :)

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 3, 2020

@koo-taejin thanks for the quick reply. How about the second one? The pSpanID may came from the kafka header and used for three different applications. I think it's a hard-to-solve issue. How about replace with new spanIds on the web side?

@koo-taejin
Copy link
Member

@yjqg6666
The queue type is a little different from the original design, and there are a little problem in processing the queue type.
I will also consider it.

Thanks for your reporting

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 7, 2020

@koo-taejin Should i keep this issue open or close it.

@koo-taejin
Copy link
Member

@yjqg6666
Please keep this issue with open state

thanks :)

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 7, 2020

@koo-taejin
I just got a workaround. In the kafka plugin when it consume a record, I tried to save the Pinpoint-SpanID in a ConsumerRecord header as an annotation and generate a new span id for the consumer. In the web when construct a call tree and if the service type of a span is queue(i only check if it's kafka), i restore the span id from the previous-set annotation value. It's working in the call tree and servermap. It's not a good way but hopefully it may have some help.

Hopefully there would be a better official solution for the queue type(one-to-many).

@koo-taejin
Copy link
Member

@yjqg6666

Thank you so much for giving me a great idea.

If it is possible, could you send me your work to PR?
Let me test it and see if there are any something that I have to care about.

thanks :)

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 8, 2020

@koo-taejin I just sent a PR.

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 8, 2020

In the PR, the web side contains some specific code for Kafka-only, it's not a good solution. There should be a better /general solution for the queues.

@koo-taejin
Copy link
Member

@yjqg6666

I let you know progression of this issue.

I am looking to solve problem 1.

The cause seems to be the wrong EndPoint.
Kafka client acts as a server, but it is a client.
Besides, Kafka client kept everything in isolation, making it difficult to obtain this information.
So it has difficult to get the EndPoint.

There seems to be a problem with trying to handle it well without EndPoint, because it was too difficult to get the correct EndPoint with Stably.
I have been trying to dig in Kafka structure to solve it.

I am thinking of getting EndPoint at Kafka Selector.

So, If I find out good solution, then I will commit it.

thanks :)

@koo-taejin
Copy link
Member

@yjqg6666

I expected to have solved issue 1 via #7283

Could you check whether this issue has solved or not?

thanks :)

@yjqg6666
Copy link
Contributor Author

@koo-taejin I am building the master now and will check and update soon.

@yjqg6666
Copy link
Contributor Author

@koo-taejin Checked and have confirmed that the first issue is fixed. Thumb up!

The topic-publish app:

image

The topic-consume app:

image

The topic-publish app(inbound-1 and outbound-2):

image

@koo-taejin
Copy link
Member

@yjqg6666

I am glad to hear that.
I try to be thinking of solve issue 2 from now.

If I have any progression, then I am going to share with you.

thanks :)

@koo-taejin
Copy link
Member

@yjqg6666

Can you check the issue 2 using the master branch?

As a result of analysis, it is expected that the information of callerHost has been changed due to the changed endPoint, so it will come out normally.

If the results come out, please share

thanks :)

@yjqg6666
Copy link
Contributor Author

@yjqg6666

Can you check the issue 2 using the master branch?

As a result of analysis, it is expected that the information of callerHost has been changed due to the changed endPoint, so it will come out normally.

If the results come out, please share

thanks :)

checking.

@yjqg6666
Copy link
Contributor Author

@koo-taejin not working.

It should be:

image

Now still get this one using master:
image

@koo-taejin
Copy link
Member

@yjqg6666
Does the above appear in the main server map instead of the server map in CallStack?
I thought the Main server map would have been solved because it draws the server map base on caller and callee tables.

@koo-taejin
Copy link
Member

@yjqg6666
It is judged that it works well if I make reproducing situation.
Is there any doubt that it is different from the part that I made?

  • servermap in main
    my2

  • servermap in callstack
    my1

@yjqg6666
Copy link
Contributor Author

yjqg6666 commented Sep 23, 2020

@koo-taejin It's fixed in the main server map. Now the issue only exist in the server map for a single transaction and only if the consumer apps both/all call the same other app.
In you case tj-testweb2 and tj-testweb3 should both call tj-testweb4.

@yjqg6666
Copy link
Contributor Author

This is the main server map:
image

@yjqg6666
Copy link
Contributor Author

In your case tj-testweb2 do not call tj-testweb4 and can not see the issue. If called the call num between tj-testweb2/tj-testweb3 and tj-testweb4 should be 2 instead of 1.
image

@koo-taejin
Copy link
Member

@yjqg6666

I have understood that what you mean.
Thanks for your detailed explanation.

@koo-taejin
Copy link
Member

koo-taejin commented Sep 24, 2020

@yjqg6666

I let you know progression of this issue 2.

I have found out reason of this issue and the way I solve this issue.
I am working on finding, if this solution has another problem.
If I am sure that it has not any problem, then I am going to commit it.

ㅁㅁㅁㅁㅁ

thanks :)

@koo-taejin
Copy link
Member

@yjqg6666

I expected to have solved issue 2 via #7290

Could you check whether this issue has solved or not?

thanks :)

@yjqg6666
Copy link
Contributor Author

I expected to have solved issue 2 via #7290
Could you check whether this issue has solved or not?

checking.

@yjqg6666
Copy link
Contributor Author

@koo-taejin Now it works like a charm.

@koo-taejin koo-taejin self-assigned this Sep 25, 2020
@koo-taejin koo-taejin added this to the 2.1.1 milestone Sep 25, 2020
@koo-taejin
Copy link
Member

@yjqg6666
I am glad to hear that.

Please close this issue, If you do not have any problem.

thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants