Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertx 4.1 & Tracing rework #900

Merged
merged 16 commits into from
Jun 4, 2021

Conversation

slinkydeveloper
Copy link
Contributor

@slinkydeveloper slinkydeveloper commented May 17, 2021

Fixes #784

We're still missing eclipse-vertx/vertx-tracing#28, which is blocking us from using the new OpenTelemetry integration. We might also get rid of even more code if eclipse-vertx/vert.x#3903 gets resolved before 4.1

Proposed Changes

  • 🎁 Vert.x 4.1
  • 🧹 Removed our opentel integration and replaced with the brand new one provided by vert.x
  • 🎁 Added a generic tracing test to check if tracecontext is propagated properly, both in ordered and unordered case

Release Note

Bumped vert.x to 4.1

@slinkydeveloper slinkydeveloper requested review from a team as code owners May 17, 2021 10:17
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label May 17, 2021
@knative-prow-robot knative-prow-robot added area/data-plane needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 17, 2021
@slinkydeveloper slinkydeveloper requested a review from a team as a code owner May 18, 2021 12:55
@knative-prow-robot knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 18, 2021
@slinkydeveloper slinkydeveloper changed the title [WIP] Vertx 4.1 Vertx 4.1 May 24, 2021
@knative-prow-robot knative-prow-robot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 24, 2021
@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #900 (f18c47c) into main (add346e) will decrease coverage by 0.24%.
The diff coverage is 69.44%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #900      +/-   ##
============================================
- Coverage     76.07%   75.82%   -0.25%     
+ Complexity      441      419      -22     
============================================
  Files            78       74       -4     
  Lines          3013     2924      -89     
  Branches        133      124       -9     
============================================
- Hits           2292     2217      -75     
+ Misses          559      555       -4     
+ Partials        162      152      -10     
Flag Coverage Δ
java-unittests 79.79% <69.44%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...knative/eventing/kafka/broker/dispatcher/Main.java 0.00% <0.00%> (ø)
...v/knative/eventing/kafka/broker/receiver/Main.java 0.00% <0.00%> (ø)
...ve/eventing/kafka/broker/core/tracing/Tracing.java 60.46% <52.17%> (+1.09%) ⬆️
...venting/kafka/broker/core/tracing/TracingSpan.java 90.90% <83.33%> (+18.68%) ⬆️
...oker/receiver/CloudEventRequestToRecordMapper.java 70.58% <83.33%> (+9.47%) ⬆️
...ting/kafka/broker/dispatcher/RecordDispatcher.java 93.33% <91.66%> (-0.34%) ⬇️
...ve/eventing/kafka/broker/core/metrics/Metrics.java 90.32% <100.00%> (+0.66%) ⬆️
...nting/kafka/broker/core/tracing/TracingConfig.java 86.95% <100.00%> (ø)
...oker/dispatcher/http/HttpConsumerRecordSender.java 60.00% <100.00%> (ø)
...r/dispatcher/http/HttpConsumerVerticleFactory.java 86.00% <100.00%> (+0.89%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add346e...f18c47c. Read the comment docs.

@slinkydeveloper
Copy link
Contributor Author

I'm gonna test locally, with the new amazing dev scripts 😍 , if tracing works properly

@slinkydeveloper
Copy link
Contributor Author

/hold

Tracing still doesn't work 😢

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 25, 2021
@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2021
@slinkydeveloper
Copy link
Contributor Author

Tracing fixed!

From up to bottom, traces matches the tests TracingHeadersUsingUnorderedDeliveryWithMultipleTriggers, TracingHeadersUsingUnorderedDelivery and TracingHeadersUsingOrderedDelivery

Screenshot_2021-05-26 Zipkin

Probably we should investigate why in the unordered case, the span is so long (sometimes it's not!), but I guess we can tackle it in a separate issue? I have the feeling this has something to do with the offset commit...

Fix #929

cc @pierDipi @devguyio @matzew

@slinkydeveloper
Copy link
Contributor Author

We still need to wait for the new vert.x 4.1 release before merging this one, so i keep the hold

@slinkydeveloper slinkydeveloper changed the title Vertx 4.1 Vertx 4.1 & Tracing rework May 26, 2021
@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented May 26, 2021

Still some spans, sometimes, does not appear in zipkin in some traces, but that includes also spans from source as well... I guess that's my zipkin env? Or some race?

Screenshot_2021-05-26 Zipkin(1)
Screenshot_2021-05-26 Zipkin(2)

Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

/lgtm
/hold for the question, unhold if that's not possible.

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
A new simple test for tracing
Updated THIRD-PARTY.txt

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
s
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
@slinkydeveloper
Copy link
Contributor Author

/unhold

Rebased, this is ready for merging. @matzew @pierDipi wanna check it out?

I plan to do a followup to further improve the tracing code with some refactoring/cleanup, add b3 headers support, but I want to keep it in another PR to avoid making this one way too big.

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2021
@slinkydeveloper slinkydeveloper requested review from pierDipi and removed request for fallback-notification-blocker June 3, 2021 12:23
@@ -52,19 +50,18 @@ public CloudEventRequestToRecordMapper(final Vertx vertx) {
}

if (logger.isDebugEnabled()) {
final var span = TracingSpan.getCurrent(vertx);
final var span = Span.fromContextOrNull(Context.current());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What current means here? Shouldn't we pull the span from the Vertx context?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a utility to get the current span? The idea being isolating OTEL code in one module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What current means here? Shouldn't we pull the span from the Vertx context?

Nope, I've integrated the vert.x context with the otel context apis https://github.com/eclipse-vertx/vertx-tracing/blob/master/vertx-opentelemetry/src/main/java/io/vertx/tracing/opentelemetry/VertxContextStorageProvider.java, so to retrieve spans we don't need to access the vert.x context manually.

Can we use a utility to get the current span? The idea being isolating OTEL code in one module.

TBH I think we don't need to hide the otel apis, they are straightforward and using them directly sounds reasonable to me...


final var span = getCurrent(vertx);
public static void decorateCurrentWithEvent(final CloudEvent event) {
Span span = Span.fromContextOrNull(Context.current());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment on lines +171 to +177
getOffsetManager(deliveryOrder, consumer, eventsSentCounter::increment),
ConsumerTracer.create(
((VertxInternal) vertx).tracer(),
new KafkaClientOptions()
.setConfig(consumerConfigs)
.setTracingPolicy(TracingPolicy.PROPAGATE)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we need this tracer, vertx Kafka already creates a tracer when we create a Kafka client at line 200, and it does a similar thing of https://github.com/knative-sandbox/eventing-kafka-broker/pull/900/files#diff-384d435208823cfbd29027fa5bba51ebbc6c3671ffecb12f0af13405ed9c11b1R138.
Can you add a comment on why we set tracing policy to ignore when we create a consumer and we create our own tracer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because it doesn't wait for our handler to finish before finishing the span and our handler is async?

Copy link
Contributor Author

@slinkydeveloper slinkydeveloper Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on why we set tracing policy to ignore when we create a consumer and we create our own tracer?

I was planning to clean up a bit that code in the next PR and explain why we do that and also "hide" that detail a bit, maybe behind an helper. In short, the reason is both the one you sad, which makes spans looks weird and sometimes even no-sense, and this one: #929

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried, there is not much I can hide, it will look weird anyway... I'll check out what can be done on vertx-kafka-client side to make this look better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neverthless, I've made a PR to add some comments: #964

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2021
@knative-prow-robot knative-prow-robot merged commit 9b66aaa into knative-extensions:main Jun 4, 2021
@slinkydeveloper slinkydeveloper deleted the vertx_4_1 branch June 4, 2021 08:22
Leo6Leo pushed a commit to Leo6Leo/eventing-kafka-broker that referenced this pull request Nov 29, 2023
* Update reconciler-test dependency

* Increase poll.timeout from 4 to 8 minutes

Updating the dependency on reconciler-test brought some changes which
decreased the default timeout to 2 minutes. However, in CI sometimes the
timeout needs to be longer to properly delete resources after test.
Setting this explicitly to 8 minutes.

* Run make generate-release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/data-plane area/test cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vert.x 4.1 update
4 participants