Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used BatchSpansProcessor instead of SimpleSpansProcessor #393

Merged
merged 12 commits into from
May 16, 2020

Conversation

RashmiRam
Copy link
Contributor

BatchSpansProcessor won't get blocked by the exporter.
Fixes #368

@trask This is the quick fix. Are we working on introducing the disruptor in the future? If that's the case, Can this just be a stop gap for now?

BatchSpansProcessor won't get blocked by the exporter.

Fixes open-telemetry#368

Signed-off-by: RashmiRam <rashmi.ramanathan@freshworks.com>
@RashmiRam
Copy link
Contributor Author

I signed it

@johnbley
Copy link
Member

johnbley commented May 8, 2020

Hi Rashmi -
Thanks for the patch! It looks like this is causing the smoke-tests to fail (you can click through the "details" to see the build logs) . You can reproduce this locally by running (e.g.) ./gradlew :smoke-tests:springboot:test. In SpringBootSmokeTest.groovy we use a logging exporter and a simple log reader to check that the spans get logged. Perhaps there's a clean way to explicitly forceFlush the batch or to special-case the logging processor to not be batched?

@RashmiRam
Copy link
Contributor Author

I was trying to run failure test alone as entire instrumentation test takes hours to run. I'll check that. Thanks for the help @johnbley

@RashmiRam
Copy link
Contributor Author

I fixed the springboot smoke tests. Now, I see random failures in instrumentation tests(cassandra, akka-hhtp) which are passing in my local. Any help would be much appreciated.

@iNikem
Copy link
Contributor

iNikem commented May 8, 2020

Yeah, our tests are quite flaky right now. We are working on it :) @trask can you please rerun the pipeline?

@iNikem
Copy link
Contributor

iNikem commented May 9, 2020

@RashmiRam I see that check job has failed in CI with code formatting violation. Can you please run ./gradlew googleJavaFormat as suggested by CONTRIBUTING.md?

@RashmiRam
Copy link
Contributor Author

RashmiRam commented May 9, 2020

I did @iNikem and ran ./gradlew auto-tooling:verifyGoogleJavaFormat to find any pending violations. It is passing. I am not sure why the tests in CI are failing

@iNikem
Copy link
Contributor

iNikem commented May 9, 2020

I have checked out your repo and tried to run ./gradlew auto-tooling:verifyGoogleJavaFormat:

The following files are not formatted properly:

/Users/nikem/Documents/workspace/mtp/opentelemetry-auto-instr-java/agent-tooling/src/main/java/io/opentelemetry/auto/tooling/TracerInstaller.java

Did you commit your changes?

@RashmiRam
Copy link
Contributor Author

Yes. I did This is the commit. c4383ba

@johnbley
Copy link
Member

Yes. I did This is the commit. c4383ba

I've seen googleJavaFormat's cache get confused at times; try adding some whitespace, saving, and then re-running it.

@RashmiRam
Copy link
Contributor Author

That helped @johnbley check job is passing now in CircleCI. Thanks a ton.

@@ -70,6 +70,11 @@ abstract class AbstractSmokeTest extends Specification {
processBuilder.environment().put("JAVA_HOME", System.getProperty("java.home"))
processBuilder.environment().put("DD_API_KEY", API_KEY)

// // Setting configuration variables of batch span processor through env vars
// // This config is to immediately flush a batch of 1 span with delay of 10ms
processBuilder.environment().put("OTEL_BSP_MAX_EXPORT_BATCH", "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution a lot. Much cleaner than anything I had suggested.

@trask
Copy link
Member

trask commented May 11, 2020

@RashmiRam
Copy link
Contributor Author

@trask I have updated the timeout in both play & springbok smoke tests to 20s as 15s fails at times even in my local.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RashmiRam!

@trask
Copy link
Member

trask commented May 16, 2020

Hey @RashmiRam, sorry we haven't merged this yet. The play smoke test is still failing in CI even after I bumped the timeout to 30 seconds. I'm looking into it now.

@trask
Copy link
Member

trask commented May 16, 2020

The problem is that our smoke tests use the logging exporter and parse the log file which is not reliable under concurrent logging, which can happen now with the background BatchSpanProcessor thread.

Here's an example of the problem, where LOGGED_SPAN didn't end up on it's own line:

[opentelemetry.auto.trace 2020-05-14 22:26:38:428 +0000] [application-akka.actor.default-dispatcher-4] DEBUG io.opentelemetry.auto.bootstrap.instrumentation.java.concurrent.State - Failed to set parent span because another parent span is already set io.opentelemetry.auto.bootstrap.instrumentation.java.concurrent.State@448e2484: new: io.opentelemetry.sdk.trace.RecordEventsReadableSpan@3dc393cf, old: io.opentelemetry.sdk.trace.RecordEventsReadableSpan@3dc393cfLOGGED_SPAN GET /welcome 8c4cd5a61ad2794e

The real solution here is to improve our test harness, I made a note here: #298 (comment)

For now though, the debug logging Failed to set parent span that is causing the interference is unnecessarily verbose, so I fixed it. I had fixed it in #318, but I had reverted my change in #362 because of DataDog/dd-trace-java#1403 (review). I think re-applying #318 is good anyways, and it resolves this issue for now, until we improve the test harness.

@trask trask merged commit 29a18bd into open-telemetry:master May 16, 2020
@RashmiRam
Copy link
Contributor Author

RashmiRam commented May 17, 2020

Thank you so much for the detailed explanation on the test failures @trask and Thanks for the merge too. It would be great if you let me know about the release date for this too.

@trask
Copy link
Member

trask commented May 17, 2020

Releasing v0.3.0 tomorrow (Mon, May 18) 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't use synchronous span processor
4 participants