Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OTel bridge behavior with multiple classloaders + improved activation #2735

Merged
merged 22 commits into from
Sep 13, 2022

Conversation

SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Aug 8, 2022

Summary

When there are multiple external plugins using OTel API, or there are multiple OTel APIs that are instrumented in distinct classloaders (CL), each copy is instrumented within an isolated CL, which makes all those calls to OTel APIs unable to properly work together in the OTel bridge as they can't rely on type equality due to having isolated CLs.

For example, as it was reported in the forum, creating a transaction with an external plugin and creating spans with another external plugin would make using both not work as expected and unable to properly create child spans when the transaction is active.

On top of that, the "span upgrade" process that is currently used when wrapping Elastic spans to make them visible to OTel is flawed:

  • The active Elastic span is deactivated, then wrapped for OTel, wrapped is then activated in place
  • as a result the original span activation is not preserved and may lead to activation leaks: the original span is no longer active which is the opposite of what the plugin that created it might expect, and the wrapped span won't be deactivated.

Solution proposal

  • Make Elastic spans (or rather implementations of ElasticContext) able to store wrapped objects with a Map<Class<?>,ElasticContet<?>> with wrapped object type as the key, and wrapped object as value.
  • Make the "upgrade process" only create a wrapper when none is registered, this should preserve the Elastic span activation workflow.
  • For allocation, we expect at most one entry per copy of the OTel API, which would be at least the number of external plugins so only a few entries in most cases.

Checklist

  • validate implementation manually with test application
  • adding integration tests to cover this properly (might be optional depending on complexity/cost), ideally testing that all OTel APIs are instrumented and allow to see each-other spans transparently.
  • This is a bugfix

@apmmachine
Copy link
Contributor

apmmachine commented Aug 8, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-13T08:02:26.140+0000

  • Duration: 57 min 23 sec

Test stats 🧪

Test Results
Failed 0
Passed 3151
Skipped 36
Total 3187

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@SylvainJuge SylvainJuge added the bug Bugs label Aug 29, 2022
@SylvainJuge SylvainJuge self-assigned this Aug 29, 2022
@SylvainJuge SylvainJuge changed the title [debug] investigate otel plugin issue Fix OTel bridge behavior with multiple classloaders + improved activation Aug 29, 2022
@SylvainJuge SylvainJuge added the ci:agent-integration Enables agent integration tests in build pipeline label Aug 30, 2022
List<String> logLines = Arrays.asList(app.getLogs().split("\n"));
assertThat(logLines).isNotEmpty();

assertThat(logLines).containsExactly(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for review] this is a very simple way to validate proper plugin execution by relying on the fact that instrumentation plugin will print to output, proper parsing of the generated event could added later with a mock server.

@SylvainJuge SylvainJuge marked this pull request as ready for review September 9, 2022 11:42
@github-actions
Copy link

github-actions bot commented Sep 9, 2022

/test

Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if one OTel plugin wants to attach a context that was created by another plugin?
I think the way co.elastic.apm.agent.opentelemetry.context.OTelContextStorage#attach is implemented doesn't support that atm.

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for walking me through it @SylvainJuge 🙏

@SylvainJuge SylvainJuge enabled auto-merge (squash) September 13, 2022 08:02
@SylvainJuge SylvainJuge merged commit d9370ea into elastic:main Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java bug Bugs ci:agent-integration Enables agent integration tests in build pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants