Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypertrace config is being loaded twice #223

Closed
pavolloffay opened this issue Jan 11, 2021 · 4 comments · Fixed by #293
Closed

Hypertrace config is being loaded twice #223

pavolloffay opened this issue Jan 11, 2021 · 4 comments · Fixed by #293
Labels
area:config bug Something isn't working

Comments

@pavolloffay
Copy link
Member

The config file is being loaded twice. The config should live in the bootstrap classloader and be loaded only once.

~/projects/demo-apps/vertx-examples/web-examples(4.x) » JAVA_TOOL_OPTIONS=-javaagent:/Users/ploffay/Downloads/hypertrace-agent-all.jar HT_SERVICE_NAME=vertx-rest-demo vertx run src/main/java/io/vertx/example/web/rest/SimpleREST.java                        130 ↵ ploffay@Pavols-MBP
Picked up JAVA_TOOL_OPTIONS: -javaagent:/Users/ploffay/Downloads/hypertrace-agent-all.jar
[main] INFO org.hypertrace.agent.core.config.HypertraceConfig - Config loaded: {"serviceName":"vertx-rest-demo","reporting":{"endpoint":"http://localhost:9411/api/v2/spans","opa":{"endpoint":"http://opa.traceableai:8181/","pollPeriodSeconds":30}},"dataCapture":{"httpHeaders":{"request":true,"response":true},"httpBody":{"request":true,"response":true},"rpcMetadata":{"request":true,"response":true},"rpcBody":{"request":true,"response":true},"bodyMaxSizeBytes":131072},"propagationFormats":["TRACECONTEXT"]}
[opentelemetry.auto.trace 2021-01-11 11:07:42:881 +0100] [main] INFO org.hypertrace.agent.core.config.HypertraceConfig - Config loaded: {"serviceName":"vertx-rest-demo","reporting":{"endpoint":"http://localhost:9411/api/v2/spans","opa":{"endpoint":"http://opa.traceableai:8181/","pollPeriodSeconds":30}},"dataCapture":{"httpHeaders":{"request":true,"response":true},"httpBody":{"request":true,"response":true},"rpcMetadata":{"request":true,"response":true},"rpcBody":{"request":true,"response":true},"bodyMaxSizeBytes":131072},"propagationFormats":["TRACECONTEXT"]}
@pavolloffay pavolloffay added bug Something isn't working area:config labels Jan 11, 2021
@pavolloffay pavolloffay mentioned this issue Jan 11, 2021
15 tasks
@ryandens
Copy link
Member

Hey @pavolloffay 👋 If you're looking for someone to take on this issue, I'd be happy to help!

It looks to me like the bug is with the implementation of double-checked locking in the HypertraceConfig.get() method. The implementation here reminded me of an example described in this blog post, The "Double-Checked Locking Is Broken" Declaration.

The example they give in that article is very similar to the HypertraceConfig.get() implementation

// (Still) Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo { 
  private Helper helper = null;
  public Helper getHelper() {
    if (helper == null) 
      synchronized(this) {
        if (helper == null) 
          helper = new Helper();
      }    
    return helper;
    }
  // other functions and members...
  }

The above case is not a valid implementation of double-checking locking because:

...reason it doesn't work it that the writes that initialize the Helper object and the write to the helper field can be done or perceived out of order. Thus, a thread which invokes getHelper() could see a non-null reference to a helper object, but see the default values for fields of the helper object, rather than the values set in the constructor.

Solutions

There's a number of solutions to this problem, I'm not positive which would be the best fit that lines up with the goals for the internal architecture of this project

  1. Avoid this problem by explicitly initializing the AgentConfig once and use dependency injection to provide it to the previous callers of HypertraceConfig.get(). This is probably the most "noisy" solution, and maybe not desirable depending on your long-term goals for the project
  2. Create a separate class called HypertraceConfigSingleton modeled off of the HelperSingleton from the double-checking locking blog
  3. Leverage the volatile keyword. Given the read-heavy nature of config objects, making the static AgentConfig instance a volatile field is probably not a great solution. However, we can leverage a trick to make the static field behave as volatile by updating a volatile boolean field whenever we write the static field as described here
  4. Leverage an existing implementation of double-checked locking, like Guava's NonSerializableMemoizingSupplier. Note that guava takes the approach described in solution 3 where a volatile boolean field is updated after the non-volatile value is written to. This guarantees that the readers and writers agree on whether or not the value is null. I noticed that guava is used in one other place in javaagent-core and it's the VisibleForTesting annotation which is only retained at the class-level by the compiler but not by the VM at runtime. Using Guava in the fashion I'm describing here would require making this class available at runtime, which would increase the size of the shaded JAR produced by this project.

Based on my understanding of the solutions and this project's constraints, I think the best solution would be to improve the double-checking locking implementation with the volatile keyword as described in solution 3, but curious to hear your thoughts! If you're not looking for contributions for this issue right now, I hope the analysis is helpful to whoever takes this on in the future!

@ryandens
Copy link
Member

I was curious to learn more about the internals of the Hypertrace javaagent, so I forked this repository and tried out a few of the approaches. I'll hold off on opening a PR until if/when a consensus is reached on the desired solution, but if it's helpful I have a branch leveraging the volatile approach described in solution 3: sample diff

@pavolloffay
Copy link
Member Author

pavolloffay commented Mar 22, 2021

In our case, I think the main issue was that the config object was loaded in app/system classloader, then classes were added into bootstrap classloader and config was loaded again from agent classloader.

The issue seems to be fixed now, at least I am not able to reproduce it again

JAVA_TOOL_OPTIONS=-javaagent:/Users/ploffay/Downloads/hypertrace-agent-all.jar HT_SERVICE_NAME=vertx-rest-demo vertx run src/main/java/io/vertx/example/web/rest/SimpleREST.java                              ploffay@Pavols-MBP
Picked up JAVA_TOOL_OPTIONS: -javaagent:/Users/ploffay/Downloads/hypertrace-agent-all.jar
[opentelemetry.auto.trace 2021-03-22 08:23:14:630 +0100] [main] INFO org.hypertrace.agent.core.config.HypertraceConfig - Config loaded: {"serviceName":"vertx-rest-demo","reporting":{"endpoint":"http://localhost:9411/api/v2/spans","opa":{"endpoint":"http://opa.traceableai:8181/","pollPeriodSeconds":30}},"dataCapture":{"httpHeaders":{"request":true,"response":true},"httpBody":{"request":true,"response":true},"rpcMetadata":{"request":true,"response":true},"rpcBody":{"request":true,"response":true},"bodyMaxSizeBytes":131072},"propagationFormats":["TRACECONTEXT"],"enabled":true,"javaagent":{}}
[opentelemetry.auto.trace 2021-03-22 08:23:14:650 +0100] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 0.10.3
[opentelemetry.auto.trace 2021-03-22 08:23:14:682 +0100] [main] WARN org.hypertrace.agent.otel.extensions.HypertraceResourceProvider - Failed to read container id, cgroups file does not exist: /proc/self/cgroup (No such file or directory)
[opentelemetry.auto.trace 2021-03-22 08:23:14:862 +0100] [main] WARN org.hypertrace.agent.otel.extensions.HypertraceResourceProvider - Failed to read container id, cgroups file does not exist: /proc/self/cgroup (No such file or directory)
Hypertrace agent started, version: 0.10.3
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/Users/ploffay/.sdkman/candidates/vertx/3.6.0/lib/groovy-all-2.4.15.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
warning: Implicitly compiled files were not subject to annotation processing.
  Use -proc:none to disable annotation processing or -implicit to specify a policy for implicit compilation.
Succeeded in deploying verticle
^C[opentelemetry.auto.trace 2021-03-22 08:23:38:106 +0100] [opentelemetry-exec-shutdown-hook] WARN io.opentelemetry.javaagent.tooling.CommonTaskExecutor - Periodic task scheduler is shutdown. Will not run: cleaner for {}

But I agree that we should use volatile. Would you like to submit a PR?

@ryandens
Copy link
Member

I also tried to reproduce with the Vertx sample app and wasn't able to, I wasn't sure if it was just a rare race condition or if it was architecture dependent. Either way, good chance that this was fixed upstream somewhere! Happy to open a PR for the small thread-safety issue 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants