Skip to content

Java Instrumentation Techniques

Adrian Cole edited this page Sep 28, 2019 · 6 revisions

Most users of tracing systems won't code their own instrumentation, rather employ frameworks that are already traced, or libraries that can configure tracing. This is an advanced guide on how to code for various practices employed including application frameworks. This document includes span lifecycle (timing and data collection) and scoping (making the current span visible to downstream code, or across threads) concerns highlighted by nuance of Java or practice of it.

Those using Brave instrumentation will inherit practices mentioned here. However, there are other instrumentation libraries and Brave itself needs maintenance. We've written this document to reduce reliance on individuals' experience, and to help others who want to become instrumentation authors.

This doc is never complete. Click Watch for updates! If you like this doc, star our repository as volunteers behind Zipkin love seeing stars.

Tracing multiple versions of an interface with one instrumentation class

Libraries that use semantic versioning may add functionality over time. This gets tricky for interfaces defined before Java 8, as method additions can break compilation. One might be tempted to make an instrumentation library for each deviation, but that adds a lot of packaging complexity. You might also be tempted to use implicit instrumentation instead. This may be the right choice, but first consider just not using the Override annotation. Here's the idea in practice.

For example, Kafka added a method in a later version. You might thing to just override that to create a tracing decorator.

final class TracingConsumer<K, V> implements Consumer<K, V> {
 --snip--

  // This method existed for a long time
  @Override
  public ConsumerRecords<K, V> poll(long timeout) {
 --snip--

  // This method was recently added
  @Override
  public void subscribe(Pattern pattern) {

While your instrumentation project might compile, compilation against a lower version will fail, even if you don't use that method.

instrumentation-kafka-clients: Compilation failure
[ERROR] /Users/acole/oss/brave/instrumentation/kafka-clients/src/main/java/brave/kafka/clients/TracingConsumer.java:[32,7] brave.kafka.clients.TracingConsumer is not abstract and does not override abstract method subscribe(java.util.regex.Pattern) in org.apache.kafka.clients.consumer.Consumer

The trick here is to write instrumentation against the latest version of the library, but don't add Override annotations to methods added later. This will allow things to compile and also work against lower versions of the library (who will simply ignore that "extra" method).

final class TracingConsumer<K, V> implements Consumer<K, V> {
 --snip--

  @Override
  public ConsumerRecords<K, V> poll(long timeout) {
 --snip--

  // Do not use @Override annotation to avoid compatibility issue version < 1.0
  public void subscribe(Pattern pattern) {

When to use a forwarding type vs a proxy?

A forwarding type seems a great way to keep strong types and avoid the expense of reflective calls. For example, a tracing decorator forwards to a delegate, and can be substituted for it in call sites.

Implementing a completion handler explicitly might look like this:

final class SpanCompletionListener implements CompletionListener {
 --snip--

  @Override
  public void onCompletion(Message message) {
    try (Scope ws = current.maybeScope(span.context())) {
      delegate.onCompletion(message);
    } catch (RuntimeException | Error e) {
      span.error(e); // always catch unhandled errors
      throw e;
    } finally {
      span.finish();
    }
  }
}

The following complicates this:

  • When libraries you need to intercept employ multiple inheritance.
    • Ex JMS implementations routinely implement several interfaces on a single type. Unless you also implement all types you can mask functionality.
  • Call sites that accept a general type, but also handle specific concrete types.
    • Ex Spring's BeanPostProcessor allows code to modify or replace a provided Bean. Some call sites expect the bean to be a concrete type and result in a ClassCastException if not. One example is DataSourceJmxConfiguration, which special-cases Hikari's DataSource.
  • Supporting multiple incompatible major versions
    • It can seem tempting to address the same code with a single type. However, if a type changes incompatibly due to a major version change, explicit trace handlers may need to be in separate artifacts.
  • Libraries that break API routinely
    • Some libraries don't make API compatibility guarantees or break those guarantees. For example, a java interface could have a change that removes adds or renames a method or protected field in a minor or even patch release.

An indirect approach is to use proxies, AOP or bytecode manipulation to achieve the goal. Explaining the pros and cons between these approaches is better for a different document. Here's an example of using AOP for the same operation as above:

final class SpanCompletionMethodInterceptor implements MethodInterceptor {
 --snip--

  @Override
  public Object invoke(MethodInvocation methodInvocation) throws Throwable {
    try (Scope ws = current.maybeScope(span.context())) {
      return methodInvocation.proceed();
    } catch (Throwable t) {
      span.error(t); // always catch unhandled errors
      throw t;
    } finally {
      span.finish();
    }
  }
}

The indirect approach is by definition more insulated from library code, and has some general complications:

  • Usually implies adding a library (such as AspectJ) or changing the bootstrap (like adding an bytecode manipulation agent)
    • A typical instrumentation goal is to add the least extra dependencies to an app as possible. Also some AOP libraries can conflict with user code.
  • While possible to use reflection as opposed to AOP or a bytecode manipulation agent, often overhead is a problem.
    • Reflective calls are inherently more expensive, and efficient use of reflection is more advanced.
  • Indirect approaches can often "miss" silently, especially when applied to libraries that change.
    • Indirect code is often applied at explicit join points. This implies maintenance to ensure these join points are not invalidated by library changes in the subject.
  • Indirect approaches can be harder to understand, and dismissed as magic.
    • IDE mechanisms such as reference searching often don't work with tools like method interceptors. Also, stack traces can be less intuitive as call stacks for some styles of indirect instrumentation can be complex.

With the above in mind, there are still certain patterns or call sites which lead you towards one approach or another. The below inventory some considerations which might steer you towards a solution.

Tracing Multiple Inheritance

If a type you need to wrap has multiple inheritance, it can still be ok to have a multiple inheritance wrapper. This can work in JMS for example, as there are a limited number of combinations. The tracing wrapper will need to do some extra checking at runtime. For example, if your tracing wrapper implements queue and topic connection factory, you can forward calls to anything that's either types or both. However, you should check at runtime whether or not the delegate can accept the forwarded methods. When they can't, you might throw an IllegalStateException.

Tracing Concrete Types

If a type, such as a DataSource, needs to be traced, and needs to match the same type as the delegate, there are a few ways: bytecode manipulation, AOP and dynamic proxies. For example, you can make a tracing HikariDataSource with an interceptor using Spring's ProxyFactory to wrap it. This approach is used in https://github.com/ttddyy/datasource-proxy

Tracing Unstable Libraries

Libraries that break API can be risky to trace. For example, they might expose a forwarding helper, but rename a protected field supplied for that helper (even for good reason). If you used that field in your tracing decorator, it would then be pinned to the last version the field had the name mentioned. Sometimes libraries do this by mistake and will revert things like field renames. Sometimes these are intentional, correcting typos or similar. It is important to let projects know when their api drift impacts you as sometimes they aren't aware. From an instrumentation code POV, the impact can range from reduced use of library helpers to avoiding the helpers completely for other approaches such as proxies.

Source compatibility hazards when methods accept functional interfaces

When creating instrumentation code, it is important to be aware of source compatibility when incrementally adding functionality. This issue can be complex with methods that accept functional interfaces.

For example, consider this functional interface:

public interface ProbabilityOfMethod<M> {
  /** Returns null if there's no configured sample probability of this method */
  @Nullable Float get(M method);
}

You design a sampler that uses this, and accepts it in a factory method.

public static <M> DeclarativeSampler<M> create(ProbabilityOfMethod<M> probabilityOfMethod) {

Then, someone adds an annotation that uses it.

@Retention(RetentionPolicy.RUNTIME) public @interface Traced {
  float sampleProbability() default 1.0f;
}

Neat, as they can implement that sampler with a method reference like this:

DeclarativeSampler<Traced> declarativeSampler = DeclarativeSampler.create(Traced::sampleProbability);

Later, they notice they want to add a condition to it. They change that call site after updating the annotation, implementing with a lambda instead:

DeclarativeSampler<Traced> declarativeSampler =
    DeclarativeSampler.create(t -> t.enabled() ? t.sampleProbability() : null);

Now, lets say the instrumentation authors want to support a different function, for example an integer based rate. They add a similar interface to the probability one:

public interface ProbabilityOfMethod<M> {
  /** Returns null if there's no configured sample probability of this method */
  @Nullable Float get(M method);
}

public interface RateOfMethod<M> {
  /** Returns null if there's no configured sample rate (in traces per second) of this method */
  @Nullable Integer get(M method);
}

At first thought, since these are different types, it may seem ok to just overload the factory method like so:

public static <M> DeclarativeSampler<M> create(ProbabilityOfMethod<M> probabilityOfMethod) {
--snip--
public static <M> DeclarativeSampler<M> create(RateOfMethod<M> rateOfMethod) {

Everything would compile fine in the instrumentation codebase, api compat trackers show no problem, and probably this would get to a release. Sometime later, though, the end user would notice their code doesn't compile anymore. The "if statement" below creates a situation the compiler can't resolve.

DeclarativeSampler<Traced> declarativeSampler =
    DeclarativeSampler.create(t -> t.enabled() ? t.sampleProbability() : null);

So, regardless of API compatibility reports, the user experiences a source compatibility break due to the constraints of compile-time checking.

How to prevent this

When designing APIs, think of functional interfaces like you do generic collections, even if the compiler won't help you. Basically, you need to differentiate on method name. This means preferring against using them in factory methods unless certain there will not be a different choice later. The most defensive code would be to use method names that describe the functional interface.

In the above example, instead of:

public static <M> Builder<M> create(ProbabilityOfMethod<M> probabilityOfMethod) {

Use a builder and name the parameter:

public Builder probabilityOfMethod(ProbabilityOfMethod<M> probabilityOfMethod) {

This will make such a source compatibility problem improbable.

Clone this wiki locally