Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-15156 Avoid to spam DEBUG log when OpenTelemetry is not found on the client #11272

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

fax4ever
Copy link
Contributor

@fax4ever fax4ever commented Sep 7, 2023

@fax4ever fax4ever requested a review from pruivo September 7, 2023 09:32
@fax4ever fax4ever added the 14.0.x Annotate a PR with this label if you want it to be backported to the 14.0.x branch label Sep 7, 2023
@pruivo
Copy link
Member

pruivo commented Sep 7, 2023

I would probably move all the checks to TelemetryService and cache the instance to avoid instantiating a new one every time. And no more catches or null checks in the code.
Note: create() can be removed since the instance can be accessed directly.

public interface TelemetryService {

   TelemetryService INSTANCE = createInstance();

   /**
    * Try to create a {@link TelemetryService} instance.
    *
    * @throws ClassCastException if the OpenTelemetry API module is not in the classpath
    * @return a {@link TelemetryService} instance
    */
   static TelemetryService create() {
      return INSTANCE;
   }

   void injectSpanContext(HeaderParams header);

   private static TelemetryService createInstance() {
      try {
         Class.forName("io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator", false,
               TelemetryService.class.getClassLoader());
         return new TelemetryServiceImpl();
      } catch (ClassNotFoundException e) {
         // log warning that opentelemetry is not in the classpath
      }
      // return no-op implementation
      return header -> {
         //no-op
      };
   }

}

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 7, 2023

I would probably move all the checks to TelemetryService and cache the instance to avoid instantiating a new one every time. And no more catches or null checks in the code. Note: create() can be removed since the instance can be accessed directly.

public interface TelemetryService {

   TelemetryService INSTANCE = createInstance();

   /**
    * Try to create a {@link TelemetryService} instance.
    *
    * @throws ClassCastException if the OpenTelemetry API module is not in the classpath
    * @return a {@link TelemetryService} instance
    */
   static TelemetryService create() {
      return INSTANCE;
   }

   void injectSpanContext(HeaderParams header);

   private static TelemetryService createInstance() {
      try {
         Class.forName("io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator", false,
               TelemetryService.class.getClassLoader());
         return new TelemetryServiceImpl();
      } catch (ClassNotFoundException e) {
         // log warning that opentelemetry is not in the classpath
      }
      // return no-op implementation
      return header -> {
         //no-op
      };
   }

}

I don't know, since there is the case of cfg.tracingPropagationEnabled() == false.
In this case I think we can avoid to create it...

Copy link
Member

@wburns wburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some comments on the levels.

@@ -387,9 +387,9 @@ public interface Log extends BasicLogger {
@Message(value = "Native IOUring transport not available, using NIO instead: %s", id = 4108)
void ioUringNotAvailable(String cause);

@LogMessage(level = DEBUG)
@LogMessage(level = TRACE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEBUG for this is okay now that it is only logged once and without a stack trace.

@@ -399,4 +399,8 @@ public interface Log extends BasicLogger {
@Message(value = "OpenTelemetry API is present in the classpath, but the tracing propagation is not enabled. Client context tracing will not be propagated.", id = 4111)
void openTelemetryPropagationDisabled();

@LogMessage(level = DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be ERROR now? If this exception is encountered the user has open telemetry on the classpath and has explicitly configured trace propagation.

Copy link
Contributor Author

@fax4ever fax4ever Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right this error shouldn't happen, since we've already checked that the OpenTelemetry is present on the classpath.
It looks like a very runtime exception...

Looking at the code that is executed by the current OpenTelemetry version, it should be safe to call W3CTraceContextPropagator.getInstance().

Anyway we don't have control over this library, that's the reason I introduced the check.
In general implementing tracing I usually write code that is usually more defensive, since in any case tracing should not affect the main behavior of the application.

Anyway it is OK for me to remove this check.

Copy link
Member

@wburns wburns Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I am not saying to remove it. Whether we let it propagate or log the exception is up to you, you know more issues that may cause this to happen. I was only proposing it be higher priority log level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense

@@ -399,4 +399,8 @@ public interface Log extends BasicLogger {
@Message(value = "OpenTelemetry API is present in the classpath, but the tracing propagation is not enabled. Client context tracing will not be propagated.", id = 4111)
void openTelemetryPropagationDisabled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and openTelemetryPropagationEnabled may want to be TRACE since they are logged for every cache that is used when telemetry is on the classpath. DEBUG may be fine, just want to make sure this spot is considered for evaluation.

@pruivo
Copy link
Member

pruivo commented Sep 7, 2023

I don't know, since there is the case of cfg.tracingPropagationEnabled() == false. In this case I think we can avoid to create it...

What is your concern exactly?
You replace the TelemetryService field being allocated per operation (every Hot Rod operation has a field that is not required anymore) with a single lambda instantiation at startup.

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 7, 2023

I don't know, since there is the case of cfg.tracingPropagationEnabled() == false. In this case I think we can avoid to create it...

What is your concern exactly? You replace the TelemetryService field being allocated per operation (every Hot Rod operation has a field that is not required anymore) with a single lambda instantiation at startup.

I wanted to make the TelemetryService creation lazy. But since this instance is very simple, we can have it eager as well.

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 7, 2023

Thank you @pruivo and @wburns for the review.
I tried to apply your ideas here.
Let me know if it is fine.

@fax4ever
Copy link
Contributor Author

Hi @pruivo @wburns,

anything else we can improve/fix here?
Thanks

Copy link
Member

@pruivo pruivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Personally, if we trying to avoid spam debug messages, I would just remove them instead of changing them to trace. Just my 2cents.

@fax4ever
Copy link
Contributor Author

LGTM. Personally, if we trying to avoid spam debug messages, I would just remove them instead of changing them to trace. Just my 2cents.

You're right. We can expect we don't have the OpenTelemetry library on the client VM most of the time. With the last push I set the last debug log to trace.

When OpenTelemetry is not found on the client
Copy link
Member

@wburns wburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one last question though. Is there a reason the client/hotrod-client module logs if the telemetry API is present or not and client/hotrod doesn't? It looks like there is no way to disable or enable it per cache with the new module?

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 13, 2023

Thank for the review @wburns

Is there a reason the client/hotrod-client module logs if the telemetry API is present or not and client/hotrod doesn't?

I'm not really sure that the latter doesn't.

It looks like there is no way to disable or enable it per cache with the new module?

Enabling / disabling tracing on cache basis is part of Tracing 2.0 targeting Infinispan 15. I'm finishing these days https://github.com/fax4ever/infinispan/tree/ISPN-14974.

@wburns
Copy link
Member

wburns commented Sep 13, 2023

I'm not really sure that the latter doesn't.

I was taking that from the diff in that the following code only exists in client/hotrod-client module.

      if (cfg.tracingPropagationEnabled()) {
         telemetryService = TelemetryService.INSTANCE;
         log.openTelemetryPropagationEnabled();
      } else {
         log.openTelemetryPropagationDisabled();
      }

Enabling / disabling tracing on cache basis is part of Tracing 2.0 targeting Infinispan 15. I'm finishing these days https://github.com/fax4ever/infinispan/tree/ISPN-14974.

Sorry, didn't realize it was a global configuration not per cache.

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 13, 2023

On the client/hotrod in the init of CacheOperationsFactory I think we have:

TelemetryService telemetryService = TelemetryService.INSTANCE;

the first access to the interface should initialize the singleton if i don't get wrong:

TelemetryService INSTANCE = createInstance();

static TelemetryService createInstance() {
   try {
      Class.forName("io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator", false,
            TelemetryService.class.getClassLoader());
   } catch (ClassNotFoundException e) {
      log.noOpenTelemetryAPI();
      return null;
   }

   try {
      return new TelemetryServiceImpl();
   } catch (Throwable e) {
      log.errorCreatingPropagationContext(e);
      return null;
   }
}

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 13, 2023

Sorry @wburns I just realized that you asked for the enable / disable propagation log and not for the main log.
The property tracingPropagationEnabled is not currently handled by the new module.

@wburns wburns merged commit 024bcf2 into infinispan:main Oct 2, 2023
3 of 4 checks passed
@wburns
Copy link
Member

wburns commented Oct 2, 2023

Integrated into main, thanks @fax4ever !

@fax4ever
Copy link
Contributor Author

fax4ever commented Oct 2, 2023

Thank you @wburns!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
14.0.x Annotate a PR with this label if you want it to be backported to the 14.0.x branch
Projects
None yet
3 participants