Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traceId, spanId not set in logs when using quartz and opentelemetry #28552

Closed
syr opened this issue Oct 12, 2022 · 45 comments · Fixed by #35989
Closed

traceId, spanId not set in logs when using quartz and opentelemetry #28552

syr opened this issue Oct 12, 2022 · 45 comments · Fixed by #35989

Comments

@syr
Copy link

syr commented Oct 12, 2022

Describe the bug

I use quarkus-quartz and quarkus-opentelemetry

Have definded a custom log format like in https://quarkus.io/guides/opentelemetry#create-the-configuration

quarkus.log.console.format=%d{HH:mm:ss} %-5p traceId=%X{traceId}, parentId=%X{parentId}, spanId=%X{spanId}, sampled=%X{sampled} [%c{2.}] (%t) %s%e%n

Expected behavior

Expected that traceId etc. are filled, (this only works for rest resource), see:

  00:16:43 INFO  traceId=7d79594584b49a7db88f47c75a0cd185, parentId=, spanId=cb9d17786f3ed0c0, sampled=true [or.ac.qu.TaskResource] (executor-thread-0) log test

Actual behavior

But for log output in quartz scheduled method everything is empty

00:16:45 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-3) log test

How to Reproduce?

https://github.com/syr/quartz-otel-logs is based on latest quartz-quickstart code.

Only needs to be started, scheduler runs every 10 seconds printing log lines without traceId etc.

Calling http://localhost:8080/tasks shows that it works generally, but only for rest resources

Output of uname -a or ver

No response

Output of java -version

java -version openjdk version "17.0.2" 2022-01-18 OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8) OpenJDK 64-Bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.13.2.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Maven 3.8.1

Additional information

No response

@syr syr added the kind/bug Something isn't working label Oct 12, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Oct 12, 2022

@mkouba
Copy link
Contributor

mkouba commented Oct 13, 2022

I believe that this works as designed. You can't expect a traceId to be set during execution of a scheduled method that has no HTTP request associated. In fact, a scheduled method cannot have an HTTP request associated because it represents a kind of entry point in your application - there is no external event that would trigger the execution of a scheduled method.

I'm going to close this issue. Feel free to reopen if the paragraph above does not make any sense.

@mkouba mkouba closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2022
@Ladicek
Copy link
Contributor

Ladicek commented Oct 13, 2022

Technically, a scheduled method can call other external services etc., and so tracing may make sense. As you mention, a scheduled method is an entrypoint, so it deserves a fresh new trace/span, I'd say.

@mkouba
Copy link
Contributor

mkouba commented Oct 13, 2022

Technically, a scheduled method can call other external services etc., and so tracing may make sense. As you mention, a scheduled method is an entrypoint, so it deserves a fresh new trace/span, I'd say.

I see. My point was that the tracing info should not be propagated but that was probably just a misunderstanding of the problem.

@mkouba mkouba reopened this Oct 13, 2022
@brunobat
Copy link
Contributor

brunobat commented Oct 13, 2022

Tracing does make sense on Quartz Jobs. This is not just for HTTP.
It will be great to understand the outcome of each job.
This requires 2 changes, though. One thing is to instrument jobs, another one is curating the MDC context with the right data.

@brunobat brunobat added this to Backlog in Observability roadmap [Deprecated] via automation Oct 13, 2022
@syr
Copy link
Author

syr commented Oct 14, 2022

Thanks for adding the issue to the roadmap.
As a workaround I use otel-agent (via JAVA_TOOL_OPTIONS) and set the MDC fields by myself by puttying MDC entries for traceId, spanId, parentId from Span.current().getSpanContext()
Thus, all logs have the same trace id for each scheduled run.

However get two problem:

  1. no idea where to get parentId from. As far as I read it is not possible to get it manually
  2. manual instrumentation by @WithSpan is not working/ignored. Should it generally be possible to mix auto-instrumentation (by otel agent) and manual instrumenation? Updatde reproducer project https://github.com/syr/quartz-otel-logs/commit/4ccd464e8e216576070acfd63ff6bdf3084d1019

@brunobat
Copy link
Contributor

  1. You don't really need the parent. The SpanId and the TraceId are enough to search on Jaeger or an equivalent tool.
  2. In Quarkus we don't recommend using the agent because of context propagation problems. The '@WithSpan' on a CDI bean should work with imperative code. If you can provide a reproducer it would help.

@syr
Copy link
Author

syr commented Oct 14, 2022

https://github.com/syr/quartz-otel-logs/commit/4ccd464e8e216576070acfd63ff6bdf3084d1019

Without agent I don't get valid traceId, only initial values 0000000000000. So atm it's also only meant as a temp. workaround.

@brunobat
Copy link
Contributor

We need to instrument Quartz with OpenTelemetry. Not done yet.
But if you call a CDI bean from quartz and use '@WithSpan' it should work.

@syr
Copy link
Author

syr commented Oct 14, 2022

moved the method with @span to a seperate service class in https://github.com/syr/quartz-otel-logs/commit/57aef0684ec3cd944d3067b4d245203b2b299e67 , but still the same spanId

12:39:39 INFO  traceId=1646cfc30a48e24dfaa8a5e3f4cc4a75, parentId=unknown, spanId=5058966601badfb7, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-5) log test
  12:39:39 INFO  traceId=1646cfc30a48e24dfaa8a5e3f4cc4a75, parentId=unknown, spanId=5058966601badfb7, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-5) log test from testMethod
´´´

@brunobat
Copy link
Contributor

I think we'll need a reproducer project

@syr
Copy link
Author

syr commented Oct 14, 2022

See link in my previous reply.

I also sent all traces to a local jaeger-aio where I saw in the jaeger UI that actually all spans were created. So it seems to me that maybe only MDC is not kept consistent. Will extend the reproducer project to verify..likely tomorrow.

@syr
Copy link
Author

syr commented Oct 15, 2022

I added some log output that shows that spanId from spanContext vs. MDC is not consistent
https://github.com/syr/quartz-otel-logs

correct: logs from scheduled method (spanId in log entry is correct, but is also directly before set in MDC)

14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) traceId: 59ec1645b4243fdf69e4dc8a61377bf3
14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) spanId: 46e90f4f12994226
14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) log test

IMO not correct: logs from service method, called by scheduled method (spanId in log entries should be b6b70be685472933 as logged, because method is annotated with @WithSpan)

14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) traceId in testMethod: 59ec1645b4243fdf69e4dc8a61377bf3
14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) spanId in testMethod: b6b70be685472933
14:01:26 INFO  traceId=59ec1645b4243fdf69e4dc8a61377bf3, parentId=unknown, spanId=46e90f4f12994226, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) log test in testMethod

Maybe also another issue due to using quarkus-opentelemetry together with opentelemetry agent.

@brunobat
Copy link
Contributor

The MDC context needs to be cleared after the job was done, with MDC.remove(...) after:
https://github.com/syr/quartz-otel-logs/blob/2ad12e7efbbb1557a17c57dc305280a50a92a3d0/src/main/java/org/acme/quartz/TaskBean.java#L34

@syr
Copy link
Author

syr commented Oct 17, 2022

spanId from log from testMethod keeps unchanged when I do it like this..

  18:22:48 INFO  traceId=cbd5541755282dcac116f8dec98ac033, parentId=unknown, spanId=84d275bf0774f598, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) spanId: 84d275bf0774f598
  18:22:48 INFO  traceId=cbd5541755282dcac116f8dec98ac033, parentId=unknown, spanId=84d275bf0774f598, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) log test
  18:22:48 INFO  traceId=cbd5541755282dcac116f8dec98ac033, parentId=unknown, spanId=84d275bf0774f598, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) traceId in testMethod: cbd5541755282dcac116f8dec98ac033
  18:22:48 INFO  traceId=cbd5541755282dcac116f8dec98ac033, parentId=unknown, spanId=84d275bf0774f598, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) spanId in testMethod: f43a3652290c1da4

Only thing that worked but is cumbersome is to set also the MDC explicitly in the org.acme.quartz.TaskService#testMethod, so again like:

@WithSpan("custom span")
    public void testMethod(){
        SpanContext spanContext = Span.current().getSpanContext();
        MDC.put("traceId", spanContext.getTraceId());
        MDC.put("spanId", spanContext.getSpanId());
...

Which leads to correct/expected log field with new spanId and same traceId

  18:13:19 INFO  traceId=c99870cee4e1da346d9d73535db2ec5a, parentId=unknown, spanId=58e43c8f29b1f67b, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) spanId: 58e43c8f29b1f67b
  18:13:19 INFO  traceId=c99870cee4e1da346d9d73535db2ec5a, parentId=unknown, spanId=58e43c8f29b1f67b, sampled= [or.ac.qu.TaskBean] (QuarkusQuartzScheduler_Worker-2) log test
  18:13:19 INFO  traceId=c99870cee4e1da346d9d73535db2ec5a, parentId=unknown, spanId=8524d787f125a66b, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) traceId in testMethod: c99870cee4e1da346d9d73535db2ec5a
  18:13:19 INFO  traceId=c99870cee4e1da346d9d73535db2ec5a, parentId=unknown, spanId=8524d787f125a66b, sampled= [or.ac.qu.TaskService] (QuarkusQuartzScheduler_Worker-2) spanId in testMethod: 8524d787f125a66b

Any idea why opentelemetry is not doing this automatically? Or is it not its responsibility?
Sorry, maybe I should ask this in opentelemetry github or SO

@brunobat
Copy link
Contributor

There is no connection between the MDC context and OpenTelemetry.
On Quarkus we curate the MDC context with that data (it's a Quarkus specific feature), however, because we haven't instrumented Quartz yet, that's why the info is not being shown in the logs as well.
Setting the MDC context manually in the method works even if it's not clean. Please don't forget to call MDC.remove in the end of the method, otherwise you will get wrong log entries afterwards.
Once we implement the OTel Quartz instrumentation on Quarkus, you will be able to remove that code.

@syr
Copy link
Author

syr commented Oct 18, 2022

@brunobat thank you for the clarification which brings some light in the dark and the hint about MDC.remove. Actually from my basic understanding would be easier if OTel would also set the MDC, maybe configurable by a flag.
Is there an overview of quarkus extensions thta are OTel-ready (except REST resources)?
Was working through https://quarkus.io/guides/opentelemetry which is good to start but does not have a list of OTel ready quarkus extensions listed.

@syr
Copy link
Author

syr commented Oct 18, 2022

Or is it maybe possible to plug jboss logging to one of the logging frameworks (or vice versa) that supports automatic config of MDC (Log4J, Logback), see https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/logger-mdc-instrumentation.md ?

@syr
Copy link
Author

syr commented Nov 13, 2022

@Ladicek @brunobat @brunobat @radcortez
Is there some list of libraries that are instrumented by quarkus-opentelemetry?
I guess it must be a subset of https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md, e.g. quartz not yet instrumented.

Also having problems, as context propagation with AWS SDK seems to be not working, neither with quarkus-opentelemetry nor with the opentelemetry agent...created issue with reproducer on optentelemetry github: open-telemetry/opentelemetry-java-instrumentation#7120 without any feedback,

In the meantime I wonder if I should really follow opentelemetry atm, as there are so many issues, maybe it is too new/experimental to be used.

@brunobat
Copy link
Contributor

Hi Syr,
There is no such list for the time being and we currently don't follow that list.
We prefer to instrument Quarkus ourselves and make sure context propagation and native mode behave correctly.
I can tell you that inside Quarkus, today we instrument:

  • Vertx, including the HTTP server
  • REST clients
  • GRPC server
    But also if you use:
  • Reactive messaging from SmallRye
  • SQL queries if you use the dedicated driver
  • Provide CDI aware beans where you can create your own spans

OpenTelemetry is constantly evolving but also needs the community contributions...

@syr
Copy link
Author

syr commented Nov 16, 2022

Hi @brunobat, thanks for the info. Could you please add this list to the opentelemetry guide? IMO it's crucial to decide if the extension fits or if otel agent must be used to get more auto-instrumenation, like in our case using quartz and aws sdk libs.

Might be a silly question, but is there a recommended way atm for production applications?
OpenTracing is deprecated and OpenTelemetry experimental, so sitting between chairs atm, using custom stuff.
Would love to contribute, need to fight to get some time slots, which is super hard during a dev project, while otel is the most complex/foggy lib, especially about understanding responsibilities IMO.

@brunobat
Copy link
Contributor

OpenTelemetry tracing is not experimental anymore. We are preparing a few changes, soon, that will wide the instrumentation support.
The OTel agent instrumentation is too generic. It seems to work but, as I mentioned, it has context propagation issues within the service, specially if you mix imperative and reactive code. The built in Quarkus instrumentation fixes that.

@syr
Copy link
Author

syr commented Nov 17, 2022

OpenTelemetry tracing is not experimental anymore. We are preparing a few changes, soon, that will wide the instrumentation support.

OK sounds good. I was referring to the status shown at code.quarkus.io.
Is there a rough roadmap weeks/months/2023 when to expect these extended otel support beeing released?
This info would be very helpful to decide how much effort we put in workarounds using otel agent.

The OTel agent instrumentation is too generic. It seems to work but, as I mentioned, it has context propagation issues within the service, specially if you mix imperative and reactive code. The built in Quarkus instrumentation fixes that.

Yes I noticed quite some issues (mainly three points in open-telemetry/opentelemetry-java-instrumentation#7120 (comment)), however it was the only way I see to get quartz and aws sdk auto-instrumented atm.

We have quite few reactive code in the project, I would rather consider to change it back to impierative because there are also issues when Mutiny is combined with @transactional for example...likely mixing imperative and reactive is not a good idea. But that's another topic...

@syr
Copy link
Author

syr commented Nov 28, 2022

@brunobat @Ladicek @mkouba
Can you please have a look at the last comment in open-telemetry/opentelemetry-java-instrumentation#7120 (comment)
Maybe there is not so much to change to get the MDC beeing updated when ussing jboss logger via io.quarkus.logging.Log

@famod
Copy link
Member

famod commented Nov 28, 2022

I ran into the same problem, but with the simple scheduler (not quartz).
IMO, #28552 (comment) also applies to the simple scheduler.

Shall I open a separate issue?

@brunobat
Copy link
Contributor

@famod yes please.

@famod
Copy link
Member

famod commented Nov 30, 2022

@brunobat will do.

For the simple scheduler, I've realized that a workaround seems as easy as adding @WithSpan to my @Scheduled method. Am I missing something?

@syr
Copy link
Author

syr commented Nov 30, 2022

@famod do you use otel agent or is this already working with quarkus-opentelemetry + quarkus-scheduler dependencies?

Btw. I would expect this to work with simple scheduler by default opposite to the described issue here which is scoped to quartz not beeing instrumented atm.
I would advice you to test to add another @WithSpan on a subsequent method, and log something, just to check that MDCs are correctly updated by Otel span context (traceId should stay the same, spanId should change).

@famod
Copy link
Member

famod commented Dec 2, 2022

@syr I'm not using the otel agent explicitly, only quarkus-opentelemetry + quarkus-scheduler dependencies and additionally io.opentelemetry.instrumentation:opentelemetry-jdbc.

@WithSpan just works, I see JDBC sub-spans for my job executions.

I created a separate enhancement issue: #29647

@brunobat brunobat moved this from In progress to Backlog in Observability roadmap [Deprecated] Dec 12, 2022
@Kasokaso
Copy link

Is there any plan to do something similar when consuming events from Kafka? I believe it would be useful to understand the outcome of processed events.

@brunobat
Copy link
Contributor

Not sure when this is going to be scheduled.

@geoand
Copy link
Contributor

geoand commented Jul 4, 2023

IIUC, everything just works with @WithSpan correct?

If so, maybe we just need to add a note in the scheduler documentation instead of doing all the work to enable tracing by default for scheduled methods?

@Ladicek
Copy link
Contributor

Ladicek commented Jul 4, 2023

If everything works with @WithSpan, which we treat as an interceptor binding, enabling tracing by default for scheduled methods would amount to a simple annotation transformation I'd say.

@geoand
Copy link
Contributor

geoand commented Jul 4, 2023

Good point

syr pushed a commit to syr/quarkus-resteasy-postgres that referenced this issue Jul 4, 2023
syr added a commit to syr/quarkus-resteasy-postgres that referenced this issue Jul 4, 2023
syr added a commit to syr/quarkus-resteasy-postgres that referenced this issue Jul 4, 2023
syr added a commit to syr/quarkus-resteasy-postgres that referenced this issue Jul 4, 2023
@syr
Copy link
Author

syr commented Jul 4, 2023

IIUC, everything just works with @WithSpan correct?

If so, maybe we just need to add a note in the scheduler documentation instead of doing all the work to enable tracing by default for scheduled methods?

AFAIU it does not work when using custom job classes (similar to https://quarkus.io/guides/quartz#scheduling-jobs-programmatically).
Just tested with version 2.16.7.Final: https://github.com/syr/quarkus-resteasy-postgres/tree/otel_quartz_2.16

Prints from job execute method (org.acme.quartz.TaskCreateJob#execute):

18:12:16 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.qu.TaskCreateJob] (DefaultQuartzScheduler_Worker-1) created task
18:12:17 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.qu.TaskCreateJob] (DefaultQuartzScheduler_Worker-2) created task
18:12:18 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.qu.TaskCreateJob] (DefaultQuartzScheduler_Worker-3) created task
18:12:19 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.qu.TaskCreateJob] (DefaultQuartzScheduler_Worker-4) created task

However, it's interesting that it works (even without @WithSpan) when using @Scheduled e.g.:

    @Scheduled(every = "10s", identity = "task-job")
    void schedule() {
        Log.info("created task");
    }

with quarkus-quartz extension.

However, we need the custom job classes, so cannot use @Scheduled on methods :/

@brunobat
Copy link
Contributor

brunobat commented Jul 4, 2023

@syr I see that you are implementing org.quartz.Job and annotating the execute method with @WithSpan.
I don't think this will be on a CDI context, therefore the interceptor will not kick in.
If you implement the job in a bean and then, you call it from inside the execute, it should work.

@syr
Copy link
Author

syr commented Jul 4, 2023

@brunobat Yes, like this it's working

public class TaskCreateJob implements Job {
    @Override
    public void execute(JobExecutionContext jobExecutionContext) {
        Arc.container().select(TaskCreateJobBean.class).get().execute(jobExecutionContext);
    }
}

@ApplicationScoped
@Unremovable
public class TaskCreateJobBean  {
    @WithSpan
    public void execute(JobExecutionContext jobExecutionContext) {
        Log.info("created task");
    }
}

..but it's damn ugly. Was hoping there is a more elegant way.
Please let me know if you see a better/less redundant way.
I would not like to create bean classes for every job class we have.

@radcortez
Copy link
Member

It would require adding proper OTel integration in a JobListener. This can be used as a basis: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/quartz-2.0/library/src/main/java/io/opentelemetry/instrumentation/quartz/v2_0

Would you like to try submitting a PR with the proper support? :)

@syr
Copy link
Author

syr commented Jul 10, 2023

It would require adding proper OTel integration in a JobListener. This can be used as a basis: https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/quartz-2.0/library/src/main/java/io/opentelemetry/instrumentation/quartz/v2_0

Would you like to try submitting a PR with the proper support? :)

In which repo would this have to be added? opentelemetry-java-instrumentation or quarkus-opentelemetry?

@geoand
Copy link
Contributor

geoand commented Jul 10, 2023

quarkus-opentelemetry

@mskacelik
Copy link
Contributor

Gonna have a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment