Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceId does not propagate between threads #1748

Closed
jacekmg opened this issue Oct 1, 2020 · 18 comments
Closed

TraceId does not propagate between threads #1748

jacekmg opened this issue Oct 1, 2020 · 18 comments

Comments

@jacekmg
Copy link

jacekmg commented Oct 1, 2020

Describe the bug
TraceId does not propagate between threads since version 2.2.2.RELESE of sleuth (Spring Cloud Hoxton.SR3)
I've tested with verison 2.2.1.RELEASE (Hoxton.SR2) - works correctly
On newest version - 2.2.5.RELEASE issue still persists.

Sample
Here's test Controller:

`
package app.testing;

import brave.Span;
import brave.Tracer;
import brave.propagation.TraceContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {
    private Tracer tracer;

    @Autowired
    public TraceIdController(Tracer tracer) {
        this.tracer = tracer;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.just(getTraceId()))
                .subscribeOn(Schedulers.elastic())
                .map(elasticThreadId -> new TraceIdDto(traceId, elasticThreadId.orElse(null)));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

package app.testing;

import lombok.AllArgsConstructor;
import lombok.Data;

@Data
@AllArgsConstructor
public class TraceIdDto {
    private String nativeThreadTraceId;
    private String elasticThreadTraceId;
}

`

And junit test that shows the problem:
` package app.testing;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import org.springframework.test.web.reactive.server.WebTestClient;

import static org.assertj.core.api.Assertions.assertThat;

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
public class TraceIdControllerTest {

    @Autowired
    private WebTestClient webTestClient;

    @Test
    public void testTraceId() {

        TraceIdDto traceIdDto = webTestClient.get().uri("/api/v1/traceId/test")
                .exchange().expectStatus().isOk()
                .expectBody(TraceIdDto.class)
                .returnResult().getResponseBody();

        assertThat(traceIdDto.getNativeThreadTraceId()).isEqualTo(traceIdDto.getElasticThreadTraceId());
    }

}

`

Proposed solution
I've investiagted sleuth code a little bit, and I think the problem is in SleuthContextListener. I've compered this class with ContextRefreshedListener that existed in 2.2.1 version.
It seems like SleuthContextListener sets refreshed and closed flags to false, when appropriate event comes, instead to true.
... listener.refreshed.compareAndSet(false, event instanceof ContextRefreshedEvent); listener.closed.compareAndSet(false, event instanceof ContextClosedEvent); ...

@marcingrzejszczak
Copy link
Contributor

Are you willing to create a failing test and a PR for this?

@jacekmg
Copy link
Author

jacekmg commented Oct 1, 2020

I'd rather not. I think, it would be better if someone who knows sleuth code better, did it.

@marcingrzejszczak
Copy link
Contributor

No problem. What you could do is change the way SleuthContextListener works locally and check your code against snapshots to confirm that this is a problem.

@jacekmg
Copy link
Author

jacekmg commented Oct 1, 2020

I've tested a little bit more, and I was wrong - SleuthContextListener sets the refreshed flag correctly. However ContextUtil retrieves different instance of SleuthContextListener than the one that handles ContextRefreshedEvent. So I'm not sure how to fix the issue at the moment.

@marcingrzejszczak
Copy link
Contributor

That's useful, thanks. I'll try to look at it.

@marcingrzejszczak
Copy link
Contributor

I can't replicate this. It will be difficult to fix this without a sample. Can you please provide one?

@jacekmg
Copy link
Author

jacekmg commented Dec 1, 2020

So the sample that I provided up above works for you without a problem?

@marcingrzejszczak
Copy link
Contributor

Those are two snippets, can you create a sample as a Java maven project that actually replicates this? That would be super helpful.

@jacekmg
Copy link
Author

jacekmg commented Dec 1, 2020

Allright, I will try to create a project for you. Gradle instead of maven won't be problem for you?

@marcingrzejszczak
Copy link
Contributor

I'm not a fan of Gradle but sure ;)

@jacekmg
Copy link
Author

jacekmg commented Dec 3, 2020

@marcingrzejszczak here you go: https://github.com/jacekmg/sleuth-trace-id-propagation-test
Let me know if that's sufficient for you.

@marcingrzejszczak
Copy link
Contributor

marcingrzejszczak commented Dec 4, 2020

Thanks for the sample - it was super useful.

So unfortunately Sleuth and Reactor are a problematic couple from the very beginning. That's mainly Sleuth's fault - we're trying to make things super easy for the user without the user's need to interact with any of the Sleuth's internals. With Reactor we're going away of such an approach (that's why in Sleuth 3.0.0 you'll have the utility operators fro messaging, webflux etc.).

Coming back to your case - the trace id is properly propagated via the reactor context. For some reason it's just not set as a thread local.

What you should be doing is the following (for Sleuth 2.x)

@RestController
public class TraceIdController {

    private Tracer tracer;

    @Autowired
    public TraceIdController(Tracer tracer) {
        this.tracer = tracer;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.subscriberContext()
                    .map(ctx -> Optional.of(ctx.get(TraceContext.class)).map(TraceContext::traceIdString))
                .subscribeOn(Schedulers.elastic())
                .map(traceIdFromNewThread -> new TraceIdDto(traceId, traceIdFromNewThread.orElse(null))));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

and example for Sleuth 3.0.0

.application.yaml - disables the default on each operator wrapping

spring:
  sleuth:
    reactor:
      instrumentation-type: manual

.TraceIdController

package app.controllers;

import app.dtos.TraceIdDto;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.sleuth.CurrentTraceContext;  // (1)
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.TraceContext;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.cloud.sleuth.instrument.web.WebFluxSleuthOperators;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ServerWebExchange;

import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get(ServerWebExchange serverWebExchange) {
        String traceId = WebFluxSleuthOperators.withSpanInScope(tracer, currentTraceContext, 
            serverWebExchange, () -> getTraceId().orElse(null));  // (2)
        return Mono.defer(Mono::subscriberContext)
                .subscribeOn(Schedulers.elastic())
                .map(context -> WebFluxSleuthOperators.withSpanInScope(context, 
                    () -> new TraceIdDto(traceId, getTraceId().orElse(null)))); // (3)
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceId);
    }

}
  • (1) - new imports to the Sleuth's API
  • (2) - you can retrieve trace information from ServerWebExchange; the lambda will contain the trace information set in thread local
  • (3) - you can retrieve trace information from Reactor's Context; the lambda will contain the trace information set in thread local

That way things will not only just work but actually work much faster.

@jacekmg
Copy link
Author

jacekmg commented Dec 4, 2020

To clarify, the issue won't be fixed for webflux?
Because it also affects slf4j Logger. I've added logger to the sample project, and we can clearly see that logger in elastic thread do not have access to traceId:

2020-12-04 16:27:45.634  INFO [,32334ea865a2891f,32334ea865a2891f,true] 3747 --- [or-http-epoll-3] app.controllers.TraceIdController        : Traceid retrieval
2020-12-04 16:27:45.679  INFO [,,,] 3747 --- [      elastic-2] app.controllers.TraceIdController        : Traceid retrieval

So it makes sleuth pretty much useless when we are using slf4j logger and webflux.
In other words: In order to use sleuth in logging we would have to implement custom logger that would have subscriberContext as an argument, everytime that we want to log something?

@marcingrzejszczak
Copy link
Contributor

marcingrzejszczak commented Dec 4, 2020

The issue will be fixed for Webflux and Slf4j logger will work. That means that the fix is to use manual operators.

In Sleuth 2.x you'll have to put the span in scope manually. Example

package app.controllers;

import app.dtos.TraceIdDto;
import brave.Span;
import brave.Tracer;
import brave.propagation.CurrentTraceContext;
import brave.propagation.TraceContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private static final Logger log = LoggerFactory.getLogger(TraceIdController.class);

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    @Autowired
    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.subscriberContext()
                .map(ctx -> Optional.of(ctx.get(TraceContext.class)))
                .subscribeOn(Schedulers.elastic())
                .map(traceIdFromNewThread -> {
                    try (CurrentTraceContext.Scope scope = this.currentTraceContext.maybeScope(traceIdFromNewThread.orElse(null))) {
                        log.info("Hello");
                        return new TraceIdDto(traceId, getTraceId().orElse(null));
                    }
                }));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

Result:

2020-12-04 18:02:16.733  INFO [,cfaa55d92a45e52e,cfaa55d92a45e52e,true] 448629 --- [      elastic-2] app.controllers.TraceIdController        : Hello

In Sleuth 3.0.0 within the lambda we put the span in scope which means that we put it into MDC so you can log directly in lamba.

package app.controllers;

import app.dtos.TraceIdDto;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.sleuth.CurrentTraceContext;
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.TraceContext;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.cloud.sleuth.instrument.web.WebFluxSleuthOperators;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ServerWebExchange;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private static final Logger log = LoggerFactory.getLogger(TraceIdController.class);

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get(ServerWebExchange serverWebExchange) {
        String traceId = WebFluxSleuthOperators.withSpanInScope(tracer, currentTraceContext, serverWebExchange, () -> {
            log.info("Hello1");
            return getTraceId().orElse(null);
        });
        return Mono.defer(Mono::subscriberContext)
                .subscribeOn(Schedulers.elastic())
                .map(context -> WebFluxSleuthOperators.withSpanInScope(context, () -> {
                    log.info("Hello2");
                    return new TraceIdDto(traceId, getTraceId().orElse(null));
                }));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceId);
    }

}

and the logs

2020-12-04 18:05:02.936  INFO [,bcc5da9deecfc003,bcc5da9deecfc003] 449021 --- [or-http-epoll-4] app.controllers.TraceIdController        : Hello1
2020-12-04 18:05:02.955  INFO [,bcc5da9deecfc003,bcc5da9deecfc003] 449021 --- [      elastic-2] app.controllers.TraceIdController        : Hello2

@jacekmg
Copy link
Author

jacekmg commented Dec 7, 2020

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.
Anyway, thanks for fast and clear responses.

@marcingrzejszczak
Copy link
Contributor

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.

You'll have to do that manually, sorry.

@vinodnholkar
Copy link

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.
Anyway, thanks for fast and clear responses.

@jacekmg Did you find any solution on this. I am facing same issue

@jacekmg
Copy link
Author

jacekmg commented Feb 22, 2021

@vinodnholkar I didn't find any new solutions, besides manual scope update that @marcingrzejszczak proposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants