Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using weblux, different requests get the same traceid #2070

Closed
LeePui opened this issue Aug 20, 2021 · 7 comments
Closed

When using weblux, different requests get the same traceid #2070

LeePui opened this issue Aug 20, 2021 · 7 comments
Assignees
Labels
8.3-candidate agent-java bug Bugs community Issues and PRs created by the community

Comments

@LeePui
Copy link

LeePui commented Aug 20, 2021

apm-agent version:1.25.0
VM options:-javaagent:elastic-apm-agent-1.25.0.jar -Delastic.apm.enable_experimental_instrumentations=true -Delastic.apm.use_path_as_transaction_name=true

i use apm-agent-api in spring gateway,i want to get the traceid and transmit it to downstream applications,i use WebFilter :

`public class MyWebFilter implements WebFilter, Ordered {

public Mono<Void> filter(ServerWebExchange ctx, WebFilterChain chain) {
	String traceId = null;
	Transaction transaction = ElasticApm.currentTransaction();
	if (!StringUtils.isBlank(transaction.getTraceId())) {
		traceId = transaction.getTraceId();
	} else {
		transaction = ElasticApm.startTransaction();
		try (final Scope ignored = transaction.activate()) {
			transaction.setName(CLASS_NAME + "#" + "filter");
			transaction.setType(Transaction.TYPE_REQUEST);
			traceId = transaction.getTraceId();
		} catch (Exception e) {
			transaction.captureException(e);
			throw e;
		} finally {
			transaction.end();
		}
	}
	//set traceId in httpHeader......
}

}`

when i sent some request, i found the same traceid:
image

@github-actions github-actions bot added agent-java community Issues and PRs created by the community labels Aug 20, 2021
@SylvainJuge
Copy link
Member

SylvainJuge commented Aug 20, 2021

Hi @LeePui ,

We don´t have automatic instrumentation available for Spring Gateway, thus I assume that you are trying to work-around that limitation by propagating the headers yourself to the Webflux client, is that right ?

First, we currently have a PR (#1948) that changes a bit Webflux instrumentation that is about to be merged, would you mind testing with this snapshot to be sure that the observed behavior is still present ?

From what I understand, the Spring Gateway provides Webflux server endpoints and delegates execution to Webflux client.
Because we don't have automatic instrumentation of Webflux client (#1306), that means in your case:

  1. you should always have a transaction active in the filter (if not, then it means we should fix that first)
  2. you will have to propagate the active transaction to the client part, using either Reactor context or Webflux ServerExchange attributes are good options.
  3. you will have to write an interceptor on the client to retrieve the transaction and create a client span that wraps the HTTP request execution (I'm not yet familiar with the client, thus I don't know exactly how that would work).
  4. you will have to write the client span ID into HTTP headers for context propagation, there should be a single span per downstream HTTP request.

All the points 2 to 4 should be handled by the automatic instrumentation of Webflux client.

What likely happens in your case is that you have a single transaction that does multiple downstream HTTP requests, which explains why you see many times the same transaction ID.

@LeePui
Copy link
Author

LeePui commented Aug 23, 2021

Hi @LeePui ,

We don´t have automatic instrumentation available for Spring Gateway, thus I assume that you are trying to work-around that limitation by propagating the headers yourself to the Webflux client, is that right ?

First, we currently have a PR (#1948) that changes a bit Webflux instrumentation that is about to be merged, would you mind testing with this snapshot to be sure that the observed behavior is still present ?

From what I understand, the Spring Gateway provides Webflux server endpoints and delegates execution to Webflux client.
Because we don't have automatic instrumentation of Webflux client (#1306), that means in your case:

  1. you should always have a transaction active in the filter (if not, then it means we should fix that first)
  2. you will have to propagate the active transaction to the client part, using either Reactor context or Webflux ServerExchange attributes are good options.
  3. you will have to write an interceptor on the client to retrieve the transaction and create a client span that wraps the HTTP request execution (I'm not yet familiar with the client, thus I don't know exactly how that would work).
  4. you will have to write the client span ID into HTTP headers for context propagation, there should be a single span per downstream HTTP request.

All the points 2 to 4 should be handled by the automatic instrumentation of Webflux client.

What likely happens in your case is that you have a single transaction that does multiple downstream HTTP requests, which explains why you see many times the same transaction ID.

Hi @SylvainJuge ,
In fact, i'm trying to get traceId by apm-agent-api in spring gateway, i want to combining application log and apm tracing by traceId.
So i use the code above to get traceId and set it in httpHeader. I will get the traceid from the request header and put it into the log in downstream applications.

@SylvainJuge
Copy link
Member

So, if I understand correctly here, the end-to-end request workflow is the following:

  1. an HTTP client makes a request to the Spring-Gateway service
  2. this gateway is instrumented with APM agent and you retrieve a transaction ID using the API
  3. your code manually injects this transaction ID into downstream HTTP requests using a custom header
  4. the gateway delegates request processing to downstream services with the extra HTTP header
  5. the downstream service re-injects the ID provided through headers into logs, probably through manual calls to the MDC.
  6. you should (in theory) be able to correlate the initial HTTP call on the gateway and the logs on the downstream service(s).

Can you confirm that properly describes your case here ?

What is not really clear to me are the following points:

  • does your gateway perform multiple downstream HTTP calls ?
  • which custom header do you use in the Webflux HTTP client ?
  • is the downstream service also instrumented with APM agent ?

@LeePui
Copy link
Author

LeePui commented Aug 30, 2021

So, if I understand correctly here, the end-to-end request workflow is the following:

  1. an HTTP client makes a request to the Spring-Gateway service
  2. this gateway is instrumented with APM agent and you retrieve a transaction ID using the API
  3. your code manually injects this transaction ID into downstream HTTP requests using a custom header
  4. the gateway delegates request processing to downstream services with the extra HTTP header
  5. the downstream service re-injects the ID provided through headers into logs, probably through manual calls to the MDC.
  6. you should (in theory) be able to correlate the initial HTTP call on the gateway and the logs on the downstream service(s).

Can you confirm that properly describes your case here ?

What is not really clear to me are the following points:

  • does your gateway perform multiple downstream HTTP calls ?
  • which custom header do you use in the Webflux HTTP client ?
  • is the downstream service also instrumented with APM agent ?

yes, the request workflow is right.

  • does your gateway perform multiple downstream HTTP calls ?
    just one downstream.

  • which custom header do you use in the Webflux HTTP client ?
    i use elastic-apm-traceparent

  • is the downstream service also instrumented with APM agent ?
    yes

and i use API in gateway like this:

public class MyWebFilter implements WebFilter, Ordered {

public Mono<Void> filter(ServerWebExchange ctx, WebFilterChain chain) {
	String traceId = null;
	Transaction transaction = ElasticApm.startTransaction();
        try (final Scope ignored = transaction.activate()) {
	        transaction.setName(CLASS_NAME + "#" + "filter");
	        transaction.setType(Transaction.TYPE_REQUEST);
	        traceId = ElasticApm.currentTransaction().getTraceId();
                System.out.println(Thread.currentThread().getName() + "__" + traceId);
        } catch (Exception e) {
	        transaction.captureException(e);
	        throw e;
        } finally {
	        transaction.end();
        }
	//set traceId in httpHeader......
}
}

i do stress tests in 1.24.0 and 1.25.1-SNAPSHOT.jar, use 20 thread, each thread loop 10 times。
i found 1.24.0 will get the same traceId in the same thread, it look like this:

reactor-http-nio-1__9ffe164b152f884ef070a30a10b0d5fe
reactor-http-nio-2__47767cb81dfdc7c037086f2f49a157e1
reactor-http-nio-3__9a3bfbe74ead54a222bbcaba8381d6e3
reactor-http-nio-4__0f0d8346645cf853ab1d5c610f72d570
reactor-http-nio-1__9ffe164b152f884ef070a30a10b0d5fe
reactor-http-nio-2__47767cb81dfdc7c037086f2f49a157e1
.....

and 1.25.1-SNAPSHOT.jar will get a empty traceId.

if i replace traceId = ElasticApm.currentTransaction().getTraceId(); with traceId = transaction.getTraceId(); , it will get different traceid.

@SylvainJuge SylvainJuge added the bug Bugs label Aug 30, 2021
@SylvainJuge
Copy link
Member

There are multiple issues here, thus we will proceed step by step to make that more clear, starting from the upstream gateway then going to the downstream service.

  1. On the gateway, we should have a transaction created for every request, thus ElasticApm.currentTransaction().getTraceId() should not be empty 100% of the time.

  2. Still on the gateway, we should manually create a span for every downstream HTTP request because we don´t provide automatic webflux client instrumentation, pseudo-code:

// should start the span when the HTTP request is started
Span span = ElasticApm.currentTransaction().startSpan();

// manually propagate the context header (don't try to do this yourself)
span.injectTraceHeaders(...); // see method javadoc for an example here

// ... execute the HTTP request

// ending the span when the HTTP request ends, as it may not be in the same method or even thread
// storing span reference within reactor context could be required as most processing is asynchronous
// unlike with blocking HTTP clients

span.end();

  1. On the downstream service, the HTTP context propagation header should be visible, and there should be an active transaction. This transaction should have its parent context ID that refers to the HTTP client span.

@LeePui
Copy link
Author

LeePui commented Sep 3, 2021

There are multiple issues here, thus we will proceed step by step to make that more clear, starting from the upstream gateway then going to the downstream service.

  1. On the gateway, we should have a transaction created for every request, thus ElasticApm.currentTransaction().getTraceId() should not be empty 100% of the time.
  2. Still on the gateway, we should manually create a span for every downstream HTTP request because we don´t provide automatic webflux client instrumentation, pseudo-code:
// should start the span when the HTTP request is started
Span span = ElasticApm.currentTransaction().startSpan();

// manually propagate the context header (don't try to do this yourself)
span.injectTraceHeaders(...); // see method javadoc for an example here

// ... execute the HTTP request

// ending the span when the HTTP request ends, as it may not be in the same method or even thread
// storing span reference within reactor context could be required as most processing is asynchronous
// unlike with blocking HTTP clients

span.end();
  1. On the downstream service, the HTTP context propagation header should be visible, and there should be an active transaction. This transaction should have its parent context ID that refers to the HTTP client span.

ok,thank you

@SylvainJuge
Copy link
Member

Hi @LeePui, this issue hadn't had any update in while, thus I suggest that we revisit it once we have implemented Webflux Client support (#1306). In the mean time I will close this and we can re-open later if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.3-candidate agent-java bug Bugs community Issues and PRs created by the community
Projects
None yet
Development

No branches or pull requests

3 participants