Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace information lost after spring cloud 2021.0.6 update #2270

Closed
oemergenc opened this issue Mar 7, 2023 · 9 comments · Fixed by #2275
Closed

trace information lost after spring cloud 2021.0.6 update #2270

oemergenc opened this issue Mar 7, 2023 · 9 comments · Fixed by #2275
Labels
Milestone

Comments

@oemergenc
Copy link

oemergenc commented Mar 7, 2023

Describe the bug
We experience a problem with our trace information after updating to spring cloud 2021.0.6.
We are using spring-cloud-gcp-logging for trace information and after the update to spring cloud 2021.0.6 we are losing the trace information for some logs.

Sample
I have created a very simple test application, see here:
https://github.com/oemergenc/spring-cloud-tracing-problem

There are two branches: main branch uses spring cloud 2021.0.5 and the broken branch uses 2021.0.6. There is also a very simple test which runs successfully on the main branch and fails on the broken branch. It would be nice if you could have look at this, since I have no idea where trace information is lost. The test is here:

https://github.com/oemergenc/spring-cloud-tracing-problem/blob/main/src/test/java/com/example/tracing/problem/tracing/ReactiveLoggingIntegrationTest.java

If you need more infos about the sample project, let me know. It includes a simple webfilter which logs some statements.
Thanks in advance

@marcingrzejszczak
Copy link
Contributor

Hey! I've asked to you create this issue in spring cloud gcp. Why have you created it here?

@oemergenc
Copy link
Author

Hi, because I don't think that the problem is within spring cloud gcp. In my sample project we have the following dependencies:

        implementation(platform("com.google.cloud:spring-cloud-gcp-dependencies:3.4.5"))
	implementation(platform("org.springframework.cloud:spring-cloud-dependencies:2021.0.5"))

	implementation("org.springframework.boot:spring-boot-starter-webflux")
	implementation("com.google.cloud:spring-cloud-gcp-logging")
	implementation("org.springframework.cloud:spring-cloud-starter-sleuth")
	implementation("io.projectreactor.netty:reactor-netty-http-brave")
	implementation("io.zipkin.gcp:brave-propagation-stackdriver:1.0.4")

With this version everything works. When update to the following sping cloud version, which includes an update of spring-cloud-sleuth from version 3.1.5 to 3.1.7, we run into the problem I have described.

	implementation(platform("com.google.cloud:spring-cloud-gcp-dependencies:3.4.5"))
	implementation(platform("org.springframework.cloud:spring-cloud-dependencies:2021.0.6")) // here is the update

	implementation("org.springframework.boot:spring-boot-starter-webflux")
	implementation("com.google.cloud:spring-cloud-gcp-logging")
	implementation("org.springframework.cloud:spring-cloud-starter-sleuth")
	implementation("io.projectreactor.netty:reactor-netty-http-brave")
	implementation("io.zipkin.gcp:brave-propagation-stackdriver:1.0.4")

As you can see we did not update spring-cloud-gcp at all and only updated spring-cloud-dependencies, which in turn includes an update of sleuth. We think something changed in the tracing of spring cloud sleuth.

@oemergenc
Copy link
Author

Hi,
i was able to boil it further down. It seems like something changed in the artifact spring-cloud-sleuth-instrumentation, which causes the change in behavior. When I pin it to version 3.1.5 it seems to work again.

implementation("org.springframework.cloud:spring-cloud-sleuth-instrumentation:3.1.5")

As far as i can see there were some commits added in 3.1.7 regarding cleaning up of Threadlocal. Anyone an idea what exactly changed there and might this be the reason we are seeing the change in behavior?

@oemergenc
Copy link
Author

Hi, anyone an idea here, this somehow block us. The example I have provided demonstrates the problem in a very simple way, our real world problem is of more importance. We are losing access logs in our services because of this, which is a serious problem for us, any help would be appreciated.

@marcingrzejszczak
Copy link
Contributor

If we have time to look into that and idea how to solve this then we'll update the issue.

@sankarg931190
Copy link

Hi @OlegDokuka ,
I am not sure, if the fix #2275 actually propagating the spanId to OnComplete (or OnError).
I tried to replicate with a simple Spring webflux project, and I can't see the traceId and spanId propagated during completion and during onError.
Here is the sample code for your reference,
My Pom has :
org.springframework.boot : spring-boot-starter-parent : 2.7.12
org.springframework.boot : spring-boot-starter-webflux
org.springframework.boot : spring-boot-starter-test
org.springframework.cloud : spring-cloud-starter-sleuth : 3.1.8
io.projectreactor:reactor-test

Route class:
return RouterFunctions.route(GET("/helloWorld/{one}")
.and(RequestPredicates.accept(MediaType.TEXT_PLAIN)), helloWorldHandler::helloWorld);

And a handler class

public Mono helloWorld(ServerRequest request) {
String value = request.pathVariable("one");
if(value.equals("1")) {
throw new RuntimeException("path should contain some value");
}
return ServerResponse.ok().contentType(MediaType.TEXT_PLAIN)
.body(BodyInserters.fromObject("Hello "+value));
}

If I run the application, and hit the following url "http://localhost:8080/helloWorld/1", here is the application log,

2023-05-23 18:11:41.911 TRACE [,,] 12100 --- [ctor-http-nio-2] o.s.w.s.adapter.HttpWebHandlerAdapter : [219e5b8d-1] HTTP GET "/helloWorld/start", headers={masked}
2023-05-23 18:11:41.961 DEBUG [,9dffbac718e4a18b,9dffbac718e4a18b] 12100 --- [ctor-http-nio-2] o.s.w.r.f.s.s.RouterFunctionMapping : [219e5b8d-1] Mapped to com.traceid.traceidIssue.controller.HelloWorldRouter$$Lambda$488/0x0000000800456040@34b085f5
2023-05-23 18:11:41.998 DEBUG [,9dffbac718e4a18b,9dffbac718e4a18b] 12100 --- [ctor-http-nio-2] org.springframework.web.HttpLogging : [219e5b8d-1] Writing "Hello start"
2023-05-23 18:11:42.035 TRACE [,,] 12100 --- [ctor-http-nio-2] o.s.w.s.adapter.HttpWebHandlerAdapter : [219e5b8d-1] Completed 200 OK, headers={masked}

The last log line is missing the traceId. This is a scenario of OnComplete.
Similarly, when run an error scenario , for example accessing "http://localhost:8080/helloWorld/1", here are the error logs.

2023-05-23 18:12:03.045 TRACE [,,] 12100 --- [ctor-http-nio-4] o.s.w.s.adapter.HttpWebHandlerAdapter : [e3f64717-2] HTTP GET "/helloWorld/1", headers={masked}
2023-05-23 18:12:03.046 DEBUG [,94da05e6b3c9fa90,94da05e6b3c9fa90] 12100 --- [ctor-http-nio-4] o.s.w.r.f.s.s.RouterFunctionMapping : [e3f64717-2] Mapped to com.traceid.traceidIssue.controller.HelloWorldRouter$$Lambda$488/0x0000000800456040@34b085f5
### 2023-05-23 18:12:03.070 DEBUG [,,] 12100 --- [ctor-http-nio-4] org.springframework.web.HttpLogging : [e3f64717-2] Resolved [RuntimeException: path should contain some value] for HTTP GET /helloWorld/1
2023-05-23 18:12:03.078 ERROR [,,] 12100 --- [ctor-http-nio-4] a.w.r.e.AbstractErrorWebExceptionHandler : [e3f64717-2] 500 Server Error for HTTP GET "/helloWorld/1"

java.lang.RuntimeException: path should contain some value
at com.traceid.traceidIssue.controller.HelloWorldHandler.helloWorld(HelloWorldHandler.java:17) ~[classes/:na]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
2023-05-23 18:12:03.080 DEBUG [,,] 12100 --- [ctor-http-nio-4] org.springframework.web.HttpLogging : [e3f64717-2] Writing
2023-05-23 18:12:03.083 TRACE [,,] 12100 --- [ctor-http-nio-4] o.s.w.s.adapter.HttpWebHandlerAdapter : [e3f64717-2] Completed 500 INTERNAL_SERVER_ERROR, headers={masked}

As you can see above (highlighted in Italics ), the traceId is missing, when the exception is thrown.

If I change the following in pom file and restart the application,
org.springframework.cloud : spring-cloud-starter-sleuth : 3.1.5

then for positive scenario, I could see the traceId propagated during onComplete. (TraceId marked in In Bold)

2023-05-23 18:09:37.110 TRACE [,,] 19444 --- [ctor-http-nio-2] o.s.w.s.adapter.HttpWebHandlerAdapter : [fd841721-5] HTTP GET "/helloWorld/test", headers={masked}
2023-05-23 18:09:37.111 DEBUG [,3eeeb7e23074ad47,3eeeb7e23074ad47] 19444 --- [ctor-http-nio-2] o.s.w.r.f.s.s.RouterFunctionMapping : [fd841721-5] Mapped to com.traceid.traceidIssue.controller.HelloWorldRouter$$Lambda$488/0x0000000800456040@21c4efeb
2023-05-23 18:09:37.111 DEBUG [,3eeeb7e23074ad47,3eeeb7e23074ad47] 19444 --- [ctor-http-nio-2] org.springframework.web.HttpLogging : [fd841721-5] Writing "Hello test"
2023-05-23 18:09:37.113 TRACE [,3eeeb7e23074ad47,3eeeb7e23074ad47] 19444 --- [ctor-http-nio-2] o.s.w.s.adapter.HttpWebHandlerAdapter : [fd841721-5] Completed 200 OK, headers={masked}

Similarly for the error scenario,

2023-05-23 18:10:15.946 TRACE [,,] 19444 --- [ctor-http-nio-4] o.s.w.s.adapter.HttpWebHandlerAdapter : [d738a83e-6] HTTP GET "/helloWorld/1", headers={masked}
2023-05-23 18:10:15.946 DEBUG [,cafe29d9a35fd3bc,cafe29d9a35fd3bc] 19444 --- [ctor-http-nio-4] o.s.w.r.f.s.s.RouterFunctionMapping : [d738a83e-6] Mapped to com.traceid.traceidIssue.controller.HelloWorldRouter$$Lambda$488/0x0000000800456040@21c4efeb
_2023-05-23 18:10:15.948 DEBUG [,cafe29d9a35fd3bc,cafe29d9a35fd3bc] 19444 --- [ctor-http-nio-4] org.springframework.web.HttpLogging : [d738a83e-6] Resolved [RuntimeException: path should contain some value] for HTTP GET /helloWorld/1
2023-05-23 18:10:15.949 ERROR [,cafe29d9a35fd3bc,cafe29d9a35fd3bc] 19444 --- [ctor-http-nio-4] a.w.r.e.AbstractErrorWebExceptionHandler : [d738a83e-6] 500 Server Error for HTTP GET "/helloWorld/1"
java.lang.RuntimeException: path should contain some value
at com.traceid.traceidIssue.controller.HelloWorldHandler.helloWorld(HelloWorldHandler.java:17) ~[classes/:na]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

2023-05-23 18:10:15.951 DEBUG [,cafe29d9a35fd3bc,cafe29d9a35fd3bc] 19444 --- [ctor-http-nio-4] org.springframework.web.HttpLogging : [d738a83e-6] Writing "
2023-05-23 18:10:15.952 TRACE [,cafe29d9a35fd3bc,cafe29d9a35fd3bc] 19444 --- [ctor-http-nio-4] o.s.w.s.adapter.HttpWebHandlerAdapter : [d738a83e-6] Completed 500 INTERNAL_SERVER_ERROR, headers={masked}_

Clearly, here as well, the traceId and spanId were propagated to OnError condition, while using spring-cloud-starter-sleuth:3.1.5

The concern here is, the feature which was working in version 3.1.5 is broken in subsequent versions (3.1.6 and above).

Even in the git code shared by @oemergenc 's still throws error when sleuth is pointed to version 3.1.6 and above.

Appreciate, if you look into the issue and let us know if you require any further details.

@oemergenc
Copy link
Author

Hi,

indeed it seems the problem is still there. I have updated my Repo to include a test scenario where an error occurs in a rest endpoint. While the test a running fine on the main branch, both tests are failing on the broken branch. The only difference of the branches is the version of this dependency:

main branch
implementation(platform("org.springframework.cloud:spring-cloud-dependencies:2021.0.5"))

broken branch
implementation(platform("org.springframework.cloud:spring-cloud-dependencies:2021.0.7"))

@sankarg931190
Copy link

Can we mark the issue as "reopened"?

@sankarg931190
Copy link

Created a new issue to track, #2292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
5 participants