Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New SPAN created for handling on /error when request does not have span header #859

Closed
brenuart opened this issue Feb 14, 2018 · 11 comments
Closed
Labels
Milestone

Comments

@brenuart
Copy link

Uncaught exceptions are catched by Tomcat and forwarded to /error for handling by Spring's default BasicErrorController.

It appears the span logged from within the BasicErrorHandler (and anywhere in the handling chain after the TraceFilter) may not refer to the same span as from within the handler that threw the uncaught exception. It actually depends on whether the original request had span headers on it or not:

  • if span headers are present and the request is parent of an ongoing trace, span stays the same in /error
  • if no span header is present, everything appears as if two spans were created: one first the original attempt at invoking the handler, a second for the forward to /error.

This behaviour is observed with SpringCloud Edgware.SR2.
Things are different however with Finchley.M6: the same span is reported in both cases.

I have attached two sample applications to illustrate the case. They are built from Spring Initializr with Web and Sleuth features only. Both are the same except one is built on Edgware.SR2 and the other on Finchley.SR6.
The application contains a simple RestController that logs a small message and throws a RuntimeException whenever it receives a request (mapped on /**).
The context also contains a custom DefaultErrorAttributes with the same behaviour as the original except it logs a message when the errorAttributes() method is invoked. This method is invoked after Tomcat has forwarded the request to the ErrorDispatcher after reception of an uncaught exception.
The application starts on port 8080 by default.

Scenario 1
Invoke the service - an uncaught RuntimeException is thrown

curl -v http://localhost:8080

Application logs:

 INFO [-,e7f50e4c43a76f7c,e7f50e4c43a76f7c,false] 41701 --- [nio-8080-exec-2] com.example.demo.RestService             : Request received - about to throw an exception
ERROR [-,e7f50e4c43a76f7c,e7f50e4c43a76f7c,false] 41701 --- [nio-8080-exec-2] o.s.c.sleuth.instrument.web.TraceFilter  : Uncaught exception thrown

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.RuntimeException: boom
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:982) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:635) ~[tomcat-embed-core-8.5.27.jar:8.5.27]
....
ERROR [-,,,] 41701 --- [nio-8080-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: boom] with root cause

java.lang.RuntimeException: boom
	at com.example.demo.RestService.custom(RestService.java:30) ~[classes/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_151]
....
 INFO [-,045aa25bab393b07,045aa25bab393b07,false] 41701 --- [nio-8080-exec-2] com.example.demo.RestService             : getErrorAttributes() invoked for /error

The first two lines are caused by the request hitting the RestController handler. They both refer to the same Span (e7f50e4c43a76f7c).
The third line is logged by Tomcat because of the uncaught exception - it does not contain any Span (as expected, see #714).
The last line is caused by the handling on /error.
--> Logs generated by /error refer to a DIFFERENT span than when the request originally hit the rest controller.

Scenario 2
Invoke the service with Span headers:

curl -v -H "X-B3-TraceId: 97952cdf1b39993f" -H"X-B3-SpanId: 97952cdf1b39993f" http://localhost:8080

Application logs:
Note: exception logged by Tomcat is omitted for brevity.

 INFO [-,97952cdf1b39993f,97952cdf1b39993f,true] 41701 --- [nio-8080-exec-3] com.example.demo.RestService             : Request received - about to throw an exception
ERROR [-,97952cdf1b39993f,97952cdf1b39993f,true] 41701 --- [nio-8080-exec-3] o.s.c.sleuth.instrument.web.TraceFilter  : Uncaught exception thrown

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.RuntimeException: boom
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:982) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861) ~[spring-webmvc-4.3.14.RELEASE.jar:4.3.14.RELEASE]
....
 INFO [-,97952cdf1b39993f,97952cdf1b39993f,true] 41701 --- [nio-8080-exec-3] com.example.demo.RestService             : getErrorAttributes() invoked for /error

--> All logs now refer to the SAME span, including those generated by /error.

@marcingrzejszczak
Copy link
Contributor

marcingrzejszczak commented Feb 14, 2018

It's a known issue that I haven't filed. Thanks for doing that for me ;)

@marcingrzejszczak marcingrzejszczak added this to the 2.0.0.M7 milestone Feb 14, 2018
@brenuart
Copy link
Author

sample-projects.zip

@brenuart
Copy link
Author

Well... according to me, everything is fine in 2.0.0... but 1.3.x behaviour is odd (and annoying).

@marcingrzejszczak
Copy link
Contributor

Seriously? Interesting, I'll double check that then. Thanks for the sample!

@marcingrzejszczak marcingrzejszczak removed this from the 2.0.0.M7 milestone Feb 14, 2018
@brenuart
Copy link
Author

brenuart commented Feb 14, 2018

I suspect a new span is actually created for /error in both cases but they reuse the ids embedded in the request - so they look the same. However, we may have two server received events and it will look as if the server received two consecutive requests: one for the original URI, the second for /error.

@marcingrzejszczak
Copy link
Contributor

I think that the problem might be present in both cases but Brave & Zipkin combination is handled in a more intelligent way.

@brenuart
Copy link
Author

To give you some background: we have a custom handler behind /error that includes the span ids in the error response. For us it is important they are the same as from within the rest handler.

@brenuart
Copy link
Author

I wonder if the difference in behaviour comes from the following changes in TraceWebAspect:

1.3.x marks the span for close after the HandlerExceptionResolver is invoked, 2.0.x does not (afaik). Why should it be closed anyway? The handler may not have handled the exception and it will bubble up to Tomcat...

What's the purpose of instrumenting the HandlerExceptionResolver by the way - besides maybe capturing details about the exception... ?

@marcingrzejszczak
Copy link
Contributor

In 2.0.x the problem was already fixed in TraceFilter. I've forgotten to backport it to 1.3.x.

The problem wasn't with the aspect, it was in TraceFilter. I haven't checked if there's an error controller present when an exception was thrown. Cause if there is one then it means that it will be called and it will execute the logic of closing the span. In 2.0.x there's a problem with multiple tags, but I guess that's another problem to solve

@marcingrzejszczak marcingrzejszczak added this to the 1.3.3.RELEASE milestone Feb 15, 2018
@brenuart
Copy link
Author

Nice - I'm gonna test latest SNAPSHOT asap. I'll keep you informed.

Your comment let me think the TraceFilter expects Tomcat to forward the request to /error in case of uncaught exception. Correct?
As already mentioned, this is not always the case. For instance, Tomcat silently ignore uncaught org.apache.catalina.connector.ClientAbortException (subtype of IOException) exceptions: they are not logged and not forwarded to the ErrorDispatcher. Is this causing issues to the TraceFilter?

@marcingrzejszczak
Copy link
Contributor

Your comment let me think the TraceFilter expects Tomcat to forward the request to /error in case of uncaught exception. Correct?

Correct

As already mentioned, this is not always the case. For instance, Tomcat silently ignore uncaught org.apache.catalina.connector.ClientAbortException (subtype of IOException) exceptions: they are not logged and not forwarded to the ErrorDispatcher. Is this causing issues to the TraceFilter?

I haven't tested that at all so we can actually give it a go and check it out.

codefromthecrypt pushed a commit that referenced this issue May 15, 2020
`ExceptionLoggingFilter` logs "Uncaught exception thrown" to error level
when there is a synchronous exception not otherwise swallowed. This is a
cure worse than the disease. This issue disables by default and the 3.x
should end the years of problems it caused.

Fixes #1347

This was done IIUC to help @jfuerth who had log messages with trace IDs,
except the uncaught one, in #714 (Sept 2017). They were formerly told to
use `ExceptionHandler` and noted that didn't work in practice as it
subverted their status code logic.

Concretely, 4203566 "solved" the issue
by having sleuth log all uncaught exceptions with the message
"Uncaught exception thrown". This code was initially embedded in
sleuth's and later pulled out to `ExceptionLoggingFilter` in 2.0.

A few months later @brenuart noticed this "solution" had problems. For
example, it caused the same exception to be logged twice. Bertrand also
mentioned some faults in the implementation including edge cases like
`ClientAbortException` not addressed by this. (In fact, there are even
more problems: the implementation assumes async isn't in use!)
Bertands concerns about this having surprising logging effects were
dismissed as a non-issue. Bertrand raised #859 about glitches this
caused, the status_code related ones being if statemented around, and
it went quiet a while.

A month later @oburgosm opened #902 (currently 5 thumbs up) asking to
remove the "Uncaught exception thrown" or at least set it to debug.
The response was if they don't like it, control their own tracing
filter and the issue was closed.

A month later, @nickcodefresh opened #966 confused about "Uncaught
exception thrown" messages raised by sleuth, incidentally the
`ClientAbortException` mentioned by @brenuart before. Because the
logging came from Sleuth, it appeared they were zipkin errors, and was
explained they weren't.

A year later, @T3rm1 opened #1347 to remove the filter, or at least
disable it by default. The response was first to set the Logger to OFF.
A few days later the issue was closed as the submitter was told they
were the first person to complain about this. 5 months later, another
user rallied for support.

In April 2020, #902 was re-touched by @kosciej who asked to please
reconsider, mentioning it was not typical even in Spring to do this,
for example 'org.springframework.web.filter' package. The impact of
needing to specifically change apps to OFF the sleuth logger seemed
harsh. The suggestion was instead of setting log config, to set a
property instead `spring.sleuth.web.exception-logging-filter-enabled`,
to `false` as that could also accomplish the goal of stopping
"Uncaught exception thrown".

A month later (yesterday) @MrHurniak commented again on #1347 with the
usual complaint that it is redundant and confusing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants