Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DelegatingAsyncDisruptorAppender more resilient to exceptions thrown by child appenders #456

Closed
brenuart opened this issue Dec 28, 2020 · 6 comments
Milestone

Comments

@brenuart
Copy link
Collaborator

The EventHandler used by the DelegatingAsyncDisruptorAppender does the following:

  1. Invoke AppenderAttachable#appendLoopOnAppenders(event). This method loops through the attached appenders and calls Appender#doAppend(event) on each. If an appender throws an exception, the process is aborted and subsequent appenders are not called.
    Is this a problem? I mean, wouldn't it be safer to try/catch exceptions and give a chance to all appenders to process the event? Looking at Logback's AsyncAppenderBase I noticed they behave the same and don't care about exceptions... Could be the rational is appenders are not supposed to throw exceptions?

  2. Flush the output stream of attached OutputStreamAppenders. Same remark here: if an exception is thrown when flushing the first output stream, the remaining appenders won't be flushed at all.
    The case may be slightly different from doAppend() in that appenders like the OutputStreamAppender won't throw an exception when they have IO issues when processing the event. They log an error status instead and proceed.
    According to me, the DelegatingAsyncDisruptorAppender should do the same: wrap OutputStream#flush() calls within a try/catch and log an error status when an IOException is thrown.

What do you think?
If you agree with point (2), I can include the modifications in the PR I'm about to submit for #454.

@philsttr
Copy link
Collaborator

philsttr commented Dec 28, 2020

Sounds good to me.

One thing to consider is that if an exception occurs while appending/flushing on one event, it seems there will be a high probability that the same exception will occur while appending/flushing the next event on the next call to onEvent. This will result in an error status being logged pretty frequently. Perhaps those error status messages should be throttled (for example, log the first one, and then only log a percentage of the subsequent errors until the next success)? Or only log on state changes? What are your thoughts?

@brenuart
Copy link
Collaborator Author

brenuart commented Dec 28, 2020

Indeed... The easiest is probably to log an error status the first time flush throws an exception and be silent until it succeeds. We can do more like throttling the error status, log for how long flushing was failing, etc... However I'm not sure it is worth the effort. If flush fails, appending subsequent events are likely to fail any time soon anyway.

Looking at LogEventExceptionHandler I would say this is true for any exception thrown by an EventHandler. May be we should find a more generic solution?

On the other hand, I noticed OutputStreamAppender#subAppend(event) catches the IOException, logs an error status then stops the appender. There is no "error status" storm indeed, but nothing will go through the appender anymore (and it won't recover automatically).
Note: may be we should also test Appender#isStarted before calling flush on delegate appenders... ?

@philsttr
Copy link
Collaborator

Hmm. I'm thinking the DelegatingAsyncDisruptorAppender shouldn't make any assumptions about how the delegate appenders will behave if those delegate appenders throw exceptions. As you discovered, OutputStreamAppender will swallow the exception and stop itself if it encounters an exception. Other appenders might have different behavior, and be able to recover.

Therefore, I think the DelegatingAsyncDisruptorAppender should continue to attempt to append/flush on each event (and not check for started).

The easiest is probably to log an error status the first time flush throws an exception and be silent until it succeeds. We can do more like throttling the error status, log for how long flushing was failing, etc... However I'm not sure it is worth the effort.

Agreed. Let's keep it simple for now.

@brenuart
Copy link
Collaborator Author

Therefore, I think the DelegatingAsyncDisruptorAppender should continue to attempt to append/flush on each event (and not check for started).

Both AppenderBase and UnsynchronizedAppenderBase check whether the appender is started in their doAppend(event) method. If the appender is not started, they log an error status and return immediately without appending the event.

This behaviour is written in Logstash base classes from which most appenders are likely to inherit. Because of that there is indeed no need for the DelegatingAsyncDisruptorAppender to check for isStarted when forwarding events to the registered delegate appenders (it can safely call to doAppend on appenders - each will take the appropriate decision itself).
However, this guard is not in place for the flush of the output stream (this is not a "standard" operation exposed by any of the existing logstash interfaces or base classes).

I still believe we should implement the same logic as for doAppend(), i.e. flush the appender only when it is started. This would be consistent with how doAppend() behaves. I don't see this as "making (unreasonable) assumptions about how the appender behaves". The fact that some appenders like the FileAppender are not able to recover after they stopped themselves because of an exception is another issue and should not affect "our" decision to check for isStarted before flushing.

@brenuart
Copy link
Collaborator Author

Plz check PR #457. It addresses multiple issues at once because they all relate to the same code. Tell me what you think.

@philsttr
Copy link
Collaborator

Closed by #457

@philsttr philsttr added this to the 6.6 milestone Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@brenuart @philsttr and others