-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix retry indefinitely in termination process #129
Fix retry indefinitely in termination process #129
Conversation
This commit checks pipeline shutdown request to quit retry loop This feature requires logstash-core that exposes shutdown_requested Fixed: logstash-plugins#123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 to enabling graceful shutdowns in plugins.
If we are already requiring changes in LS core to make this work, I would rather we implement all of the complex bits of this functionality over in LS core and merely hook into it from this plugin in as lightweight a manner as possible. That way other plugins that need this functionality don't also have to implement the reach into the execution context.
If we were to implement LogStash::Outputs::Base#pipeline_shutdown_requested?
(note the leading pipeline_
and trailing ?
) in LS core, we could hook into it here with support for older LS's by defining:
def pipeline_shutdown_requested?
return super if defined?(super) # since LS 8.1.0
nil # falsy, unknown
end
We should also discuss whether 2 attempts during a shutdown is sufficient for this particular plugin, and whether that should be configurable.
@yaauie thanks for the review. Moving the status check to output base class make a lot of sense, especially involving many nil checking.
I try to find a balance between a reasonable shutdown time and the number of retries. The first try is 0, so "2" actually means a total of 3 attempts, which take around 15 seconds in the connection refuse case. I take k8s Regarding making it configurable, user who has enabled PQ would have less concern as the events will be retired in the next start, so memory queue user may want to have config. This is a config limit to retryable error in the shutdown scenario. One concern is |
@kaisecheng with the current implementation, it is going to be very difficult to ensure that we bail before some arbitrary cutoff, especially when using the default non-batched behaviour and the cleverly randomized exponential backoff. A batch with the default 125 events, for example, can take 125*timeout to burn through its first attempt of each event, and an output that is already in a retry loop before the shutdown is requested can block on I think that this PR prevents us from blocking indefinitely, which is a good starting point and worth merging. I don't see a trivial way to change the existing implementation into one that is easily interruptible for shutdowns (e.g., finish or fail within X seconds of a shutdown being requested), without re-implementing a threadsafe queue whose elements retain a quarantine timestamp and whose pop method had new behaviour that is less blocking. But alas, that is much larger in scope and would certainly add risk. |
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Indeed, the duration can be affected by too many factors, the type of failure, timeout setting, size of events... and this PR is not aiming to define a shutdown process in X seconds. Mostly I am thinking of the connection refuse case when the config point to a wrong URL, how many times it should retry, how many times it has retired before a complete shutdown, and how long does the shutdown take so that user considers it is a "bug" because of too long. I think trying for two times is too short which could finish in 2 seconds, so three attempts seem to be a compromise. |
…tp into fix_stall_termination # Conflicts: # CHANGELOG.md
…h-output-http into fix_stall_termination
b78d885
to
fc43ab8
Compare
The plugin is blocking the shutdown process when
url
point to an invalid port, which is a retryable error that retry indefinitelyThis commit adds a checking that checks pipeline shutdown state to quit retry loop
This feature requires logstash-core that exposes shutdown_requested elastic/logstash#13811
Fixed: #123