Fix retry indefinitely in termination process #129

kaisecheng · 2022-02-22T17:58:17Z

The plugin is blocking the shutdown process when url point to an invalid port, which is a retryable error that retry indefinitely

This commit adds a checking that checks pipeline shutdown state to quit retry loop
This feature requires logstash-core that exposes shutdown_requested elastic/logstash#13811

Fixed: #123

This commit checks pipeline shutdown request to quit retry loop This feature requires logstash-core that exposes shutdown_requested Fixed: logstash-plugins#123

yaauie

👍🏼 to enabling graceful shutdowns in plugins.

If we are already requiring changes in LS core to make this work, I would rather we implement all of the complex bits of this functionality over in LS core and merely hook into it from this plugin in as lightweight a manner as possible. That way other plugins that need this functionality don't also have to implement the reach into the execution context.

If we were to implement LogStash::Outputs::Base#pipeline_shutdown_requested? (note the leading pipeline_ and trailing ?) in LS core, we could hook into it here with support for older LS's by defining:

  def pipeline_shutdown_requested?
    return super if defined?(super) # since LS 8.1.0
    nil # falsy, unknown
  end

We should also discuss whether 2 attempts during a shutdown is sufficient for this particular plugin, and whether that should be configurable.

lib/logstash/outputs/http.rb

kaisecheng · 2022-02-23T17:40:33Z

@yaauie thanks for the review. Moving the status check to output base class make a lot of sense, especially involving many nil checking.

We should also discuss whether 2 attempts during a shutdown is sufficient for this particular plugin, and whether that should be configurable.

I try to find a balance between a reasonable shutdown time and the number of retries. The first try is 0, so "2" actually means a total of 3 attempts, which take around 15 seconds in the connection refuse case. I take k8s terminationGracePeriodSeconds as a reference, which kill the pod in 30 seconds (default value) if it can't finish the shutdown process. Changing the attempt to "3" would be pretty close to 30 seconds.

Regarding making it configurable, user who has enabled PQ would have less concern as the events will be retired in the next start, so memory queue user may want to have config. This is a config limit to retryable error in the shutdown scenario. One concern is automatic_retries could confuse with shutdown_retries. We need a better doc to explain the difference and this particular scenario. It is a nice to have feature for memory queue user.

yaauie · 2022-02-28T22:03:43Z

@kaisecheng with the current implementation, it is going to be very difficult to ensure that we bail before some arbitrary cutoff, especially when using the default non-batched behaviour and the cleverly randomized exponential backoff. A batch with the default 125 events, for example, can take 125*timeout to burn through its first attempt of each event, and an output that is already in a retry loop before the shutdown is requested can block on Queue#pop for up to 60s before we even get an event with which to run this new logic.

I think that this PR prevents us from blocking indefinitely, which is a good starting point and worth merging. I don't see a trivial way to change the existing implementation into one that is easily interruptible for shutdowns (e.g., finish or fail within X seconds of a shutdown being requested), without re-implementing a threadsafe queue whose elements retain a quarantine timestamp and whose pop method had new behaviour that is less blocking. But alas, that is much larger in scope and would certainly add risk.

lib/logstash/outputs/http.rb

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

kaisecheng · 2022-02-28T23:33:38Z

Indeed, the duration can be affected by too many factors, the type of failure, timeout setting, size of events... and this PR is not aiming to define a shutdown process in X seconds. Mostly I am thinking of the connection refuse case when the config point to a wrong URL, how many times it should retry, how many times it has retired before a complete shutdown, and how long does the shutdown take so that user considers it is a "bug" because of too long. I think trying for two times is too short which could finish in 2 seconds, so three attempts seem to be a compromise.

…tp into fix_stall_termination # Conflicts: # CHANGELOG.md

…h-output-http into fix_stall_termination

Fix retry indefinitely in termination process

8f05f41

This commit checks pipeline shutdown request to quit retry loop This feature requires logstash-core that exposes shutdown_requested Fixed: logstash-plugins#123

roaksoax added the status:needs-review label Feb 22, 2022

kaisecheng mentioned this pull request Feb 22, 2022

expose shutdown request state of pipeline elastic/logstash#13811

Merged

5 tasks

yaauie reviewed Feb 22, 2022

View reviewed changes

lib/logstash/outputs/http.rb Outdated Show resolved Hide resolved

lib/logstash/outputs/http.rb Outdated Show resolved Hide resolved

move shutdown check to base class

79d3270

bump version

15e4c63

roaksoax assigned kaisecheng Feb 28, 2022

roaksoax added the int-shortlist label Feb 28, 2022

yaauie reviewed Feb 28, 2022

View reviewed changes

lib/logstash/outputs/http.rb Outdated Show resolved Hide resolved

Update lib/logstash/outputs/http.rb

4b20e23

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

kaisecheng added 2 commits March 2, 2022 18:11

Merge branch 'main' of github.com:logstash-plugins/logstash-output-ht…

ec58542

…tp into fix_stall_termination # Conflicts: # CHANGELOG.md

Merge branch 'fix_stall_termination' of github.com:kaisecheng/logstas…

a6e2e39

…h-output-http into fix_stall_termination

kaisecheng requested a review from yaauie March 3, 2022 14:53

format

fc43ab8

kaisecheng force-pushed the fix_stall_termination branch from b78d885 to fc43ab8 Compare March 4, 2022 11:56

yaauie approved these changes Mar 4, 2022

View reviewed changes

roaksoax added status:approved and removed status:needs-review labels Mar 4, 2022

kaisecheng merged commit 6dca44e into logstash-plugins:main Mar 4, 2022

roaksoax removed the status:approved label Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix retry indefinitely in termination process #129

Fix retry indefinitely in termination process #129

kaisecheng commented Feb 22, 2022 •

edited

Loading

yaauie left a comment

kaisecheng commented Feb 23, 2022

yaauie commented Feb 28, 2022

kaisecheng commented Feb 28, 2022

Fix retry indefinitely in termination process #129

Fix retry indefinitely in termination process #129

Conversation

kaisecheng commented Feb 22, 2022 • edited Loading

yaauie left a comment

Choose a reason for hiding this comment

kaisecheng commented Feb 23, 2022

yaauie commented Feb 28, 2022

kaisecheng commented Feb 28, 2022

kaisecheng commented Feb 22, 2022 •

edited

Loading