Block excessive gracefulReload requests #3396

kenhys · 2021-05-26T07:15:38Z

Which issue(s) this PR fixes:

What this PR does / why we need it:

In the previous versions, /api/config.gracefulReload call doesn't
restrict excessive API calls. It causes the following error when
already gracefulReload is executing.

Worker 0 finished unexpectedly with signal SIGKILL

This commit mitigates such a situation by limit a API call.
(it gives an interval in 60 seconds by default)

Docs Changes:

https://docs.fluentd.org/deployment/system-config
@blocking_reload_interval should be added in another PR.

Release Note:

N/A

NOTE: Ideally it should wait and detects graceful reload finish, but
there is no easy way to synchronize internal state between
ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config).

kenhys · 2021-05-26T08:24:22Z

rerun windows CI again.

lib/fluent/supervisor.rb

kenhys · 2021-05-27T07:45:45Z

TODO: @system_config.blocking_reload_interval returns default value instead of customized one. 🤔

kenhys · 2021-05-28T00:58:26Z

This PR should be ready for v1.13.1? or later release. (not for v1.13.0)

cosmo0920 · 2021-05-28T01:36:19Z

@kenhys All of tests which are running on GitHub Actions is failed. Could you check them?

kenhys · 2021-05-28T04:25:11Z

The problem of the current approach to customize blocking_read_interval:

config[:blocking_read_interval] is nil in Fluent::ServerModule#before_run
because of the above, @block_reload_until is not overriden in Fluent::ServerModule#run

kenhys · 2021-06-01T09:17:09Z

Broken test cases were fixed.

kenhys · 2021-06-01T09:21:36Z

Rebased with recent master.

lib/fluent/supervisor.rb

kenhys · 2021-06-02T09:29:32Z

Removed if @blocking_reload_interval check, and tweak test case a bit.

lib/fluent/supervisor.rb

kenhys · 2021-06-03T02:59:22Z

Changed to use @blocking_reload_interval instead of config[:blocking_reload_interval] and removed a redundant conditional assignment.

ashie · 2021-06-07T02:12:42Z

It can block excessive RPC call but can't block excessive SIGUSR2 call.
Could you consider also SIGUSR2 case?

kenhys · 2021-06-07T03:03:29Z

I'll fix it, too.

In the previous versions, RPC::Server#mount_proc assumes that the content of API response is either {'ok':true} or {'ok':false}. And response must be instance of HTTPResponse or similar one .(which respond to #body) This commit accept more simple hash object to customize API response. Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>

kenhys · 2021-06-07T05:26:01Z

SIGUSR2 is also supported.

ashie · 2021-06-07T06:11:02Z

CI failed on Windows:

2021-06-07T05:42:25.0561362Z Error: test_blocking_signal_handler(SupervisorTest::gracefulReload): NoMethodError: undefined method `stop_rpc_server' for nil:NilClass
2021-06-07T05:42:25.0565425Z D:/a/fluentd/fluentd/test/test_supervisor.rb:299:in `teardown'

Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>

The reload is kicked from not only RPC call, but also SIGUSR2. Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>

Cryptophobia · 2021-06-08T18:06:30Z

When gracefulReload is triggered, a new thread for reloading is created at both supervisor process and worker process.

If the threads run sequentially, would that be enough to keep consistency, no matter how many times gracefulReload is called? This would the similar as implementing a queue for the threads?

I have no opinions against trying to solve this as long as there is a flag to keep current behavior the same because in projects like KFO, we do not wait or check the status for a return from gracefulReload endpoint when issuing multiple requests sequentially. We hope that fluentd will be able to handle this logic of what to do, in situations when configuration has changed in a short amount of time.

The other important consideration is, if a gracefulReload worker thread fails, how do we make sure that unfinished configuration change is still gracefullyReloaded? If it only fails once and blocks, the subsequent configuration change will not be able to gracefulReload, and that would be bad design, because then subsequent calls to an API (gracefulReload in this case) depend on the processing of previous API calls. Should the API not be independent and the processing of the config reloading need to be idempotent?

By idempotent, I just mean if we send 2 calls to gracefulReload, fluentd should just finish processing both sequentially. Any blocking would change behavior of the API and configuration changes could be missed, also API calls ignored.

kenhys · 2021-06-09T04:41:31Z

Thanks @alex-vmw, @Cryptophobia, @ashie
Current approach makes it problematic, I withdraw PR.

kenhys marked this pull request as ready for review May 26, 2021 08:53

ashie requested changes May 27, 2021

View reviewed changes

lib/fluent/supervisor.rb Outdated Show resolved Hide resolved

lib/fluent/supervisor.rb Outdated Show resolved Hide resolved

kenhys force-pushed the guard-reload branch from cc3c413 to 3235e34 Compare May 27, 2021 07:43

kenhys changed the title ~~Block excessive gracefulReload requests~~ WIP: Block excessive gracefulReload requests May 27, 2021

kenhys mentioned this pull request May 28, 2021

Support pretty print Fluent::Config::Section for debugging #3398

Merged

kenhys force-pushed the guard-reload branch from 3235e34 to 5c32c6e Compare May 28, 2021 05:50

kenhys changed the title ~~WIP: Block excessive gracefulReload requests~~ Block excessive gracefulReload requests Jun 1, 2021

kenhys force-pushed the guard-reload branch from 5c32c6e to b05698f Compare June 1, 2021 09:19

kenhys requested a review from ashie June 2, 2021 02:32

ashie reviewed Jun 2, 2021

View reviewed changes

lib/fluent/supervisor.rb Outdated Show resolved Hide resolved

kenhys force-pushed the guard-reload branch 3 times, most recently from ad6e5fd to 98e4ce9 Compare June 2, 2021 07:18

ashie reviewed Jun 3, 2021

View reviewed changes

lib/fluent/supervisor.rb Outdated Show resolved Hide resolved

kenhys force-pushed the guard-reload branch from 98e4ce9 to d500c23 Compare June 3, 2021 02:57

kenhys force-pushed the guard-reload branch from d500c23 to 64dbd22 Compare June 7, 2021 05:25

kenhys force-pushed the guard-reload branch from 64dbd22 to 7fc202f Compare June 7, 2021 09:48

Block excessive SIGUSR2, too

dac2d45

The reload is kicked from not only RPC call, but also SIGUSR2. Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>

kenhys force-pushed the guard-reload branch from 7fc202f to dac2d45 Compare June 8, 2021 01:23

kenhys mentioned this pull request Jun 8, 2021

1.0: add blocking_reload_interval explanation fluent/fluentd-docs-gitbook#328

Closed

kenhys mentioned this pull request Jun 9, 2021

Consistent gracefulReload RPC #3415

Open

kenhys closed this Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block excessive gracefulReload requests #3396

Block excessive gracefulReload requests #3396

kenhys commented May 26, 2021 •

edited

kenhys commented May 26, 2021

kenhys commented May 27, 2021 •

edited

kenhys commented May 28, 2021 •

edited

cosmo0920 commented May 28, 2021

kenhys commented May 28, 2021

kenhys commented Jun 1, 2021

kenhys commented Jun 1, 2021

kenhys commented Jun 2, 2021

kenhys commented Jun 3, 2021

ashie commented Jun 7, 2021

kenhys commented Jun 7, 2021

kenhys commented Jun 7, 2021

ashie commented Jun 7, 2021

Cryptophobia commented Jun 8, 2021 •

edited

kenhys commented Jun 9, 2021

Block excessive gracefulReload requests #3396

Block excessive gracefulReload requests #3396

Conversation

kenhys commented May 26, 2021 • edited

kenhys commented May 26, 2021

kenhys commented May 27, 2021 • edited

kenhys commented May 28, 2021 • edited

cosmo0920 commented May 28, 2021

kenhys commented May 28, 2021

kenhys commented Jun 1, 2021

kenhys commented Jun 1, 2021

kenhys commented Jun 2, 2021

kenhys commented Jun 3, 2021

ashie commented Jun 7, 2021

kenhys commented Jun 7, 2021

kenhys commented Jun 7, 2021

ashie commented Jun 7, 2021

Cryptophobia commented Jun 8, 2021 • edited

kenhys commented Jun 9, 2021

kenhys commented May 26, 2021 •

edited

kenhys commented May 27, 2021 •

edited

kenhys commented May 28, 2021 •

edited

Cryptophobia commented Jun 8, 2021 •

edited