-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block excessive gracefulReload requests #3396
Conversation
rerun windows CI again. |
TODO: |
This PR should be ready for v1.13.1? or later release. (not for v1.13.0) |
@kenhys All of tests which are running on GitHub Actions is failed. Could you check them? |
The problem of the current approach to customize
|
Broken test cases were fixed. |
Rebased with recent master. |
ad6e5fd
to
98e4ce9
Compare
Removed if |
Changed to use |
It can block excessive RPC call but can't block excessive |
I'll fix it, too. |
In the previous versions, RPC::Server#mount_proc assumes that the content of API response is either {'ok':true} or {'ok':false}. And response must be instance of HTTPResponse or similar one .(which respond to #body) This commit accept more simple hash object to customize API response. Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
SIGUSR2 is also supported. |
CI failed on Windows:
|
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
The reload is kicked from not only RPC call, but also SIGUSR2. Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
If the threads run sequentially, would that be enough to keep consistency, no matter how many times gracefulReload is called? This would the similar as implementing a queue for the threads? I have no opinions against trying to solve this as long as there is a flag to keep current behavior the same because in projects like KFO, we do not wait or check the status for a return from gracefulReload endpoint when issuing multiple requests sequentially. We hope that fluentd will be able to handle this logic of what to do, in situations when configuration has changed in a short amount of time. The other important consideration is, if a gracefulReload worker thread fails, how do we make sure that unfinished configuration change is still gracefullyReloaded? If it only fails once and blocks, the subsequent configuration change will not be able to gracefulReload, and that would be bad design, because then subsequent calls to an API (gracefulReload in this case) depend on the processing of previous API calls. Should the API not be independent and the processing of the config reloading need to be idempotent? By idempotent, I just mean if we send 2 calls to gracefulReload, fluentd should just finish processing both sequentially. Any blocking would change behavior of the API and configuration changes could be missed, also API calls ignored. |
Thanks @alex-vmw, @Cryptophobia, @ashie |
Which issue(s) this PR fixes:
Fixes #3341
What this PR does / why we need it:
In the previous versions, /api/config.gracefulReload call doesn't
restrict excessive API calls. It causes the following error when
already gracefulReload is executing.
Worker 0 finished unexpectedly with signal SIGKILL
This commit mitigates such a situation by limit a API call.
(it gives an interval in 60 seconds by default)
Docs Changes:
https://docs.fluentd.org/deployment/system-config
@blocking_reload_interval
should be added in another PR.Release Note:
N/A
NOTE: Ideally it should wait and detects graceful reload finish, but
there is no easy way to synchronize internal state between
ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config).