-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker-0 is getting SIGKILLed instead of nicely reloading upon issuing gracefulReload #3341
Comments
NOTE: It seems that |
It may be better to guard access to gracefulReload until reload_config finish. |
mount_proc need to modify supporting custom response? |
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by limit a API call. (it gives an interval in 60 seconds) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by limit a API call. (it gives an interval in 60 seconds) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by limit a API call. (it gives an interval in 60 seconds) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by limit a API call. (it gives an interval in 60 seconds) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by limit a API call. (it gives an interval in 60 seconds) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
AFAIK there is only one case that fluentd itself sends https://github.com/treasure-data/serverengine/blob/master/lib/serverengine/process_manager.rb#L80 graceful_kill_timeout: 600, # escalation
@graceful_kill_start_time &&
@graceful_kill_timeout >= 0 &&
@graceful_kill_start_time < now - @graceful_kill_timeout But it wont' be triggered in this case, because this case is graceful reload, not graceful stop. In addition, since fluentd doesn't enable serverengine's heartbeat feature, it won't be triggered too: Because I don't know other cases that fluentd triggers |
And also the possibility is a bit of smaller, workers could be SIGKILLed by SIGSEGV. (I think that this case is not caused by SIGSEGV....) |
I got the following output by causing intentional SEGV:
So that we can exculde this case. |
I checked the implementation again but I can't find a case that it sends As I mentioned at #3413 (comment), gracefulReload needs more memory, it might be the cause of this issue.
|
In k8s environment, users sometimes set up CPU and memory restrictions for preventing eating up CPU and memory resources: When users set up CPU and memory restrictions for Pods/Daemonsets, they get frequently OOM killer issues via SIGSEGV. |
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
Describe the bug
We were performing tests that were issuing multiple /api/config.gracefulReload calls to fluentd. Sometimes after issuing a /api/config.gracefulReload we see that worker-0 is getting SIGKILLed instead of nicely reloading.
When the issue happens we see the below message in the logs:
When the issue does NOT happen we see the below message in the logs instead:
To Reproduce
Reload fluentd config multiple times via /api/config.gracefulReload call and look for below message in the log:
Expected behavior
Fluentd worker-0 should always nicely reload when /api/config.gracefulReload is issued.
Your Environment
Fluentd or td-agent version: fluentd
1.12.2
we also tested with1.9.1
and1.11.2
and saw same issue in those as well.Operating system:
Ubuntu 20.04.1 LTS
Kernel version:
5.4.0-42-generic
Your Configuration
Your Error Log
Additional context
The text was updated successfully, but these errors were encountered: