New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supervisor could die on some signals #1723

Closed
abicky opened this Issue Oct 22, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@abicky
Contributor

abicky commented Oct 22, 2017

I found supervisor died on signals if its worker had died unexpectedly before, for example, killed by OOM Killer, SEGV and so on.
It seems to be because config[:worker_pid] includes the pid of the dead worker.

Steps to reproduce

  1. Run fluentd
% docker run -it --rm --name fluentd fluent/fluentd:v0.14.21
  1. Kill the worker
% docker exec -it fluentd sh
/ # ps
PID   USER     TIME   COMMAND
    1 root       0:00 {entrypoint.sh} /usr/bin/dumb-init /bin/sh /bin/entrypoint.sh /bin/sh -c exec fluentd -c /fluentd/etc/${FLUENTD_CONF} -p /fluentd/plugins $FLUENTD_OPT
    5 fluent     0:00 {fluentd} /usr/bin/ruby /usr/bin/fluentd -c /fluentd/etc/fluent.conf -p /fluentd/plugins
   15 fluent     0:00 /usr/bin/ruby -Eascii-8bit:ascii-8bit /usr/bin/fluentd -c /fluentd/etc/fluent.conf -p /fluentd/plugins --under-supervisor
   23 root       0:00 sh
   27 root       0:00 ps
/ # kill -KILL 15
  1. Send a signal to the supervisor
/ # kill -USR1 1
  1. The supervisor dies
Unexpected error No such process
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:168:in `kill'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:168:in `block in supervisor_sigusr1_handler'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:167:in `each'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:167:in `supervisor_sigusr1_handler'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:137:in `block in install_supervisor_signal_handlers'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/process_manager.rb:252:in `select'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/process_manager.rb:252:in `tick'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/multi_spawn_server.rb:91:in `wait_tick'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/multi_worker_server.rb:60:in `run'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/multi_spawn_server.rb:57:in `run'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/server.rb:123:in `main'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:119:in `main'
  /usr/lib/ruby/gems/2.3.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:68:in `run'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:606:in `supervise'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/supervisor.rb:476:in `run_supervisor'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/command/fluentd.rb:310:in `<top (required)>'
  /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.21/bin/fluentd:5:in `<top (required)>'
  /usr/bin/fluentd:22:in `load'
  /usr/bin/fluentd:22:in `<main>'
@repeatedly

This comment has been minimized.

Member

repeatedly commented Oct 31, 2017

This problem happens with restart/reload signals so ignoring this error should fix the problem.
I will it soon.

@repeatedly repeatedly self-assigned this Oct 31, 2017

@repeatedly repeatedly closed this in f51b609 Nov 9, 2017

repeatedly added a commit that referenced this issue Nov 9, 2017

Merge pull request #1739 from fluent/fix-worker_pid-handling
Fix worker_pid handling by replacing old pid with new one. fix #1723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment