Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syswrite': Broken pipe (Errno::EPIPE) on unicorn (not continues or reproducible) #49

Closed
innovia opened this issue Mar 25, 2014 · 6 comments

Comments

@innovia
Copy link

innovia commented Mar 25, 2014

We get every now and then this:

/unicorn-4.6.2/lib/unicorn/http_server.rb:263:in `syswrite': Broken pipe (Errno::EPIPE)

and unicorn fails to start

doing eye stop unicorn
and then eye start unicorn fixes it

here's the info

RUBY = '/usr/bin/ruby'

rails_env = 'production'

Eye.application('rails_unicorn') do
  process('unicorn') do
    working_dir '/var/www/projects/current'

    if File.exist? File.join(working_dir, 'Gemfile')
      clear_bundler_env
      env 'BUNDLE_GEMFILE' => File.join(working_dir, 'Gemfile')
    end

    env "RAILS_ENV" => rails_env
    # unicorn requires to be `ruby` in path (for soft restart)
    env "PATH" => "#{File.dirname(RUBY)}:#{ENV['PATH']}"

    pid_file 'tmp/pids/unicorn.pid'
    stdall 'log/eye-unicorn.log'

    start_command  "/usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/#{rails_env}.rb -E #{rails_env} -D"
    stop_command  'kill -QUIT {PID}'
    restart_command  'kill -USR2 {PID}'

    # stop signals: http://unicorn.bogomips.org/SIGNALS.html
    stop_signals [:TERM, 10.seconds]

    start_timeout 30.seconds
    restart_grace 30.seconds

    monitor_children do
      stop_command "kill -QUIT {PID}"
      check :cpu, :every => 30, :below => 75, :times => 3
      check :memory, :every => 30, :below => 500.megabytes, :times => [3,5]
    end
  end
end

user is deploy and permissions are ok to unicorn socket

this process worked with bluepill without this error message
any idea?

@kostya
Copy link
Owner

kostya commented Mar 25, 2014

I am using eye with unicorn, all works just fine.
Try to execute all commands without eye. (cd working_dir && RAILS_ENV=... start_command)
Show unicorn config and version.

@innovia
Copy link
Author

innovia commented Mar 31, 2014

the commands works fine without eye

/var/www/projects/current/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:262:in syswrite': Broken pipe (Errno::EPIPE) from /var/www/projects/current/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/lib/unicorn/http_server.rb:262:injoin'
from /var/www/projects/gettaxi/current/vendor/bundle/ruby/1.9.1/gems/unicorn-4.6.2/bin/unicorn_rails:209:in <top (required)>' from /var/www/projects/current/vendor/bundle/ruby/1.9.1/bin/unicorn_rails:23:inload'
from /var/www/projects/current/vendor/bundle/ruby/1.9.1/bin/unicorn_rails:23:in `

'

we are using it in production, and we need to stablize it

@innovia
Copy link
Author

innovia commented Mar 31, 2014

i tried that too

what i did is a hack
worth to say the version of unicorn is 4.6.2 when the latest is 4.8.2 so im about to try that too

the hack i did

around line 262 i added rescue (in the unicorn gem)

proc_name 'master'
    logger.info "master process ready" # test_exec.rb relies on this message
    logger.info "@ready_pipe: #{@ready_pipe.inspect}"
    begin
    if @ready_pipe
      @ready_pipe.syswrite($$.to_s)
      @ready_pipe = @ready_pipe.close rescue nil
    end
    rescue Exception => e
      logger.info("exception: #{e}")
    end

unicorn goes past that exception
and i can see from the log:

I, [2014-03-31T09:15:11.706748 #12984]  INFO -- : unlinking existing socket=/var/run/unicorn/unicorn.sock
I, [2014-03-31T09:15:11.706923 #12984]  INFO -- : listening on addr=/var/run/unicorn/unicorn.sock fd=16
I, [2014-03-31T09:15:11.871138 #13191]  INFO -- : worker=0 ready
I, [2014-03-31T09:15:11.958364 #13195]  INFO -- : worker=1 ready
I, [2014-03-31T09:15:12.074725 #12984]  INFO -- : master process ready
I, [2014-03-31T09:15:12.075097 #12984]  INFO -- : @ready_pipe: #<IO:fd 8>

looks like its looking at the wrong socket file? the file descriptors are not the same?

i tried with daemonize true and without and run the command manually and it works 100% of the time

we installed it on all our production servers and some are acting up

@kostya
Copy link
Owner

kostya commented Mar 31, 2014

i am successfully using eye with unicorn 4.8.2 (i am not running eye with bundler, system wide setup)
your issue seems similar to https://github.com/gitlabhq/gitlabhq/issues/5945 (no idea how it related to eye).

my config is:

def unicorn_watch(proxy)
  proxy.process "unicorn" do
    pid_file "tmp/pids/unicorn.pid"
    start_command "./bin/unicorn -Dc ./config/unicorn.rb -E #{RAILS_ENV}"
    stop_command "kill -TERM {{PID}}"
    restart_command "kill -USR2 {{PID}}"
    stdall "log/unicorn.log"

    start_timeout 50.seconds
    stop_grace 5.seconds
    restart_grace 10.seconds

    checks :memory, :every => 30, :below => 250.megabytes, :times => [3,5]

    monitor_children do
      stop_command "kill -QUIT {{PID}}"
      checks :cpu, :every => 30, :below => 80, :times => 3
      checks :memory, :every => 30, :below => 250.megabytes, :times => [3,5]
    end
  end
end

RAILS_ENV = 'production'
Eye.app 'xxxx' do
  working_dir '/xxxx/current'
  env "RAILS_ENV" => RAILS_ENV, "RBENV_ROOT" => "/usr/local/rbenv", "PATH" => "/usr/local/rbenv/shims:/usr/local/rbenv/bin:#{ENV['PATH']}"
  unicorn_watch(self)
end
rails_env = ENV['RAILS_ENV'] || 'production'
worker_processes (rails_env == 'production') ? 24 : 1
working_directory "/xxxx/current"
shared_directory = "/xxxx/shared"
listen "#{shared_directory}/sockets/unicorn.sock", :backlog => 64
timeout 120
pid "#{shared_directory}/pids/unicorn.pid"
stderr_path "#{shared_directory}/log/unicorn.stderr.log"
stdout_path "#{shared_directory}/log/unicorn.stdout.log"
preload_app true
GC.respond_to?(:copy_on_write_friendly=) and
  GC.copy_on_write_friendly = true
before_exec do |server|
  ENV["BUNDLE_GEMFILE"] = "/xxxx/current/Gemfile"
end

before_fork do |server, worker|
  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.connection.disconnect!

  old_pid = "#{server.config[:pid]}.oldbin"
  if old_pid != server.pid
    begin
      sig = (worker.nr + 1) >= server.worker_processes ? :QUIT : :TTOU
      Process.kill(sig, File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
    end
  end
end

after_fork do |server, worker|
  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.establish_connection
  Rails.cache.reconnect
end

@innovia
Copy link
Author

innovia commented Mar 31, 2014

Well,

it looks like Eye does not run the start command with the users env

if we append sudo -u deploy to the start command it works fine

start_command  "sudo -u deploy RAILS_ENV=stage /usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/#{rails_env}.rb -E #{rails_env} -D"

Why does eye not inherit the env vars

here's the trace
before the change:

31.03.2014 12:43:40 ERROR -- [rails_unicorn:unicorn] process <> failed to start (:pid_not_found)
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] switch :crashed [:starting => :down] (reason: crashed)
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] schedule :check_crash (reason: crashed)
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] <= restore
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] => restore  (reason: crashed)
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] pid_file not found, starting...
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] switch :starting [:down => :starting] (reason: crashed)
31.03.2014 12:43:40 INFO  -- [rails_unicorn:unicorn] executing: `/usr/local/bin/bundle exec RAILS_ENV=stage unicorn_rails -c config/unicorn/stage.rb -E stage -D` with start_timeout: 30.0s, start_grace: 2.5s, env: {"GEM_PATH"=>nil, "GEM_HOME"=>nil, "RUBYOPT"=>nil, "BUNDLE_BIN_PATH"=>nil, "BUNDLE_GEMFILE"=>"/var/www/projects/current/Gemfile", "RAILS_ENV"=>"stage", "PATH"=>"/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games"}, working_dir: /var/www/projects/current
31.03.2014 12:43:41 INFO  -- [rails_unicorn:unicorn] sleeping for :start_grace 2.5
31.03.2014 12:43:41 INFO  -- [Eye] client command: logger_dev  (0.010937986s)

after the change

31.03.2014 12:50:42 INFO  -- [rails_unicorn:__default__] => update_config {:name=>"__default__", :application=>"rails_unicorn", :processes=>{"unicorn"=>{:name=>"unicorn", :application=>"rails_unicorn", :group=>"__default__", :working_dir=>"/var/www/projects/current", :environment=>{"GEM_PATH"=>nil, "GEM_HOME"=>nil, "RUBYOPT"=>nil, "BUNDLE_BIN_PATH"=>nil, "BUNDLE_GEMFILE"=>"/var/www/projects/current/Gemfile", "RAILS_ENV"=>"stage", "PATH"=>"/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games"}, :pid_file=>"tmp/pids/unicorn.pid", :stdall=>"log/eye-unicorn.log", :stdout=>"log/eye-unicorn.log", :stderr=>"log/eye-unicorn.log", :start_command=>"sudo -u deploy RAILS_ENV=stage /usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/stage.rb -E stage -D", :stop_command=>"kill -QUIT {PID}", :restart_command=>"kill -USR2 {PID}", :stop_signals=>[:TERM, 10], :start_timeout=>30, :restart_grace=>30, :monitor_children=>{:stop_command=>"kill -QUIT {PID}", :checks=>{:cpu=>{:every=>30, :below=>75, :times=>3, :type=>:cpu}, :memory=>{:every=>30, :below=>524288000, :times=>[3, 5], :type=>:memory}}}}}} (reason: load config by user)
31.03.2014 12:50:42 INFO  -- [rails_unicorn:__default__] <= update_config
31.03.2014 12:50:42 INFO  -- [rails_unicorn:unicorn] schedule :update_config (reason: load config by user)
31.03.2014 12:50:42 INFO  -- [rails_unicorn:unicorn] => update_config {:name=>"unicorn", :application=>"rails_unicorn", :group=>"__default__", :working_dir=>"/var/www/projects/current", :environment=>{"GEM_PATH"=>nil, "GEM_HOME"=>nil, "RUBYOPT"=>nil, "BUNDLE_BIN_PATH"=>nil, "BUNDLE_GEMFILE"=>"/var/www/projects/current/Gemfile", "RAILS_ENV"=>"stage", "PATH"=>"/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games"}, :pid_file=>"tmp/pids/unicorn.pid", :stdall=>"log/eye-unicorn.log", :stdout=>"log/eye-unicorn.log", :stderr=>"log/eye-unicorn.log", :start_command=>"sudo -u deploy RAILS_ENV=stage /usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/stage.rb -E stage -D", :stop_command=>"kill -QUIT {PID}", :restart_command=>"kill -USR2 {PID}", :stop_signals=>[:TERM, 10], :start_timeout=>30, :restart_grace=>30, :monitor_children=>{:stop_command=>"kill -QUIT {PID}", :checks=>{:cpu=>{:every=>30, :below=>75, :times=>3, :type=>:cpu}, :memory=>{:every=>30, :below=>524288000, :times=>[3, 5], :type=>:memory}}}} (reason: load config by user)
31.03.2014 12:50:42 INFO  -- [rails_unicorn:unicorn] <= update_config
31.03.2014 12:50:42 INFO  -- [Eye] loaded: ["/etc/eye/unicorn.eye"], selfpid <6401>
31.03.2014 12:50:42 INFO  -- [Eye] client command: load /etc/eye/unicorn.eye (0.016453655s)
31.03.2014 12:50:48 INFO  -- [rails_unicorn:unicorn] schedule :start (reason: start by user)
31.03.2014 12:50:48 INFO  -- [rails_unicorn:unicorn] => start  (reason: start by user)
31.03.2014 12:50:48 INFO  -- [rails_unicorn:unicorn] pid_file not found, starting...
31.03.2014 12:50:48 INFO  -- [rails_unicorn:unicorn] switch :starting [:unmonitored => :starting] (reason: start by user)
31.03.2014 12:50:48 INFO  -- [rails_unicorn:unicorn] executing: `sudo -u deploy RAILS_ENV=stage /usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/stage.rb -E stage -D` with start_timeout: 30.0s, start_grace: 2.5s, env: {"GEM_PATH"=>nil, "GEM_HOME"=>nil, "RUBYOPT"=>nil, "BUNDLE_BIN_PATH"=>nil, "BUNDLE_GEMFILE"=>"/var/www/projects/current/Gemfile", "RAILS_ENV"=>"stage", "PATH"=>"/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games"}, working_dir: /var/www/projects/current
31.03.2014 12:50:48 INFO  -- [Eye] client command: start unicorn (0.033338443s)
31.03.2014 12:51:18 WARN  -- [Eye::System] [unicorn] sending :KILL signal to <8964> due to timeout (30s)
31.03.2014 12:51:19 ERROR -- [rails_unicorn:unicorn] execution failed with #<Timeout::Error: execution expired>; try increasing the start_timeout value (the current value of 30s seems too short)
31.03.2014 12:51:19 ERROR -- [rails_unicorn:unicorn] process <> failed to start ("#<Timeout::Error: execution expired>")
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] switch :crashed [:starting => :down] (reason: start by user)
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] schedule :check_crash (reason: crashed)
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] <= start
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] => check_crash  (reason: crashed)
31.03.2014 12:51:19 WARN  -- [rails_unicorn:unicorn] check crashed: process is down
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] schedule :restore (reason: crashed)
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] <= check_crash
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] => restore  (reason: crashed)
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] process <8973> from pid_file is already running
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] switch :already_running [:down => :up] (reason: crashed)
31.03.2014 12:51:19 INFO  -- [rails_unicorn:unicorn] <= restore
31.03.2014 12:51:47 INFO  -- [Eye] client command: info_data  (0.010877146s)
31.03.2014 12:51:49 INFO  -- [Eye] client command: info_data  (0.063672129s)
31.03.2014 12:51:50 INFO  -- [Eye] client command: info_data  (0.014524825s)
31.03.2014 12:51:51 INFO  -- [Eye] client command: info_data  (0.011760108s)
31.03.2014 12:51:52 INFO  -- [Eye] client command: info_data  (0.011946618s)
31.03.2014 12:51:54 INFO  -- [Eye] client command: info_data  (0.012172548s)
31.03.2014 12:51:55 INFO  -- [Eye] client command: info_data  (0.011876044s)
31.03.2014 12:51:56 INFO  -- [Eye] client command: info_data  (0.011770813s)
31.03.2014 12:51:57 INFO  -- [Eye] client command: info_data  (0.01164495s)
31.03.2014 12:51:59 INFO  -- [Eye] client command: info_data  (0.072339509s)
31.03.2014 12:52:00 INFO  -- [Eye] client command: info_data  (0.014547108s)
31.03.2014 12:52:02 INFO  -- [Eye] client command: info_data  (0.148811209s)
31.03.2014 12:52:03 INFO  -- [Eye] client command: info_data  (0.02290131s)
31.03.2014 12:52:05 INFO  -- [Eye] client command: info_data  (0.017454679s)
31.03.2014 12:52:06 INFO  -- [Eye] client command: info_data  (0.013057565s)
31.03.2014 12:52:07 INFO  -- [Eye] client command: info_data  (0.011842256s)
31.03.2014 12:52:08 INFO  -- [Eye] client command: info_data  (0.072820972s)
31.03.2014 12:52:10 INFO  -- [Eye] client command: info_data  (0.011882142s)
31.03.2014 12:52:11 INFO  -- [Eye] client command: info_data  (0.011859982s)
31.03.2014 12:52:12 INFO  -- [Eye] client command: info_data  (0.01176355s)
31.03.2014 12:52:13 INFO  -- [Eye] client command: info_data  (0.011899301s)
31.03.2014 12:52:14 INFO  -- [Eye] client command: info_data  (0.072242753s)
31.03.2014 12:52:16 INFO  -- [Eye] client command: info_data  (0.011703209s)
31.03.2014 12:52:17 INFO  -- [Eye] client command: info_data  (0.012107479s)
31.03.2014 12:52:18 INFO  -- [Eye] client command: info_data  (0.014616754s)
31.03.2014 12:52:19 INFO  -- [rails_unicorn:unicorn:child-9210] check:cpu(<75%) [0%] => OK
31.03.2014 12:52:19 INFO  -- [rails_unicorn:unicorn:child-9210] check:memory(<500Mb) [227Mb] => OK
31.03.2014 12:52:19 INFO  -- [rails_unicorn:unicorn:child-9213] check:cpu(<75%) [0%] => OK
31.03.2014 12:52:19 INFO  -- [rails_unicorn:unicorn:child-9213] check:memory(<500Mb) [224Mb] => OK

@kostya
Copy link
Owner

kostya commented May 3, 2014

I reproduce and understand the bug. Thanks for report.
This lines in log shows the bug:

03.05.2014 16:19:21 WARN  -- [Eye::System] [unicorn] sending :KILL signal to <9697> due to timeout (30s)
03.05.2014 16:19:21 ERROR -- [rails_unicorn:unicorn] execution failed with #<Timeout::Error: execution expired>; try increasing the start_timeout value (the current value of 30s seems too short)
03.05.2014 16:19:21 ERROR -- [rails_unicorn:unicorn] process <9684> failed to start ("#<Timeout::Error: execution expired>")

When grandparent die, unicorn cant write it pidfile and raising with such error.

so fix is start_timeout 100.seconds or more.

@kostya kostya closed this as completed May 3, 2014
ngpestelos pushed a commit to ngpestelos/unicorn that referenced this issue Aug 21, 2014
When daemonizing, it is possible for the grandparent to be
terminated by another process before the master can notify
it.  Do not abort the master in this case.

This may fix the following issue:

	kostya/eye#49

(which I was notified of privately via email)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants