capistrano timeouts in whenever task under 0.8.1, not 0.8.0 #288

Closed
wlipa opened this Issue Dec 27, 2012 · 16 comments

Comments

Projects
None yet
4 participants

wlipa commented Dec 27, 2012

I'm consistently getting this timeout using the capistrano task under 0.8.1, but not under 0.8.0.

  * 2012-12-27 15:09:28 executing `whenever:update_crontab'
  * executing "cd /home/ops/apps/example.com/releases/20121227230906 && whenever --update-crontab example --set environment=production --roles db"
    servers: ["1.example.com"]
*** [deploy:update_code] rolling back
  * executing "rm -rf /home/ops/apps/example.com/releases/20121227230906; true"
    servers: ["1.example.com"]
    [1.example.com:25552] executing command
    command finished in 206ms
connection failed for: 1.example.com (Errno::ETIMEDOUT: Operation timed out - connect(2))

swombat commented Dec 28, 2012

I'm getting the same issue. The problem is fixed by switching back to 0.8.0 (gem 'whenever', '0.8.0'
in the Gemfile, then bundle update) - so this is what I've done now.

This is the full gemset that I have installed that causes the problem to occur:

Using rake (10.0.3) Using aasm (3.0.15) Using i18n (0.6.1) Using multi_json (1.5.0) Using activesupport (3.2.9) Using builder (3.0.4) Using activemodel (3.2.9) Using erubis (2.7.0) Using journey (1.0.4) Using rack (1.4.1) Using rack-cache (1.2) Using rack-test (0.6.2) Using hike (1.2.1) Using tilt (1.3.3) Using sprockets (2.2.2) Using actionpack (3.2.9) Using mime-types (1.19) Using polyglot (0.3.3) Using treetop (1.4.12) Using mail (2.4.4) Using actionmailer (3.2.9) Using arel (3.0.2) Using tzinfo (0.3.35) Using activerecord (3.2.9) Using mysql (2.9.0) Using activerecord-mysql-adapter (0.0.1) Using activeresource (3.2.9) Using connection_pool (0.9.3) Using rubinius-core-api (0.0.1) Using rubinius-actor (0.0.2) Using girl_friday (0.11.1) Using airbrake (3.1.6) Using xml-simple (1.1.2) Using aws-s3 (0.6.3) Using bcrypt-ruby (3.0.1) Using bundler (1.2.3) Using highline (1.6.15) Using net-ssh (2.6.2) Using net-scp (1.0.4) Using net-sftp (2.0.5) Using net-ssh-gateway (1.1.0) Using capistrano (2.13.5) Using chronic (0.9.0) Using coffee-script-source (1.4.0) Using execjs (1.4.0) Using coffee-script (2.2.0) Using rack-ssl (1.3.2) Using json (1.7.5) Using rdoc (3.12) Using thor (0.16.0) Using railties (3.2.9) Using coffee-rails (3.2.2) Using commonjs (0.2.6) Using multipart-post (1.1.5) Using faraday (0.8.4) Using httpauth (0.2.0) Using rails (3.2.9) Using sass (3.2.4) Using sass-rails (3.2.5) Using ios-checkboxes (0.2.1) Using jquery-rails (2.1.4) Using kgio (2.7.4) Using less (2.2.2) Using less-rails (2.2.6) Using less-rails-bootstrap (2.0.13) Using libv8 (3.3.10.4) Using mysql2 (0.3.11) Using oauth (0.4.7) Using oauth2 (0.6.1) Using rack-protection (1.3.2) Using raindrops (0.10.0) Using rdiscount (1.6.8) Using redis (3.0.2) Using redis-namespace (1.2.1) Using sinatra (1.3.3) Using vegas (0.1.11) Using resque (1.23.0) Using rvm-capistrano (1.2.7) Using sorcery (0.7.13) Using therubyracer (0.10.2) Using uglifier (1.3.0) Using unicorn (4.5.0) Using whenever (0.8.1)

Whenever is invoked in Capistrano with:

set :whenever_command, "bundle exec whenever" require "whenever/capistrano"
Owner

javan commented Dec 28, 2012

@cap10morgan, any idea what might be up here?

Contributor

cap10morgan commented Dec 28, 2012

Hmm, no I can't imagine what could be causing connect timeouts in 0.8.1. @wlipa and/or @swombat, can you see if you can pare down the circumstances under which the timeout happens in 0.8.1 to a reproducible set of steps?

Contributor

cap10morgan commented Dec 28, 2012

The error does make sense, of course, if the first example was really trying to connect to "1.example.com", but I'm assuming that the server names were changed to protect the innocent.

swombat commented Dec 28, 2012

Difficult... I'm under a pretty tight deadline so can't play around with making test projects to test this out... I know that for me this happened after I nuked my RVM/Brew, reinstalled my environment, and, as part of this, had to upgrade to rails 3.2.9. I did a couple of "bundle update"s in there, which might have pushed whenever up a version - and then this started happening whenever I did the "cap deploy" step.

Didn't happen a day earlier, when I was on Rails 3.2.3 and whatever cap version went with it.

If you can suggest a monkey-patch I can use after re-upgrading to whenever 0.8.1, to get better debug info, happy to put that in and see what happens...

wlipa commented Dec 29, 2012

I haven't narrowed it down yet, but this command is enough to show the problem, which goes away with the only change being whenever 0.8.0 vs 0.8.1 in Gemfile.lock.

bundle exec cap whenever:update_crontab

Contributor

cap10morgan commented Dec 29, 2012

I can't reproduce a timeout on my end.

Can you post your deploy config (deploy.rb and any relevant stage configurations under config/deploy/) and your schedule.rb? Feel free to change server names and anything else you'd like to keep private.

If you have a test / staging server against which the timeout happens that you don't mind me using to debug this, that would help immensely. Feel free to communicate with me privately if that's something you'd be willing to provide: cap10morgan@gmail.com.

wlipa commented Dec 30, 2012

I think I'm getting a little closer.

Under 0.8.0, doing "cap whenever:update_crontab" executes 2 remote commands. The first lists out the releases, and then one of those releases is used as the release directory for running whenever.

* executing "ls -x /home/ops/apps/example.com/releases"
* executing "cd /home/ops/apps/example.com/releases/20121229231656 && whenever --update-crontab example --set environment=production --roles db"

Under 0.8.1, there's just one command - the listing out of the releases on the server doesn't happen. It looks like the release directory that's used in the whenever command is based on the current client timestamp, and it doesn't actually exist on the server.

* executing "cd /home/ops/apps/example.com/releases/20121230063404 && whenever --update-crontab example --set environment=production --roles db"

(The above times out, and releases/20121230063404 doesn't exist on the server.)

Here are the actual releases on the server:

ops@example2:releases$ ls -l
total 12
drwxrwxr-x 12 ops ops 4096 Nov  5 16:35 20121106003535
drwxrwxr-x 12 ops ops 4096 Nov 21 14:01 20121121220106
drwxrwxr-x 12 ops ops 4096 Dec 29 15:17 20121229231656

@cap10morgan cap10morgan added a commit to cap10morgan/whenever that referenced this issue Dec 30, 2012

@cap10morgan cap10morgan use latest_release path in update_crontab; fixes issue #288 8280bd0
Contributor

cap10morgan commented Dec 30, 2012

@wlipa Can you see if the patch referenced above fixes your issue?

wlipa commented Dec 30, 2012

It does fix the directory, but unfortunately, the timeout is still there. Drat.

Contributor

cap10morgan commented Dec 30, 2012

@wlipa Can you do another comparison between the commands run by 0.8.0 and 0.8.1+patch to see what the difference is?

wlipa commented Dec 30, 2012

Sure. Here's the next thing I'm finding.... I think the timeout is legit, and the reason it's happening is that the ssh port is getting lost when running the whenever command. So a second, bogus connect is done to a dead port. I run ssh on a non default port on the server.

I added some logging to the run_tree and DefaultConnectionFactory.connect_to methods in capistrano. Here are the differences:

0.8.0

run_tree {:except=>{:no_release=>true}, :once=>true, :eof=>true}
* executing "ls -x /home/ops/apps/example.com/releases"
establish_connection_to 2.example.com:25552
DefaultConnectionFactory.connect_to 2.example.com:25552

run_tree {:roles=>:db, :eof=>true}
* executing "cd /home/ops/apps/example.com/releases/20121229231656 && whenever --update-crontab example --set environment=production --roles db"
establish_connection_to 2.example.com:25552
# Note - the prior connection is reused at this point.


0.8.1

run_tree {:except=>{:no_release=>true}, :once=>true, :eof=>true}
* executing "ls -x /home/ops/apps/example.com/releases"
establish_connection_to 2.example.com:25552
DefaultConnectionFactory.connect_to 2.example.com:25552

run_tree {:roles=>:db, :hosts=>"2.example.com", :eof=>true}
* executing "cd /home/ops/apps/example.com/releases/20121229231656 && whenever --update-crontab example --set environment=production --roles db"
# Next line is the first bogus looking one - port is missing
establish_connection_to 2.example.com
# Note we're reconnecting on the next line when we should be reusing the prior connection.
DefaultConnectionFactory.connect_to 2.example.com
connection failed for: 2.example.com (Errno::ETIMEDOUT: Operation timed out - connect(2))

Uninformed speculation - something to do with the :hosts argument that gets down to run_tree?

Contributor

cap10morgan commented Dec 30, 2012

@wlipa This is very helpful, thank you. I'm traveling for the next couple of days. I'll look into this as soon as I can, but I think you've provided enough info to find and correct the problem.

@cap10morgan cap10morgan added a commit to cap10morgan/whenever that referenced this issue Dec 30, 2012

@cap10morgan cap10morgan set the :hosts option to the ServerDefinition instance
This should bring in config options like the SSH port, which should fix
issue GitHub issue #288.
55a1666
Contributor

cap10morgan commented Dec 30, 2012

@wlipa OK, I found a little time to work on it after all. Can you give that pull request a try too?

Owner

javan commented Dec 30, 2012

Thanks guys. Heads up, I'm out of town until Wednesday.

On Dec 30, 2012, at 3:21 PM, Wes Morgan notifications@github.com wrote:

@wlipa OK, I found a little time to work on it after all. Can you give that pull request a try too?


Reply to this email directly or view it on GitHub.

wlipa commented Dec 31, 2012

With the pull request #291, no more timeout. Yay.

javan closed this Jan 10, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment