Workless scaling up but not scaling down! #19

Closed
DanielleSucher opened this Issue Mar 15, 2012 · 22 comments

10 participants

@DanielleSucher

Hey! I'm trying to get Workless running, but it seems to only partially work for me. It scales my app up to 1 worker dyno when a delayed_job background process starts, but then it does not scale back down to 0 workers after the job finishes and is deleted and no jobs remain undone.

Any ideas on why that might be happening? Has anyone else mentioned having this problem?

I've been poking around trying to solve the mystery myself, but no luck so far.

My setup:
Heroku Cedar
Ruby 1.9.3
Rails 3.2.1
delayed_job_active_record 0.3.2
workless 1.0.1

@lostboy
Owner
@DanielleSucher
@DanielleSucher

I think my fork is working for me, finally. Probably worth taking glance when you look into this issue.

@davidakachaos

Could you verify if this issue still needs your fix on the current version? If not, we (you) can close this issue.

@ambethia

I'm having this problem right now. It may or may not be related to me trying out Rails 4 beta1, and edge DelayedJob. Any insight before I dive in and try to figure it out?

@pedrokost

@ambethia, have you discovered what you issue was? I am trying to upgrade an app to Rails 4 rc1, and I have the same problem. Workers aren't scaling down.

@ambethia

So, from what I could tell the after_commit hook didn't seem to be working right, so I manually put the hooks in an initializer:

Delayed::Backend::ActiveRecord::Job.class_eval do
  after_destroy "self.class.scaler.down"
  after_create "self.class.scaler.up"
  after_update "self.class.scaler.down"
end

It's working fine for now, I haven't had the time to dig any deeper.

@davidakachaos

Is this a bug on the Rails 4.0 code? Or is the after_commit removed from Rails 4.0?

@ambethia

after_commit is in Rails 4, and the scaling up works. The workless hook even seems to work sometimes.

@davidakachaos

Just now I can confirm this issue with the released 1.1.2 version of workless. Haven't tried my own fork yet because of time contrains. This is a nasty error that needs to be resolved. With the local scaler I have had no problem.

@mattholtom

Is it only Rails 4 that is causing this issue? I'm currently troubleshooting periodic failures of a similar scale-down issue with the local scaler, workless 1.1.2, rails 3.2.13.

@davidakachaos

Okay, I might have a solution. But I'm not the one who found it...

See: davidakachaos#2 (comment)

As @cornflakesuperstar says:

[snip] that when setting WORKLESS_MAX_WORKERS to a value greater than one for heroku cedar, Delayed::Job needs to be configured with:

Delayed::Worker.raise_signal_exceptions = :term

otherwise, a race condition occurs whereby if you have multiple workers running and the first worker finishes but a second worker is still going, the call to ::Heroku::API.post_ps_scale is always incorrectly killing the last worker.

This results in a Delayed::Job record that still thinks it's locked to a worker process, however that worker process has been SIGKILL'ed by heroku.

Because this hanging job remains locked (even though it's not being processed), workless doesn't spin down the remaining worker (which isn't doing anything) and you get a race condition with a job that doesn't finish and a worker that doesn't spin down.

This requires delayed job ~> 3.0.5 afaik!!

@lostboy
Owner

can i close?

@davidakachaos

It's related to PR #46 so we should first discuss that I guess..

@dtuite

I'm experiencing this issue now on Heroku and locally. Workers scale up but don't scale down.

I experienced the issue first when I had no max/min worker config vars. I tried setting

WORKLESS_MAX_WORKERS: 1
WORKLESS_MIN_WORKERS: 0

but the problem persists.

I'm running rails 4.0.0, delayed_job 4.0.0 and delayed_job_active_record 4.0.0 on ruby 2.0.0p247.

The issue seems to be happening with both the local and heroku_cedar scalers (I didn't try any other ones).

I'm not sure how to debug exactly but if you need more info just let me know.

@mike-north

I'm seeing the same thing @dtuite is, and just like him, I'm running delayed_job (4.0.0), delayed_job_active_record (4.0.0), rails (4.0.0), daemons (1.1.9), and ruby 2.0.0p247

I've tried setting WORKLESS_MAX_WORKERS: 1, WORKLESS_MIN_WORKERS: 0

@winston

Hi,

I have the same problem:
delayed_job_active_record (4.0.0)
delayed_job (4.0.0)
workless (1.2.1)
rails 4.0
heroku cedar
ruby 2.0.0-p247

Is this a problem specific to cedar stack? Thank you!

@nikue

Same issue

@davidakachaos

Well #64 was merged into the master. Could some of you perhaps test version 1.2.2 to see if it solves this issue?

@dtuite

Yep, #64 fixed it. Thank you.

@davidakachaos

👍 @lostboy this can be considered done 😄

@lostboy
Owner

lovely :)

@lostboy lostboy closed this Sep 23, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment