Old workers expiration issue #1704

hackhowtofaq · 2020-02-10T14:43:37Z

Line 486 in adb633a

# Returns a list of workers that have sent a heartbeat in the past, but which

Since worker.to_s includes the pid, older workers with different pid but seconds_since_heartbeat > prune_interval are not expired.

Are you aware of this? Is this intended?

The text was updated successfully, but these errors were encountered:

josh-m-sharpe · 2020-04-05T14:42:46Z

I frequently notice after a deploy when old workers are killed and new ones started, the worker count in resque web is elevated inaccurately. Do you think this is related?

hackhowtofaq · 2020-04-05T20:24:33Z

I've also noticed this. Yeap I think it has to do with the above issue.

josh-m-sharpe · 2020-04-05T22:37:45Z

Seems like there's additional information here:

resque/lib/resque/worker.rb

Lines 605 to 610 in adb633a

    
           # If the worker hasn't sent a heartbeat, remove it from the registry. 
        
           # 
        
           # If the worker hasn't ever sent a heartbeat, we won't remove it since 
        
           # the first heartbeat is sent before the worker is registred it means 
        
           # that this is a worker that doesn't support heartbeats, e.g., another 
        
           # client library or an older version of Resque. We won't touch these.

iloveitaly · 2021-06-10T13:01:01Z

I've run into this as well. I wonder if we could add a configuration option to prune these workers automatically. If a user knows they aren't using an old version of resque or other client library, they could set the configuration option to auto-prune these workers.

If someone could whip up a PR I can help review and get it merged!

jrochkind · 2021-06-10T15:10:49Z

In my experience simply running Resque::Worker.all_workers_with_expired_heartbeats.each(&:unregister_worker) does succesfully prune these.

It's the more complicated logic in #prune_dead_workers that is maybe deciding they are not actually prune-able, perhaps because of what @hackhowtofaq explains. It seems like a bug to me, rather than something that should have a config variable, you don't have config variables for "run with this bug fixed", right?

It's just unclear to me what this more complicated logic is intended to do. Cause certainly one PR that would fix this problem would be just replacing #prune_dead_workers with the simple implementation Resque::Worker.all_workers_with_expired_heartbeats.each(&:unregister_worker) -- but what would that be losing, what is the more complicated logic meant to do?

If there aren't docs or tests about this, this is what makes it challenging to work on an under-maintained "legacy" project like this, there's nobody around with the context to understand what's up, so someone kind of has to spend some time trying to re-construct it archeologically to make changes safely.

Or we could just replace the implementation with ``Resque::Worker.all_workers_with_expired_heartbeats.each(&:unregister_worker)` and if no tests fail say that's good enough? Seems risky.

iloveitaly · 2021-07-09T13:50:13Z

Pretty sure this is related to #1750 which has a PR associated with it. Hoping we can get that PR merged soon.

iloveitaly · 2021-08-02T15:46:40Z

#1750 has been merged and should fix this issue! Re-open this issue if it's still a problem.

hackhowtofaq mentioned this issue Feb 18, 2020

Old workers expiration issue hackhowtofaq/resquex#3

Open

iloveitaly mentioned this issue Jun 10, 2021

requires resque EXACTLY 2.0.0 resque/resque-heroku-signals#5

Closed

iloveitaly closed this as completed Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Old workers expiration issue #1704

Old workers expiration issue #1704

hackhowtofaq commented Feb 10, 2020

josh-m-sharpe commented Apr 5, 2020

hackhowtofaq commented Apr 5, 2020

josh-m-sharpe commented Apr 5, 2020

iloveitaly commented Jun 10, 2021

jrochkind commented Jun 10, 2021 •

edited

iloveitaly commented Jul 9, 2021

iloveitaly commented Aug 2, 2021

Old workers expiration issue #1704

Old workers expiration issue #1704

Comments

hackhowtofaq commented Feb 10, 2020

josh-m-sharpe commented Apr 5, 2020

hackhowtofaq commented Apr 5, 2020

josh-m-sharpe commented Apr 5, 2020

iloveitaly commented Jun 10, 2021

jrochkind commented Jun 10, 2021 • edited

iloveitaly commented Jul 9, 2021

iloveitaly commented Aug 2, 2021

jrochkind commented Jun 10, 2021 •

edited