Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up after hard shutdowns #2071

Closed
michaeldiscala opened this issue Dec 3, 2014 · 6 comments
Closed

Cleaning up after hard shutdowns #2071

michaeldiscala opened this issue Dec 3, 2014 · 6 comments

Comments

@michaeldiscala
Copy link
Contributor

Hi there,

We're running Sidekiq v3.3.0 with sidekiq-pro v1.9.1 on Heroku and are having some issues with workers that don't get a chance to cleanup after themselves. Our workers get hard-shutdown if either 1) the dyno shuts down abruptly without sending the SIGTERM or 2) the worker fails to exit within the 10s period. If jobs are actively running during one of these hard shutdowns, they remain permanently on the busy page until we manually clear the data structures in Redis. Aside from being annoying in the interface, these phantom jobs are also problematic because they interfere with our processes for auto-scaling dynos.

Two questions:

  1. Is this the expected behavior in the face of a hard shutdown?
  2. If so, what would your thoughts be on incorporating some sort of cleaning process that eliminates jobs associated with workers who haven't sent a heartbeat for a long time?

We're working to make these hard shutdowns less common (by talking with heroku and by getting our app to shutdown faster). Since I doubt that we'll ever eliminate them completely, however, we'd definitely appreciate thoughts on how to tackle them from the Sidekiq perspective as well.

Thanks so much!

@mperham
Copy link
Collaborator

mperham commented Dec 3, 2014

Heroku is special: I can't tell the PID of the process so I have no way to accurately heartbeating. Normally the heartbeat is associated with the PID: process crashes, another process starts with another PID, old heartbeat ceases and job data expires in Redis. With heroku, dyno 1 crashes, it's restarted so the heartbeat immediately restarts for dyno 1, data never expires.

One possible fix: have each process reset its job state when it starts up so it can't inherit old job data.

@michaeldiscala
Copy link
Contributor Author

Could we also just associate the heartbeat with a UUID that's generated when the process first starts up?

@mperham
Copy link
Collaborator

mperham commented Dec 4, 2014

Solid idea. Let me think about that.

@michaeldiscala
Copy link
Contributor Author

Would you be open to me putting together a preliminary PR for this? Given that the bug interferes with our autoscaling and can impact our Heroku costs significantly, we're definitely open to spending some time on it ourselves.

@mperham
Copy link
Collaborator

mperham commented Dec 7, 2014

Go for it.

On Dec 7, 2014, at 13:52, Michael DiScala notifications@github.com wrote:

Would you be open to me putting together a preliminary PR for this? Given that the bug interferes with our autoscaling and can impact our Heroku costs significantly, we're definitely open to spending some time on it ourselves.


Reply to this email directly or view it on GitHub.

michaeldiscala added a commit to michaeldiscala/sidekiq that referenced this issue Dec 31, 2014
In environments with deterministic hostnames and PIDs (i.e. Heroku), the
current system for constructing a process's identity string produces
collisions (e.g. a new process can assume the identity of one that has
died). In order to address this, we add a UUID to the identity string.
Issue sidekiq#2071.
michaeldiscala added a commit to michaeldiscala/sidekiq that referenced this issue Dec 31, 2014
In environments with deterministic hostnames and PIDs (i.e. Heroku), the
current system for constructing a process's identity string produces
collisions (e.g. a new process can assume the identity of one that has
died). In order to address this, we add a random number to the identity string.
Issue sidekiq#2071.
michaeldiscala added a commit to michaeldiscala/sidekiq that referenced this issue Dec 31, 2014
In environments with deterministic hostnames and PIDs (i.e. Heroku), the
current system for constructing a process's identity string produces
collisions (e.g. a new process can assume the identity of one that has
died). In order to address this, we add a random string to the identity string.
Issue sidekiq#2071.
michaeldiscala added a commit to michaeldiscala/sidekiq that referenced this issue Dec 31, 2014
In environments with deterministic hostnames and PIDs (i.e. Heroku), the
current system for constructing a process's identity string produces
collisions (e.g. a new process can assume the identity of one that has
died). In order to address this, we add a random string to the identity string.
Issue sidekiq#2071.
@michaeldiscala
Copy link
Contributor Author

Addressed in #2113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants