-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker count in process title not matching web ui #211
Comments
Also should mention this is a Rails app on MRI 1.9.3 p194. Trying it with jruby, it just hangs without ever starting. |
The procline only updates every 5 seconds. I assume that is not the issue because these are large files and so the workers should be busy for a good amount of time. The web UI can display old workers if you've kill -9'd sidekiq in the past. Does the log output actually reflect ~50 workers? |
The log output and the proc title seem in line. Is there a way to clear out entries from a kill -9? |
Change to suit your redis location and namespace. And do your best to avoid kill -9. |
Sorry to bump here again, but now I see one process like this: sidekiq 1.2.1 [13 of 15 busy] But there's no work going on in the logs or the web UI. These tend to be long running jobs (large file uploads). When this happens, the queues fill up and no workers are taking jobs, and the process has to be restarted. Could this be an issue with blocking i/o? I'm using Net::HTTP for uploads and downloads. |
Are you using the :timeout option to kill lingering workers? That won't fix the problem but it should keep the queues processing if a job hangs. |
The problem is there's no way to determine a good timeout, since some files are large and the remote bandwidth occasionally slow. I don't want to kill off workers that are doing their job. |
I don't know a good timeout since some jobs can take a really long time. I don't want to kill workers that are doing their job. |
You can dynamically set the timeout based on the size of the file but that's ignoring the real problem: why are your jobs hanging? Use the TTIN signal to get a thread dump of the process and see where all the threads are stuck. If it's in net/http, maybe the open_timeout and read_timeout would help? |
@mperham I have similar problem. Some of my jobs can be take reaaaaaally looong time, and they are disappearing from the web UI. Although if I run I don't need a Timeout here, since I want the job to be processed until it ends. My issue is that the UI is missing track of the being-processed job. Should I open a new issue? Seems the same to me... |
This happens with a large number of workers doing simultaneous IO (downloading and uploading large files over http). I should mention these workers tend to just stop i/o at some stage without throwing errors.
Right now, the web UI says 53 workers, but the process title total across three processes is around 2-4 workers.
The text was updated successfully, but these errors were encountered: