Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker count in process title not matching web ui #211

Closed
jsierles opened this issue May 25, 2012 · 10 comments
Closed

worker count in process title not matching web ui #211

jsierles opened this issue May 25, 2012 · 10 comments

Comments

@jsierles
Copy link

This happens with a large number of workers doing simultaneous IO (downloading and uploading large files over http). I should mention these workers tend to just stop i/o at some stage without throwing errors.

Right now, the web UI says 53 workers, but the process title total across three processes is around 2-4 workers.

@jsierles
Copy link
Author

Also should mention this is a Rails app on MRI 1.9.3 p194. Trying it with jruby, it just hangs without ever starting.

@mperham
Copy link
Collaborator

mperham commented May 25, 2012

The procline only updates every 5 seconds. I assume that is not the issue because these are large files and so the workers should be busy for a good amount of time.

The web UI can display old workers if you've kill -9'd sidekiq in the past. Does the log output actually reflect ~50 workers?

@jsierles
Copy link
Author

The log output and the proc title seem in line. Is there a way to clear out entries from a kill -9?

@mperham
Copy link
Collaborator

mperham commented May 25, 2012

> redis-cli
redis 127.0.0.1:6379> del 'workers'
(integer) 1

Change to suit your redis location and namespace. And do your best to avoid kill -9.

@mperham mperham closed this as completed May 25, 2012
@jsierles
Copy link
Author

Sorry to bump here again, but now I see one process like this:

sidekiq 1.2.1 [13 of 15 busy]

But there's no work going on in the logs or the web UI. These tend to be long running jobs (large file uploads). When this happens, the queues fill up and no workers are taking jobs, and the process has to be restarted. Could this be an issue with blocking i/o? I'm using Net::HTTP for uploads and downloads.

@mperham
Copy link
Collaborator

mperham commented May 30, 2012

Are you using the :timeout option to kill lingering workers? That won't fix the problem but it should keep the queues processing if a job hangs.

@jsierles
Copy link
Author

The problem is there's no way to determine a good timeout, since some files are large and the remote bandwidth occasionally slow. I don't want to kill off workers that are doing their job.

@jsierles
Copy link
Author

I don't know a good timeout since some jobs can take a really long time. I don't want to kill workers that are doing their job.

@mperham
Copy link
Collaborator

mperham commented May 30, 2012

You can dynamically set the timeout based on the size of the file but that's ignoring the real problem: why are your jobs hanging?

Use the TTIN signal to get a thread dump of the process and see where all the threads are stuck. If it's in net/http, maybe the open_timeout and read_timeout would help?

@brutuscat
Copy link

@mperham I have similar problem. Some of my jobs can be take reaaaaaally looong time, and they are disappearing from the web UI. Although if I run ps axf I can see sidekiq is processing it sidekiq 2.0.0 [1 of 10 busy]

I don't need a Timeout here, since I want the job to be processed until it ends. My issue is that the UI is missing track of the being-processed job.

Should I open a new issue? Seems the same to me...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants