worker count in process title not matching web ui #211

jsierles · 2012-05-25T20:29:42Z

This happens with a large number of workers doing simultaneous IO (downloading and uploading large files over http). I should mention these workers tend to just stop i/o at some stage without throwing errors.

Right now, the web UI says 53 workers, but the process title total across three processes is around 2-4 workers.

jsierles · 2012-05-25T20:30:29Z

Also should mention this is a Rails app on MRI 1.9.3 p194. Trying it with jruby, it just hangs without ever starting.

mperham · 2012-05-25T20:35:23Z

The procline only updates every 5 seconds. I assume that is not the issue because these are large files and so the workers should be busy for a good amount of time.

The web UI can display old workers if you've kill -9'd sidekiq in the past. Does the log output actually reflect ~50 workers?

jsierles · 2012-05-25T20:37:15Z

The log output and the proc title seem in line. Is there a way to clear out entries from a kill -9?

mperham · 2012-05-25T20:47:13Z

> redis-cli
redis 127.0.0.1:6379> del 'workers'
(integer) 1

Change to suit your redis location and namespace. And do your best to avoid kill -9.

jsierles · 2012-05-30T19:55:39Z

Sorry to bump here again, but now I see one process like this:

sidekiq 1.2.1 [13 of 15 busy]

But there's no work going on in the logs or the web UI. These tend to be long running jobs (large file uploads). When this happens, the queues fill up and no workers are taking jobs, and the process has to be restarted. Could this be an issue with blocking i/o? I'm using Net::HTTP for uploads and downloads.

mperham · 2012-05-30T20:16:07Z

Are you using the :timeout option to kill lingering workers? That won't fix the problem but it should keep the queues processing if a job hangs.

jsierles · 2012-05-30T20:19:41Z

The problem is there's no way to determine a good timeout, since some files are large and the remote bandwidth occasionally slow. I don't want to kill off workers that are doing their job.

jsierles · 2012-05-30T20:20:28Z

I don't know a good timeout since some jobs can take a really long time. I don't want to kill workers that are doing their job.

mperham · 2012-05-30T20:24:48Z

You can dynamically set the timeout based on the size of the file but that's ignoring the real problem: why are your jobs hanging?

Use the TTIN signal to get a thread dump of the process and see where all the threads are stuck. If it's in net/http, maybe the open_timeout and read_timeout would help?

brutuscat · 2012-06-05T10:47:32Z

@mperham I have similar problem. Some of my jobs can be take reaaaaaally looong time, and they are disappearing from the web UI. Although if I run ps axf I can see sidekiq is processing it sidekiq 2.0.0 [1 of 10 busy]

I don't need a Timeout here, since I want the job to be processed until it ends. My issue is that the UI is missing track of the being-processed job.

Should I open a new issue? Seems the same to me...

mperham closed this as completed May 25, 2012

brutuscat mentioned this issue Jun 7, 2012

Web UI lost track of long running tasks #227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker count in process title not matching web ui #211

worker count in process title not matching web ui #211

jsierles commented May 25, 2012

jsierles commented May 25, 2012

mperham commented May 25, 2012

jsierles commented May 25, 2012

mperham commented May 25, 2012

jsierles commented May 30, 2012

mperham commented May 30, 2012

jsierles commented May 30, 2012

jsierles commented May 30, 2012

mperham commented May 30, 2012

brutuscat commented Jun 5, 2012

worker count in process title not matching web ui #211

worker count in process title not matching web ui #211

Comments

jsierles commented May 25, 2012

jsierles commented May 25, 2012

mperham commented May 25, 2012

jsierles commented May 25, 2012

mperham commented May 25, 2012

jsierles commented May 30, 2012

mperham commented May 30, 2012

jsierles commented May 30, 2012

jsierles commented May 30, 2012

mperham commented May 30, 2012

brutuscat commented Jun 5, 2012