-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an optional threaded background worker #318
Conversation
Due to the three background workers (scheduler, twitter stream and delayed job) huginn needs a lot of memory to run (about 520MB on my dev machine). This PR introduces an optional threaded background worker which combines the three current separated processed into one (reducing the memory footprint to ~260MB). Since just one instance of the of every background processor is running at a time there should not be any threading related issues. The main gotcha of this is, that it's most likely not possible to run multiple delayed job workers concurrently. The ultimate solution would probably be switching to sidekiq with sidetiq as scheduler, but that is a different task :) When running on MRI the GIL should not be an issue because it is released for most IO bound operations (waiting for the database/website/sleeping).
I've actually been pondering (and a couple of quick googles) on whether there was a 'better' scheduling system that could be used. I was thinking along the lines of resque (partly because it's designed for distributed stuff, partly because it's designed to handle a lot of volume, mostly because I fanboi github) |
@alias1 Resque would also be a good replacement for delayed job. The benefit of sidekiq is that it is also threaded, that way multiple workers consume consume less RAM |
One of the downsides I see with switching away from delayed_job is that jobs stop being transactional. Right now, when a job is enqueued in DJ by a ActiveRecord hook, that job won't run until the surrounding transaction is committed. If an error causes the transaction to roll back, the DJ will correctly never be scheduled. With other systems, such as sidekiq and resque, these transactional semantics are not present. How important do we feel this is? |
Oh, I was not aware of the transactions. Actually i just mentioned sidekiq because it fit the threading context of the PR. The polling architecture of DJ might be a bit out of date, but other then that it does it's job well. That is also why I just merged the three background processes we have now into one and kept DJ as it is. |
Where there any changes in scheduler or twitter_stream besides pulling into libraries? |
Also, the twitter_stream is using eventmachine. I guess that's okay in a thread? |
Definitely agree. Also, on the resque side of things, it isn't really super relevant (and might be overkill given I believe it requires redis and stuff) for a self deployed/single user type Huginn setup. Where it would become really interesting is with massive scale, and/or huge numbers of distributed agents. |
def stop | ||
@rufus_scheduler.stop | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cantino Here I added the stop
method to be able to stop the scheduler from the threaded script
I do not think running eventmachine within a thread will cause any issues. The twitter stream is running in its own thread, so it should really be just like running in a process. |
Do you think we should just switch to this entirely, or leave it commented out by default? I'm tempted to switch. |
end | ||
|
||
# We need to wait a bit to let delayed_job set it's traps so we can override them | ||
sleep 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it could race on a slow machine, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the problem is that DJ sets it's traps in the start method https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/worker.rb#L136-L146
I will check if there is a way to ensure our traps are set after DJs
I think once a few people tried this out and report no issues we could be safe to switch entirely. |
What's the benefit/risk of switching entirely? I'm not fully up on what's happening, but in layman a what's the gain from it?— On Thu, May 15, 2014 at 11:26 PM, Dominik Sander notifications@github.com
|
The risk is that we can not be 100% sure that everything will work in the threaded context. But that can be tested by just letting the background workers run for a few days. |
Cool :) |
@alias1 I don't think so because this does not change the way foreman starts or stops unicorn |
Oh ok. Figured it might have smooshed all the workers into the main thread. |
@cantino The threaded worker is now the default, are we confident to enough to merge this? |
Well, it's been working fine for me. Have you had any issues? |
Nope it worked without issues so far. |
Let's go for it. First step in getting to an easy Heroku free deploy! On Monday, June 2, 2014, Dominik Sander notifications@github.com wrote:
|
Ok, merging now 😄 |
Provide a new threaded background worker
Provide a new threaded background worker
Due to the three background workers (scheduler, twitter stream and delayed job) huginn needs a lot of memory to run (about 520MB on my dev machine). This PR introduces an optional threaded background worker which combines the three current separated processed into one (reducing the memory footprint to ~260MB).
Since just one instance of the of every background processor is running at a time there should not be any threading related issues.
The main gotcha of this is, that it's most likely not possible to run multiple delayed job workers concurrently. The ultimate solution would probably be switching to sidekiq with sidetiq as scheduler, but that is a different task :)
When running on MRI the GIL should not be an issue because it is released for most IO
bound operations (waiting for the database/website/sleeping).