Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an optional threaded background worker #318

Merged
merged 4 commits into from
Jun 2, 2014

Conversation

dsander
Copy link
Collaborator

@dsander dsander commented May 11, 2014

Due to the three background workers (scheduler, twitter stream and delayed job) huginn needs a lot of memory to run (about 520MB on my dev machine). This PR introduces an optional threaded background worker which combines the three current separated processed into one (reducing the memory footprint to ~260MB).

Since just one instance of the of every background processor is running at a time there should not be any threading related issues.

The main gotcha of this is, that it's most likely not possible to run multiple delayed job workers concurrently. The ultimate solution would probably be switching to sidekiq with sidetiq as scheduler, but that is a different task :)

When running on MRI the GIL should not be an issue because it is released for most IO
bound operations (waiting for the database/website/sleeping).

Due to the three background workers (scheduler, twitter stream and delayed job)
huginn needs a lot of memory to run (about 520MB on my dev machine). This PR
introduces an optional threaded background worker which combines the three current
separated processed into one (reducing the memory footprint to ~260MB).
Since just one instance of the of every background processor is running at a time
there should not be any threading related issues.
The main gotcha of this is, that it's most likely not possible to run multiple
delayed job workers concurrently. The ultimate solution would probably be switching
to sidekiq with sidetiq as scheduler, but that is a different task :)

When running on MRI the GIL should not be an issue because it is released for most IO
bound operations (waiting for the database/website/sleeping).
@0xdevalias
Copy link
Member

@dsander The ultimate solution would probably be switching to sidekiq with sidetiq as scheduler, but that is a different task

I've actually been pondering (and a couple of quick googles) on whether there was a 'better' scheduling system that could be used. I was thinking along the lines of resque (partly because it's designed for distributed stuff, partly because it's designed to handle a lot of volume, mostly because I fanboi github)

@dsander
Copy link
Collaborator Author

dsander commented May 12, 2014

@alias1 Resque would also be a good replacement for delayed job. The benefit of sidekiq is that it is also threaded, that way multiple workers consume consume less RAM

@cantino
Copy link
Member

cantino commented May 12, 2014

One of the downsides I see with switching away from delayed_job is that jobs stop being transactional. Right now, when a job is enqueued in DJ by a ActiveRecord hook, that job won't run until the surrounding transaction is committed. If an error causes the transaction to roll back, the DJ will correctly never be scheduled. With other systems, such as sidekiq and resque, these transactional semantics are not present.

How important do we feel this is?

@dsander
Copy link
Collaborator Author

dsander commented May 12, 2014

Oh, I was not aware of the transactions. Actually i just mentioned sidekiq because it fit the threading context of the PR. The polling architecture of DJ might be a bit out of date, but other then that it does it's job well. That is also why I just merged the three background processes we have now into one and kept DJ as it is.
I think that right now there are more important tasks like adding OAuth consumer, agent scenarios or moving agents to gems.

@cantino
Copy link
Member

cantino commented May 13, 2014

Where there any changes in scheduler or twitter_stream besides pulling into libraries?

@cantino
Copy link
Member

cantino commented May 13, 2014

Also, the twitter_stream is using eventmachine. I guess that's okay in a thread?

@0xdevalias
Copy link
Member

@dsander I think that right now there are more important tasks like adding OAuth consumer, agent scenarios or moving agents to gems.

Definitely agree.

Also, on the resque side of things, it isn't really super relevant (and might be overkill given I believe it requires redis and stuff) for a self deployed/single user type Huginn setup. Where it would become really interesting is with massive scale, and/or huge numbers of distributed agents.

def stop
@rufus_scheduler.stop
end

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cantino Here I added the stop method to be able to stop the scheduler from the threaded script

@dsander
Copy link
Collaborator Author

dsander commented May 14, 2014

I do not think running eventmachine within a thread will cause any issues. The twitter stream is running in its own thread, so it should really be just like running in a process.

@cantino
Copy link
Member

cantino commented May 15, 2014

Do you think we should just switch to this entirely, or leave it commented out by default? I'm tempted to switch.

end

# We need to wait a bit to let delayed_job set it's traps so we can override them
sleep 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could race on a slow machine, unfortunately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the problem is that DJ sets it's traps in the start method https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/worker.rb#L136-L146
I will check if there is a way to ensure our traps are set after DJs

@dsander
Copy link
Collaborator Author

dsander commented May 15, 2014

Do you think we should just switch to this entirely, or leave it commented out by default? I'm tempted to switch.

I think once a few people tried this out and report no issues we could be safe to switch entirely.

@0xdevalias
Copy link
Member

What's the benefit/risk of switching entirely? I'm not fully up on what's happening, but in layman a what's the gain from it?—
Sent from Mailbox

On Thu, May 15, 2014 at 11:26 PM, Dominik Sander notifications@github.com
wrote:

Do you think we should just switch to this entirely, or leave it commented out by default? I'm tempted to switch.

I think once a few people tried this out and report no issues we could be safe to switch entirely.

Reply to this email directly or view it on GitHub:
#318 (comment)

@dsander
Copy link
Collaborator Author

dsander commented May 16, 2014

The risk is that we can not be 100% sure that everything will work in the threaded context. But that can be tested by just letting the background workers run for a few days.
The benefit is that the memory usage is cut in half.

@0xdevalias
Copy link
Member

Cool :)

@0xdevalias
Copy link
Member

@dsander Does this change how things would be started up from foreman export? I have a feeling this might make my issue on #339 (unicorns not dying properly) easier?

@dsander
Copy link
Collaborator Author

dsander commented May 26, 2014

@alias1 I don't think so because this does not change the way foreman starts or stops unicorn

@0xdevalias
Copy link
Member

Oh ok. Figured it might have smooshed all the workers into the main thread.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling df90322 on dsander:threaded-background-workers into 44853e4 on cantino:master.

@dsander
Copy link
Collaborator Author

dsander commented Jun 1, 2014

@cantino The threaded worker is now the default, are we confident to enough to merge this?

@cantino
Copy link
Member

cantino commented Jun 2, 2014

Well, it's been working fine for me. Have you had any issues?

@dsander
Copy link
Collaborator Author

dsander commented Jun 2, 2014

Nope it worked without issues so far.

@cantino
Copy link
Member

cantino commented Jun 2, 2014

Let's go for it. First step in getting to an easy Heroku free deploy!

On Monday, June 2, 2014, Dominik Sander notifications@github.com wrote:

Nope it worked without issues so far.


Reply to this email directly or view it on GitHub
#318 (comment).

@dsander dsander mentioned this pull request Jun 2, 2014
@dsander
Copy link
Collaborator Author

dsander commented Jun 2, 2014

Ok, merging now 😄

dsander added a commit that referenced this pull request Jun 2, 2014
Provide a new threaded background worker
@dsander dsander merged commit 577b95f into huginn:master Jun 2, 2014
@dsander dsander deleted the threaded-background-workers branch June 2, 2014 17:22
DataMinerUK pushed a commit to DataMinerUK/huginn that referenced this pull request Oct 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants