-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add perform_all_later
to enqueue multiple jobs at once
#46603
Conversation
0f8ae98
to
3f9262c
Compare
Thanks for working on this! I deleted two review comments because I realized after posting that you addressed them in the PR description. I will rephrase one of them here: I think it would make sense to skip
You said:
A similar argument could be made for Active Record’s bulk APIs, but to my knowledge, this hasn’t been a major source of confusion in practice. These APIs exist to accommodate intentional performance optimization and it’s enough that they’re clearly documented not to run callbacks. |
Good points. I'm worried that not running callbacks will make them far less usable for many apps, though. As an example, our main monolith does a lot of validation in |
3f9262c
to
2f56ba8
Compare
Hello! Thanks for opening this, similar contribution is on my TODO list for some time already. I can share our experience on this. We have implemented something similar (called I tried to extract pseudo code of our approach. class UserUpdate < ApplicationJob
before_enqueue :update_user_state
def perform(user_id)
# ... some logic
end
def update_user_state
User.find(user_id).queue_update # state machine change
end
end
class UserGroupUpdate < ApplicationJob
before_enqueue :update_users_states
def perform(user_group_id)
User.where(group_id: user_group_id).waiting_for_update.find_in_batches do |batch|
UserUpdate.perform_async_multi(*batch.map(&:id))
end
end
def update_users_states
# state machine bulk change
# we need this since enqueue callback is skipped due to
# perform_async_multi is used for maximum performance
User.where(group_id: user_group_id).idle.queue_update!
end
end Thanks to |
@simi Thanks for that example! So it looks like with the current implementation in this PR that does run class UserGroupUpdate < ApplicationJob
def perform(user_group_id)
User.where(group_id: user_group_id).waiting_for_update.find_in_batches do |batch|
UserUpdate.perform_all_later(*batch.map(&:id))
end
end
end Does that meet your needs, or do you think it would be better not to run the callbacks? |
In our case we would like to at least opt-out of the callbacks, since we would like to to run all callbacks in "bulk" way in "group" update wrapper. Since |
That makes sense. I'll change the proposed implementation in this PR to not run callbacks. We can always add an option to run callbacks later, but the default will be not to. As @georgeclaghorn pointed out, that is in line with the Active Record bulk methods, so it won't be too unexpected for users. |
31b4a0a
to
baaecb9
Compare
I've thought a bit about this last night, and I'm afraid the only way to handle keyword arguments properly would be to expose some helper to process arguments: MyJob.perform_all_later([
MyJob.arguments(1, foo: 42),
MyJob.arguments(1, {hello: "world"}, foo: 42),
]) But then at this stage I wonder if it may make sense to just instantiate the |
Something like this (maybe covered with some high level API on job level as well)? batch = ActiveJob::Batch.new
users.each do |user|
batch.jobs << UserJob.new(id: user.id)
end
batch.perform_later |
More like: ActiveJob.perform_all_later(users.map { |u| UserJob.new(id: user.id) }) Also, instantiating the jobs means running the callbacks is less of a worry. Now of course that means much more allocations, but for most job classes that will be one or two allocs per job which I think is negligible compare to what is needed to serialize the arguments anyway. |
Another option would be to treat the last hash in the args array (if any) as keywords. Jobs that take a hash as their last positional argument would need to pass an additional empty hash of keyword arguments, but would otherwise work. MyJob.perform_all_later([[first_arg1, first_arg2], [second_arg1, second_arg2]]) # Positional only
MyJob.perform_all_later([[first_arg, { key: first_key }], [second_arg, { key: second_key }]]) # Positional and keyword
MyJob.perform_all_later([[first_arg, { hello: "world" }, {}], [second_arg, { hello: "world" }, {}]]) # Positional hash
MyJob.perform_all_later([[{ key: first_key }], [{ key: first_key }]]) # Keywords only (could support omitting the wrapping array here) |
I'm the maintainer of GoodJob. I wanted to offer my strong support of this proposal 👍🏻 From my perspective of what's possible with GoodJob, I'd simply recommend to my users that they pass in job instances like @casperisfine suggested: jobs = users.map { |u| UserJob.new(id: user.id).set(delay: 10.minutes) }
ApplicationJob.perform_all_later(jobs) That also makes it possible to determine which jobs failed to enqueue by expecting/hoping the Adapter is able to set Re: delays, I think the Adapter should just be expected to grab |
I'd be ok with that, but as an explictly lower level API that doesn't instantiate the job instances and don't run callbacks. To me that's a more niche feature that can only be used in specific cases, like |
dd32881
to
c9a5eff
Compare
I updated the API to take an array of jobs, as suggested. I'm not sure if the code is in the right place now. I left it in the modules (grouped by functionality), but as the API is now top-level, it may make more sense in a separate file, or just the top-level I made no changes to callbacks (they are not run), as they still suffer from the same problems as before (detailed in the description and earlier discussion). |
private | ||
def instrument_enqueue_all(queue_adapter, jobs) | ||
payload = { adapter: queue_adapter, jobs: jobs } | ||
ActiveSupport::Notifications.instrument("enqueue_all.active_job", payload) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a different event for mass enqueue might be a bit of a problem. e.g. tracing framework might not handle both etc.
I don't have a solution in mind right now, but making a note to put more thoughts into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of a separate event is that the semantics of the current event remain the same, and frameworks can add support for the new event over time. If there is a clean way of plugging in the existing monitoring, I'd be all for that, but I'm not sure there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That new event is really the only thing that bothers me here. @rafaelfranca I'd love to hear your thoughts on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mangara ok, sorry for the delay, I just had a chat with @rafaelfranca and introducing that new event is fine, we'll just have to make sure it's properly pointed out in the changelog etc.
So no more blocker from me. I'll merge this next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add this to the instrumentation guide:
https://edgeguides.rubyonrails.org/active_support_instrumentation.html#active-job
I'm happy to do this after this PR lands though, not a blocker!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! It should definitely be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3748385 🙏
49ef697
to
a047397
Compare
fyi'ing @mperham for visibility on Sidekiq. |
This is a totally fair criticism and poor API design on my part. If someone wants to open up a new issue for Sidekiq, we can discuss changing the return value for 7.1. |
a047397
to
e3fabdc
Compare
Sidekiq has a useful optimisation called `push_bulk` that enqueues many jobs at once, eliminating the repeated Redis roundtrips. However, this feature is not exposed through Active Job, so it only works for `Sidekiq::Worker` jobs. This adds a barrier to Active Job adoption for apps that rely on this feature. It also makes it harder for other queue adapters to implement similar functionality, as they then have to take care of serialization, callbacks, etc. themselves. This commit adds `ActiveJob.perform_all_later(<job1>, <job2>)`, backed by Sidekiq's `push_bulk` and with a fallback to enqueuing serially if the queue adapter does not support bulk enqueue. The performance benefit for 1000 jobs can be more than an order of magnitude: | Enqueue type | Serial time (ms) | Bulk time (ms) | Speedup | | ------------------ | ---------------- | -------------- | ------- | | Raw Sidekiq | 2661 | 119 | 22x | | Active Job Sidekiq | 2853 | 208 | 14x | (Measured in a simple test app in our production environment.) Instrumentation for perform_all_later uses a new event `enqueue_all.active_job`
e3fabdc
to
9b62f88
Compare
I changed the return value to |
Motivation / Background
Sidekiq has a useful optimisation called
push_bulk
that enqueues many jobs at once, eliminating the repeated Redis roundtrips. However, this feature is not exposed through Active Job, so it only works forSidekiq::Worker
jobs. This adds a barrier to Active Job adoption for apps that rely on this feature. It also makes it harder for other queue adapters to implement similar functionality, as they then have to take care of serialization, callbacks, etc. themselves.Detail
This PR adds
ActiveJob.perform_all_later([<job 1>, <job2>])
, backed by Sidekiq'spush_bulk
and with a fallback to enqueuing serially if the queue adapter does not support bulk enqueue.Arguments
It's hard to support the full diversity of ways a Ruby method can be called with a simple array. In particular, jobs with a mix of positional and keyword arguments are difficult to distinguish from jobs with positional arguments that include a hash. Sidekiq gets around this by not supporting keyword arguments.
While it is possible to support everything by treating the last hash in the array of arguments as keywords and requiring jobs with a hash as last positional argument to include an additional empty hash of keyword arguments, based on feedback on the PR, we chose to pass instantiated jobs, so we can let Ruby handle all these complexities.
Delay
Passing in instantiated jobs also makes it easy for the user to specify delays for each job:
Return value
The return value of
perform_all_later
is limited by Sidekiq's current behaviour.push_bulk
returns the (provider) ids of all jobs that were successfully enqueued. This means that if we try to enqueue 2 jobs, we may get back a single id with no way of knowing which job it belongs to, so we can't even map it back to Active Job's job ids. I chose to return the number of successfully enqueued jobs, but it may be better to always returntrue
for now, so that we can more easily change it later? In an ideal world, I think the return value would be an array of either the job orfalse
, to mirror the return value ofperform_later
.Callbacks
Based on feedback in this PR,
perform_all_later
does not run any callbacks. This is in line with the Active Record bulk methods, so it shouldn't be too surprising to users and it is clearly stated in the documentation.There are several issues with running callbacks:
around_enqueue
callbacks for each job in a way that the callback begins before the job is enqueued and ends after. Running them another way breaks the assumptions and gives strange results, for instance for Active Job's own enqueue instrumentation.Batching
Sidekiq recommends batches of no more than 1000 jobs, and their newer bulk API
perform_bulk
will automatically break them up into batches of that size if you pass a larger array. As recommended batch sizes will vary between back-ends, I'm not sure if this should be something that Active Job handles as opposed to the adapter, although we could make it configurable. This is also easy to add later.Additional information
The performance benefit for 1000 jobs can be more than an order of magnitude:
(Measured in a simple test app in our production environment.)
See also #39499 which was a previous stab at this and where I stole the name from (🙇 @vinistock).
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]