You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, a task involves working on multiple units. Instead of performing these tasks sequentially, executing them in parallel can speed up the process.
Let's say the task is to send emails to N Customers, N being a large number. Instead of 1 Job sending emails to N Customers sequentially, the task can be sped up by enqueuing N Jobs each sending the email to 1 Customer. There is 1 problem with this approach: You cannot track the status of the task and be notified of its completion.
Batch Processing is the process of executing N Jobs under 1 umbrella, tracking their status collectively and being notified when all the Jobs are completed.
Batching helps reduce the time it takes to complete a large Task and can help build a complex workflow by feature of Callbacks that can trigger other Tasks upon its completion.
Feature Specs
A Batch must have a one-way workflow of states. Something like created -> ready-for-execution -> in-progress -> executed-at-least-once -> successful/partially-successful/failed -> completed
Once a Batch is marked as ready-for-execution, no more Jobs must be allowed to be added to it. Reason being, if existing Jobs complete, a Batch cannot go from completed state to in-progress state.
A Batch's status should be trackable via API. The status should return all metadata like: {state: "in-progress", total: 100, executing: 30, successful: 45, retrying: 15, died: 10, created_at: "2023-feb-01 11:00", updated_at: "2023-feb-1 12:00"}
If a batch is deleted, all Jobs within the batch should be deleted.
A Batch must have a callback to mark its completion. For simplicity sake, there should be only 1 callback which marks that all Jobs have reached a state of completion (succeeded, or exhausted retries & died). Callbacks need to be implemented as per details here: Client-side Callbacks when a Job executes #54
A Batch shouldn't have a deadline/timeout. That is best taken care of by retry-settings at an individual Job-level.
Scheduling a Job within a batch shouldn't be allowed. If this functionality is needed, it can be achieved by a scheduled Job creating a Batch of Jobs
Nuances
Ordering of Job-execution within a batch cannot be guaranteed since Jobs will be executed parallely on different workers
When a Batch is deleted, all Jobs in the queue will be deleted. However, some Jobs might be executing. Their deletion cannot be guaranteed. They might fail and continue to be retried until they are dead. To address this, Jobs can check status of the batch before executing. Checks being performed before execution can be detrimental to performance. Hence, it's not advisable to enqueue batches that might need to be deleted mid-way.
Implementation Details
This is a complex feature to build. Some ideas after initial investigation:
A persistent store will be required to store count of executing jobs. Hence, this feature can exist for a message-broker like Redis and Postgres, but not for RabbitMQ.
The text was updated successfully, but these errors were encountered:
What is Batching?
Sometimes, a task involves working on multiple units. Instead of performing these tasks sequentially, executing them in parallel can speed up the process.
Let's say the task is to send emails to N Customers, N being a large number. Instead of 1 Job sending emails to N Customers sequentially, the task can be sped up by enqueuing N Jobs each sending the email to 1 Customer. There is 1 problem with this approach: You cannot track the status of the task and be notified of its completion.
Batch Processing is the process of executing N Jobs under 1 umbrella, tracking their status collectively and being notified when all the Jobs are completed.
Batching helps reduce the time it takes to complete a large Task and can help build a complex workflow by feature of Callbacks that can trigger other Tasks upon its completion.
Feature Specs
created -> ready-for-execution -> in-progress -> executed-at-least-once -> successful/partially-successful/failed -> completed
ready-for-execution
, no more Jobs must be allowed to be added to it. Reason being, if existing Jobs complete, a Batch cannot go fromcompleted
state toin-progress
state.{state: "in-progress", total: 100, executing: 30, successful: 45, retrying: 15, died: 10, created_at: "2023-feb-01 11:00", updated_at: "2023-feb-1 12:00"}
retry-settings
at an individual Job-level.Nuances
Implementation Details
This is a complex feature to build. Some ideas after initial investigation:
The text was updated successfully, but these errors were encountered: