Skip to content
/ jobber Public

A job queue that scales, has ability to recover jobs. Uses Redis

License

Notifications You must be signed in to change notification settings

nmjmdr/jobber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jobber

Jobber is a job queue service implemented using GO and Redis. Its scalable and resilient. Each component can be horizontally scalled. It can recover jobs from workers who fail to execute a job and assign them to other workers.

Version 2.0 of Walrus

Jobber is the version 2 of Walrus project (https://github.com/nmjmdr/walrus)

Jobber simplifies the design and improves the project structure. It does not have built in scheduler like Walrus (A scheduler is used to schedule jobs for execution - execute job at a point intime). It implements dispatcher, worker and recoverer.

Jobber can be easily extended to perform the functionality of Walrus by adding the scheduler component to it.

Job Type

Jobber has the concept of a Job Type. Each job type gets its own worker queue. One can have a number of worker instances running to execute job depending upon the load on the worker queue. A future enhancment would be be auto-scale the number of workers depending upon the jobs that are queued in the worker queue.

Recovering a job

Jobber supports recovering a job. If a worker fails during the execution of a job, the job can be recovered and executed by another worker. Recovery is supported using the concept of Visiblity timeout

Visiblity timeout

When a worker picks up a job to execute it has fixed amount of time in which to complete it. Within this time window the job is not visible to other workers. This time period is called as Visiblity timeout If the worker fails to do so, then a recoverer process recovers the job and pushes it back onto the worker queue.

Jobber API

The API supports operation to queue a job: Request

POST: https://<host>/jobber/queue
Body:
{
    "type": "job-type"
    "payload": { 
        /* free-form json payload */
    }
}

Response

201 OK
{
    "job-id": "uuid"
}

Design details

I have discussed the design details in this 4-part video series

IMAGE ALT TEXT

Design

Sequence

  1. The API accepts a new job and queues it a queue named job_queue_job-type.
  2. An instance of the worker is ready to take a new job, it tries to acquire lock on the job (job-id). The lock is set to auto expire after visibility_timeout time period.
  3. If successful, it then does RPOPLPUSH, poping the job from the worker queue and pushing it onto in_process_queue
  4. Worker then works to finish the job and then deletes it from in_process_queue
  5. If in case worker is unable to finish the job and delete it from in_process_queue, then the recoverer process pushes it back onto the worker queue

How does the Recoverer recovers jobs?

  1. Recoverer regularly scans the in_process_queue for jobs that do not have an active lock
  2. If it finds a job present in the in_process_queue but without an active lock, it then pops that job and pushes it back onto the worker queue

Steps followed by the worker

The worker follows the below steps:

  1. Read the head of the queue
  2. Try and lock the job
  3. If it cant, return and go back to waiting for the next job
  4. If the job is locked, then the worker pops the job from the worker queue and pushes it to in_process_queue (It does this using RPOPLPUSH so that the push and pop operations are done in a single step)
  5. Meanwhile if the recoverer tries to recover a job, it finds that there is an active lock on the job and it returns
  6. Process the job
  7. Delete from in_process_queue
  8. Delete the lock

If the worker fails in processing the job, the job remains in in_process_queue and the locks expires. The recoverer can then recover the job.

Currently the recoverer attempts to recover only the job at the head of the queue. It does not look further down the queue. This should not be a problem as along the visibility timeouts are small and it is not highly critical to recover the jobs relatively early.

Implementation of Visibility timeout

Visiblity time out is implemented using SETNX with expiry. SETNX sets a key only if it does not exist. Lock attempts to create a new key using SETNX for the given job id. If it can create it then the a lock has successfully placed on the job. The key is set to expire within the visibility time out period.

Note that currently I have not used Redlock mechanism https://redis.io/topics/distlock and only done a SETNX without a random value. The drawback of this is that in a master slave setup of redis, if the master goes down, then there is a chance that a valid lock could be removed by another process (in our case the recoverer). Currently this is not handled and the lock can be easily enhanced to handle it.

Future enhancements

  1. Making the recovere look through the in_proceses_queue to recover jobs

About

A job queue that scales, has ability to recover jobs. Uses Redis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages