jp (Job Pool) is a producer/consumer job pooling system; I’m calling it a job pool rather than a job queue, because it *does not enforce strict ordering*.
A few of the basic principles behind this project:
-
A producer/consumer system
-
Easily usable from multiple languages
-
If a consumer starts working on a message in the pool, then crashes, eventually, without human interaction, the message will eventually be given to another consumer - jp uses a lock-work-purge system, with a timeout on the lock
-
All messages must be persistent
-
It’s not required that messages are consumed in /exactly/ the order they are added to the pool
-
It must be easy to setup an HA installation
-
It must be horizontally scalable
jp uses Thrift for communication with consumers and producers, and should work with any of the languages Thrift supports. At the time of writing, this includes:
-
C++
-
Java
-
Python
-
PHP
-
Ruby
-
Erlang
-
Perl
-
Haskell
-
C#
-
Cocoa
-
Smalltalk
-
Ocaml
Consumers and producers can be written in different languages.
Additionally, as an alternative to using the Thrift interface directly, simplified client libraries are currently provided for:
-
C++
-
Ruby
-
Ruby 1.9
-
Thrift (with Ruby support)
-
mongo rubygem
-
bson rubygem
-
rev rubygem
-
A mongodb server/cluster to connect to
-
Ant (simple java client build only)
You might additionally want the bson_ext rubygem, which normally increases performance - however, it appears to be incompatible with Ruby 1.9.0 on Lenny.
We strongly recommend using Ruby 1.9.1 or later - this is also available for Lenny via backports.
-
libevent (and development headers - normally libevent-dev or similar)
-
ant
-
You can prevent the locked -> unlocked transition from occuring by using a consumer-side timeout
-
Without a consumer-side timeout, there is a possibility that multiple consumers will be processing the same message at the same time
-
Messages must be idempotent (i.e. it doesn’t matter how many times they are consumed, as long as it’s at least once - may be twice, may be hundreds). jp will run them only once in the normal case, but there are other possibilities.
-
Unless you only run one consumer per pool, or you implement your own locking, it must be safe for multiple consumers to process the same message at the same time.
-
Message delivery order can’t matter. In some circumstances, you might be able to work around this with a ‘version’ field serialized into the message (if later messages do not depend upon previous messages having been acted on).
jp simply passes around BLOBs as messages; it’s entirely up to your application what serialization format you use - here’s a few suggestions that are fairly portable between different languages, though you can use any other method you like:
-
Thrift via memory buffer, and your choice of protocol - the Ruby library provides a Thrift::Serialize class using the memory buffer transport and the binary protocol
-
UTF-8 if it’s simple character data
-
UTF-8 JSON
-
XML
-
UTF-8 YAML
-
Google’s Protocol Buffers
If you’re willing to be locked into using the same language for producers and consumers, you can, of course use platform-specific functionality such as:
-
serialize/unserialize in PHP
-
Marshall in Ruby
-
Data::Dumper in Perl
-
writeObject/readObject (from Serializable) in Java
It is, however, generally preferable to have more flexibility for the future by using a serialization method from the first list, or similar.
There are (at least) three ways messages can be consumed out of order:
-
A consumer fails to complete processing within a per-pool time limit, so the message is re-queued, and may then be executed after other items that were queued while it was being processed
-
Two items end up being inserted on different mongodb servers, in the same second (then, within that second, the ordering is decided by a byte-by-byte comparison of the mongodb-generated ‘_id’ field)
-
Two consumers are running against the same queue, and one runs somewhat faster than another; depending on what your consumers do, this may lead to out-of-order execution
The first of these is unavoidable if multiple concurrent consumers are allowed per pool.
Follow the instructions on the mongodb website, or, for a quick start, download one of the tarballs from www.mongodb.org/downloads, then:
tar xfv /path/to/mongodb-OS-arch-version.tgz cd mongodb-OS-arch-version mkdir /var/tmp/mongodata bin/mongod --dbpath /var/tmp/mongodata
There are also packages for various distributions linked from the bottom of that page; use them in preference to the version included in your distribution. For example, the ‘mongodb’ version in Ubuntu at the time of writing doesn’t support the find_and_modify operation (too old), however mongodb-stable from the repositories listed on the mongodb site does.
To build all the generated components, run:
make
./jp examples/jp-config.rb
This starts jp using the configuration file in the examples directory. This:
-
connects to a mongodb server on localhost
-
sets up ‘text’, ‘json’, and ‘thrift’ pools for the examples
Look at the contents of examples/ for an idea of how to use this - to build them, and their dependencies, run ‘make examples’ from the top-most directory.
Producer and consumer examples using the Thrift API are available for:
-
C++
-
Java
-
Ruby
Producer and consumer examples using the simplified libraries are available for:
-
C++
-
Ruby
-
Java