Design of the Haskell driver for MongoDB
by Tony Hannan, July 2011
The Haskell driver is a production-quality MongoDB driver for the Haskell language. This article highlights the design of the driver. For detailed documentation with coding examples see the driver package and homepage (follow driver link above).
BSON is a binary format for documents used by MongoDB but defined independently at bsonspec.org. Each language has its own representation for documents but they all serialize to this format. In the Haskell bson package, I chose to represent a Document as a list of Fields, where each Field has a Label and a Value. This is isomorphic to an association list, but I chose a custom pair (Field) over the standard pair to make printing of documents nicer.
val "hello" or
val 42 to get a Value, and
cast aValue :: Maybe Bool to extract the Bool, or Nothing if it is not a Bool. The function (=:) constructs a Field but converts the second arg using val so you can construct fields directly from basic values, as in
["name" =: "Tony", "score" =: 42]. Types that are not technically a BSON basic type but compatible with one of them are also instance of Val so they can be used as if they were. Integer, Float, and String are examples of this, their Val instance converts to/from their compatible BSON basic type.
UString is a type synonym for CompactString which is a UTF-8 encoded string from the compact-string package. I chose this package over the text package because its native format is UTF-8 while text's native format is UTF-16 and thus would spend more time serializing to BSON which requires UTF-8. If and when the text package changes its native format to UTF-8 I will switch to it. In the meantime, you can make Text an instance of Val to automatically convert it to/from UString.
UString is an instance of IsString so literal strings can be interpreted as UStrings. Use the Language extension OverloadedStrings to enable this. If you don't use this extension, use the u function to convert a String to a UString. Field labels are also UStrings.
You may want to define fields ahead of time to help catch typos. For example, you can define
name = ("name" =:) :: UString -> Field and
score = ("score" =:) :: Int -> Field, and then construct a document as
[name "Tony", score 42]. This will ensure your fields have the correct label and type, and is more succinct.
To increase concurrency on a server connection and thus speed up threads sharing it, I pipeline requests over a connection, a' la HTTP pipelining. Pipelining means sending multiple requests over the socket and receiving the responses later in the same order. This is faster than sending one request, waiting for the response, then sending the next request, and so on. The pipelining implementation uses futures/promises, which are simply implemented as IO actions. You are not exposed to the pipelining, because it is internal to Cursor, which iterates over the results of a query. Internally, a query returns its cursor right away, locking the socket only briefly to write the request (allowing other threads to issue their queries). When a cursor is asked for its first result, it waits for the query response from the server. Also, when a cursor returns the last result of the current batch, it asynchronously requests the next batch from the server. This asynchronicity is automatic because the request returns a promise right away that the cursor will wait on when asked for the next result.
Every database read and write operation requires a connection to access, a database to effect, and an access mode to use. Furthermore, every operation may fail because of a connection failure or invalid operation. This context and failure is captured in a Reader and Error monad stacked on top of IO, called the Action monad. To access MongoDB, you sequence together several operations/actions that together accomplish a high-level task, and execute that task against a connection, database, and access mode. The execution will return Left if an operation failed or Right if all operations succeeded. Access mode indicates if stale reads (from slaves) are OK and if writes should be ensured and how.
You may notice that a DB action/task is analogous to a DB transaction in that the action aborts when one of its operations fails. However, for scalability reasons, MongoDB does not support ACID across multiple read/write operations, so the operations before the failed operation remain in effect. Your failure handler must be prepared to recover from this intermediate state. If your DB action is conceptually a single high-level task, then it should not be too hard to undo and redo that task even from an intermediate state.