Collaborative Document Review Web Application README
The original Web application was part of a Docbook-based toolchain that was used to produce Real World Haskell, and was written in Python using Django. This implementation is intended to be as independent as possible from the authoring system that is used to produce a document, and to run in a wide variety of environments.
If you have feature requests, bug reports or other feedback, please let me know!
For most users, you can be up and running by following these steps:
Prepare your documents:
- Add a
class="chapter"around the content that you will want comments on.
- Add an
idattribute to all of the paragraphs that you wish to enable comments on.
- Add a
Build and install using
cabal installto obtain a doc-review executable:
$ cabal install doc-review
Test your documents:
$ doc-review run --content-dir=$PATH_TO_YOUR_DOCUMENTS
Comments you leave when testing will not be saved when the server is restarted.
Select your backend. Right now, that probably means the SQLite backend:
$ doc-review run --content-dir=$PATH_TO_YOUR_DOCUMENTS --store=sqlite:comments.db
This command will create
comments.dbif necessary, and store all comments in that SQLite database.
Decide how you are going to run the server. Running this program as a daemon and configuring your Web server to use reverse proxying is the most straightforward solution.
Marking up documents
doc-review will recursively traverse the directory specified by the
--content-dir parameter looking for files with the extension
.xhtml. It will parse those files as HTML,
looking for paragraphs marked as commentable, and will store those
chapter definitions in the data store.
In order for a document to be commentable, you must load the comment
<head> of the document:
You will likely want to (but by no means have to) reuse the CSS for the comments:
<link rel="stylesheet" type="text/css" href="/comments.css"/>
To mark a paragraph as commentable, it must be inside of a
class="chapter". The choice of
class="chapter" is for
compatibility with the Real World Haskell implementation.
The current implementation (like the Real World Haskell
implementation) depends on the
id being unique across the full set
of documents that you want comments on. That means that if you have
two documents with paragraphs with
comments left on either of those paragraphs will be visible in both
documents. This can be useful if you have duplicated content, but if
the content is different, user (or author!) confusion can result.
Running the server
The server is a plain HTTP server that serves three kinds of files:
1. URLs under
/comments/, which serve the comments API. This is
the document and to save user comments. These are the only dynamic
URLs that will be served.
2. The URL will be checked against the files in the directory
--content-dir=. That static file will be served if
there is a match.
3. If no matching file was found, it will look for a matching file
in the directory specified by
--static-dir= and serve that.
It is likely that you will be integrating this server into a larger Web site. In that case, you will likely want to reverse proxy requests from your main Web server to this server (using e.g. <http://httpd.apache.org/docs/2.2/mod/mod_proxy.html>). If you are running this server proxied by another Web server, you can serve the content and other static files from that server to no ill effect (those files are served by default for convenience).
There are three kinds of storage backend implemented at this time:
- Store the comments in the server process' memory (no persistent storage). This means that all comments will be lost when the server is restarted. This is useful for testing and as a reference implementation of the storage API.
- Flat file
- Store the comments in flat files in the filesystem. The comments and other data are stored in a custom binary format. This backend is known to have race conditions and other non-ACID properties. It is not recommended that you use this store.
- Store the comments in a SQLite database. This is the best option for production use at this time. That being said, this is not really a production-ready solution, because database errors (e.g. a SQLITE_BUSY timeout) will result in a 500 error being returned to the client.
The output to
doc-review --help will indicate how to specify each
The store API is well defined and easy to implement and test. Patches for data storage improvements are welcome!
In addition to the storage options specified above, there is an experimental binary logging option that will append a binary log record of each store operation to a file in addition to applying it to the store. This was implemented as a backup mechanism should the primary store be corrupted, as replaying the operations from the log should restore the contents of the data store.
Note that this option is not well tested, and may disappear in future releases.
This section discusses some implementation details that may be useful for examining the data in the database or implementing your own storage backend. As always, the code is the best reference, but this discussion should help you get started and serve as a rough specification for what the code ought to do when it's not inherently clear.
This server stores a session cookie for each browsing session that is renewed on each request. The session cookie is used to look up the user information to prefill when showing the add comment form. It is also stored in the database so that the author/administrator can see which comments came from which browser. It is a rather imprecise mechanism, and easy to spoof (just send whatever session cookie you want), but it is helpful for the user not to have to re-fill the form fields. The session cookie expires after 11 days without visiting any page on the site.
There is a test suite, which will be build when the parameter
--flags=test is supplied to cabal-install. The test suite only
tests the storage backends. The remainder of the code currently has no
automated tests. The backends are tested using randomized testing for
consistency with each other as well as some relatively trivial, but
The tests do not test concurrent access to the stores. There is no specification of the behavior of the stores under concurrent access. The SQLite and in-memory stores serialize access to the backend between threads, so concurrency should not be an issue, but the file-based backend may cause data loss under concurrent use. Tests welcome.
To test the stores for consistency, the test suite creates two empty stores of different types and then randomly generates store operations. The store operations are performed to each store in turn, checking that the operation returns the same result for both stores. This does not show that the stores behave correctly, but it does provide evidence that the implementations are consistent with each other.
There are not many tests for correctness, but there are a few tests that perform an operation with a specified effect on the backend and then make observations that the desired effect has occurred. These tests are run with each store in an empty state, and then a sequence of randomized operations that perturb the store's state are performed. The properties are once again checked. This process is repeated. This should provide evidence that the specified properties hold for the store without depending on it being in a particular state.
As usual, there are a whole list of features and changes that I'd like to make to this program. See TODO for this list. If a feature is important to you, or if you have an idea for a new feature, please let me know. The best way is to submit a patch!