Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
src/java/org
tests
.gitignore
README.md
THANKS
build.xml

README.md

CouchDB-Lucene

This is a preliminary implementation of a Lucene indexer for CouchDB.

Dependancies

  1. Java - Probably requires 1.5.something
  2. CouchDB - trunk
  3. Git - something

Installation

This is assuming you have a working Java runtime environment and a working Ant installed.

$ git clone git://github.com/davisp/couchdb-lucene.git couchdb-lucene
$ cd couchdb-lucene
$ ant

Configuration

Assuming the build worked, you'll want to edit your local.ini config file for CouchDB to setup the external process

[external]
fti = /path/to/java -jar /path/to/couchdb-lucene/build/couchdb-lucene-0.1-dev.jar

[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}

A couple things to note:

  1. Remember to change /path/to/java and /path/to/couchdb-lucene with paths appropriate to your system
  2. The <<"fti">> specification in the [httpd_db_handlers] section must match the entry in the [external] section.
  3. By default, indexing is only committed every 500 updates or every 60 seconds, whichever comes first.

Config options can be added to the command line in the [external] section. These are just a few of the possible settings. For a full list see org.apache.couchdb.lucene.Config.

-Xcouchdb.index.bulk=500        # Number of documents to request at once
-Xcouchdb.index.rambuf=128      # RAM buffer size in MiB
-Xcouchdb.commit.interval=60    # Commit changes a minimum of every N seconds
-Xcouchdb.commit.updates=500    # Commit changes a minimum of every N updates (Node wide)
-Xcouchdb.debug.enabled="true"  # Enable DEBUG mode that addes a few query string parameters.

Indexing

The basic idea for indexing is that you specific a _design/lucene document that in turn specifies a set of views that will be indexed by couchdb-lucene. You can specify as many views for indexing as you desire.

Example _design document:

{
    "_id": "_design/lucene",
    "_rev": "232924",
    "views": {
        "foo": {
            "map": "function(doc) {if(doc.foo) emit(doc._id, doc.foo);}"
        },
        "bar": {
            "map": "function(doc) {if(doc.bar) emit(doc._id, doc.bar);}"
        }
    },
}

IMPORTANT You must emit(doc._id, value_to_index). If you don't emit a key that is the docid, nothing will get indexed.

Querying

Parameters:

  1. q: A query string. This is processed by Lucene's QueryParser
  2. limit: Limit the number of results returned.
  3. skip: Skip over the first N results.

Debug Parameters:

  1. debug=true: Wait until the indexing catches up to the database update_seq that was passed with this request. If -Xcouchdb.debug.enabled="true" is specified on the command line, this defaults to true. If debug is not enabled, this parameter has no effect.
  2. destroy=true: Remove the entire database from the Lucene index. Only available when debug mode is enabled.

Example URLs:

  • http://127.0.0.1:5984/db_name/_fti?q=foo:query
  • http://127.0.0.1:5984/db_name/_fti?q=foo:plankton+bar:goat&limit=1&skip=1

Example Results:

{
    "total_rows":2,
    "offset": 0,
    "rows": [
        {"id":"test","score":0.26010897755622864},
        {"id":"test2","score":0.2229505479335785}
    ]
}

Feedback

I'm looking for feedback on this whole Lucene business. So use it and report back with any errors or tracebacks or other generally unexpected behavior.