Distributed Directory

Matt Davis edited this page Sep 13, 2015 · 4 revisions

DistributedDirectory is a Lucene Directory designed to write binary index files into a NoSQL DB that works with the standard Lucene IndexWriter and IndexReader. Currently only a MongoDB implementation exists. This is used by LuMongo but can used independented through the lumongo-storage maven artifact.

MongoDB Implementation

Files are chunked into blocks and stored into MongoDB. The blocks are cached in memory for quick retrieval. For a given Lucene index there are three collections in MongoDB: a counter collection, a files collection and a block collection. The counter collection serves simply as an auto increment count for file number. The files collection stores the fileNumber, the fileName, the length of the file, and the last modified time of the file stored as a long. The blocks collection stores the fileNumber, the blockNumber, and the block of bytes for that chunk. Currently the implementation is not safe across multiple machines/JVMs for writes.

Download

See download page for maven information

Usage

Using with an IndexReader

MongoClient mongo = new MongoClient(hostName);
Directory d = new DistributedDirectory(new MongoDirectory(mongo, databaseName, "someIndex"));
IndexReader indexReader = DirectoryReader.open(d);

##Using with an IndexWriter

MongoClient mongo = new MongoClient(hostName);
Directory directory = new DistributedDirectory(new MongoDirectory(mongo, databaseName, STORAGE_TEST_INDEX));

StandardAnalyzer analyzer = new StandardAnalyzer(LuceneConstants.VERSION);
IndexWriterConfig config = new IndexWriterConfig(LuceneConstants.VERSION, analyzer);
IndexWriter w = new IndexWriter(directory, config);

boolean applyDeletes = true;

//realtime reader off the index writer   
IndexReader ir = DirectoryReader.open(w, applyDeletes);

##Set global MongoDirectory Block Cache (number per JVM)

MongoDirectory.setMaxIndexBlocks(numberOfBlocks);

The cache size in memory is approximately the block size (32k default) * the number of blocks

#Convert to normal file system directory

MongoClient mongo = new MongoClient(hostName);
DistributedDirectory d = new DistributedDirectory(new MongoDirectory(mongo, databaseName, STORAGE_TEST_INDEX));

d.copyToFSDirectory(new File("/tmp/fsdirectory"));

d.close();

#Similar Projects The storing of binary files indexes was influenced heavily by lucene-on-cassandra.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.