Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
DistributedDirectory is a Lucene Directory designed to write binary index files into a NoSQL DB that works with the standard Lucene IndexWriter and IndexReader. Currently only a MongoDB implementation exists. This is used by LuMongo but can used independented through the lumongo-storage maven artifact.
Files are chunked into blocks and stored into MongoDB. The blocks are cached in memory for quick retrieval. For a given Lucene index there are three collections in MongoDB: a counter collection, a files collection and a block collection. The counter collection serves simply as an auto increment count for file number. The files collection stores the fileNumber, the fileName, the length of the file, and the last modified time of the file stored as a long. The blocks collection stores the fileNumber, the blockNumber, and the block of bytes for that chunk. Currently the implementation is not safe across multiple machines/JVMs for writes.
See download page for maven information
Using with an IndexReader
MongoClient mongo = new MongoClient(hostName); Directory d = new DistributedDirectory(new MongoDirectory(mongo, databaseName, "someIndex")); IndexReader indexReader = DirectoryReader.open(d);
##Using with an IndexWriter
MongoClient mongo = new MongoClient(hostName); Directory directory = new DistributedDirectory(new MongoDirectory(mongo, databaseName, STORAGE_TEST_INDEX)); StandardAnalyzer analyzer = new StandardAnalyzer(LuceneConstants.VERSION); IndexWriterConfig config = new IndexWriterConfig(LuceneConstants.VERSION, analyzer); IndexWriter w = new IndexWriter(directory, config); boolean applyDeletes = true; //realtime reader off the index writer IndexReader ir = DirectoryReader.open(w, applyDeletes);
##Set global MongoDirectory Block Cache (number per JVM)
The cache size in memory is approximately the block size (32k default) * the number of blocks
#Convert to normal file system directory
MongoClient mongo = new MongoClient(hostName); DistributedDirectory d = new DistributedDirectory(new MongoDirectory(mongo, databaseName, STORAGE_TEST_INDEX)); d.copyToFSDirectory(new File("/tmp/fsdirectory")); d.close();
#Similar Projects The storing of binary files indexes was influenced heavily by lucene-on-cassandra.