A simple redis storage engine for lucene
The repo is just a very simple implements for store lucene's index files in redis. I initially did this project is aims to be usable in production. It's a complete concise implementation, you can use a different jedis implements (Jedis/Jedis Pool/Sharded Jedis Pool/Jedis Cluster) without modifying the code. It supports index file slice, compress index file contents, mutex lock and redis file cache. With redis file cache it can help you improve the performance of writing index files. In this repo the lock implements by java nio file lock, it can release lock when jvm exit abnormal. If you use a singleton lock, then you can not achieve mutual exclusion across multi processes, or else if you use redis to store a flag as lock, then the flag will still store in redis when the java virtual machine exit abnormal. And when you use next time, you can not obtain lock again unless you delete the lock flag in the redis manual.
- JDK 1.8+
- Lucene 5.5.0+
- Jedis 2.9.0+
- Lombok 1.16.12+
- Log4j 2.6.2+
- Guava 20.0+
- Snappy-java 18.104.22.168+
- Clone the repo git clone firstname.lastname@example.org:shijiebei2009/RedisDirectory.git RedisDirectory
- cd RedisDirectory
- use maven commands or gradle commands to build the project
- Supports pool
- Supports sharding
- Supports cluster (not tested)
- Supports Maven or Gradle Compile
- Supports storage level distribution
Make sure you have the RedisDirectory.jar in you class path (Gradle or Maven can help you). To use it just like follows, you can set
in the redis.windows.conf if it occurs MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new WhitespaceAnalyzer()).setOpenMode(IndexWriterConfig .OpenMode.CREATE); JedisPool jedisPool = new JedisPool(new JedisPoolConfig(), "localhost", 6379); RedisDirectory redisDirectory = new RedisDirectory(new JedisPoolStream(jedisPool)); IndexWriter indexWriter = new IndexWriter(redisDirectory, indexWriterConfig); indexWriter.addDocument(...); indexWriter.close(); redisDirectory.close();
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new WhitespaceAnalyzer()).setOpenMode(IndexWriterConfig .OpenMode.CREATE); RedisDirectory redisDirectory = new RedisDirectory(new JedisStream("localhost", 6379)); IndexWriter indexWriter = new IndexWriter(redisDirectory, indexWriterConfig); indexWriter.addDocument(...); indexWriter.close(); redisDirectory.close();
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new WhitespaceAnalyzer()).setOpenMode(IndexWriterConfig .OpenMode.CREATE); List<JedisShardInfo> shards = new ArrayList<>(); JedisShardInfo si = new JedisShardInfo("localhost", 6379); shards.add(si); JedisPoolConfig jedisPoolConfig = new JedisPoolConfig(); ShardedJedisPool shardedJedisPool = new ShardedJedisPool(jedisPoolConfig, shards); RedisDirectory redisDirectory = new RedisDirectory(new ShardedJedisPoolStream(shardedJedisPool)); IndexWriter indexWriter = new IndexWriter(redisDirectory, indexWriterConfig); indexWriter.addDocument(...); indexWriter.close(); redisDirectory.close();
File is divided into blocks and stored as HASH in redis in binary format that can be loaded on demand. You can customise the block size by modifying the DEFAULT_BUFFER_SIZE in config file. Remember its a 1 time intialization once index is created on a particular size it can't be changed; higher block size causes lower fragmentation.
The index files will store in redis as follows:
directory metadata (user definition) => index file name => index file length
file metadata (user definition) => @index file name:block number => the block values
I've just started. Have to:
- Include support for Snappy compression to compress file block.
- Rock solid JUNIT test cases for each class.
- Enable atomic operations on RedisFile, this will allow multiple connections to manipulate single file.
- Redundancy support, maintain multiple copies of a file (or its blocks).
Simple Performance Test ( Windows 7, i7 4790CPU, 8GB, Redis-x64-3.2 )
On my computer with windows redis downloaded from here
developed by MSOpenTech. In command line, I run RedisDirectory jar file with arguments like this
java -Xms1024m -Xmx5120m -jar RedisDirectory-0.0.1.jar, and the performance test results are as below. When the redis as the store engine, before the
program start I will run
flushall in redis and after the program done, I get the index size by
info in redis commands line.
|Type||Documents||Fields||Write Time||Search Time(10 million)||Index Size|
|RedisDirectory (JedisPool)||10 million||15||423s||632s||used_memory_human:2.67G|
|RedisDirectory (Jedis)||10 million||15||452s||536s||used_memory_human:2.67G|
|RedisDirectory (ShardedJedisPool)||10 million||15||477s||790s||used_memory_human:2.67G|
The above test did not compress the index file. You can customise the compress index file or not by modifying
COMPRESS_FILE=false in config file.
Under normal circumstances, in the local machine test, compressed file performance is not as good as uncompressed file performance. In the 10 million
of documents test, the compression index file consumes write time about 680s.
Copyright 2016 Xu Wang
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.