Index Batch Insertion

For general notes on batch insertion, see [batchinsert].

Indexing during batch insertion is done using BatchInserterIndex which are provided via BatchInserterIndexProvider. An example:

component=neo4j-lucene-index-docs
source=examples/ImdbDocTest.java
tag=batchInsert

The configuration parameters are the same as mentioned in [indexing-create-advanced].

Best practices

Here are some pointers to get the most performance out of BatchInserterIndex:

Try to avoid flushing too often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.
Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.
Enable caching for keys that you will later do lookups. This can significantly increase performance (though insertion performance may degrade slightly).

Note	Changes to the index are available for reading first after they are flushed to disk. Thus, for optimal performance, read and lookup operations should be kept to a minimum during batch insertion since they involve IO and impact speed negatively.