For general notes on batch insertion, see [batchinsert].
Indexing during batch insertion is done using BatchInserterIndex which are provided via BatchInserterIndexProvider. An example:
component=neo4j-lucene-index-docs source=examples/ImdbDocTest.java tag=batchInsert
The configuration parameters are the same as mentioned in [indexing-create-advanced].
Here are some pointers to get the most performance out of BatchInserterIndex
:
-
Try to avoid flushing too often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.
-
Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.
-
Enable caching for keys that you will later do lookups. This can significantly increase performance (though insertion performance may degrade slightly).
Note
|
Changes to the index are available for reading first after they are flushed to disk. Thus, for optimal performance, read and lookup operations should be kept to a minimum during batch insertion since they involve IO and impact speed negatively. |