You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My project uses the bulk insert interface to create an embedded neo4j database where some nodes have many properties indexed by a legacy full text index. I have found that upgrading to a neo4j version that contains #8462 will cause our database import process to stall on inserting these nodes with many properties.
I have included demo code that takes many hours to insert a single node.
Neo4j Version: 3.2.0-alpha04 (but any release with #8462 should exhibit this bug) Operating System: Centos 7.1 API: Embedded Java API
Looking at jstack, I can see that the program spends a lot of time in Document.getFields
"main" #1 prio=5 os_prio=0 tid=0x00002b241c00c000 nid=0xa6f runnable [0x00002b2419403000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.document.Document.getFields(Document.java:176)
at org.neo4j.index.impl.lucene.legacy.IndexType.restoreSortFields(IndexType.java:397)
at org.neo4j.index.impl.lucene.legacy.IndexType.addToDocument(IndexType.java:231)
at org.neo4j.index.impl.lucene.legacy.LuceneBatchInserterIndex.addSingleProperty(LuceneBatchInserterIndex.java:126)
at org.neo4j.index.impl.lucene.legacy.LuceneBatchInserterIndex.add(LuceneBatchInserterIndex.java:96)
at neo4j_test.neo4j_test.Neo4jImport.main(Neo4jImport.java:43)
Looking through the source code, it looks like
Each property is inserted one at a time into LuceneBatchInserterIndex which calls restoreSortFields
restoreSortFields will iterate through each already inserted field
For each inserted field, it calls Lucene's getFields method which will again iterate through each already inserted field
In the end, inserting this one node will cause the fields array in Lucene's document to be iterated through O(n^3) times which ends up taking a very long time.
Expected behavior
The property should be bulk inserted into the index quickly.
Actual behavior
A single large node can take hours to insert causing database creation to essentially never complete.
The text was updated successfully, but these errors were encountered:
Bug Report
My project uses the bulk insert interface to create an embedded neo4j database where some nodes have many properties indexed by a legacy full text index. I have found that upgrading to a neo4j version that contains #8462 will cause our database import process to stall on inserting these nodes with many properties.
I have included demo code that takes many hours to insert a single node.
Neo4j Version: 3.2.0-alpha04 (but any release with #8462 should exhibit this bug)
Operating System: Centos 7.1
API: Embedded Java API
Steps to reproduce
This demo exhibits the problem.
Looking at jstack, I can see that the program spends a lot of time in Document.getFields
Looking through the source code, it looks like
In the end, inserting this one node will cause the fields array in Lucene's document to be iterated through O(n^3) times which ends up taking a very long time.
Expected behavior
The property should be bulk inserted into the index quickly.
Actual behavior
A single large node can take hours to insert causing database creation to essentially never complete.
The text was updated successfully, but these errors were encountered: