open file descriptors leak (somehow related to lucene schema-index) #1799

tbaum · 2014-01-08T14:54:59Z

During load (a mixture of Cypher + Java API calls) the number of open file handles increase drastically and finally stops the process (limit 10k)

Running the same code on a Linux-64 box works. The open file handle count is not exceeding 1k.

Neo4J 2.0.0, embedded
Mac-OS 10.9.1
JDK1.7 or JDK1.8

The application creates a database '/tmp/db3'

# lsof -p *jvm-pid* | grep /tmp/db3 | tr -s ' ' ' ' | cut -d' ' -f9 | cut -c 1-37 | sort | uniq -c | sort -r
6519 /private/tmp/db3/schema/index/lucene/
  43 /private/tmp/db3/schema/label/lucene/
  12 /private/tmp/db3/neostore.propertysto
  12 /private/tmp/db3/index/lucene/node/fu
   7 /private/tmp/db3/neostore.relationshi
   5 /private/tmp/db3/neostore.nodestore.d
   4 /private/tmp/db3/neostore.labeltokens
   2 /private/tmp/db3/neostore.schemastore
   1 /private/tmp/db3/tm_tx_log.2
   1 /private/tmp/db3/store_lock
   1 /private/tmp/db3/nioneo_logical.log.2
   1 /private/tmp/db3/neostore.id
   1 /private/tmp/db3/neostore
   1 /private/tmp/db3/messages.log
   1 /private/tmp/db3/lock
   1 /private/tmp/db3/index/lucene.log.2
   1 /private/tmp/db3/index/lucene-store.d

messages.log: https://gist.github.com/tbaum/e746a3cf83ba4ddbc806

The text was updated successfully, but these errors were encountered:

tbaum · 2014-01-08T16:08:31Z

#1437 might refer to the same issue

apcj · 2014-01-22T15:35:33Z

Hi Thomas, can you give us a hand to narrow this down? Is it reproducible? If so, can create a smallest-possible set of operations that causes a problem?
Also, if you suspect schema indexes, can you share the output of the :schema command and some node counts for each indexed label?

tbaum · 2014-02-21T11:19:40Z

Hi Alistair,
I tried to reproduce this issue, but couldn't make it fail again. sorry.
Maybe it was caused by a JDK-bug.

closing this issue

tbaum · 2014-03-17T17:18:01Z

reopening: found by accident an easy way to reproduce it:
https://github.com/tbaum/neo4j-fd-leak
... it should fail after some seconds

@apcj can you verify this problem on your machine?

bytetwister · 2014-04-28T17:37:32Z

@tbaum I too am experiencing the same problem and can reproduce the issue using the neo4j-fd-leak test case you created

I am running embedded neo4j on mac os x 10.9.2 with the latest jdk 7 and neo4j 2.0.2 (I tried 2.0.1 as well and experienced the same problem). Using neo4j 1.9.* I did not encounter this issue and only discovered it when attempting to upgrade to 2.*.

lsof showcases the problem as noted in the first comment

Any workarounds? This is obviously a show stopper for us until we can get it resolved.

If it helps, here is the stack trace I get when running neo4j-fd-leak

open files: 142
open files: 718
open files: 2003
open files: 3572
open files: 5224
open files: 6933
open files: 8597
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
at main.FileDescriptorLeak.loop(FileDescriptorLeak.java:53)
at main.FileDescriptorLeak.main(FileDescriptorLeak.java:37)
... 6 more
Caused by: org.neo4j.graphdb.TransactionFailureException: commit threw exception
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:502)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:403)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 8 more
Caused by: javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:552)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:464)
... 11 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /private/tmp/neo4j-fd-leak-master/target/schema/label/lucene/_4tf.fdt (Too many open files in system)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:908)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:812)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:752)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:579)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:490)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:544)
... 12 more
Caused by: java.io.FileNotFoundException: /private/tmp/neo4j-fd-leak-master/target/schema/label/lucene/_4tf.fdt (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:83)
at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:64)
at org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:107)
at org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:151)
at org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2328)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2300)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.updateDocument(LuceneLabelScanStore.java:147)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.flush(LuceneLabelScanWriter.java:135)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:80)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:905)
... 19 more
open files: 9869
open files: 9869
open files: 9869
open files: 9869
open files: 9869

chrisvest · 2014-05-06T18:10:44Z

It looks like this is a problem specifically when you have two or more MERGE clauses in the same transaction (just breaking them apart from one query into several is not enough) and they don't match existing nodes, so nodes will have to be created. So the test can be simplified to that. Interesting and annoying. Well, it's being looked at. Also, turns out OS X really does not like it when you use up all of its file descriptors :P

chrisvest · 2014-05-07T11:14:44Z

Thanks for writing that test program, @tbaum, it was very helpful. The fix is in the pipeline, so I'm closing the issue.

danielcalencar · 2014-07-18T03:16:13Z

was this fix integrated into 2.1.2?

chrisvest · 2014-07-18T09:22:54Z

@danielcalencar Yes.

wendelicious · 2014-09-25T16:29:24Z

I'm still seeing this in 2.1.4 My transaction does:

create a new node with a label and a property
create a second new node with a label and a property
relate the first to the second. the relationship has a label and two properties

This happens if I do it all as a single transaction. This also happens if I do it in 3 respective transactions. I see this on Mac OS 10.9.4. I haven't attempted on linux; I'm working on a repro there.

Any suggestions?

monowai · 2014-10-01T23:39:35Z

I'm seeing this issue in 2.1.4 as well, but haven't had the time to track down what is going wrong. I'm creating 3-6 new nodes with ~10 relationships per operation with 10 operations per transaction. It spits the dummy at around ~20k operations. OSX with file handles to 1000000. I've tried this with sequentially processing a batch and multi-threaded, but the result is the same.

messages.log does not have much more than the slightly edited stack trace below.

Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/93/_l6b.fdx (Too many open files)
    at org.neo4j.kernel.impl.api.index.IndexingService.processUpdateIfIndexExists(IndexingService.java:447)
    at org.neo4j.kernel.impl.api.index.IndexingService.applyUpdates(IndexingService.java:390)
    at org.neo4j.kernel.impl.api.index.IndexingService.updateIndexes(IndexingService.java:337)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:705)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
    at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
....
    Suppressed: org.neo4j.kernel.impl.nioneo.store.MultipleUnderlyingStorageExceptions: Errors when closing (flushing) index updaters: (:label[6](property[4])) java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files) (:label[75](property[6])) java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/89/_lbi.fdx (Too many open files)
        at org.neo4j.kernel.impl.api.index.IndexUpdaterMap.close(IndexUpdaterMap.java:100)
        at org.neo4j.kernel.impl.api.index.IndexingService.updateIndexes(IndexingService.java:338)
        ... 37 more
        Suppressed: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files)
            at org.neo4j.kernel.impl.api.index.IndexUpdaterMap.close(IndexUpdaterMap.java:92)
            ... 38 more
        Caused by: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files)
            at java.io.RandomAccessFile.open(Native Method)
            at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
            at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
            at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
            at org.apache.lucene.index.FieldsWriter.<init>(FieldsWriter.java:84)
            at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:64)
            at org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:107)
            at org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:151)
            at org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
            at org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
            at org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
            at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
            at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
            at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
            at org.neo4j.kernel.api.impl.index.LuceneIndexAccessor.add(LuceneIndexAccessor.java:151)
            at org.neo4j.kernel.api.impl.index.LuceneIndexAccessor$LuceneIndexUpdater.process(LuceneIndexAccessor.java:186)

jakewins · 2014-10-02T04:02:30Z

Hey guys, sorry about that, it looks like the fix in 2.1.4 didn't resolve all code paths triggering this. We're working on reproducing locally - any additional details y'all have to expand on the type of load or details of what operations you're running to trigger this above what's already in this ticket is very welcome.

monowai · 2014-10-03T13:34:01Z

~500k nodes. in transacted batches of ~100 nodes.

It crashes when the file count, of the command below, reaches ~10k - but that's OS dependant.

lsof | grep neo4j | wc -l

lutovich · 2014-10-08T11:34:12Z

@monowai can you please provide some info about indexes in your setup. How many indexes exists? Are any of them touched during bulk insert? Thanks in advance!

monowai · 2014-10-08T11:48:13Z

Approximately 5 different classes of node are in play. Each has at least one unique index. One of the nodes has 3.

lutovich · 2014-10-08T13:01:55Z

@monowai I'm not able to reproduce the problem :(
My set setup is like this:

10 labels (label set), 10 property keys (prop set), 10 relationship types (rel set)
100 indexes / 100 unique constraints
every transaction consists of 10 operations
every operation is:
a) create 100 nodes; each has random label from label set, 10 properties with random values
b) create 1000 relationships between all 100 nodes; each has random type from rel set, 10 properties with random values

This works fine for 100 transactions in a row, though rarely I can see peaks of file descriptors usage up to 6k.
Can you please suggest what I'm missing?

monowai · 2014-10-08T13:13:15Z

My setup:

Each operation generates or updates ~10 nodes and ~10 relationships
~ 6 different nodes each with a label. Each node generally has a unique constraint
1 Node has 3 indexes and two labels
(your scenario looks fine, but maybe add multiple labels to the nodes?)
Each transaction executes 10 operations
There are 230k operations

Maybe the volume you are executing is insufficient to see the file resource leak for the hardware you have. The thing to note is that the file handle is never released - hence it exhausts the limit.

Feel free to reach me by direct email and I’ll share the email I sent PhilipR

On 8/10/2014, at 9:02 am, Konstantin Lutovich notifications@github.com wrote:

@monowai I'm not able to reproduce the problem :(
My set setup is like this:

10 labels (label set), 10 property keys (prop set), 10 relationship types (rel set)

100 indexes / 100 unique constraints

every transaction consists of 10 operations

every operation is:
a) create 100 nodes; each has random label from label set, 10 properties with random values
b) create 1000 relationships between all 100 nodes; each has random type from rel set, 10 properties with random values

This works fine for 100 transactions in a row, though rarely I can see peaks of file descriptors usage up to 6k.
Can you please suggest what I'm missing?

—
Reply to this email directly or view it on GitHub.

lutovich · 2014-10-26T22:13:03Z

#3302 makes number of files required for lucene indexes much lower. General file descriptor usage should be more stable with this fix.

tbaum closed this as completed Feb 21, 2014

tbaum reopened this Mar 17, 2014

chrisvest mentioned this issue May 7, 2014

2.0 file descriptor leak #2380

Merged

chrisvest closed this as completed May 7, 2014

jakewins reopened this Oct 2, 2014

lutovich closed this as completed Oct 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open file descriptors leak (somehow related to lucene schema-index) #1799

open file descriptors leak (somehow related to lucene schema-index) #1799

tbaum commented Jan 8, 2014

tbaum commented Jan 8, 2014

apcj commented Jan 22, 2014

tbaum commented Feb 21, 2014

tbaum commented Mar 17, 2014

bytetwister commented Apr 28, 2014

chrisvest commented May 6, 2014

chrisvest commented May 7, 2014

danielcalencar commented Jul 18, 2014

chrisvest commented Jul 18, 2014

wendelicious commented Sep 25, 2014

monowai commented Oct 1, 2014

jakewins commented Oct 2, 2014

monowai commented Oct 3, 2014

lutovich commented Oct 8, 2014

monowai commented Oct 8, 2014

lutovich commented Oct 8, 2014

monowai commented Oct 8, 2014

lutovich commented Oct 26, 2014

open file descriptors leak (somehow related to lucene schema-index) #1799

open file descriptors leak (somehow related to lucene schema-index) #1799

Comments

tbaum commented Jan 8, 2014

tbaum commented Jan 8, 2014

apcj commented Jan 22, 2014

tbaum commented Feb 21, 2014

tbaum commented Mar 17, 2014

bytetwister commented Apr 28, 2014

chrisvest commented May 6, 2014

chrisvest commented May 7, 2014

danielcalencar commented Jul 18, 2014

chrisvest commented Jul 18, 2014

wendelicious commented Sep 25, 2014

monowai commented Oct 1, 2014

jakewins commented Oct 2, 2014

monowai commented Oct 3, 2014

lutovich commented Oct 8, 2014

monowai commented Oct 8, 2014

lutovich commented Oct 8, 2014

monowai commented Oct 8, 2014

lutovich commented Oct 26, 2014