Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open file descriptors leak (somehow related to lucene schema-index) #1799

Closed
tbaum opened this issue Jan 8, 2014 · 18 comments
Closed

open file descriptors leak (somehow related to lucene schema-index) #1799

tbaum opened this issue Jan 8, 2014 · 18 comments
Labels

Comments

@tbaum
Copy link
Contributor

tbaum commented Jan 8, 2014

During load (a mixture of Cypher + Java API calls) the number of open file handles increase drastically and finally stops the process (limit 10k)

Running the same code on a Linux-64 box works. The open file handle count is not exceeding 1k.

  • Neo4J 2.0.0, embedded
  • Mac-OS 10.9.1
  • JDK1.7 or JDK1.8

The application creates a database '/tmp/db3'

# lsof -p *jvm-pid* | grep /tmp/db3 | tr -s ' ' ' ' | cut -d' ' -f9 | cut -c 1-37 | sort | uniq -c | sort -r
6519 /private/tmp/db3/schema/index/lucene/
  43 /private/tmp/db3/schema/label/lucene/
  12 /private/tmp/db3/neostore.propertysto
  12 /private/tmp/db3/index/lucene/node/fu
   7 /private/tmp/db3/neostore.relationshi
   5 /private/tmp/db3/neostore.nodestore.d
   4 /private/tmp/db3/neostore.labeltokens
   2 /private/tmp/db3/neostore.schemastore
   1 /private/tmp/db3/tm_tx_log.2
   1 /private/tmp/db3/store_lock
   1 /private/tmp/db3/nioneo_logical.log.2
   1 /private/tmp/db3/neostore.id
   1 /private/tmp/db3/neostore
   1 /private/tmp/db3/messages.log
   1 /private/tmp/db3/lock
   1 /private/tmp/db3/index/lucene.log.2
   1 /private/tmp/db3/index/lucene-store.d

messages.log: https://gist.github.com/tbaum/e746a3cf83ba4ddbc806

@tbaum
Copy link
Contributor Author

tbaum commented Jan 8, 2014

#1437 might refer to the same issue

@apcj
Copy link
Contributor

apcj commented Jan 22, 2014

Hi Thomas, can you give us a hand to narrow this down? Is it reproducible? If so, can create a smallest-possible set of operations that causes a problem?
Also, if you suspect schema indexes, can you share the output of the :schema command and some node counts for each indexed label?

@tbaum
Copy link
Contributor Author

tbaum commented Feb 21, 2014

Hi Alistair,
I tried to reproduce this issue, but couldn't make it fail again. sorry.
Maybe it was caused by a JDK-bug.

  • closing this issue

@tbaum tbaum closed this as completed Feb 21, 2014
@tbaum tbaum reopened this Mar 17, 2014
@tbaum
Copy link
Contributor Author

tbaum commented Mar 17, 2014

reopening: found by accident an easy way to reproduce it:
https://github.com/tbaum/neo4j-fd-leak
... it should fail after some seconds

@apcj can you verify this problem on your machine?

@bytetwister
Copy link

@tbaum I too am experiencing the same problem and can reproduce the issue using the neo4j-fd-leak test case you created

I am running embedded neo4j on mac os x 10.9.2 with the latest jdk 7 and neo4j 2.0.2 (I tried 2.0.1 as well and experienced the same problem). Using neo4j 1.9.* I did not encounter this issue and only discovered it when attempting to upgrade to 2.*.

lsof showcases the problem as noted in the first comment

Any workarounds? This is obviously a show stopper for us until we can get it resolved.

If it helps, here is the stack trace I get when running neo4j-fd-leak

open files: 142
open files: 718
open files: 2003
open files: 3572
open files: 5224
open files: 6933
open files: 8597
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
at main.FileDescriptorLeak.loop(FileDescriptorLeak.java:53)
at main.FileDescriptorLeak.main(FileDescriptorLeak.java:37)
... 6 more
Caused by: org.neo4j.graphdb.TransactionFailureException: commit threw exception
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:502)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:403)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 8 more
Caused by: javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:552)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:464)
... 11 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /private/tmp/neo4j-fd-leak-master/target/schema/label/lucene/_4tf.fdt (Too many open files in system)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:908)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:812)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:752)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:579)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:490)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:544)
... 12 more
Caused by: java.io.FileNotFoundException: /private/tmp/neo4j-fd-leak-master/target/schema/label/lucene/_4tf.fdt (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:83)
at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:64)
at org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:107)
at org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:151)
at org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2328)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2300)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.updateDocument(LuceneLabelScanStore.java:147)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.flush(LuceneLabelScanWriter.java:135)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:80)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:905)
... 19 more
open files: 9869
open files: 9869
open files: 9869
open files: 9869
open files: 9869

@chrisvest
Copy link
Contributor

It looks like this is a problem specifically when you have two or more MERGE clauses in the same transaction (just breaking them apart from one query into several is not enough) and they don't match existing nodes, so nodes will have to be created. So the test can be simplified to that. Interesting and annoying. Well, it's being looked at. Also, turns out OS X really does not like it when you use up all of its file descriptors :P

@chrisvest
Copy link
Contributor

Thanks for writing that test program, @tbaum, it was very helpful. The fix is in the pipeline, so I'm closing the issue.

@danielcalencar
Copy link

was this fix integrated into 2.1.2?

@chrisvest
Copy link
Contributor

@danielcalencar Yes.

@wendelicious
Copy link

I'm still seeing this in 2.1.4 My transaction does:

  • create a new node with a label and a property
  • create a second new node with a label and a property
  • relate the first to the second. the relationship has a label and two properties

This happens if I do it all as a single transaction. This also happens if I do it in 3 respective transactions. I see this on Mac OS 10.9.4. I haven't attempted on linux; I'm working on a repro there.

Any suggestions?

@monowai
Copy link

monowai commented Oct 1, 2014

I'm seeing this issue in 2.1.4 as well, but haven't had the time to track down what is going wrong. I'm creating 3-6 new nodes with ~10 relationships per operation with 10 operations per transaction. It spits the dummy at around ~20k operations. OSX with file handles to 1000000. I've tried this with sequentially processing a batch and multi-threaded, but the result is the same.

messages.log does not have much more than the slightly edited stack trace below.

Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/93/_l6b.fdx (Too many open files)
    at org.neo4j.kernel.impl.api.index.IndexingService.processUpdateIfIndexExists(IndexingService.java:447)
    at org.neo4j.kernel.impl.api.index.IndexingService.applyUpdates(IndexingService.java:390)
    at org.neo4j.kernel.impl.api.index.IndexingService.updateIndexes(IndexingService.java:337)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:705)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
    at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
....
    Suppressed: org.neo4j.kernel.impl.nioneo.store.MultipleUnderlyingStorageExceptions: Errors when closing (flushing) index updaters: (:label[6](property[4])) java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files) (:label[75](property[6])) java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/89/_lbi.fdx (Too many open files)
        at org.neo4j.kernel.impl.api.index.IndexUpdaterMap.close(IndexUpdaterMap.java:100)
        at org.neo4j.kernel.impl.api.index.IndexingService.updateIndexes(IndexingService.java:338)
        ... 37 more
        Suppressed: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files)
            at org.neo4j.kernel.impl.api.index.IndexUpdaterMap.close(IndexUpdaterMap.java:92)
            ... 38 more
        Caused by: java.io.FileNotFoundException: /usr/local/Cellar/tomcat/8.0.9/libexec/bin/data/neo4j/ab/schema/index/lucene/14/_an2.fdx (Too many open files)
            at java.io.RandomAccessFile.open(Native Method)
            at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
            at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
            at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
            at org.apache.lucene.index.FieldsWriter.<init>(FieldsWriter.java:84)
            at org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:64)
            at org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:107)
            at org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:151)
            at org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
            at org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
            at org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
            at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
            at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
            at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
            at org.neo4j.kernel.api.impl.index.LuceneIndexAccessor.add(LuceneIndexAccessor.java:151)
            at org.neo4j.kernel.api.impl.index.LuceneIndexAccessor$LuceneIndexUpdater.process(LuceneIndexAccessor.java:186)

@jakewins
Copy link
Contributor

jakewins commented Oct 2, 2014

Hey guys, sorry about that, it looks like the fix in 2.1.4 didn't resolve all code paths triggering this. We're working on reproducing locally - any additional details y'all have to expand on the type of load or details of what operations you're running to trigger this above what's already in this ticket is very welcome.

@jakewins jakewins reopened this Oct 2, 2014
@monowai
Copy link

monowai commented Oct 3, 2014

~500k nodes. in transacted batches of ~100 nodes.

It crashes when the file count, of the command below, reaches ~10k - but that's OS dependant.

lsof | grep neo4j | wc -l

@lutovich
Copy link
Contributor

lutovich commented Oct 8, 2014

@monowai can you please provide some info about indexes in your setup. How many indexes exists? Are any of them touched during bulk insert? Thanks in advance!

@monowai
Copy link

monowai commented Oct 8, 2014

Approximately 5 different classes of node are in play. Each has at least one unique index. One of the nodes has 3.

@lutovich
Copy link
Contributor

lutovich commented Oct 8, 2014

@monowai I'm not able to reproduce the problem :(
My set setup is like this:

  1. 10 labels (label set), 10 property keys (prop set), 10 relationship types (rel set)
  2. 100 indexes / 100 unique constraints
  3. every transaction consists of 10 operations
  4. every operation is:
    a) create 100 nodes; each has random label from label set, 10 properties with random values
    b) create 1000 relationships between all 100 nodes; each has random type from rel set, 10 properties with random values

This works fine for 100 transactions in a row, though rarely I can see peaks of file descriptors usage up to 6k.
Can you please suggest what I'm missing?

@monowai
Copy link

monowai commented Oct 8, 2014

My setup:

  1. Each operation generates or updates ~10 nodes and ~10 relationships
    ~ 6 different nodes each with a label. Each node generally has a unique constraint
    1 Node has 3 indexes and two labels
    (your scenario looks fine, but maybe add multiple labels to the nodes?)
  2. Each transaction executes 10 operations
  3. There are 230k operations

Maybe the volume you are executing is insufficient to see the file resource leak for the hardware you have. The thing to note is that the file handle is never released - hence it exhausts the limit.

Feel free to reach me by direct email and I’ll share the email I sent PhilipR

On 8/10/2014, at 9:02 am, Konstantin Lutovich notifications@github.com wrote:

@monowai I'm not able to reproduce the problem :(
My set setup is like this:

  1. 10 labels (label set), 10 property keys (prop set), 10 relationship types (rel set)
  2. 100 indexes / 100 unique constraints
  3. every transaction consists of 10 operations
  4. every operation is:
    a) create 100 nodes; each has random label from label set, 10 properties with random values
    b) create 1000 relationships between all 100 nodes; each has random type from rel set, 10 properties with random values

This works fine for 100 transactions in a row, though rarely I can see peaks of file descriptors usage up to 6k.
Can you please suggest what I'm missing?


Reply to this email directly or view it on GitHub.

@lutovich
Copy link
Contributor

#3302 makes number of files required for lucene indexes much lower. General file descriptor usage should be more stable with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants