Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files #731

Closed
fxprunayre opened this issue Jan 25, 2015 · 8 comments
Closed

Too many open files #731

fxprunayre opened this issue Jan 25, 2015 · 8 comments
Labels
Milestone

Comments

@fxprunayre
Copy link
Member

I get quite often too many open files exceptions.

eg. while indexing catalog with 700 records


     Marking the metadata as _indexingError=1 in index
2015-01-25 12:30:13,138 ERROR [geonetwork.index] - Failed to properly index geometry of metadata 400. Error: /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/index/spatialindex.dbf (Too many open files)
java.io.FileNotFoundException: /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/index/spatialindex.dbf (Too many open files)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
    at org.geotools.data.shapefile.ShpFiles.getReadChannel(ShpFiles.java:831)
    at org.geotools.data.shapefile.dbf.DbaseFileReader.<init>(DbaseFileReader.java:139)
    at org.geotools.data.shapefile.dbf.IndexedDbaseFileReader.<init>(IndexedDbaseFileReader.java:149)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.openDbfReader(IndexedShapefileDataStore.java:859)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.getAttributesReader(IndexedShapefileDataStore.java:561)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.createFeatureWriter(IndexedShapefileDataStore.java:965)
    at org.geotools.data.AbstractDataStore.getFeatureWriter(AbstractDataStore.java:470)
    at org.geotools.data.AbstractFeatureStore.removeFeatures(AbstractFeatureStore.java:361)
    at org.fao.geonet.ker

It also happens when using the catalog - viewing records.

@fxprunayre fxprunayre added the bug label Jan 25, 2015
@fxprunayre fxprunayre added this to the 3.0.0 milestone Jan 25, 2015
@jesseeichar
Copy link

How do you get the metadata? do you harvest from a server?

@fxprunayre
Copy link
Member Author

No it was on an existing db. But some of them may trigger exceptions on indexing - it may be related ? Will do more testing to identify the issue. I have observed this errors 7 times this week.

@jesseeichar
Copy link

I index 6500 metadata on geocat test and integration servers so your system is somehow different. we need to isolate what is different.

one thing we need to do is see what files are open. I suspsect they are the schema files but I would like verification. can you do the :

jps
lsof -p <pid> > openfiles.txt

And quickly check which files are open multiple times?

Then I will instrument the places I suspect that might provide links and will commit the instrumented code. Once done you can reindex your metadata and hopefully the instrumentation will lead us to the problem code.

Jesse

@fxprunayre
Copy link
Member Author

Not sure if this can be related, but I have probably something wrong with the Requests table which trigger that exception.


2015-01-25 16:39:16,362 DEBUG [jeeves.config.springutil.JeevesDispatcherServlet] - Successfully completed request
org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [insert into Requests (autogenerated, hits, ip, lang, query, type, requestdate, service, simple, sortby, spatialfilter, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [null]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch
    at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:643)
    at org.springframework.orm.jpa.vendor.HibernateJpaDialect.translateExceptionIfPossible(HibernateJpaDialect.java:106)
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:517)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:755)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:724)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:475)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:270)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:155)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.data.jpa.repository.support.LockModeRepositoryPostProcessor$LockModePopulatingMethodIntercceptor.invoke(LockModeRepositoryPostProcessor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy99.save(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor313.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.zeroturnaround.javarebel.integration.util.ReloadingProxyFactory$ReloadingMethodHandler.invoke(JRebel:74)
    at com.sun.proxy.$Proxy99.save(Unknown Source)
    at org.fao.geonet.kernel.search.log.QueryRequest.storeToDb(QueryRequest.java:268)
    at org.fao.geonet.kernel.search.log.SearcherLogger.logSearch(SearcherLogger.java:98)
    at org.fao.geonet.kernel.search.SearchLoggerTask.run(SearchLoggerTask.java:48)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.hibernate.exception.ConstraintViolationException: could not execute batch
    at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:129)
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)
    at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution(BatchingBatch.java:132)
    at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.doExecuteBatch(BatchingBatch.java:111)
    at org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.execute(AbstractBatchImpl.java:163)
    at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.getBatch(JdbcCoordinatorImpl.java:215)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3102)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3581)
    at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:104)
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:463)
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:349)
    at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:350)
    at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:56)
    at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1222)
    at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:425)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.beforeTransactionCommit(JdbcTransaction.java:101)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:177)
    at org.hibernate.jpa.internal.TransactionImpl.commit(TransactionImpl.java:77)
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:513)
    ... 25 more
Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into Requests (autogenerated, hits, ip, lang, query, type, requestdate, service, simple, sortby, spatialfilter, id) values ('0', '974', '127.0.0.1', 'fre', '+(_op0:2 _op2:2 _op0:1 _op2:1 _op0:0 _op2:0 _op0:-1 _op2:-1 _owner:22 _dummy:0) +(_isTemplate:y _isTemplate:n _isTemplate:s)', 'TERM', '2015-01-25T16:39:16', 'qi', '0', '_docLocale ASC,null ASC', NULL, '9232') was aborted.  Call getNextException to see the cause.

It looks like index XSLTs are opened a lot

 ls -l /proc/24823/fd | cut -c55-200 | sort | uniq -c

Return

    688 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/codelist/external/thesauri/theme/inspire-theme.rdf
   1391 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/convert/functions.xsl
      2 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/index-fields/index-subtemplate-fields.xsl
    696 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/index-fields/index-subtemplate-fields.xsl
     43 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-fn.xsl
     41 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-fn.xsl
     42 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl-multilingual.xsl
     42 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl-multilingual.xsl
     42 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl.xsl
     42 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl.xsl
     49 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/base-variables.xsl
     48 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/base-variables.xsl
     44 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/functions-core.xsl
     40 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/functions-core.xsl
     39 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-relations.xsl
     45 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-relations.xsl
     44 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-tpl-brief.xsl
     40 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-tpl-brief.xsl
      2 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/eea-layout.xsl


So maybe related to the oasis catalog resolver ?

@jesseeichar
Copy link

I have added 2 methods to IO (newInputStream and newBufferedReader) and made all calls of Files.newInputStream or newBufferedReader be changed to use those methods. In those methods I wrap the inputstream/reader with the Debugging equivalents (DebuggingReader and DebuggingInputStream).

These classes register an exception with the OpenResourceTracker class on creation and remove the exception on closing. Thus the OpenResourceTracker maintains a list of all the open resources (along with the file name) and when there are 1000 open files it will write to the log all of the open files and the stacktrace at the point when they were opened..

Now that I think about it, it will kill the system at that point because each time an resource is opened after 1000 then all open files will be logged repeatedly. essentially crashing the system. Maybe it should be changed to only print once.

If you get a linux error you can still:

  • put a break point in ... ServiceManager for example.
  • open the expression evaluator dialog
  • execute: System.out.println(org.fao.geonet.utils.debug.OpenResourceTracker.openExceptions.size())
  • execute: org.fao.geonet.utils.debug.OpenResourceTracker.printExceptions()

@jesseeichar
Copy link

Just committed a fix for this issue. Please verify that it fixes the issue. The fix is 7257438

It turned out that the parsing framework was calling both getInputStreamn and getReader on PathStreamSource which resulted in 2 streams being opened but only 1 was used and thus only 1 was closed. I made a test for this and verified that now the test passes.

I also made changes to the open resource tracking. It now will track resources when in development mode but not in production mode. In addition it will print an error when 1000, 2000, 3000 and 4000 resources are open. The error will consist of the first 100 open files and their stack traces.

@jesseeichar
Copy link

Reopening so that @fxprunayre can verify.

@jesseeichar jesseeichar reopened this Jan 26, 2015
@fxprunayre
Copy link
Member Author

Tested and it does not happen anymore. Thanks Jesse. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants