[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

valeriy42 · 2021-07-26T12:29:14Z

It may be related to #74372

Build scan:
https://gradle-enterprise.elastic.co/s/72dm3rzywepgo/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests/classMethod

Reproduction line:
null

Applicable branches:
master

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests&tests.test=classMethod

Failure excerpt:

java.lang.RuntimeException: file handle leaks: [FileChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob+platform-support-unix/os/debian-9&&immutable/x-pack/plugin/searchable-snapshots/build/testrun/internalClusterTest/temp/org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests_7EAC4EFD7E9BAE8F-001/tempDir-002/node_s1/indices/GgGGxyiqTp6hubUY4smlEg/7/snapshot_cache/e0CRTwPxRfWYDpryprgKUA/lSh6MQ6ETIKYawrp-oBFqw)]

  at __randomizedtesting.SeedInfo.seed([7EAC4EFD7E9BAE8F]:0)
  at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
  at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
  at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
  at org.apache.lucene.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:228)
  at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:834)

  Caused by: java.lang.Exception: (No message provided)

    at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
    at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
    at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:197)
    at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:166)
    at java.nio.channels.FileChannel.open(FileChannel.java:292)
    at java.nio.channels.FileChannel.open(FileChannel.java:345)
    at org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile$FileChannelReference.<init>(CacheFile.java:119)
    at org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile.acquire(CacheFile.java:200)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput$CacheFileReference.get(MetadataCachingIndexInput.java:466)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput.readWithBlobCache(MetadataCachingIndexInput.java:124)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput.doReadInternal(MetadataCachingIndexInput.java:106)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.BaseSearchableSnapshotIndexInput.readInternal(BaseSearchableSnapshotIndexInput.java:112)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:315)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:56)
    at org.apache.lucene.store.DataInput.readInt(DataInput.java:102)
    at org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:173)
    at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
    at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:79)
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:71)
    at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:101)
    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:83)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:668)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:496)
    at org.elasticsearch.index.store.Store.checkIndex(Store.java:332)
    at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:2647)
    at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:2591)
    at org.elasticsearch.index.shard.IndexShard.maybeCheckIndex(IndexShard.java:2581)
    at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1654)
    at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$7(StoreRecovery.java:455)
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134)
    at org.elasticsearch.action.ActionListener$DelegatingActionListener.onResponse(ActionListener.java:184)
    at org.elasticsearch.repositories.blobstore.FileRestoreContext.lambda$restore$0(FileRestoreContext.java:161)
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$11.restoreFiles(BlobStoreRepository.java:2888)
    at org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:157)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$restoreShard$80(BlobStoreRepository.java:2985)
    at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(Thread.java:834)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-26T12:29:16Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-08-03T13:09:19Z

This is reproducible by slowing down org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile#populateAndRead before it submits tasks for each gap to the executor. Apparently we're never registering the situation that the executor was shut down and thus just quietly fail to close the file that would be closed by the executor tasks. Looking into a fix

We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686

We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686

…76070) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686

…76095) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686

…76092) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686

valeriy42 added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jul 26, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Jul 26, 2021

original-brownbear self-assigned this Jul 28, 2021

original-brownbear mentioned this issue Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete #76070

Merged

original-brownbear closed this as completed in #76070 Aug 4, 2021

original-brownbear mentioned this issue Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete (#76070) #76092

Merged

original-brownbear mentioned this issue Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete (#76070) #76095

Merged

benwtrent mentioned this issue Aug 30, 2021

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #77017

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

valeriy42 commented Jul 26, 2021

elasticmachine commented Jul 26, 2021

original-brownbear commented Aug 3, 2021

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

Comments

valeriy42 commented Jul 26, 2021

elasticmachine commented Jul 26, 2021

original-brownbear commented Aug 3, 2021