Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

Closed
valeriy42 opened this issue Jul 26, 2021 · 2 comments · Fixed by #76070
Closed

[CI] FrozenSearchableSnapshotsIntegTests classMethod failing #75686

valeriy42 opened this issue Jul 26, 2021 · 2 comments · Fixed by #76070
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@valeriy42
Copy link
Contributor

It may be related to #74372

Build scan:
https://gradle-enterprise.elastic.co/s/72dm3rzywepgo/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests/classMethod

Reproduction line:
null

Applicable branches:
master

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests&tests.test=classMethod

Failure excerpt:

java.lang.RuntimeException: file handle leaks: [FileChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob+platform-support-unix/os/debian-9&&immutable/x-pack/plugin/searchable-snapshots/build/testrun/internalClusterTest/temp/org.elasticsearch.xpack.searchablesnapshots.FrozenSearchableSnapshotsIntegTests_7EAC4EFD7E9BAE8F-001/tempDir-002/node_s1/indices/GgGGxyiqTp6hubUY4smlEg/7/snapshot_cache/e0CRTwPxRfWYDpryprgKUA/lSh6MQ6ETIKYawrp-oBFqw)]

  at __randomizedtesting.SeedInfo.seed([7EAC4EFD7E9BAE8F]:0)
  at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
  at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
  at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
  at org.apache.lucene.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:228)
  at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:834)

  Caused by: java.lang.Exception: (No message provided)

    at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
    at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
    at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:197)
    at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:166)
    at java.nio.channels.FileChannel.open(FileChannel.java:292)
    at java.nio.channels.FileChannel.open(FileChannel.java:345)
    at org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile$FileChannelReference.<init>(CacheFile.java:119)
    at org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile.acquire(CacheFile.java:200)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput$CacheFileReference.get(MetadataCachingIndexInput.java:466)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput.readWithBlobCache(MetadataCachingIndexInput.java:124)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.MetadataCachingIndexInput.doReadInternal(MetadataCachingIndexInput.java:106)
    at org.elasticsearch.xpack.searchablesnapshots.store.input.BaseSearchableSnapshotIndexInput.readInternal(BaseSearchableSnapshotIndexInput.java:112)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:315)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:56)
    at org.apache.lucene.store.DataInput.readInt(DataInput.java:102)
    at org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:173)
    at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
    at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:79)
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:71)
    at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:101)
    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:83)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:668)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:496)
    at org.elasticsearch.index.store.Store.checkIndex(Store.java:332)
    at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:2647)
    at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:2591)
    at org.elasticsearch.index.shard.IndexShard.maybeCheckIndex(IndexShard.java:2581)
    at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1654)
    at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$7(StoreRecovery.java:455)
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134)
    at org.elasticsearch.action.ActionListener$DelegatingActionListener.onResponse(ActionListener.java:184)
    at org.elasticsearch.repositories.blobstore.FileRestoreContext.lambda$restore$0(FileRestoreContext.java:161)
    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$11.restoreFiles(BlobStoreRepository.java:2888)
    at org.elasticsearch.repositories.blobstore.FileRestoreContext.restore(FileRestoreContext.java:157)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$restoreShard$80(BlobStoreRepository.java:2985)
    at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(Thread.java:834)

@valeriy42 valeriy42 added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jul 26, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jul 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear original-brownbear self-assigned this Jul 28, 2021
@original-brownbear
Copy link
Member

This is reproducible by slowing down org.elasticsearch.xpack.searchablesnapshots.cache.common.CacheFile#populateAndRead before it submits tasks for each gap to the executor. Apparently we're never registering the situation that the executor was shut down and thus just quietly fail to close the file that would be closed by the executor tasks. Looking into a fix

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 4, 2021
We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as elastic#46178 where we had the same bug in recoveries

closes elastic#75686
original-brownbear added a commit that referenced this issue Aug 4, 2021
We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as #46178 where we had the same bug in recoveries

closes #75686
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 4, 2021
…76070)

We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as elastic#46178 where we had the same bug in recoveries

closes elastic#75686
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 4, 2021
…76070)

We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as elastic#46178 where we had the same bug in recoveries

closes elastic#75686
original-brownbear added a commit that referenced this issue Aug 4, 2021
…76095)

We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as #46178 where we had the same bug in recoveries

closes #75686
original-brownbear added a commit that referenced this issue Aug 4, 2021
…76092)

We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as #46178 where we had the same bug in recoveries

closes #75686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants