Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] IndexShardIT.testShardHasMemoryBufferOnTranslogRecover failure #37111

Closed
romseygeek opened this issue Jan 3, 2019 · 1 comment
Closed
Assignees
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@romseygeek
Copy link
Contributor

This reproduces on master:

REPRODUCE WITH: ./gradlew :server:integTest \
  -Dtests.seed=9EC7D98FCA15BCE6 \
  -Dtests.class=org.elasticsearch.index.shard.IndexShardIT \
  -Dtests.method="testShardHasMemoryBufferOnTranslogRecover" \
  -Dtests.security.manager=true \
  -Dtests.locale=sv \
  -Dtests.timezone=Pacific/Auckland \
  -Dcompiler.java=11 \
  -Druntime.java=8

The resulting error is:

ERROR   4.25s | IndexShardIT.testShardHasMemoryBufferOnTranslogRecover <<< FAILURES!
   > Throwable #1: [test/rCQ777wBRcmIfm1spp2jiA][[test][0]] IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[file "_0.cfs" is already pending delete];
   >    at __randomizedtesting.SeedInfo.seed([9EC7D98FCA15BCE6:42E398DA205C6200]:0)
   >    at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:429)
   >    at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95)
   >    at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302)
   >    at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93)
   >    at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1657)
   >    at org.elasticsearch.index.shard.IndexShardIT.recoverShard(IndexShardIT.java:636)
   >    at org.elasticsearch.index.shard.IndexShardIT.testShardHasMemoryBufferOnTranslogRecover(IndexShardIT.java:560)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   >    at java.base/java.lang.Thread.run(Thread.java:834)
   > Caused by: [test/rCQ777wBRcmIfm1spp2jiA][[test][0]] EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[file "_0.cfs" is already pending delete];
   >    at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:204)
   >    at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:165)
   >    at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
   >    at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1418)
   >    at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1372)
   >    at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:424)
   >    ... 42 more
   > Caused by: java.nio.file.NoSuchFileException: file "_0.cfs" is already pending delete
   >    at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:338)
   >    at org.apache.lucene.store.FileSwitchDirectory.deleteFile(FileSwitchDirectory.java:151)
   >    at org.apache.lucene.store.FilterDirectory.deleteFile(FilterDirectory.java:63)
   >    at org.elasticsearch.index.store.ByteSizeCachingDirectory.deleteFile(ByteSizeCachingDirectory.java:175)
   >    at org.apache.lucene.store.FilterDirectory.deleteFile(FilterDirectory.java:63)
   >    at org.elasticsearch.index.store.Store$StoreDirectory.deleteFile(Store.java:721)
   >    at org.elasticsearch.index.store.Store$StoreDirectory.deleteFile(Store.java:726)
   >    at org.apache.lucene.store.LockValidatingDirectoryWrapper.deleteFile(LockValidatingDirectoryWrapper.java:38)
   >    at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:696)
   >    at org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:690)
   >    at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:238)
   >    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:898)
   >    at org.elasticsearch.index.engine.InternalEngine$AssertingIndexWriter.<init>(InternalEngine.java:2580)
   >    at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2107)
   >    at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2097)
   >    at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:199)
   >    ... 47 more

FileSwitchDirectory makes me think this is related to the addition of hybridfs, especially given that this only started failing yesterday afternoon, after #36668 was merged.

@romseygeek romseygeek added >test-failure Triaged test failures from CI :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 4, 2019
We don't want two FSDirectories manage pending deletes separately
and optimze file listing. This confuses IndexWriter and causes exceptions
when files are deleted twice but are pending for deletion. This change
move to using a NIOFS subclass that only delegates to MMAP for opening files
all metadata and pending deletes are managed on top.

Closes elastic#37111
Relates to elastic#36668
@s1monw s1monw self-assigned this Jan 4, 2019
s1monw added a commit that referenced this issue Jan 5, 2019
We don't want two FSDirectories manage pending deletes separately
and optimize file listing. This confuses IndexWriter and causes exceptions
when files are deleted twice but are pending for deletion. This change
move to using a NIOFS subclass that only delegates to MMAP for opening files
all metadata and pending deletes are managed on top.

Closes #37111
Relates to #36668
s1monw added a commit that referenced this issue Jan 5, 2019
We don't want two FSDirectories manage pending deletes separately
and optimize file listing. This confuses IndexWriter and causes exceptions
when files are deleted twice but are pending for deletion. This change
move to using a NIOFS subclass that only delegates to MMAP for opening files
all metadata and pending deletes are managed on top.

Closes #37111
Relates to #36668
s1monw added a commit that referenced this issue Jun 12, 2019
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
s1monw added a commit that referenced this issue Jun 12, 2019
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants