New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible SearchWithRandomIOExceptionsIT failure #15754

Closed
jasontedor opened this Issue Jan 4, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@jasontedor
Member

jasontedor commented Jan 4, 2016

This test rarely fails and is often not reproducible, but I had a new failure on my local CI on my feature branch that I found reliably reproduces on master too:

$ gradle \
> :core:clean \
> :core:integTest \
> -Dtests.seed=7B8A12D17560A5D \
> -Dtests.class=org.elasticsearch.search.basic.SearchWithRandomIOExceptionsIT \
> -Dtests.method=testRandomDirectoryIOExceptions

Take note that the output logs approach 45 MB in size.

I ran git-bisect and it looks like this issue was introduced with fcfd98e.

@bleskes

This comment has been minimized.

Member

bleskes commented Jan 4, 2016

@jasontedor was this a ci run? if so, can you add a link? also, stack trace would be good, just to know at a glance what failed...

@jasontedor

This comment has been minimized.

Member

jasontedor commented Jan 4, 2016

@bleskes It failed on my local CI, not the public CI and the entire log output is 45 MB. Here's a relevant snippet of the logs:

[2016-01-03 14:26:38,400][WARN ][org.elasticsearch.index.engine] [node_s0] [test][1] failed engine [index]
java.io.IOException: a random IOException (_0.fdx)
    at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOException(MockDirectoryWrapper.java:445)
    at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:151)
    at org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:127)
    at org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
    at org.apache.lucene.codecs.CodecUtil.writeHeader(CodecUtil.java:91)
    at org.apache.lucene.codecs.CodecUtil.writeIndexHeader(CodecUtil.java:134)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>(CompressingStoredFieldsWriter.java:116)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
    at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
    at org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:49)
    at org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:81)
    at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:258)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:295)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1256)
    at org.elasticsearch.index.engine.InternalEngine.innerIndex(InternalEngine.java:407)
    at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:358)
    at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:503)
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnReplica(TransportIndexAction.java:187)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:169)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:64)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:380)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:286)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:283)
    at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:296)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-01-03 14:26:38,403][WARN ][org.elasticsearch.action.index] [node_s1] [test][1] failed to perform indices:data/write/index[r] on node {node_s0}{TCOt3YwhSvefWkM7AUfErQ}{local}{local[59]}[mode=>local]
RemoteTransportException[[node_s1][local[60]][indices:data/write/index[r]]]; nested: IndexFailedEngineException[Index failed for [type#5]]; nested: NotSerializableExceptionWrapper[a random IOException (_0.fdx)];
Caused by: [test][[test][1]] IndexFailedEngineException[Index failed for [type#5]]; nested: NotSerializableExceptionWrapper[a random IOException (_0.fdx)];
    at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:363)
    at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:503)
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnReplica(TransportIndexAction.java:187)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:169)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:64)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:380)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:286)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:283)
    at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:296)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: NotSerializableExceptionWrapper[a random IOException (_0.fdx)]
    at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOException(MockDirectoryWrapper.java:445)
    at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:151)
    at org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:127)
    at org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
    at org.apache.lucene.codecs.CodecUtil.writeHeader(CodecUtil.java:91)
    at org.apache.lucene.codecs.CodecUtil.writeIndexHeader(CodecUtil.java:134)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>(CompressingStoredFieldsWriter.java:116)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
    at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
    at org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:49)
    at org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:81)
    at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:258)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:295)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1256)
    at org.elasticsearch.index.engine.InternalEngine.innerIndex(InternalEngine.java:407)
    at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:358)
    ... 13 more

@bleskes Let me know if you want me to share the entire log output somewhere, but since this seems to reproduce 100% of the time with this seed I don't think that will be needed?

s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 4, 2016

Close recovered translog readers if createWriter fails
If we fail to create a writer all recovered translog readers are not
closed today which causes all open files to leak.

Closes elastic#15754

@s1monw s1monw closed this in #15762 Jan 5, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment