Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CorruptedFileIT.testCorruptFileThenSnapshotAndRestore test failure #41201

Closed
jakelandis opened this issue Apr 15, 2019 · 12 comments
Closed
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@jakelandis
Copy link
Contributor

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/1053/testReport/junit/org.elasticsearch.index.store/CorruptedFileIT/testCorruptFileThenSnapshotAndRestore/

REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore" -Dtests.seed=F06B870781995DCA -Dtests.security.manager=true -Dtests.locale=vi-VN -Dtests.timezone=Africa/Brazzaville -Dcompiler.java=12 -Druntime.java=8

java.lang.AssertionError: 
Expected: not null
     but: was null
	at __randomizedtesting.SeedInfo.seed([F06B870781995DCA:3DBF3D86138517A9]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore(CorruptedFileIT.java:535)

Does not reproduce locally.

Related: #30577 #36526 #26773 #19591 #8516

@jakelandis jakelandis added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Apr 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear original-brownbear self-assigned this Apr 15, 2019
@bizybot
Copy link
Contributor

bizybot commented Apr 29, 2019

It failed again,
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles-12/362/console

I could not reproduce this locally:

REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore" \
  -Dtests.seed=3E7D80459EDC1BBD \
  -Dtests.security.manager=true \
  -Dtests.locale=bs-Latn-BA \
  -Dtests.timezone=Europe/Malta \
  -Dcompiler.java=12 \
  -Druntime.java=11
  2> java.lang.AssertionError:
    Expected: not null
         but: was null
        at __randomizedtesting.SeedInfo.seed([3E7D80459EDC1BBD:F3A93AC40CC051DE]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore(CorruptedFileIT.java:535)

consoleText.txt

@ywelsch
Copy link
Contributor

ywelsch commented May 7, 2019

The problem is #24800, i.e., that the corruption is not properly detected and the shard not marked as corrupted, as the underlying org.apache.lucene.index.CorruptIndexException is suppressed by Lucene, and the shard is therefore not detected and marked as corrupted.

CodecUtil.checkFooter is the problematic method, hiding the corruption. One solution would be to do even more ambitious scanning for corruptions in ExceptionsHelper.unwrapCorruption, also checking suppressed exceptions. Ideally, Lucene would properly bubble things up.

Relevant log line:

[2019-04-29T01:58:22,186][WARN ][o.e.s.SnapshotShardsService] [node_td2] [[test][0]][test-repo:test-snap/rPECoZ4kTR2HVeuYfIX4Xw] failed to snapshot shard
  1> org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to snapshot
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:378) ~[main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:314) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:834) [?:?]
  1> Caused by: org.elasticsearch.index.engine.FlushFailedEngineException: Flush failed
  1> 	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1738) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1961) ~[main/:?]
  1> 	at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1145) ~[main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) ~[main/:?]
  1> 	... 6 more
  1> Caused by: org.elasticsearch.index.engine.RefreshFailedEngineException: Refresh failed
  1> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1574) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1733) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1961) ~[main/:?]
  1> 	at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1145) ~[main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) ~[main/:?]
  1> 	... 6 more
  1> Caused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path="/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/sles-12/server/build/testrun/integTest/temp/org.elasticsearch.index.store.CorruptedFileIT_3E7D80459EDC1BBD-001/tempDir-004/data/nodes/2/indices/7oAHZOOITV-g1okW9tqj6g/0/index/_1_2.fnm")
  1> 	at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:144) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.store.MockIndexInputWrapper.readBytes(MockIndexInputWrapper.java:146) ~[lucene-test-framework-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:49) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.store.DataInput.readString(DataInput.java:237) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:130) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:190) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:138) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:106) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:526) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1551) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1733) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1961) ~[main/:?]
  1> 	at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1145) ~[main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) ~[main/:?]
  1> 	... 6 more
  1> 	Suppressed: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=4ea0fff3 actual=8506c533 (resource=BufferedChecksumIndexInput(MockIndexInputWrapper(NIOFSIndexInput(path="/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/sles-12/server/build/testrun/integTest/temp/org.elasticsearch.index.store.CorruptedFileIT_3E7D80459EDC1BBD-001/tempDir-004/data/nodes/2/indices/7oAHZOOITV-g1okW9tqj6g/0/index/_1_2.fnm"))))
  1> 		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a  1> 885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:190) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:138) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:106) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:526) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1551) ~[main/:?]
  1> 		at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1733) ~[main/:?]
  1> 		at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1961) ~[main/:?]
  1> 		at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1145) ~[main/:?]
  1> 		at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) ~[main/:?]
  1> 		at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:314) [main/:?]
  1> 		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [main/:?]
  1> 		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 		at java.lang.Thread.run(Thread.java:834) [?:?]

@original-brownbear can you extend corruption detection so that it handles suppressed corruption exceptions as well?

@original-brownbear
Copy link
Member

@ywelsch sure on it :) Thanks for the input!

@benwtrent
Copy link
Member

Another build failed due to this issue:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+artifactory/720/console

Reproduce:

./gradlew :server:integTest --tests "org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore" -Dtests.seed=8B88A8ECA7AD4D75 -Dtests.security.manager=true -Dtests.locale=hsb-DE -Dtests.timezone=America/Rankin_Inlet -Dcompiler.java=12 -Druntime.java=11

Trace:

org.elasticsearch.index.store.CorruptedFileIT > testCorruptFileThenSnapshotAndRestore FAILED
    java.lang.AssertionError: 
    Expected: not null
         but: was null
        at __randomizedtesting.SeedInfo.seed([8B88A8ECA7AD4D75:465C126D35B10716]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore(CorruptedFileIT.java:535)

Further in the Logs, I see similar errors to the ones pointed out by @ywelsch

1> java.io.IOException: Invalid vInt detected (too many bits)
  1> 	at org.apache.lucene.store.DataInput.readVInt(DataInput.java:141) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:157) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:190) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:138) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:106) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:526) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1552) [main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1734) [main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1962) [main/:?]
  1> 	at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1152) [main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) [main/:?]
  1> 	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:314) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:834) [?:?]
  1> 	Suppressed: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=eb80550 actual=b46576eb (resource=BufferedChecksumIndexInput(MockIndexInputWrapper(SimpleFSIndexInput(path="/var/lib/jenkins/workspace/elastic+elasticsearch+master+artifactory/server/build/testrun/integTest/temp/org.elasticsearch.index.store.CorruptedFileIT_8B88A8ECA7AD4D75-001/tempDir-003/data/nodes/4/indices/r_TsujHzT6q0Pz8OTddEAw/0/index/_1_2.fnm"))))
  1> 		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:462) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:190) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:138) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:106) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:526) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd4  1> 3 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.1.0-snapshot-e460356abe.jar:8.1.0-snapshot-e460356abe e460356abeb1bd075a885d905a1d0873469bbd43 - jimczi - 2019-04-08 13:32:47]
  1> 		at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1552) [main/:?]
  1> 		at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1734) [main/:?]
  1> 		at org.elasticsearch.index.engine.InternalEngine.acquireLastIndexCommit(InternalEngine.java:1962) [main/:?]
  1> 		at org.elasticsearch.index.shard.IndexShard.acquireLastIndexCommit(IndexShard.java:1152) [main/:?]
  1> 		at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:367) [main/:?]
  1> 		at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:314) [main/:?]
  1> 		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [main/:?]
  1> 		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 		at java.lang.Thread.run(Thread.java:834) [?:?]

@original-brownbear
Copy link
Member

fixed in #41889

@danielmitterdorfer
Copy link
Member

We had another test failure in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.1+multijob-unix-compatibility/os=amazon/14, i.e. on the 7.1 branch.

@original-brownbear do you intent to backport #41889 to that branch as well?

For reference:

  • Reproduction line: ./gradlew :server:integTest --tests "org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore" -Dtests.seed=92FC281AD50DD500 -Dtests.security.manager=true -Dtests.locale=sq -Dtests.timezone=Iran -Dcompiler.java=12 -Druntime.java=8
  • Failure output

@original-brownbear
Copy link
Member

@danielmitterdorfer yea I guess I should unless @ywelsch has any objects to back-porting it further?

@ywelsch
Copy link
Contributor

ywelsch commented May 28, 2019

++ to backport

@dimitris-athanasiou
Copy link
Contributor

@original-brownbear
Copy link
Member

Ah oops, back porting it now :)

@original-brownbear
Copy link
Member

back ported to 7.1 now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

8 participants