Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] InternalEngineTests#testInternalVersioningOnPrimary failure #37684

Closed
costin opened this issue Jan 22, 2019 · 3 comments
Closed

[CI] InternalEngineTests#testInternalVersioningOnPrimary failure #37684

costin opened this issue Jan 22, 2019 · 3 comments
Assignees
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@costin
Copy link
Member

costin commented Jan 22, 2019

Occurs in the 6.x line
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=opensuse/195/console

Expected: <4L>
     but: was <5L>
	at __randomizedtesting.SeedInfo.seed([E35F24F40F87E9B9:DB326DAF801E8211]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.index.engine.InternalEngineTests.assertOpsOnPrimary(InternalEngineTests.java:1908)
	at org.elasticsearch.index.engine.InternalEngineTests.testInternalVersioningOnPrimary(InternalEngineTests.java:1802)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

Might be affected by an index refresh:

  1> [2019-01-22T08:33:28,902][WARN ][o.e.i.e.Engine           ] [[Thread-98]] [index][0] failed engine [refresh failed source[test]]
  1> org.apache.lucene.store.MockDirectoryWrapper$FakeIOException: null
  1> 	at org.elasticsearch.index.engine.InternalEngineTests$11.afterRefresh(InternalEngineTests.java:3782) ~[test/:?]
  1> 	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:285) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
  1> 	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:190) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
  1> 	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
  1> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1621) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1602) ~[main/:?]
  1> 	at org.elasticsearch.index.engine.InternalEngineTests.lambda$testFailEngineOnRandomIO$35(InternalEngineTests.java:3807) ~[test/:?]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
@costin costin added >test-failure Triaged test failures from CI :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 22, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@albertzaharovits
Copy link
Contributor

Might be related ?

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/5401/console

10:08:39   1> 	at org.elasticsearch.index.engine.InternalEngine.forceMerge(InternalEngine.java:1866) ~[main/:?]
10:08:39   1> 	at org.elasticsearch.index.engine.InternalEngineTests$6.run(InternalEngineTests.java:3036) ~[test/:?]
10:08:39   1> [2019-01-22T16:38:30,288][INFO ][o.e.i.e.InternalEngineTests] [testShardNotAvailableExceptionWhenEngineClosedConcurrently] after test
10:08:39   1> [2019-01-22T16:38:30,292][INFO ][o.e.i.e.InternalEngineTests] [testVersionMapAfterAutoIDDocument] before test
10:08:39   1> [2019-01-22T16:38:30,603][INFO ][o.e.i.e.InternalEngineTests] [testVersionMapAfterAutoIDDocument] after test
10:08:39   1> [2019-01-22T16:38:30,607][INFO ][o.e.i.e.InternalEngineTests] [testDoubleDeliveryReplicaAppendingOnly] before test
10:08:39   1> [2019-01-22T16:38:30,793][INFO ][o.e.i.e.InternalEngineTests] [testDoubleDeliveryReplicaAppendingOnly] after test
10:08:39   1> [2019-01-22T16:38:30,796][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] before test
10:08:39   1> [2019-01-22T16:38:30,858][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] performing [D]
10:08:39   1> [2019-01-22T16:38:30,859][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] performing [D]
10:08:39   1> [2019-01-22T16:38:30,859][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] performing [D] (versioned -4, seqNo -2, term 0 )
10:08:39   1> [2019-01-22T16:38:30,859][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] performing [D] (versioned -4, seqNo -2, term 0 )
10:08:39   1> [2019-01-22T16:38:30,859][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] performing [D] (conflict 3) (versioned -4, seqNo -2, term 0 )
10:08:39   1> [2019-01-22T16:38:31,112][INFO ][o.e.i.e.InternalEngineTests] [testVersionOnPrimaryWithConcurrentRefresh] after test
10:08:39 FAILURE 0.33s J1 | InternalEngineTests.testVersionOnPrimaryWithConcurrentRefresh <<< FAILURES!
10:08:39    > Throwable #1: java.lang.AssertionError: 
10:08:39    > Expected: <4L>
10:08:39    >      but: was <5L>
10:08:39    > 	at __randomizedtesting.SeedInfo.seed([10ECBEF210621EE7:D83EB03EE0D7155A]:0)
10:08:39    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
10:08:39    > 	at org.elasticsearch.index.engine.InternalEngineTests.assertOpsOnPrimary(InternalEngineTests.java:1887)
10:08:39    > 	at org.elasticsearch.index.engine.InternalEngineTests.testVersionOnPrimaryWithConcurrentRefresh(InternalEngineTests.java:1796)
10:08:39    > 	at java.lang.Thread.run(Thread.java:748)
10:08:39   1> [2019-01-22T16:38:31,152][INFO ][o.e.i.e.InternalEngineTests] [testEngineMaxTimestampIsInitialized] before test
10:08:39   1> [2019-01-22T16:38:31,902][INFO ][o.e.i.e.InternalEngineTests] [testEngineMaxTimestampIsInitialized] after test
10:08:39 ERROR   0.78s J1 | InternalEngineTests.testEngineMaxTimestampIsInitialized <<< FAILURES!
10:08:39    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=342, name=Thread-85, state=RUNNABLE, group=TGRP-InternalEngineTests]
10:08:39    > 	at __randomizedtesting.SeedInfo.seed([10ECBEF210621EE7:D30C9F7E415913F]:0)
10:08:39    > Caused by: org.apache.lucene.store.AlreadyClosedException: [index][0] engine is closed
10:08:39    > 	at __randomizedtesting.SeedInfo.seed([10ECBEF210621EE7]:0)
10:08:39    > 	at org.elasticsearch.index.engine.Engine.ensureOpen(Engine.java:761)
10:08:39    > 	at org.elasticsearch.index.engine.Engine.ensureOpen(Engine.java:770)
10:08:39    > 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1532)
10:08:39    > 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1521)
10:08:39    > 	at org.elasticsearch.index.engine.InternalEngineTests.lambda$testVersionOnPrimaryWithConcurrentRefresh$11(InternalEngineTests.java:1791)
10:08:39    > 	at java.lang.Thread.run(Thread.java:748)
10:08:39   1> [2019-01-22T16:38:31,917][INFO ][o.e.i.e.InternalEngineTests] [testLuceneSnapshotRefreshesOnlyOnce] before test
10:08:39   1> [2019-01-22T16:38:32,262][INFO ][o.e.i.e.InternalEngineTests] [testLuceneSnapshotRefreshesOnlyOnce] after test
10:08:39   1> [2019-01-22T16:38:32,269][INFO ][o.e.i.e.InternalEngineTests] [testRecoveryFromTranslogUpToSeqNo] before test
10:08:39   1> [2019-01-22T16:38:32,450][INFO ][o.e.i.e.InternalEngineTests] [testRecoveryFromTranslogUpToSeqNo] after test
10:08:39   1> [2019-01-22T16:38:32,453][INFO ][o.e.i.e.InternalEngineTests] [testSegments] before test
10:08:39   1> [2019-01-22T16:38:32,637][INFO ][o.e.i.e.InternalEngineTests] [testSegments] after test
10:08:39   1> [2019-01-22T16:38:32,640][INFO ][o.e.i.e.InternalEngineTests] [testConcurrentExternalVersioningOnPrimary] before test
10:08:39   1> [2019-01-22T16:38:32,827][INFO ][o.e.i.e.InternalEngineTests] [testConcurrentExternalVersioningOnPrimary] after test
10:08:39   1> [2019-01-22T16:38:32,829][INFO ][o.e.i.e.InternalEngineTests] [testCurrentTranslogIDisCommitted] before test
10:08:39   1> [2019-01-22T16:38:33,112][INFO ][o.e.i.e.InternalEngineTests] [testCurrentTranslogIDisCommitted] after test
10:08:39   1> [2019-01-22T16:38:33,115][INFO ][o.e.i.e.InternalEngineTests] [testSegmentsStatsIncludingFileSizes] before test
10:08:39   1> [2019-01-22T16:38:33,363][INFO ][o.e.i.e.InternalEngineTests] [testSegmentsStatsIncludingFileSizes] after test
10:08:39   1> [2019-01-22T16:38:33,366][INFO ][o.e.i.e.InternalEngineTests] [testNoOps] before test
10:08:39   1> [2019-01-22T16:38:33,479][INFO ][o.e.i.e.InternalEngineTests] [testNoOps] after test
10:08:39   1> [2019-01-22T16:38:33,482][INFO ][o.e.i.e.InternalEngineTests] [testSequenceNumberAdvancesToMaxSeqOnEngineOpenOnPrimary] before test
10:08:39   1> [2019-01-22T16:38:33,624][INFO ][o.e.i.e.InternalEngineTests] [testSequenceNumberAdvancesToMaxSeqOnEngineOpenOnPrimary] after test
10:08:39   1> [2019-01-22T16:38:33,627][INFO ][o.e.i.e.InternalEngineTests] [testSequenceIDs] before test
10:08:39   1> [2019-01-22T16:38:33,665][INFO ][o.e.i.e.InternalEngineTests] [testSequenceIDs] --> got seqID: Tuple [v1=0, v2=4794505393173414857]
10:08:39   1> [2019-01-22T16:38:33,668][INFO ][o.e.i.e.InternalEngineTests] [testSequenceIDs] --> got seqID: Tuple [v1=1, v2=4794505393173414857]
10:08:39   1> [2019-01-22T16:38:33,679][INFO ][o.e.i.e.InternalEngineTests] [testSequenceIDs] --> got seqID: Tuple [v1=2, v2=3]
10:08:39   1> [2019-01-22T16:38:33,774][INFO ][o.e.i.e.InternalEngineTests] [testSequenceIDs] after test
10:08:39   1> [2019-01-22T16:38:33,777][INFO ][o.e.i.e.InternalEngineTests] [testDoubleDeliveryReplicaAppendingAndDeleteOnly] before test
10:08:39   1> [2019-01-22T16:38:33,943][INFO ][o.e.i.e.InternalEngineTests] [testDoubleDeliveryReplicaAppendingAndDeleteOnly] after test
10:08:39   1> [2019-01-22T16:38:33,946][INFO ][o.e.i.e.InternalEngineTests] [testMinGenerationForSeqNo] before test
10:08:39   1> [2019-01-22T16:38:34,362][INFO ][o.e.i.e.InternalEngineTests] [testMinGenerationForSeqNo] after test
10:08:39   1> [2019-01-22T16:38:34,365][INFO ][o.e.i.e.InternalEngineTests] [testLuceneHistoryOnReplica] before test
10:08:39   1> [2019-01-22T16:38:34,651][INFO ][o.e.i.e.InternalEngineTests] [testLuceneHistoryOnReplica] after test
10:08:39   1> [2019-01-22T16:38:34,655][INFO ][o.e.i.e.InternalEngineTests] [testKeepTranslogAfterGlobalCheckpoint] before test
10:08:39   1> [2019-01-22T16:38:34,703][INFO ][o.e.c.s.IndexScopedSettings] [testKeepTranslogAfterGlobalCheckpoint] updating [index.translog.retention.age] from [12h] to [-1]
10:08:39   1> [2019-01-22T16:38:34,703][INFO ][o.e.c.s.IndexScopedSettings] [testKeepTranslogAfterGlobalCheckpoint] updating [index.translog.retention.size] from [512mb] to [512b]
10:08:39   1> [2019-01-22T16:38:34,704][INFO ][o.e.c.s.IndexScopedSettings] [testKeepTranslogAfterGlobalCheckpoint] updating [index.translog.retention.age] from [12h] to [-1]
10:08:39   1> [2019-01-22T16:38:34,704][INFO ][o.e.c.s.IndexScopedSettings] [testKeepTranslogAfterGlobalCheckpoint] updating [index.translog.retention.size] from [512mb] to [512b]
10:08:39   2> janv. 22, 2019 4:38:31 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
10:08:39   2> AVERTISSEMENT: Uncaught exception in thread: Thread[Thread-85,5,TGRP-InternalEngineTests]
10:08:39   2> org.apache.lucene.store.AlreadyClosedException: [index][0] engine is closed
10:08:39   2> 	at __randomizedtesting.SeedInfo.seed([10ECBEF210621EE7]:0)
10:08:39   2> 	at org.elasticsearch.index.engine.Engine.ensureOpen(Engine.java:761)
10:08:39   2> 	at org.elasticsearch.index.engine.Engine.ensureOpen(Engine.java:770)
10:08:39   2> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1532)
10:08:39   2> 	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1521)
10:08:39   2> 	at org.elasticsearch.index.engine.InternalEngineTests.lambda$testVersionOnPrimaryWithConcurrentRefresh$11(InternalEngineTests.java:1791)
10:08:39   2> 	at java.lang.Thread.run(Thread.java:748)
10:08:39   2> REPRODUCE WITH: ./gradlew :server:unitTest -Dtests.seed=10ECBEF210621EE7 -Dtests.class=org.elasticsearch.index.engine.InternalEngineTests -Dtests.method="testEngineMaxTimestampIsInitialized" -Dtests.security.manager=true -Dtests.locale=fr-FR -Dtests.timezone=Asia/Yangon -Dcompiler.java=11 -Druntime.java=8

@dnhatn
Copy link
Member

dnhatn commented Jan 22, 2019

This is reproducible. I am on it.

dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Jan 23, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates elastic#36467
Closes elastic#37684
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Jan 23, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates elastic#36467
Closes elastic#37684
dnhatn added a commit that referenced this issue Jan 24, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates #36467
Closes #37684
dnhatn added a commit that referenced this issue Jan 28, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates #36467
Closes #37684
talevy pushed a commit to talevy/elasticsearch that referenced this issue Feb 15, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates elastic#36467
Closes elastic#37684
talevy added a commit that referenced this issue Feb 16, 2019
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates #36467
Closes #37684
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants