Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Failure in SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot #63586

Closed
original-brownbear opened this issue Oct 13, 2020 · 1 comment · Fixed by #63911
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team (obsolete) >test-failure Triaged test failures from CI

Comments

@original-brownbear
Copy link
Member

This failed both on 7.x and master.

Given enough iterations (I'm able to reproduce this locally):


REPRODUCE WITH: ./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot" -Dtests.seed=15BC13A4BD1F1DBF -Dtests.security.manager=true -Dtests.locale=ga-IE -Dtests.timezone=Africa/Monrovia -Druntime.java=11

relatively reliably fails with:

java.lang.AssertionError: Expected no direct read for _1_6.fnm of shard [khpgkyjndq][3], node[Ey_vMbkqSiOr6XnG0FovcQ], [P], s[STARTED], a[id=wolGObVwS0eqfqJGMi_rRg]
Expected: <0L>
     but: was <1L>
	at __randomizedtesting.SeedInfo.seed([15BC13A4BD1F1DBF:9FDF64DF2E884C0C]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.assertSearchableSnapshotStats(SearchableSnapshotsIntegTests.java:810)
	at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot(SearchableSnapshotsIntegTests.java:262)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda

as it did in https://gradle-enterprise.elastic.co/s/4vrinfmyhhgyo

@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Oct 13, 2020
@original-brownbear original-brownbear self-assigned this Oct 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Oct 13, 2020
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 20, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure elastic#63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes elastic#63586
original-brownbear added a commit that referenced this issue Oct 23, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure #63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes #63586
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 26, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure elastic#63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes elastic#63586
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 26, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure elastic#63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes elastic#63586
original-brownbear added a commit that referenced this issue Oct 26, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure #63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes #63586
original-brownbear added a commit that referenced this issue Oct 26, 2020
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure #63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.

Closes #63586
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team (obsolete) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants