Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] FlushIT testFlushOnInactive failing #97154

Open
tlrx opened this issue Jun 27, 2023 · 3 comments · Fixed by #97332
Open

[CI] FlushIT testFlushOnInactive failing #97154

tlrx opened this issue Jun 27, 2023 · 3 comments · Fixed by #97332
Assignees
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI

Comments

@tlrx
Copy link
Member

tlrx commented Jun 27, 2023

Looks similar to #87888 fixed a year ago.

The test failed today https://gradle-enterprise.elastic.co/s/bcbi54ntydxas and a month ago https://gradle-enterprise.elastic.co/s/22czbdpytknvi

Build scan:
https://gradle-enterprise.elastic.co/s/bcbi54ntydxas/tests/:server:internalClusterTest/org.elasticsearch.indices.flush.FlushIT/testFlushOnInactive

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.indices.flush.FlushIT.testFlushOnInactive" -Dtests.seed=58067D81DCE0CA07 -Dtests.locale=es-CU -Dtests.timezone=America/Bahia -Druntime.java=20

Applicable branches:
8.9, main

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.indices.flush.FlushIT&tests.test=testFlushOnInactive

Failure excerpt:

java.lang.AssertionError: 
Expected: <0>
     but: was <4>

  at __randomizedtesting.SeedInfo.seed([58067D81DCE0CA07:BF63E3C83F4FC9CB]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.indices.flush.FlushIT.lambda$testFlushOnInactive$1(FlushIT.java:136)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1172)
  at org.elasticsearch.indices.flush.FlushIT.testFlushOnInactive(FlushIT.java:134)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@tlrx tlrx added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI labels Jun 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jun 27, 2023
@kingherc kingherc self-assigned this Jun 28, 2023
@kingherc
Copy link
Contributor

This is reaaaally hard to reproduce. I had one local reproduction in I do not know how many hours of running, with no meaningful logs however to understand what's going on :/ I then turned on trace logging, and have never replicated it again.

I went back in time since Aug 22 (when #87888 was fixed) in gradle history to find failures and found the following:

So my impression is that the fix appeased the error'ing behavior a lot, but did not fix all edge cases.

kingherc added a commit to kingherc/elasticsearch that referenced this issue Jul 3, 2023
Basically the active flag needs to be set to true after the op
has been processed by the engine. Else there is an edge case,
which may leave unprocessed ops and active false. Introducing
a test for this edge case.

Fixes elastic#97154
kingherc added a commit to kingherc/elasticsearch that referenced this issue Jul 3, 2023
Basically the active flag needs to be set to true after the op
has been processed by the engine. Else there is an edge case,
which may leave unprocessed ops and active false. Introducing
a test for this edge case.

Fixes elastic#97154
kingherc added a commit to kingherc/elasticsearch that referenced this issue Jul 3, 2023
Basically the active flag needs to be set to true after the op
has been processed by the engine. Else there is an edge case,
which may leave unprocessed ops and active false, without a next
flush on idle scheduled. Introducing a test for this edge case.

Fixes elastic#97154
kingherc added a commit that referenced this issue Jul 4, 2023
Basically the active flag needs to be set to true after the op
has been processed by the engine. Else there is an edge case,
which may leave unprocessed ops and active false, without a next
flush on idle scheduled. Introducing a test for this edge case.

Fixes #97154
@volodk85
Copy link
Contributor

volodk85 commented Feb 8, 2024

Unfortunately it's failing again :(

https://gradle-enterprise.elastic.co/s/kandwegr56fsy/console-log

REPRODUCE WITH: gradlew ':server:internalClusterTest' --tests "org.elasticsearch.indices.flush.FlushIT.testFlushOnInactive" -Dtests.seed=4441F78B2F9D464C -Dtests.locale=en-ZA -Dtests.timezone=Europe/Sofia -Druntime.java=21

org.elasticsearch.indices.flush.FlushIT > testFlushOnInactive FAILED
    java.lang.AssertionError: 
    Expected: <0>
         but: was <4>
        at __randomizedtesting.SeedInfo.seed([4441F78B2F9D464C:A32469C2CC324580]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.elasticsearch.indices.flush.FlushIT.lambda$testFlushOnInactive$7(FlushIT.java:491)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1143)
        at org.elasticsearch.indices.flush.FlushIT.testFlushOnInactive(FlushIT.java:489)

@volodk85 volodk85 reopened this Feb 8, 2024
@kingherc kingherc added the low-risk An open issue or test failure that is a low risk to future releases label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Meta label for distributed team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants