Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexStatsIT#testFilterCacheStats fails #32506

Closed
dnhatn opened this issue Jul 31, 2018 · 19 comments · Fixed by #42091
Closed

IndexStatsIT#testFilterCacheStats fails #32506

dnhatn opened this issue Jul 31, 2018 · 19 comments · Fixed by #42091
Assignees
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@dnhatn
Copy link
Member

dnhatn commented Jul 31, 2018

I think this failure relates to Lucene-7.5.

I can't reproduce this locally with this command:

./gradlew :server:integTest -Dtests.seed=528FC320ADC916D8 \
-Dtests.class=org.elasticsearch.indices.stats.IndexStatsIT \
-Dtests.method="testFilterCacheStats" \
-Dtests.security.manager=true \
-Dtests.locale=kk-KZ \
-Dtests.timezone=America/Scoresbysund
FAILURE 0.63s J0 | IndexStatsIT.testFilterCacheStats <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: a value greater than <0L>
   >      but: <0L> was equal to <0L>
   > 	at __randomizedtesting.SeedInfo.seed([528FC320ADC916D8:37B03E651DC111A1]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats(IndexStatsIT.java:1046)
   > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   > 	at java.base/java.lang.Thread.run(Thread.java:834)

CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java10,ES_RUNTIME_JAVA=java11,nodes=virtual&&linux/210/consoleText

Log: testFilterCacheStats.log

@dnhatn dnhatn added >test-failure Triaged test failures from CI :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Jul 31, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

dnhatn added a commit that referenced this issue Jul 31, 2018
dnhatn added a commit that referenced this issue Jul 31, 2018
@dnhatn
Copy link
Member Author

dnhatn commented Jul 31, 2018

I muted this test on the master and 6.x.

@original-brownbear
Copy link
Member

original-brownbear commented Aug 1, 2018

I was able to reproduce this locally by simply running the test in a loop with @Repeat (fails in ~ 1% of runs)

@dnhatn
Copy link
Member Author

dnhatn commented Aug 1, 2018

@original-brownbear I did not look at this yet but I think it should be caused by a background merge.

@original-brownbear original-brownbear self-assigned this Dec 2, 2018
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Dec 2, 2018
* Test randomly failed because of background merges
   * Fixed by force merging down to a single segment
* Closes elastic#32506
@original-brownbear
Copy link
Member

@dnhatn I spent a little time today reading up on the background here and using trace logging I could verify that this is caused by background merges.
Tried a fix in #36143 that empirically works fine, but I'm not a 100% sure that's the best/right approach at fixing this admittedly.

original-brownbear added a commit that referenced this issue Dec 3, 2018
* Test randomly failed because of background merges
   * Fixed by force merging down to a single segment
* Closes #32506
original-brownbear added a commit that referenced this issue Dec 3, 2018
* Test randomly failed because of background merges
   * Fixed by force merging down to a single segment
* Closes #32506
@pgomulka
Copy link
Contributor

seems like this happened again:
https://mail.google.com/mail/u/1/#search/build-elasticsearch%40elastic.co/FMfcgxwBTjvMmrGnFCTxMhzjZhTCmRJK

REPRODUCE WITH: ./gradlew :server:integTest \
  -Dtests.seed=8B6BB66DA52AD454 \
  -Dtests.class=org.elasticsearch.indices.stats.IndexStatsIT \
  -Dtests.method="testFilterCacheStats" \
  -Dtests.security.manager=true \
  -Dbuild.snapshot=false \
  -Dtests.jvm.argline="-Dbuild.snapshot=false" \
  -Dtests.locale=ar-SY \
  -Dtests.timezone=America/Ojinaga \
  -Dcompiler.java=11 \
  -Druntime.java=8

@pgomulka pgomulka reopened this Dec 17, 2018
@original-brownbear
Copy link
Member

Link for the above https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/221/

I'm not sure this is the same issue that lead to the initial problem. The failure now happens in a busy polling section of the test. I'll look into this.

@original-brownbear
Copy link
Member

Added debug logging for this test in 75bfbe9, I couldn't even reproduce a single failing assertion inside the busy assert that fails now. Maybe we can learn more from the log.

@original-brownbear
Copy link
Member

Haven't had this happen during the last 17 days it seems. It is conceivable that this was some freak timeout from super slow CI the logging suggests that the assertBusy loop that failed above has changes in the error on a second scale when that same loop locally takes < 50ms. Maybe close this?

@ywelsch
Copy link
Contributor

ywelsch commented Jan 11, 2019

There was another failure of this on January 4th, but the logs are missing
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1075/console

@original-brownbear
Copy link
Member

original-brownbear commented Jan 11, 2019

Right ... one day after I posted the above ... adding the logs of that failure here for reference and will look into it (now we have debug logging active on this one at least).
20190104112126-D62D8056-log-981F-639551784945182935.txt

@original-brownbear original-brownbear removed their assignment Jan 31, 2019
@ywelsch
Copy link
Contributor

ywelsch commented Feb 1, 2019

No recent failures -> closing

@ywelsch ywelsch closed this as completed Feb 1, 2019
@gwbrown
Copy link
Contributor

gwbrown commented Mar 22, 2019

This failed again on master:

Does not reproduce locally for me with:

./gradlew :server:integTest \
  -Dtests.seed=8574F392212DDB87 \
  -Dtests.class=org.elasticsearch.indices.stats.IndexStatsIT \
  -Dtests.method="testFilterCacheStats" \
  -Dtests.security.manager=true \
  -Dtests.locale=ga \
  -Dtests.timezone=SystemV/CST6CDT \
  -Dcompiler.java=12 \
  -Druntime.java=11

Stack trace:

Expected: <0L>
   >      but: was <1L>
   > 	at __randomizedtesting.SeedInfo.seed([8574F392212DDB87:E04B0ED79125DCFE]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.indices.stats.IndexStatsIT.lambda$testFilterCacheStats$2(IndexStatsIT.java:1047)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:862)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
   > 	at org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats(IndexStatsIT.java:1041)
   > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   > 	at java.base/java.lang.Thread.run(Thread.java:834)

Full build log:
consoleText.txt

@gwbrown gwbrown reopened this Mar 22, 2019
@original-brownbear original-brownbear self-assigned this Mar 23, 2019
@original-brownbear
Copy link
Member

@bizybot
Copy link
Contributor

bizybot commented Apr 5, 2019

This again failed here:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/4457/console

Did not reproduce locally:

./gradlew :server:integTest \
  -Dtests.seed=E5247230732850E0 \
  -Dtests.class=org.elasticsearch.indices.stats.IndexStatsIT \
  -Dtests.method="testFilterCacheStats" \
  -Dtests.security.manager=true \
  -Dtests.locale=vi \
  -Dtests.timezone=America/Paramaribo \
  -Dcompiler.java=12 \
  -Druntime.java=8

Stack trace:

   > Throwable #1: java.lang.AssertionError: 
   > Expected: <0L>
   >      but: was <25L>
   > 	at __randomizedtesting.SeedInfo.seed([E5247230732850E0:801B8F75C3205799]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.indices.stats.IndexStatsIT.lambda$testFilterCacheStats$1(IndexStatsIT.java:1035)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:862)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
   > 	at org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats(IndexStatsIT.java:1030)
   > 	at java.lang.Thread.run(Thread.java:748)
   > 	Suppressed: java.lang.AssertionError: 

consoleText.txt

@droberts195
Copy link
Contributor

This failed again in 7.0 in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/128/console

The error is:

  2> java.lang.AssertionError: 
    Expected: <0L>
         but: was <1L>
        at __randomizedtesting.SeedInfo.seed([DB1044CE003D83F0:BE2FB98BB0358489]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.indices.stats.IndexStatsIT.lambda$testFilterCacheStats$2(IndexStatsIT.java:1047)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:868)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:842)
        at org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats(IndexStatsIT.java:1041)

I got a failure the first time I tried to reproduce locally using:

./gradlew :server:integTest --tests "org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats" \
  -Dtests.seed=DB1044CE003D83F0 \
  -Dtests.security.manager=true \
  -Dtests.locale=lb \
  -Dtests.timezone=Asia/Aqtobe \
  -Dcompiler.java=12 \
  -Druntime.java=11

The error I got was:

  2> java.lang.AssertionError: 
    Expected: <0L>
         but: was <12L>
        at __randomizedtesting.SeedInfo.seed([DB1044CE003D83F0:BE2FB98BB0358489]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.indices.stats.IndexStatsIT.lambda$testFilterCacheStats$1(IndexStatsIT.java:1035)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:868)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:842)
        at org.elasticsearch.indices.stats.IndexStatsIT.testFilterCacheStats(IndexStatsIT.java:1030)

But it's not easily reproducible - I tried 5 more times and didn't see another failure.

@ywelsch
Copy link
Contributor

ywelsch commented May 7, 2019

This continues to fail, e.g. here. @dnhatn what's the issue with this test? Any ideas on why it's failing?

davidkyle added a commit that referenced this issue May 9, 2019
davidkyle added a commit that referenced this issue May 9, 2019
@davidkyle
Copy link
Member

Another https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=oraclelinux-7/382/console

This has failed PR builds and fails frequently enough to warrant muting.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue May 9, 2019
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce elastic#32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
original-brownbear added a commit that referenced this issue May 9, 2019
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce #32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue May 10, 2019
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce elastic#32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
original-brownbear added a commit that referenced this issue May 10, 2019
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce #32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
DaveCTurner pushed a commit that referenced this issue May 15, 2019
DaveCTurner pushed a commit that referenced this issue May 15, 2019
@DaveCTurner
Copy link
Contributor

I have also muted this in 7.1 and 7.0:

dnhatn added a commit that referenced this issue May 15, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
dnhatn added a commit that referenced this issue May 15, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
dnhatn added a commit that referenced this issue May 15, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
dnhatn added a commit that referenced this issue May 16, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
dnhatn added a commit that referenced this issue May 16, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce elastic#32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes elastic#32506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants