Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.fetch.subphase.highlight.HighlighterSearchIT.testBoostingQueryTermVector is flaky #12119

Closed
bowenlan-amzn opened this issue Feb 1, 2024 · 6 comments · Fixed by #12512
Assignees
Labels
flaky-test Random test failure that succeeds on second run Search Search query, autocomplete ...etc

Comments

@bowenlan-amzn
Copy link
Member

Describe the bug

https://build.ci.opensearch.org/job/gradle-check/32933/testReport/junit/org.opensearch.search.fetch.subphase.highlight/HighlighterSearchIT/testBoostingQueryTermVector__p0___search_concurrent_segment_search_enabled___true___/

Error Message
java.lang.Exception: Test abandoned because suite timeout was reached.
Stacktrace
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([FD0BF0F6E14F1BF1]:0)

Related component

Search

To Reproduce

Tried run this test 500 times, all passed. So using this issue to track for now.

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.fetch.subphase.highlight.HighlighterSearchIT" -Dtests.method="testBoostingQueryTermVector {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=FD0BF0F6E14F1BF1 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-KW -Dtests.timezone=Pacific/Enderbury -Druntime.java=21

Expected behavior

This test should always pass. But seems there's a slight chance of failing somehow.

Additional Details

No response

@bowenlan-amzn bowenlan-amzn added untriaged flaky-test Random test failure that succeeds on second run labels Feb 1, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Feb 1, 2024
@jed326
Copy link
Collaborator

jed326 commented Feb 1, 2024

I saw a similar failure in here too: https://build.ci.opensearch.org/job/gradle-check/32902/

From the test logs it looks like all the test assertions are passing but there's some issue in the test cleanup. For reproduction we probably need to try re-running the entire test suite instead of just that specific test.

Something that stands out is that this test class has 174 tests being run in the same suite scoped cluster.

@jed326
Copy link
Collaborator

jed326 commented Feb 1, 2024

Looking at the test logs https://build.ci.opensearch.org/job/gradle-check/32902/consoleText it looks like the test just runs out of time.

The concurrent segment search disabled tests take from 2024-01-30T16:53:56,964 -> 2024-01-30T17:07:48,861, which is already 14 of the 20 allotted minutes for the test class.

@jed326
Copy link
Collaborator

jed326 commented Feb 1, 2024

I ran the tests locally a few time and it was averaging only around 4 minutes to run all 174 test cases, much faster than in the posted gradle check. The hardware running the gradle check should be much more powerful than my laptop though. Perhaps this points to some regression from the JDK update?

@peternied
Copy link
Member

@yigithub @Pallavi-AWS Over the past 30 days, this test has adversely affected multiple pull requests (PRs), including [#12464, #12436, #12383, #12382 (repeated), #12380, #12376, #12305, #12293, #12278, #12163, and #12143].

Please prioritize fixing this test or disabling the test case until it can be fixed.

@jed326
Copy link
Collaborator

jed326 commented Feb 29, 2024

@peternied I can take this up either later today or tomorrow, the solution should just be to separate out the tests into 2 classes. I was under the impression that this test only failed occasionally since it wasn't being reported so didn't take it up on priority.

@jed326 jed326 self-assigned this Feb 29, 2024
@jed326
Copy link
Collaborator

jed326 commented Mar 1, 2024

Looking at the test logs https://build.ci.opensearch.org/job/gradle-check/32902/consoleText it looks like the test just runs out of time.

The concurrent segment search disabled tests take from 2024-01-30T16:53:56,964 -> 2024-01-30T17:07:48,861, which is already 14 of the 20 allotted minutes for the test class.

It seems like the full suite can take around 28 minutes so going to bump the suite timeout to 35 minutes to give it some breathing room. PR incoming...

See: #12512 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Random test failure that succeeds on second run Search Search query, autocomplete ...etc
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants