Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: org.elasticsearch.grok.GrokTests.testExponentialExpressions #58946

Open
jakelandis opened this issue Jul 2, 2020 · 10 comments
Open
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@jakelandis
Copy link
Contributor

Given the rarity of this failure I am not muting. However, if this increases in occurrences this test should be muted.

Build scan:

https://gradle-enterprise.elastic.co/s/iak3abjnc4huu

Repro line:

REPRODUCE WITH: ./gradlew ':libs:elasticsearch-grok:test' --tests "org.elasticsearch.grok.GrokTests.testExponentialExpressions" \
  -Dtests.seed=2012EF738111106E \
  -Dtests.security.manager=true \
  -Dtests.locale=es-VE \
  -Dtests.timezone=Africa/Niamey \
  -Druntime.java=11

Reproduces locally?:
No. Ran a few thousand iterations (without the extra -D arguments ) and they all passed.

Applicable branches:

This failure is on master, but history shows it is not specific to master.

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?list.size=50&list.sortColumn=startTime&list.sortOrder=desc&search.buildToolType=gradle&search.buildToolType=maven&search.relativeStartTime=P28D&search.timeZoneId=America/Chicago&tests.container=org.elasticsearch.grok.GrokTests&tests.sortField=FAILED&tests.test=testExponentialExpressions&tests.unstableOnly=true

This is the only one in past 28 day.

However, searching my email archives i see ~ 15 failures in the past 1.5 years across all branches. So this is pretty rare, but not also not likely caused by cosmic rays. I suspect that there is a subtle race condition in the test itself.

Failure excerpt:

org.elasticsearch.grok.GrokTests > classMethod FAILED	
    java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).	
        at __randomizedtesting.SeedInfo.seed([2012EF738111106E]:0)	

and

2> WARNING: Suite execution timed out: org.elasticsearch.grok.GrokTests 

I believe that for this particular test, it can occur if the Watch dog thread does not kill the Grok match as it should.

@jakelandis jakelandis added >test-failure Triaged test failures from CI :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jul 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jul 2, 2020
@dakrone
Copy link
Member

dakrone commented Mar 3, 2021

Seeing as how we haven't had more occurrences of this since last July, I'm going to close this for now, we can reopen if needed.

@dakrone dakrone closed this as completed Mar 3, 2021
@DaveCTurner
Copy link
Contributor

@fcofdez
Copy link
Contributor

fcofdez commented May 12, 2023

Looks like this is happening again https://gradle-enterprise.elastic.co/s/biozovvlmi3dk

@fcofdez fcofdez reopened this May 12, 2023
@joegallo
Copy link
Contributor

joegallo commented May 18, 2023

"Thread-11715" ID=11760 TIMED_WAITING
	at java.base@20.0.1/java.lang.Thread.sleep0(Native Method)
	at java.base@20.0.1/java.lang.Thread.sleep(Thread.java:484)
	at app//org.elasticsearch.grok.GrokTests.lambda$testExponentialExpressions$2(GrokTests.java:776)
	at app//org.elasticsearch.grok.GrokTests$$Lambda$523/0x000000080132fd40.accept(Unknown Source)
	at app//org.elasticsearch.grok.MatcherWatchdog$Default.interruptLongRunningExecutions(MatcherWatchdog.java:142)
	at app//org.elasticsearch.grok.MatcherWatchdog$Default$$Lambda$528/0x000000080133f618.run(Unknown Source)
	at app//org.elasticsearch.grok.GrokTests.lambda$testExponentialExpressions$1(GrokTests.java:783)
	at app//org.elasticsearch.grok.GrokTests$$Lambda$527/0x000000080133f400.run(Unknown Source)
	at java.base@20.0.1/java.lang.Thread.runWith(Thread.java:1636)
	at java.base@20.0.1/java.lang.Thread.run(Thread.java:1623)

It's interesting to me that in the failures I don't see a thread like the above (in the output of the threads that are running when the timeout is reached). That makes me think that for some reason the watchdog didn't execute at all.

edit: Or it executed and died in some interesting way that we're not capturing.

@albertzaharovits
Copy link
Contributor

Here's another case of this https://gradle-enterprise.elastic.co/s/eiooe5kakotua .

@dakrone dakrone added the low-risk An open issue or test failure that is a low risk to future releases label Oct 12, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo
Copy link
Contributor

joegallo commented Nov 2, 2023

Still hoping for a new reproduction now that #96230 has been merged. No luck yet.

@joegallo
Copy link
Contributor

Ahhhh! How exciting! Thanks, @mark-vieira!

@joegallo joegallo self-assigned this Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

9 participants