Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] IngestGeoIpClientYamlTestSuiteIT tests failing #106737

Open
alex-spies opened this issue Mar 26, 2024 · 4 comments
Open

[CI] IngestGeoIpClientYamlTestSuiteIT tests failing #106737

alex-spies opened this issue Mar 26, 2024 · 4 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP medium-risk An open issue or test failure that is a medium risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@alex-spies
Copy link
Contributor

A lot of test failures in the @Before setup method, specifically at

org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.lambda$waitForDatabases$3(IngestGeoIpClientYamlTestSuiteIT.java:78)

All of them fail because the datadatabases_count is smaller than the expected 4.

Might be related to #101418 or #95496: failed in the same setup method.

Test failures:

  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/10_basic/ingest-geoip installed}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with geopoint mapping (both missing and including location)}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with different database file - GeoLite2-ASN}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/30_geoip_stats/Test geoip stats}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with lists, first only}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with list}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with defaults}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with different database file - GeoLite2-Country}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with fields}
  • org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/20_geoip_processor/Test simulate with Geoip Processor}

Build scan:
https://gradle-enterprise.elastic.co/s/ctt7b3ramfg5y/tests/:modules:ingest-geoip:yamlRestTest/org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT/test%20%7Byaml=ingest_geoip%2F10_basic%2Fingest-geoip%20installed%7D

Reproduction line:

./gradlew ':modules:ingest-geoip:yamlRestTest' --tests "org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/10_basic/ingest-geoip installed}" -Dtests.seed=47EF93F612EF4AEB -Dtests.locale=en -Dtests.timezone=UCT -Druntime.java=21

Applicable branches:
8.12

Reproduces locally?:
No

Failure history:
Failure dashboard for org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT#test {yaml=ingest_geoip/10_basic/ingest-geoip installed}

Failure excerpt:

java.lang.AssertionError: 
Expected: <4>
     but: was <2>

  at __randomizedtesting.SeedInfo.seed([47EF93F612EF4AEB:CFBBAC2CBC132713]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.lambda$waitForDatabases$3(IngestGeoIpClientYamlTestSuiteIT.java:78)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1278)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1251)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.waitForDatabases(IngestGeoIpClientYamlTestSuiteIT.java:73)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:47)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@alex-spies alex-spies added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >test-failure Triaged test failures from CI labels Mar 26, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Data Management Meta label for data/management team labels Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke
Copy link
Member

From what I can tell, we're blowing up while indexing the geoip data here:

[2024-03-25T04:38:17,628][ERROR][o.e.i.g.GeoIpDownloader  ] [test-cluster-0] error downloading geoip database [MyCustomGeoLite2-City.mmdb] [.geoip_databases] org.elasticsearch.index.IndexNotFoundException: no such index [.geoip_databases]
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:473)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1603)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.filterUnavailable(IndexNameExpressionResolver.java:1583)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.resolveExpressions(IndexNameExpressionResolver.java:265)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:340)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:331)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:90)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction.shards(TransportBroadcastReplicationAction.java:183)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction$1.accept(TransportBroadcastReplicationAction.java:94)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction$1.accept(TransportBroadcastReplicationAction.java:83)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
	at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

That's coming from the try/catch of GeoIpDownloader::processDatabase. From what I can tell, it looks like the exception happens either during a flush or refresh request in indexChunks. But immediately before we flush/refresh, we've done index requests into this index. So I have no idea how we'd get no such index [.geoip_databases].

@masseyke
Copy link
Member

Oh, I missed this in the log:

[2024-03-25T04:38:17,241][INFO ][o.e.c.m.MetadataDeleteIndexService] [test-cluster-0] [.geoip_databases/eSscKA11TjCN3mvQhDl9bw] deleting index

This is starting to look like the same geoip downloader race conditions we see a lot.

@masseyke
Copy link
Member

This looks like issue # 1 from #92888.

@dakrone dakrone added medium-risk An open issue or test failure that is a medium risk to future releases and removed blocker labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP medium-risk An open issue or test failure that is a medium risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants