[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331

droberts195 · 2023-10-25T15:54:53Z

Build scan:
https://gradle-enterprise.elastic.co/s/ibqgql7fylvvw/tests/:qa:rolling-upgrade:v7.17.15%23bwcTest/org.elasticsearch.upgrades.FeatureUpgradeIT/testGetFeatureUpgradeStatus%20%7BupgradedNodes=3%7D
Reproduction line:

./gradlew ':qa:rolling-upgrade:v7.17.15#bwcTest' -Dtests.class="org.elasticsearch.upgrades.FeatureUpgradeIT" -Dtests.method="testGetFeatureUpgradeStatus {upgradedNodes=3}" -Dtests.seed=85641CEF59F01F1F -Dtests.bwc=true -Dtests.locale=ja-JP-u-ca-japanese-x-lvariant-JP -Dtests.timezone=Europe/Bratislava -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.FeatureUpgradeIT&tests.test=testGetFeatureUpgradeStatus%20%7BupgradedNodes%3D3%7D

Failure excerpt:

java.lang.RuntimeException: An error occurred while checking cluster 'test-cluster' status.

  at __randomizedtesting.SeedInfo.seed([85641CEF59F01F1F:4A3B8B4A3EE04BD8]:0)
  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.waitUntilReady(DefaultLocalClusterHandle.java:188)
  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.upgradeNodeToVersion(DefaultLocalClusterHandle.java:151)
  at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster.upgradeNodeToVersion(DefaultLocalElasticsearchCluster.java:134)
  at org.elasticsearch.upgrades.ParameterizedRollingUpgradeTestCase.upgradeNode(ParameterizedRollingUpgradeTestCase.java:126)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:39)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: java.io.IOException: 408 Request Timeout

    at org.elasticsearch.test.cluster.local.WaitForHttpResource.checkResource(WaitForHttpResource.java:129)
    at org.elasticsearch.test.cluster.local.WaitForHttpResource.waitFor(WaitForHttpResource.java:107)
    at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.waitUntilReady(DefaultLocalClusterHandle.java:186)
    at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.upgradeNodeToVersion(DefaultLocalClusterHandle.java:151)
    at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster.upgradeNodeToVersion(DefaultLocalElasticsearchCluster.java:134)
    at org.elasticsearch.upgrades.ParameterizedRollingUpgradeTestCase.upgradeNode(ParameterizedRollingUpgradeTestCase.java:126)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.lang.reflect.Method.invoke(Method.java:580)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
    at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:39)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:1583)

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-10-25T15:55:16Z

Pinging @elastic/es-search (Team:Search)

droberts195 · 2023-10-25T15:56:24Z

The important stack trace is not the one HOMER put in the issue description, but this one:

[2023-10-25T17:08:36,592][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-2] fatal error in thread [elasticsearch[test-cluster-2][cluster_coordination][T#1]], exiting java.lang.AssertionError: org.elasticsearch.gateway.CorruptStateException: org.elasticsearch.gateway.CorruptStateException: java.lang.IllegalArgumentException: mapping with hash [nta1u3NgXPKcAhx4AGeV0OW7hlyKINbUL0KI7DNvFwk=] not found
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService$Writer.assertOnCommit(PersistedClusterStateService.java:1245)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService$Writer.commit(PersistedClusterStateService.java:1235)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService$Writer.writeIncrementalStateAndCommit(PersistedClusterStateService.java:977)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.writeClusterStateToDisk(GatewayMetaState.java:600)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.setLastAcceptedState(GatewayMetaState.java:583)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.coordination.CoordinationState.handlePublishRequest(CoordinationState.java:392)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.coordination.Coordinator.handlePublishRequest(Coordinator.java:476)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.coordination.PublicationTransportHandler.acceptState(PublicationTransportHandler.java:214)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.coordination.PublicationTransportHandler.handleIncomingPublishRequest(PublicationTransportHandler.java:201)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.coordination.PublicationTransportHandler.lambda$new$0(PublicationTransportHandler.java:113)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:288)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:301)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.elasticsearch.gateway.CorruptStateException: org.elasticsearch.gateway.CorruptStateException: java.lang.IllegalArgumentException: mapping with hash [nta1u3NgXPKcAhx4AGeV0OW7hlyKINbUL0KI7DNvFwk=] not found
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.readXContent(PersistedClusterStateService.java:675)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.lambda$loadOnDiskState$12(PersistedClusterStateService.java:623)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.consumeFromType(PersistedClusterStateService.java:717)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.loadOnDiskState(PersistedClusterStateService.java:622)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService$Writer.assertOnCommit(PersistedClusterStateService.java:1243)
	... 17 more
Caused by: org.elasticsearch.gateway.CorruptStateException: java.lang.IllegalArgumentException: mapping with hash [nta1u3NgXPKcAhx4AGeV0OW7hlyKINbUL0KI7DNvFwk=] not found
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.lambda$loadOnDiskState$11(PersistedClusterStateService.java:627)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.readXContent(PersistedClusterStateService.java:673)
	... 21 more
Caused by: java.lang.IllegalArgumentException: mapping with hash [nta1u3NgXPKcAhx4AGeV0OW7hlyKINbUL0KI7DNvFwk=] not found
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexMetadata$Builder.fromXContent(IndexMetadata.java:2509)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexMetadata.fromXContent(IndexMetadata.java:1442)
	at org.elasticsearch.server@8.12.0-SNAPSHOT/org.elasticsearch.gateway.PersistedClusterStateService.lambda$loadOnDiskState$11(PersistedClusterStateService.java:625)
	... 22 more

benwtrent · 2023-10-27T11:24:06Z

This failure is on a cluster state update, switching to distributed

elasticsearchmachine · 2023-10-27T11:25:03Z

Pinging @elastic/es-distributed (Team:Distributed)

Relates: elastic#101331

ywangd · 2023-10-30T04:31:30Z

It does not reproduce locally with about 30 runs. The underlying failure is

java.lang.IllegalArgumentException: mapping with hash [nta1u3NgXPKcAhx4AGeV0OW7hlyKINbUL0KI7DNvFwk=] not found

triggered by PersistedClusterStateService#assertOnCommit which is the same as #99778. They are also both upgrade tests.
@williamrandolph thinks it is due to the change in #99668 and he is currently investigating it with #101016. Since the other issue is flagged as medium-risk, I am flagging this the same as well.

Also raised #101499 to log the index name on unfound mapping hash.

Relates: #101331

idegtiarenko · 2023-11-01T11:02:40Z

Should we consider this a higher priority as it seems to prevent node from starting after the upgrade?

Relates: elastic#101331

williamrandolph · 2023-11-02T20:14:30Z

I've got a timeline from a local reproduction:

Mapping failure investigation
Nodes 0, 1, 2

15:10:12 - tests begin
[2023-11-02T15:10:17,474][INFO ][o.e.n.Node               ] [v7.17.13-0] started
[2023-11-02T15:10:17,692][INFO ][o.e.n.Node               ] [v7.17.13-1] started
[2023-11-02T15:10:18,421][INFO ][o.e.n.Node               ] [v7.17.13-2] started

15:10:19 - node 1 is master
[2023-11-02T15:10:19,668][DEBUG][o.e.c.s.MasterService    ] [v7.17.13-1] took [6ms] to compute cluster state update for [elected-as-master ([2] nodes joined)

15:11:55 - tasks index is created
[2023-11-02T15:11:55,881][TRACE][o.e.c.s.MasterService    ] [v7.17.13-1] will process [auto create [.tasks]]

15:11:58 - Node 0 is stopped and restarted
[2023-11-02T19:11:58.918622Z] [BUILD] Stopping node

15:12:40 - Node 1 is stopped and restarted
[2023-11-02T19:12:40.987057Z] [BUILD] Stopping node

15:12:41 - Node 0 elected master, .tasks index update triggered
[2023-11-02T15:12:41,148][DEBUG][o.e.c.s.MasterService    ] [v7.17.13-0] took [8ms] to compute cluster state update for [elected-as-master ([2] nodes joined in term 7)
[2023-11-02T15:12:41,254][INFO ][o.e.i.SystemIndexMappingUpdateService] [v7.17.13-0] Index [.tasks] (alias [null]) mappings are not up-to-date and will be updated

15:12:42 - Node 0 updates system index mappings
[2023-11-02T15:12:42,200][DEBUG][o.e.c.s.MasterService    ] [v7.17.13-0] executing cluster state update for [put-mapping [.tasks/5sma6F-MR0mKEnpoRN8Sag][PutMappingClusterStateUpdateTask[request=org.elasticsearch.action.admin.indices.mapping.put.PutMappingClusterStateUpdateRequest@22736966, listener=org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener/org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{SafelyWrappedActionListener[listener=org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.indices.SystemIndexMappingUpdateService$1@2659a320]}{Task{id=1346, type='transport', action='indices:admin/mapping/put', description='', parentTask=unset, startTime=1698952361254, startTimeNanos=139856818146125}}/org.elasticsearch.action.support.master.TransportMasterNodeAction$$Lambda$6496/0x00000003021058a8@7402b8ea/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda$8336/0x00000003023b3380@6cde9b59/org.elasticsearch.action.admin.indices.mapping.put.TransportPutMappingAction$$Lambda$8341/0x00000003023b3be8@55dfe7fd]]]
[2023-11-02T15:12:42,205][INFO ][o.e.c.m.MetadataMappingService] [v7.17.13-0] [.tasks/5sma6F-MR0mKEnpoRN8Sag] update_mapping [task]
cluster uuid: Se98XR3wQkWurCBCQOrMEA [committed: true]
version: 246
state uuid: dbF4YgZpQfmJMZqEQfX0Kg

15:12:50 - Node 2 gets cluster state version 261
[2023-11-02T15:12:50,380][INFO ][o.e.c.s.ClusterApplierService] [v7.17.13-2] added {{v7.17.13-1}{LxBO0YqDQSSfFXxAaoVQBQ}{dyxRhb0NQq-2PgMYFlydpw}{127.0.0.1}{127.0.0.1:59892}{cdfhilmrstw}}, term: 7, version: 261, reason: ApplyCommitRequest{term=7, version=261, sourceNode={v7.17.13-0}{V-MIT5iuTOyvNLS0Z5GD4g}{j9ScnXQSQn2CHNPjyshTUg}{127.0.0.1}{127.0.0.1:59743}{cdfhilmrstw}{ml.allocated_processors_double=12.0, upgraded=true, ml.machine_memory=34359738368, xpack.installed=true, transform.config_version=10.0.0, testattr=test, ml.config_version=11.0.0, ml.max_jvm_size=536870912, ml.allocated_processors=12}}

15:12:51 - Node 1 adds updated mapping for .tasks
2023-11-02T15:12:51,085][DEBUG][o.e.i.m.MapperService    ] [v7.17.13-1] [.tasks] [[.tasks/5sma6F-MR0mKEnpoRN8Sag]] added mapping, source [{"task":{"dynamic":"strict","_meta":{"version":"8.12.0","managed_index_mappings_version":0}

There is no similar log message for Node 2

15:13:21 - Node 2 stops and restarts
[2023-11-02T19:13:21.925557Z] [BUILD] Stopping node

15:13:33 - During startup, node 2 hits the error
[2023-11-02T15:13:33,624][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.17.13-2] fatal error in thread [elasticsearch[v7.17.13-2][cluster_coordination][T#1]], exiting java.lang.AssertionError: org.elasticsearch.gateway.CorruptStateException: org.elasticsearch.gateway.CorruptStateException: java.lang.IllegalArgumentException: mapping of index [.tasks] with hash [07t3iDH1U18mgWco/xRfWlHPzxMIaLwTRh/T7NQ1ONk=] not found

Logs for all three nodes are here:
logs-101331.tar.gz

I was running ./gradlew ':qa:rolling-upgrade-legacy:v7.17.13#upgradedClusterTest' -Dtests.seed=DD0C8780632E495C -Dtests.bwc=true -Dtests.locale=sq -Dtests.timezone=Europe/Kiev -Druntime.java=20 against my branch here: #101016

I don't know why node 2 doesn't receive the mapping update from node 0. But it makes sense that #99668 would have caused this. Previously, the system index metadata upgrade service waited until all nodes in the cluster had the same version before running an update, so that mapping update never would have run before all three nodes were upgraded.

Come to think of it, the system index mapping update shouldn't be running this early either. So I think there's a bug in the system index mapping update code, and I'll look for it now.

elastic#99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. Fixes elastic#99778, elastic#101331

elasticsearchmachine · 2023-11-03T16:11:00Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

* Don't update system index mappings in mixed clusters #99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes #99778, #101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.

* Don't update system index mappings in mixed clusters elastic#99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes elastic#99778, elastic#101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.

* Don't update system index mappings in mixed clusters #99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes #99778, #101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.

volodk85 · 2023-12-12T22:22:38Z

Another failure: https://gradle-enterprise.elastic.co/s/heiqnz57frdhe

rjernst · 2023-12-13T05:15:47Z

@williamrandolph Can you please take another look?

williamrandolph · 2023-12-13T20:50:17Z

I'm not seeing the java.lang.IllegalArgumentException: mapping with hash [...] not found message in this most recent failure, but it's not yet clear to me what the real problem is. I'll keep looking.

benwtrent · 2023-12-13T21:02:57Z

@volodk85 unless the cause is confirmed to be the same, we shouldn't reopen test failure issues after they have been closed for a while.

williamrandolph · 2023-12-13T22:04:33Z

On the second node, a CorruptStateException:

[2023-12-12T17:17:00,062][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-2] fatal error in thread [elasticsearch[test-cluster-2][cluster_coordination][T#1]], exiting
java.lang.AssertionError: org.elasticsearch.gateway.CorruptStateException: org.elasticsearch.xcontent.XContentParseException: [-1:38420] [index_template] failed to parse field [index_template]
        at org.elasticsearch.gateway.PersistedClusterStateService$Writer.assertOnCommit(PersistedClusterStateService.java:1244) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.gateway.PersistedClusterStateService$Writer.commit(PersistedClusterStateService.java:1234) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.gateway.PersistedClusterStateService$Writer.writeIncrementalStateAndCommit(PersistedClusterStateService.java:976) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.writeClusterStateToDisk(GatewayMetaState.java:581) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.setLastAcceptedState(GatewayMetaState.java:564) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.cluster.coordination.CoordinationState.handlePublishRequest(CoordinationState.java:392) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.cluster.coordination.Coordinator.handlePublishRequest(Coordinator.java:463) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.cluster.coordination.PublicationTransportHandler.acceptState(PublicationTransportHandler.java:210) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.cluster.coordination.PublicationTransportHandler.handleIncomingPublishRequest(PublicationTransportHandler.java:197) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.cluster.coordination.PublicationTransportHandler.lambda$new$0(PublicationTransportHandler.java:109) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:74) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:315) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.9.2.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.9.2.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1623) ~[?:?]

Looks like the same thing as here: #103285

Just from what Gradle reported, I think it made sense to re-open this issue. It's just that digging deeper, it turned out to be something else.

ldematte · 2023-12-14T14:52:43Z

This is also already tracked in #103358 (same issue)

williamrandolph · 2023-12-14T19:35:20Z

Since the underlying issue is tracked in a few places already, I'm going to close this issue again. We should save this one for failures with the message

java.lang.IllegalArgumentException: mapping with hash [...] not found

I'll update the title to reflect that.

jbaiera · 2023-12-14T22:56:28Z

This might need to have the Failure Store feature flag enabled for the test clusters in the rolling upgrade. I think that #103358 is the same symptom but the solution lives in a different place than this test.

droberts195 added :Search/Mapping Index mappings, including merging and defining field types >test-failure Triaged test failures from CI labels Oct 25, 2023

elasticsearchmachine added blocker Team:Search Meta label for search team labels Oct 25, 2023

benwtrent added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team labels Oct 27, 2023

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Oct 27, 2023

ywangd added a commit to ywangd/elasticsearch that referenced this issue Oct 30, 2023

Logging index name on unfound mapping hash

3388e0c

Relates: elastic#101331

ywangd mentioned this issue Oct 30, 2023

Logging index name on unfound mapping hash #101499

Merged

ywangd added medium-risk An open issue or test failure that is a medium risk to future releases and removed blocker labels Oct 30, 2023

ywangd added a commit that referenced this issue Oct 30, 2023

Logging index name on unfound mapping hash (#101499)

43ed676

Relates: #101331

idegtiarenko mentioned this issue Nov 1, 2023

[CI] UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/10_basic/Find a task result record from the old cluster} failing #101575

Closed

idegtiarenko added blocker and removed medium-risk An open issue or test failure that is a medium risk to future releases labels Nov 1, 2023

mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this issue Nov 2, 2023

Logging index name on unfound mapping hash (elastic#101499)

f381714

Relates: elastic#101331

williamrandolph mentioned this issue Nov 3, 2023

Don't update system index mappings in mixed clusters #101778

Merged

rjernst added :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. Team:Distributed Meta label for distributed team labels Nov 3, 2023

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Nov 3, 2023

williamrandolph closed this as completed in #101778 Nov 3, 2023

volodk85 reopened this Dec 12, 2023

williamrandolph added low-risk An open issue or test failure that is a low risk to future releases and removed blocker labels Dec 14, 2023

williamrandolph changed the title ~~[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus {upgradedNodes=3} failing~~ [CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found Dec 14, 2023

williamrandolph closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331

[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331

droberts195 commented Oct 25, 2023

elasticsearchmachine commented Oct 25, 2023

droberts195 commented Oct 25, 2023

benwtrent commented Oct 27, 2023

elasticsearchmachine commented Oct 27, 2023

ywangd commented Oct 30, 2023

idegtiarenko commented Nov 1, 2023

williamrandolph commented Nov 2, 2023 •

edited

elasticsearchmachine commented Nov 3, 2023

volodk85 commented Dec 12, 2023

rjernst commented Dec 13, 2023

williamrandolph commented Dec 13, 2023

benwtrent commented Dec 13, 2023

williamrandolph commented Dec 13, 2023

ldematte commented Dec 14, 2023

williamrandolph commented Dec 14, 2023

jbaiera commented Dec 14, 2023

[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331

[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331

Comments

droberts195 commented Oct 25, 2023

elasticsearchmachine commented Oct 25, 2023

droberts195 commented Oct 25, 2023

benwtrent commented Oct 27, 2023

elasticsearchmachine commented Oct 27, 2023

ywangd commented Oct 30, 2023

idegtiarenko commented Nov 1, 2023

williamrandolph commented Nov 2, 2023 • edited

elasticsearchmachine commented Nov 3, 2023

volodk85 commented Dec 12, 2023

rjernst commented Dec 13, 2023

williamrandolph commented Dec 13, 2023

benwtrent commented Dec 13, 2023

williamrandolph commented Dec 13, 2023

ldematte commented Dec 14, 2023

williamrandolph commented Dec 14, 2023

jbaiera commented Dec 14, 2023

williamrandolph commented Nov 2, 2023 •

edited