New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] FeatureUpgradeIT testGetFeatureUpgradeStatus failing with IllegalArgumentException: mapping with hash [...] not found #101331
Comments
Pinging @elastic/es-search (Team:Search) |
The important stack trace is not the one HOMER put in the issue description, but this one:
|
This failure is on a cluster state update, switching to distributed |
Pinging @elastic/es-distributed (Team:Distributed) |
It does not reproduce locally with about 30 runs. The underlying failure is
triggered by Also raised #101499 to log the index name on unfound mapping hash. |
Should we consider this a higher priority as it seems to prevent node from starting after the upgrade? |
I've got a timeline from a local reproduction:
Logs for all three nodes are here: I was running I don't know why node 2 doesn't receive the mapping update from node 0. But it makes sense that #99668 would have caused this. Previously, the system index metadata upgrade service waited until all nodes in the cluster had the same version before running an update, so that mapping update never would have run before all three nodes were upgraded. Come to think of it, the system index mapping update shouldn't be running this early either. So I think there's a bug in the system index mapping update code, and I'll look for it now. |
elastic#99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. Fixes elastic#99778, elastic#101331
Pinging @elastic/es-core-infra (Team:Core/Infra) |
* Don't update system index mappings in mixed clusters #99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes #99778, #101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.
* Don't update system index mappings in mixed clusters elastic#99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes elastic#99778, elastic#101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.
* Don't update system index mappings in mixed clusters #99668 seems to have introduced a bug where SystemIndexMappingUpdateService updates system index mappings even in mixed clusters. This PR restores the old version-based check in order to be sure that there's no update until the cluster is fully upgraded. The timing of the mapping update seems to be causing worse problems, corrupting persisted cluster state. Fixes #99778, #101331 * Remove broken assertion The compatibility versions objects are not showing up correctly, so we shouldn't assert on them.
Another failure: https://gradle-enterprise.elastic.co/s/heiqnz57frdhe |
@williamrandolph Can you please take another look? |
I'm not seeing the |
@volodk85 unless the cause is confirmed to be the same, we shouldn't reopen test failure issues after they have been closed for a while. |
On the second node, a
Looks like the same thing as here: #103285 Just from what Gradle reported, I think it made sense to re-open this issue. It's just that digging deeper, it turned out to be something else. |
This is also already tracked in #103358 (same issue) |
Since the underlying issue is tracked in a few places already, I'm going to close this issue again. We should save this one for failures with the message
I'll update the title to reflect that. |
This might need to have the Failure Store feature flag enabled for the test clusters in the rolling upgrade. I think that #103358 is the same symptom but the solution lives in a different place than this test. |
Build scan:
https://gradle-enterprise.elastic.co/s/ibqgql7fylvvw/tests/:qa:rolling-upgrade:v7.17.15%23bwcTest/org.elasticsearch.upgrades.FeatureUpgradeIT/testGetFeatureUpgradeStatus%20%7BupgradedNodes=3%7D
Reproduction line:
Applicable branches:
main
Reproduces locally?:
Didn't try
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.FeatureUpgradeIT&tests.test=testGetFeatureUpgradeStatus%20%7BupgradedNodes%3D3%7D
Failure excerpt:
The text was updated successfully, but these errors were encountered: