Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.8.0 migrations temporary ES failures could cause permanent migration failure #158733

Closed
rudolf opened this issue May 31, 2023 · 6 comments · Fixed by #158940
Closed

8.8.0 migrations temporary ES failures could cause permanent migration failure #158733

rudolf opened this issue May 31, 2023 · 6 comments · Fixed by #158940
Assignees
Labels
blocker bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.8.1

Comments

@rudolf
Copy link
Contributor

rudolf commented May 31, 2023

Usually, when Kibana migrations fail due to a temporary problem in Elasticsearch, Kibana is able to automatically succeed in finishing the migration when the failure condition is resolved.

When upgrading to 8.8.0 it's possible that a temporary Elasticsearch error (high disk watermark or circuit breaker exceptions) causes Kibana to permanently fail with an error like:

  {
    "index": ".kibana_alerting_cases_8.8.0_001",
    "id": "action:90cb3e60-3fdb-11ed-b808-09c70a0298e6",
    "cause": {
      "type": "cluster_block_exception",
      "reason": "index [.kibana_alerting_cases_8.8.0_001] blocked by: [FORBIDDEN/8/index write (api)];"
    },
    "status": 403
  }

This can happen if:

  1. the [.kibana] migrator finishes it's migration and completes the UPDATE_TARGET_MAPPINGS_META step
  2. one or more of the other index migrators e.g. [.kibana_alerting_cases] are unable to successfully complete the CLONE_TEMP_TO_TARGET step

This causes an inconsistent state where the metadata in .kibana suggests the splitting migration had completed but it in fact had not.

Mitigation

  1. Ensure the underlying Elasticsearch health condition has been resolved
  2. Stop all Kibana instances
  3. Restore the Kibana feature state from the last snapshot before the upgrade https://www.elastic.co/guide/en/kibana/8.8/upgrade-migrations-rolling-back.html#_roll_back_by_restoring_the_kibana_feature_state_from_a_snapshot
  4. Start Kibana.

Potential data loss

Failed 8.8.0 migrations could lead to data loss under some very specific circumstances.

Detection

If the following criteria apply your cluster might have lost data, please contact support or revert to a snapshot from before the upgrade:

  1. You're upgrading an existing Kibana cluster
  2. The upgrade to 8.8.0 failed and then eventually succeeded
  3. The Kibana server logs during the upgrade contain INIT -> CREATE_NEW_TARGET entries.

Overview

When upgrading to a new stack version, Kibana runs a migration logic to upgrade saved objects documents. Eventually, these objects are copied over to new version indices.

As part of the dot kibana split, when upgrading to 8.8.0 (or later), the migration logic will create a bunch of new indices, and distribute Saved Objects stored in .kibana across these new indices. If the migration succeeds only partially (e.g. some indices are completely migrated and others aren't), we can be in a situation where some of the saved objects from .kibana index aren't properly copied over to the new version indices.

Scenario A. The write_blocked indices

  • First migration attempt:
    • One of the newer indices, e.g. .kibana_alerting_cases, is correctly created and contains all the saved objects that are intended to go into that index.
    • It fails due to some external factors (e.g. shard_limit_exceptions from ES), and .kibana index is NOT migrated.
  • Subsequent attempts:
    • .kibana index is on a "before split" state, so Kibana determines it needs to do the split.
    • .kibana_alerting_cases migrator will run a reindex flow, locking the existing .kibana_alerting_cases_8.8.0_001.
    • That very same migrator will attempt to update documents on the same index on a later step, failing to do so due to the write_block.

Kibana upgrade process gets stuck on a bootloop.
Removing the write_block is pointless, as it will be re-created at each restart.
The bootloop is not completely hard-locked though. If .kibana migrator manages to complete the migration process before the .kibana_alerting_cases fails, Kibana will believe it is on an "after split" state, and the other indices won't be write_blocked again. Manually removing the write_block at this point:

  • Will allow Kibana to start normally.
  • However, if any of the other indices' migrators did not complete the migration process, their corresponding SO will be missing. ⚠️ That's the reason why removing these write_blocks might be a bad idea.

Scenario B. The silent data loss

  • First migration attempt:
    • The .kibana migrator finishes dispatching all the saved objects to their corresponding indices, which also includes the newer version of the.kibana index itself, e.g. .kibana_8.8.0_001.
    • Another index migrator, e.g. .kibana_alerting_cases fails to clone from the temporary index .kibana_alerting_cases_8.8.0_reindex_temp into the target index .kibana_alerting_cases_8.8.0_001.
    • The migration attempt fails, but the saved object indices are left on a "post split" state.
  • Next migration attempt:
    • Kibana sees that the .kibana index is aligned with current stack version, and it determines that there is no need to split.
    • .kibana_alerting_cases migrator does not see its own index (it did not get to create the entrypoint aliases, aka .kibana_alerting_cases and .kibana_alerting_cases_8.8.0), so it assumes it is on a fresh deployment scenario. Depending on whether the first attempt failed:
      • Before the clone operation. It creates an empty .kibana_alerting_cases_8.8.0_001 and completes the migration.
        • ⚠️ All saved objects intended to go to that index are not copied over from .kibana_<previousVersion>_001, so from the SavedObjects API standpoint, they are effectively lost.
      • After the clone operation. It will attempt to create a .kibana_alerting_cases_8.8.0_001 which already exists (no-op), perform a few updates, and finally create the entrypoint aliases. This scenario has 2 possible sub-branches:
        • Before the _mappings are updated (most likely). The index created in the previous attempt won't have any mappings on it. The stored saved objects won't be indexed by any fields, and thus they won't be searchable ⚠️. Restarting Kibana again in this scenario should trigger a "compatible mappings" migration, which should then update ALL the documents in the index, so that ES can properly index them and they become searchable.
        • Ater the _mappings are updated (less likely). In this scenario, the documents should already be searchable, without any impact ✅ .

Technical details

  • The saved object relocation (into multiple indices) requires reindexing, and it is performed in multiple steps:
    • First, SO documents are transformed and added to temporary indices (e.g. .kibana_analytics_8.8.0_reindex_temp).
    • Then, these indices are cloned to the target indices (e.g. .kibana_analytics_8.8.0_001).
    • Finally, 2 aliases are created for each new index (e.g. .kibana_analytics and .kibana_analytics_8.8.0).

In order to perform migrations, Kibana launches a "migrator" instance for each of the SO indices. These migrators run in parallel, and they handle the upgrade process described above.

Since the migrators run "independently", it might happen during an upgrade that one migrator succeeds and another one does not. This was an acceptable scenario up until 8.8.0, cause each index was truly independent of each other. If a migrator failed to migrate a specific index, it would simply retry next start.

In 8.8.0, we introduce dependencies between migrators:

  • The .kibana migrator must dispatch SO documents to other indices, and to do so, it must wait for other migrators to create their corresponding .kibana_<domain>_8.8.0_reindex_temp temporary indices.
  • Once all migrators have finished transforming and adding documents to the temporary indices, they can all proceed to clone and update aliases.

There's one particularity that makes .kibana index special: it stores information about the type => index breakdown in the .kibana.mapping._meta.indexTypesMap property. At startup, this information allows Kibana to determine if some types must be relocated into other indices during an upgrade.

This is where the current issue lays: .kibana migrator is currently dispatching SO documents to other indices and then completing the rest of the migration process for its own index. However, upon successful migration, it will condition the behaviour of the migration logic on subsequent attempts (by storing the _meta.indexTypesMap). Thus, it should make sure that all migrators have finished cloning + migrating their own indices before considering itself successful.

@rudolf rudolf added bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects labels May 31, 2023
@botelastic botelastic bot added the needs-team Issues missing a team label label May 31, 2023
@rudolf rudolf added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label May 31, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor Author

rudolf commented Jun 1, 2023

Updated the issue to include the log entries that if present, would indicate potential data loss.

@Bamieh
Copy link
Member

Bamieh commented Jun 1, 2023

@rudolf; @gsoldevila and I discussed that the count might be false in case some documents are deleted during migration (disabled plugins, unused, etc)

@jrodewig
Copy link
Contributor

jrodewig commented Jun 2, 2023

@rudolf Should we add this to the 8.8 known issues? 1 If so, do you mind drafting an update for @elastic/kibana-docs to review?

Footnotes

  1. https://www.elastic.co/guide/en/kibana/8.8/release-notes-8.8.0.html

@lukeelmers
Copy link
Member

Should we add this to the 8.8 known issues?

I think we should, but we might want to wait a few more days until we know with 100% certainty that the fix will land in 8.8.1, that way we can give clear direction in the docs.

rudolf pushed a commit that referenced this issue Jun 5, 2023
…locating SO documents (#158940)

Fixes #158733

The goal of this modification is to enforce migrators of all indices
involved in a relocation (e.g. as part of the [dot kibana
split](#104081)) to create the
index aliases in the same `updateAliases()` call.

This way, either:
* all the indices involved in the [dot kibana
split](#104081) relocation will
be completely upgraded (with the appropriate aliases).
* or none of them will.
rudolf added a commit that referenced this issue Jun 5, 2023
## Summary

Adds a test for #158733. This is based on the un-merged #158940, so see
the last commit
[#6eafe910424414b5670e5f325accc59d87dd6dc4](6eafe91)
for the actual changes proposed by this PR


### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Gerard Soldevila <gerard.soldevila@elastic.co>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jun 5, 2023
## Summary

Adds a test for elastic#158733. This is based on the un-merged elastic#158940, so see
the last commit
[#6eafe910424414b5670e5f325accc59d87dd6dc4](elastic@6eafe91)
for the actual changes proposed by this PR

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |

### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Gerard Soldevila <gerard.soldevila@elastic.co>
(cherry picked from commit 75ec1ec)
kibanamachine added a commit that referenced this issue Jun 5, 2023
# Backport

This will backport the following commits from `main` to `8.8`:
- [Test for a failed clone during split migration
(#158998)](#158998)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Rudolf
Meijering","email":"skaapgif@gmail.com"},"sourceCommit":{"committedDate":"2023-06-05T09:24:28Z","message":"Test
for a failed clone during split migration (#158998)\n\n##
Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged
#158940, so see\r\nthe last
commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor
the actual changes proposed by this PR\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Gerard Soldevila
<gerard.soldevila@elastic.co>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Core","release_note:skip","backport:prev-minor","Epic:KBNA-7838","v8.9.0"],"number":158998,"url":"#158998
for a failed clone during split migration (#158998)\n\n##
Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged
#158940, so see\r\nthe last
commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor
the actual changes proposed by this PR\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Gerard Soldevila
<gerard.soldevila@elastic.co>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"#158998
for a failed clone during split migration (#158998)\n\n##
Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged
#158940, so see\r\nthe last
commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor
the actual changes proposed by this PR\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] Any UI
touched in this PR is usable by keyboard only (learn more\r\nabout
[keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n-
[ ] Any UI touched in this PR does not create any new axe
failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Gerard Soldevila
<gerard.soldevila@elastic.co>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd"}}]}]
BACKPORT-->

Co-authored-by: Rudolf Meijering <skaapgif@gmail.com>
cqliu1 pushed a commit to cqliu1/kibana that referenced this issue Jun 5, 2023
…locating SO documents (elastic#158940)

Fixes elastic#158733

The goal of this modification is to enforce migrators of all indices
involved in a relocation (e.g. as part of the [dot kibana
split](elastic#104081)) to create the
index aliases in the same `updateAliases()` call.

This way, either:
* all the indices involved in the [dot kibana
split](elastic#104081) relocation will
be completely upgraded (with the appropriate aliases).
* or none of them will.
cqliu1 pushed a commit to cqliu1/kibana that referenced this issue Jun 5, 2023
## Summary

Adds a test for elastic#158733. This is based on the un-merged elastic#158940, so see
the last commit
[#6eafe910424414b5670e5f325accc59d87dd6dc4](elastic@6eafe91)
for the actual changes proposed by this PR


### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Gerard Soldevila <gerard.soldevila@elastic.co>
sloanelybutsurely pushed a commit to sloanelybutsurely/kibana that referenced this issue Jun 6, 2023
…locating SO documents (elastic#158940)

Fixes elastic#158733

The goal of this modification is to enforce migrators of all indices
involved in a relocation (e.g. as part of the [dot kibana
split](elastic#104081)) to create the
index aliases in the same `updateAliases()` call.

This way, either:
* all the indices involved in the [dot kibana
split](elastic#104081) relocation will
be completely upgraded (with the appropriate aliases).
* or none of them will.
sloanelybutsurely pushed a commit to sloanelybutsurely/kibana that referenced this issue Jun 6, 2023
## Summary

Adds a test for elastic#158733. This is based on the un-merged elastic#158940, so see
the last commit
[#6eafe910424414b5670e5f325accc59d87dd6dc4](elastic@6eafe91)
for the actual changes proposed by this PR


### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Gerard Soldevila <gerard.soldevila@elastic.co>
@gsoldevila
Copy link
Contributor

@rudolf Should we add this to the 8.8 known issues? 1 If so, do you mind drafting an update for @elastic/kibana-docs to review?

@jrodewig @elastic/kibana-docs I've created a PR that adds a description for the issue in the Known issues, against the 8.8 branch:

#159197

gsoldevila added a commit that referenced this issue Jun 7, 2023
This PR adds #158733 to the list
of known issues:
* issue: #158733
* pull: #158940

---------

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
jrodewig added a commit that referenced this issue Jun 7, 2023
#159221)

# Backport

This will backport the following commits from `8.8` to `main`:
- [[DOCS+] Add #158940 to the list of 8.8.0 known issues
(#159197)](#159197)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Gerard
Soldevila","email":"gerard.soldevila@elastic.co"},"sourceCommit":{"committedDate":"2023-06-07T13:16:53Z","message":"[DOCS+]
Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds
#158733 to the list\r\nof known
issues:\r\n* issue: #158733
pull:
#158940:
James Rodewig
<james.rodewig@elastic.co>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Docs","release_note:skip","docs","Feature:Migrations"],"number":159197,"url":"#159197
Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds
#158733 to the list\r\nof known
issues:\r\n* issue: #158733
pull:
#158940:
James Rodewig
<james.rodewig@elastic.co>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3"}},"sourceBranch":"8.8","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Gerard Soldevila <gerard.soldevila@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.8.1
Projects
None yet
7 participants