Allow snapshot restore after write alias has been moved by ILM #73934

matschaffer · 2021-06-09T02:52:47Z

I've seen some cases where a snapshot restore has failed with an error like this:

[illegal_state_exception] alias [matschaffer-filebeat-7.7.1] has more than one write index [matschaffer-filebeat-7.7.1-2021.03.22-000096,matschaffer-filebeat-7.7.1-2021.03.21-000095]

The sequence of events is roughly:

Data is being written to matschaffer-filebeat-7.7.1-2021.03.21-000095 via matschaffer-filebeat-7.7.1 write alias
A snapshot is taken which backs up matschaffer-filebeat-7.7.1-2021.03.21-000095 with the alias information
ILM rolls over matschaffer-filebeat-7.7.1-2021.03.21-000095 to matschaffer-filebeat-7.7.1-2021.03.21-000096 and updates the write alias
A failure occurs and matschaffer-filebeat-7.7.1-2021.03.21-000095 is lost
Restore of matschaffer-filebeat-7.7.1-2021.03.21-000095 fails because it attempts to also use the matschaffer-filebeat-7.7.1 write index, currently backed by matschaffer-filebeat-7.7.1-2021.03.21-000096

To work around this I had to perform the restore manually without aliases:

POST _snapshot/found-snapshots/cloud-snapshot-2021.03.22-UUID/_restore
{
    "indices": [
        "matschaffer-filebeat-7.7.1-2021.03.21-000095"
    ],
    "include_aliases": false
}

Then replace the read alias so the restored data would be available via normal query load:

POST _aliases
{
    "actions" : [
        { "add" : { "index" : "matschaffer-filebeat-7.7.1-2021.03.21-000095", "alias" : "matschaffer-filebeat-7.7.1", "is_write_index": false } }
    ]
}

It'd be great if restore could be more ILM-aware such that it won't try to re-claim write indices already backed by a more-current index.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-15T16:47:52Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2021-06-16T13:59:36Z

We (the @elastic/es-distributed team) discussed possible solutions in our team meeting today. Our favourite idea was to introduce a new option that would let you preserve the aliases of an existing index rather than overwriting them or clearing them as we do today. The reasoning was that when restoring an index like this you're really trying to put its data back without changing its place in the cluster, so the aliases of the existing index are likely more useful than the aliases in the snapshot.

We discussed changing the default behaviour but decided it'd be surprising for the API to behave differently from today by default. Instead we would expect tooling that restores indices like this to use this new option explicitly.

We also discussed whether to preserve any other metadata (mappings, settings, ...) rather than overwriting them from those in the snapshot but decided that there are too many ways that such a mechanism might lead to operational surprises.

How does that sound @matschaffer?

matschaffer · 2021-06-17T02:58:34Z

Hard to say without a little more detail.

My expectation would be that you have some ability to restore matschaffer-filebeat-7.7.1-2021.03.21-000095 with only the read alias, leaving the write alias pointed to matschaffer-filebeat-7.7.1-2021.03.21-000096. In contrast to today where you get either read+write or nothing (via include_aliases: false).

If the new option would do this, then that's probably fine. It'd be good if we make this the default in Kibana's restore UI, or maybe even in elasticsearch itself.

We see this with some frequency when orchestrating snapshot restore after VM failure on non-HA indices.

DaveCTurner · 2021-06-17T15:50:44Z

On closer inspection it seems that include_aliases: false already does what we propose, preserving the aliases of the existing closed index over the top of which we're doing the restore, but the orchestration tooling isn't setting this option so its restores will often fail as described. I believe we should always use include_aliases: false when restoring an index to recover it from some misadventure that left it in red health.

matschaffer · 2021-06-21T01:40:32Z

cc @elastic/cloud-orchestration for comment/prioritization

ean5533 · 2021-06-21T15:16:13Z

I don't have a strong understanding of all the implications here, but if the recommendation from ES is to just set include_aliases: false on all snapshot restores (no conditional logic) then we can do that very easily. cc @anyasabo

anyasabo · 2021-06-21T18:19:46Z

Yep +1 here, though dave your wording here has me a little concerned.

I believe we should always use include_aliases: false when restoring an index to recover it from some misadventure that left it in red health.

Should we just always be setting include_aliases: false?

deckkh · 2021-07-17T09:26:44Z

one additional thing , that happens to us after snapshot restore. By default , it will restore the ILM policy , which means that ILM usually kicks in and removes the restored index , shortly after restore has completed , which is very annoying.

We opened a support case on this and we pretty arrived at the conclusion , that the snapshot web interface cant be used and we have since then used dev tools for this , which is kinda sad.

matschaffer added >enhancement needs:triage Requires assignment of a team area label labels Jun 9, 2021

nik9000 added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs team-discuss and removed needs:triage Requires assignment of a team area label labels Jun 15, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Jun 15, 2021

DaveCTurner added feedback_needed and removed team-discuss labels Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow snapshot restore after write alias has been moved by ILM #73934

Allow snapshot restore after write alias has been moved by ILM #73934

matschaffer commented Jun 9, 2021

elasticmachine commented Jun 15, 2021

DaveCTurner commented Jun 16, 2021

matschaffer commented Jun 17, 2021 •

edited

DaveCTurner commented Jun 17, 2021

matschaffer commented Jun 21, 2021

ean5533 commented Jun 21, 2021 •

edited

anyasabo commented Jun 21, 2021 •

edited

deckkh commented Jul 17, 2021

Allow snapshot restore after write alias has been moved by ILM #73934

Allow snapshot restore after write alias has been moved by ILM #73934

Comments

matschaffer commented Jun 9, 2021

elasticmachine commented Jun 15, 2021

DaveCTurner commented Jun 16, 2021

matschaffer commented Jun 17, 2021 • edited

DaveCTurner commented Jun 17, 2021

matschaffer commented Jun 21, 2021

ean5533 commented Jun 21, 2021 • edited

anyasabo commented Jun 21, 2021 • edited

deckkh commented Jul 17, 2021

matschaffer commented Jun 17, 2021 •

edited

ean5533 commented Jun 21, 2021 •

edited

anyasabo commented Jun 21, 2021 •

edited