Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot restore requests allow you to restore alias with write data stream, even if the alias already has a write data stream #80976

Closed
jrodewig opened this issue Nov 23, 2021 · 5 comments · Fixed by #81217
Assignees
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team

Comments

@jrodewig
Copy link
Contributor

jrodewig commented Nov 23, 2021

Elasticsearch version (bin/elasticsearch --version):
Version: 8.1.0-SNAPSHOT, Build: default/tar/e38cadb5283feeea06f09d555dbe47fb1d5e8b39/2021-11-22T22:35:22.825882Z, JVM: 17.0.1

OS version (uname -a if on a Unix-like system):
Darwin 20.6.0 Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:

I expect ES to reject snapshot restore requests that restore an alias with a write data stream if that alias already exists and has another write data stream. However, ES currently accepts these requests. It appears that the alias uses the restored write data stream, which may be unexpected and confusing to users.

This is likely related to #80972.

Steps to reproduce:

  1. Register a snapshot repository
PUT /_snapshot/my_repository
{
  "type": "fs",
  "settings": {
    "location": "my_backup_location"
  }
}
  1. Create the logs-my_app-default data stream.
POST logs-my_app-default/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z"
}
  1. Add the logs alias to the data stream and set it as the write data stream.
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-my_app-default",
        "alias": "logs",
        "is_write_index": true
      }
    }
  ]
}
  1. Create a snapshot containing the data stream. This also backs up its alias.
PUT _snapshot/my_repository/my_snapshot?wait_for_completion=true
  1. Delete the data stream.
DELETE _data_stream/logs-my_app-default
  1. Create another data stream, logs-my_other_app-default, and add it as the write data stream to the logs alias.
POST logs-my_other_app-default/_doc
{
  "@timestamp": "2099-05-07T16:21:15.000Z"
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-my_other_app-default",
        "alias": "logs",
        "is_write_index": true
      }
    }
  ]
}
  1. Restore the logs-my_app-default data stream and its aliases from the snapshot.
POST /_snapshot/my_repository/my_snapshot/_restore?wait_for_completion=true
{
  "indices": "logs-my_app-default",
  "include_aliases": true
}

I expected this to return an error like alias [logs] has more than one write index [logs-my_other_app-default,logs-my_app-default]. This occurs for index aliases in a similar situation. However, the request was accepted.

When you check the logs alias, the restored logs-my_app-default data stream
is the current write data stream:

GET _alias/logs

Returns:

{
  "logs-my_other_app-default" : {
    "aliases" : {
      "logs" : { }
    }
  },
  "logs-my_app-default" : {
    "aliases" : {
      "logs" : {
        "is_write_index" : true
      }
    }
  }
}
@jrodewig jrodewig added >bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Data Management/Data streams Data streams and their lifecycles labels Nov 23, 2021
@elasticmachine elasticmachine added Team:Distributed Meta label for distributed team Team:Data Management Meta label for data/management team labels Nov 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@martijnvg
Copy link
Member

martijnvg commented Nov 24, 2021

If an alias with same name does exist in the cluster then restore logic tries to merge
the alias from the snapshot with the alias in the cluster.

This behaviour already existed when restoring indices aliases from a snapshot:

PUT /_snapshot/my_repository
{
  "type": "fs",
  "settings": {
    "location": "my_backup_location"
  }
}

POST index1/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z"
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index1",
        "alias": "logs"
      }
    }
  ]
}

PUT _snapshot/my_repository/my_snapshot?wait_for_completion=true

DELETE index1

POST index2/_doc
{
  "@timestamp": "2099-05-07T16:21:15.000Z"
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index2",
        "alias": "logs"
      }
    }
  ]
}

POST /_snapshot/my_repository/my_snapshot/_restore?wait_for_completion=true
{
  "indices": "index1",
  "include_aliases": true
}

After thinking about this more about this, I tend to agree that throwing an error is better.
However I do think there may be cases where this current behaviour may be desired.
Looking back at this, I think it would have made sense to include some kind of a merge parameter,
that allows to control this. And then perhaps defaulting to fail with an error in the case that
an alias is being restored that also exists in the cluster.

@jrodewig
Copy link
Contributor Author

Thanks for looking into this @martijnvg.

If an alias with same name does exist in the cluster then restore logic tries to merge
the alias from the snapshot with the alias in the cluster.

This behaviour already existed when restoring indices aliases from a snapshot

The key differentiator here is the presence of a write index or write data stream. A more comparable example for an index alias would be:

  1. Register the repository.
PUT /_snapshot/my_repository
{
  "type": "fs",
  "settings": {
    "location": "my_backup_location"
  }
}
  1. Create index1.
POST index1/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z"
}
  1. Add index1 as the write index for the logs alias.
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index1",
        "alias": "logs",
        "is_write_index": true
      }
    }
  ]
}
  1. Create a snapshot containing index1 and the logs alias.
PUT _snapshot/my_repository/my_snapshot?wait_for_completion=true
  1. Delete index1.
DELETE index1
  1. Create index2.
POST index2/_doc
{
"@timestamp": "2099-05-07T16:21:15.000Z"
}
  1. Add index2 as the write index for the logs alias.
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index2",
        "alias": "logs",
        "is_write_index": true
      }
    }
  ]
}
  1. Restore index1 and its aliases from the snapshot.
POST /_snapshot/my_repository/my_snapshot/_restore?wait_for_completion=true
{
  "indices": "index1",
  "include_aliases": true
}

This returns:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "alias [logs] has more than one write index [index2,index1]"
      }
    ],
    "type" : "illegal_state_exception",
    "reason" : "alias [logs] has more than one write index [index2,index1]"
  },
  "status" : 500
}

I'd expect ES to return a similar error for data stream aliases with conflicting write data streams.

@martijnvg
Copy link
Member

The key differentiator here is the presence of a write index or write data stream.

Good point. I started comparing the behaviour with aliases and overlooked that.

I'd expect ES to return a similar error for data stream aliases with conflicting write data streams.

Yes, I think in this case of a write data stream we always want to return an error. No need for a parameter that controls the merging behaviour of data stream aliases or something like that.

@martijnvg martijnvg removed the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Nov 25, 2021
@elasticmachine elasticmachine removed the Team:Distributed Meta label for distributed team label Nov 25, 2021
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 1, 2021
…stream

Prohibit restoring a data stream alias if its write data stream conflicts with
the write data stream of an existing data stream alias.

Currently, the write data stream of data stream alias in the snapshot is ignored and
the write data stream of the data stream alias currently active in the cluster
is always picked.

Closes elastic#80976
martijnvg added a commit that referenced this issue Dec 17, 2021
…stream (#81217)

Prohibit restoring a data stream alias if its write data stream conflicts with
the write data stream of an existing data stream alias.

Currently, the write data stream of data stream alias in the snapshot is ignored and
the write data stream of the data stream alias currently active in the cluster
is always picked.

In order to resolve this properly, I've merged the merge(...) and renameDataStreams(...)
methods into a restore(...) method, which merges a data stream and does renaming at
the same time. This should be ok, since renaming can only be done as part of restoring.
More importantly, this allows to validate whether an existing alias and
an alias from a snapshot to not have conflicting write data streams while at the same time
allowing that a write data stream to be renamed (because data streams got renamed).

Closes #80976
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 17, 2021
…stream (elastic#81217)

Prohibit restoring a data stream alias if its write data stream conflicts with
the write data stream of an existing data stream alias.

Currently, the write data stream of data stream alias in the snapshot is ignored and
the write data stream of the data stream alias currently active in the cluster
is always picked.

In order to resolve this properly, I've merged the merge(...) and renameDataStreams(...)
methods into a restore(...) method, which merges a data stream and does renaming at
the same time. This should be ok, since renaming can only be done as part of restoring.
More importantly, this allows to validate whether an existing alias and
an alias from a snapshot to not have conflicting write data streams while at the same time
allowing that a write data stream to be renamed (because data streams got renamed).

Closes elastic#80976
elasticsearchmachine pushed a commit that referenced this issue Dec 17, 2021
…stream (#81217) (#81843)

Prohibit restoring a data stream alias if its write data stream conflicts with
the write data stream of an existing data stream alias.

Currently, the write data stream of data stream alias in the snapshot is ignored and
the write data stream of the data stream alias currently active in the cluster
is always picked.

In order to resolve this properly, I've merged the merge(...) and renameDataStreams(...)
methods into a restore(...) method, which merges a data stream and does renaming at
the same time. This should be ok, since renaming can only be done as part of restoring.
More importantly, this allows to validate whether an existing alias and
an alias from a snapshot to not have conflicting write data streams while at the same time
allowing that a write data stream to be renamed (because data streams got renamed).

Closes #80976
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants