Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix renaming data streams with CCR replication #88875

Merged
merged 8 commits into from Aug 1, 2022

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jul 27, 2022

This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:

PUT /_ccr/auto_follow/my-auto-follow-pattern
{
  "remote_cluster" : "other-cluster",
  "leader_index_patterns" : ["logs-*"],
  "follow_index_pattern" : "{{leader_index}}_copy"
}

And then the data stream logs-mysql-error was created, creating the backing index
.ds-logs-mysql-error-2022-07-29-000001.

Prior to this commit, replicating this data stream means that the backing index would be renamed to
.ds-logs-mysql-error-2022-07-29-000001_copy and the data stream would not be renamed. This
caused a check to trip in TransportPutLifecycleAction asserting that a backing index was not
renamed for a data stream during following.

After this commit, there are a couple of changes:

First, the data stream will also be renamed. This means that the logs-mysql-error becomes
logs-mysql-error_copy when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: data_stream_name. It works like this:

PUT /mynewindex/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": "myotherindex",
  "data_stream_name": "new_ds"
}

Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
.ds-logs-mysql-error-2022-07-29-000001 would be renamed to
.ds-logs-mysql-error-2022-07-29-000001_copy (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
.ds-logs-mysql-error_copy-2022-07-29-000001 to match the renamed data stream. This means that for
the given follow_index_pattern of {{leader_index}}_copy the index changes look like:

Leader Cluster Follower Cluster
logs-mysql-error (data stream) logs-mysql-error_copy (data stream)
.ds-logs-mysql-error-2022-07-29-000001 .ds-logs-mysql-error_copy-2022-07-29-000001

Which internally means the auto-follow request turned into the create follower request of:

PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
  "data_stream_name": "logs-mysql-error_copy"
}

Relates to #84940 (cherry-picked the commit for a test)
Relates to #61993 (where data stream support was first introduced for CCR)
Resolves #81751

@dakrone dakrone marked this pull request as draft July 27, 2022 22:00
@dakrone dakrone force-pushed the fix-ccr-ds-rename branch 2 times, most recently from 840d570 to 32e82d7 Compare July 28, 2022 17:51
@dakrone
Copy link
Member Author

dakrone commented Jul 28, 2022

@elasticsearchmachine test this please

martijnvg and others added 2 commits July 29, 2022 10:45
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:

```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
  "remote_cluster" : "other-cluster",
  "leader_index_patterns" : ["logs-*"],
  "follow_index_pattern" : "{{leader_index}}_copy"
}
```

And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.

Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.

After this commit, there are a couple of changes:

First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:

```
PUT /mynewindex/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": "myotherindex",
  "data_stream_name": "new_ds"
}
```

Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:

| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001`      | `.ds-logs-mysql-error_copy-2022-07-29-000001` |

Which internally means the auto-follow request turned into the create follower request of:

```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
  "data_stream_name": "logs-mysql-error_copy"
}
```

Relates to elastic#84940 (cherry-picked the commit for a test)
Relates to elastic#61993 (where data stream support was first introduced for CCR)
Resolves elastic#81751
@dakrone dakrone changed the title WIP For CCR Fix Fix renaming data streams with CCR replication Jul 29, 2022
@dakrone dakrone marked this pull request as ready for review July 29, 2022 17:00
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jul 29, 2022
@dakrone dakrone added >bug :Distributed/CCR Issues around the Cross Cluster State Replication features :Data Management/Data streams Data streams and their lifecycles v8.4.0 and removed needs:triage Requires assignment of a team area label labels Jul 29, 2022
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Distributed Meta label for distributed team labels Jul 29, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @dakrone, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@dakrone dakrone added :Distributed/CCR Issues around the Cross Cluster State Replication features and removed :Distributed/CCR Issues around the Cross Cluster State Replication features labels Jul 29, 2022
@dakrone
Copy link
Member Author

dakrone commented Jul 29, 2022

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I had some ideas/concerns shared, but I see that I am probably my understanding of the data stream is not complete :) . I would be happy to discuss those but I do not want to block this PR for this.

@@ -76,6 +76,26 @@ referenced leader index. When this API returns, the follower index exists, and
(Required, string) The <<remote-clusters,remote cluster>> containing
the leader index.

[[ccr-put-follow-request-body-data_stream_name]]`data_stream_name`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if I understand correctly, CCR works on index level. With this API what we do is to follow a single leader index that belongs to a data stream and we provide also a data stream name. I do not see from this doc, any guidance on naming the follower index or the data stream of the follower index so it could be anything right?

Furthermore, I was wondering, if CCR is defined on an index level, this means that after the leader index will be rolled over the leader data stream would not be followed anymore. Right?

So, I was wondering. Is this API maybe too "free" and error prone, users might put invalid names and potentially create weird bugs?

Should we instead, verify the name of the follower index that it follows the data stream pattern and extract the data stream name from it?

So, if a user tries to follow an data stream backing index, for example, .ds-logs-mysql-default_copy-2022-01-01-000001 we will infer logs-mysql-default_copy as the data stream name. While in the case they try to create a follower index like .ds-logs-mysql-default-2022-01-01-000001_copy, we will fail and let them now that this is an invalid name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead, verify the name of the follower index that it follows the data stream pattern and extract the data stream name from it?

It's worse than that, a data stream can have any arbitrary backing index name, because we have the migrate to data stream API that converts an alias to a data stream. So a data stream named logs-mysql-info could have backing indices named myfirstindex and other-index following no pattern. Rather than enforce that there is a pattern with the backing index names, I opted to go with a check for something that looks like our regular pattern and if that doesn't work, then we treat it as an "I don't know" pattern and just do the regular name replacement.

If for some reason we have something that looks like a good pattern name, but is not, then we throw an exception and stop the auto-following. I'll add a test for this specific scenario (in e3001d5).

Furthermore, I was wondering, if CCR is defined on an index level, this means that after the leader index will be rolled over the leader data stream would not be followed anymore. Right?

If the leader is rolled over then the auto-follow pattern would pick up the next index in the data stream (presumably <thing>-000002), the data stream is not really "followed" because CCR doesn't really follow data streams, but the next index would be replicated, only the data stream would already exist because the first index would have created it already.

Things are a little different in ILM, because ILM auto-injects some unfollow actions around rollover, but regardless the naming thing would be the same.

assertThat(request.getRemoteCluster(), equalTo("remote"));
assertThat(request.getFollowerIndex(), equalTo("my-backing-index_copy"));
assertThat(request.getLeaderIndex(), equalTo("my-backing-index"));
assertThat(request.getDataStreamName(), equalTo("logs-foo-bar_copy"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is nicely representing the concern I raised earlier. I didn't know that it was possible and it's hard for me to assess the consequences.

@dakrone
Copy link
Member Author

dakrone commented Aug 1, 2022

💚 All backports created successfully

Status Branch Result
8.4

Questions ?

Please refer to the Backport tool documentation

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 1, 2022
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:

```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
  "remote_cluster" : "other-cluster",
  "leader_index_patterns" : ["logs-*"],
  "follow_index_pattern" : "{{leader_index}}_copy"
}
```

And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.

Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.

After this commit, there are a couple of changes:

First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:

```
PUT /mynewindex/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": "myotherindex",
  "data_stream_name": "new_ds"
}
```

Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:

| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001`      | `.ds-logs-mysql-error_copy-2022-07-29-000001` |

Which internally means the auto-follow request turned into the create follower request of:

```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
  "data_stream_name": "logs-mysql-error_copy"
}
```

Relates to elastic#84940 (cherry-picked the commit for a test)
Relates to elastic#61993 (where data stream support was first introduced for CCR)
Resolves elastic#81751

(cherry picked from commit 3420be0)
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 1, 2022
This disabled backwards compatibility tests so the CCR bugfix can be backported, since it includes
some version-specific serialization.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 1, 2022
Updates versions for backport (elastic#88995) of original fix (elastic#88875)

Relates to elastic#88997
dakrone added a commit that referenced this pull request Aug 1, 2022
dakrone added a commit that referenced this pull request Aug 1, 2022
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:

```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
  "remote_cluster" : "other-cluster",
  "leader_index_patterns" : ["logs-*"],
  "follow_index_pattern" : "{{leader_index}}_copy"
}
```

And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.

Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.

After this commit, there are a couple of changes:

First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:

```
PUT /mynewindex/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": "myotherindex",
  "data_stream_name": "new_ds"
}
```

Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:

| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001`      | `.ds-logs-mysql-error_copy-2022-07-29-000001` |

Which internally means the auto-follow request turned into the create follower request of:

```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
  "data_stream_name": "logs-mysql-error_copy"
}
```

Relates to #84940 (cherry-picked the commit for a test)
Relates to #61993 (where data stream support was first introduced for CCR)
Resolves #81751
elasticsearchmachine pushed a commit that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Data streams Data streams and their lifecycles :Distributed/CCR Issues around the Cross Cluster State Replication features Team:Data Management Meta label for data/management team Team:Distributed Meta label for distributed team v8.4.0 v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto follow patterns with 'follow_index_pattern' fail to replicate data streams
5 participants