Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

Closed
cjcenizal opened this issue Apr 19, 2021 · 1 comment · Fixed by #71928
Closed
Assignees
Labels
>bug :Distributed/CCR Issues around the Cross Cluster State Replication features Team:Distributed Meta label for distributed team

Comments

@cjcenizal
Copy link
Contributor

Discovered via elastic/kibana#68711. To reproduce, create a follower index, remove its remote cluster, and then try to unfollow the follower index.

# Delete remote
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "test3": { 
          "mode": null,
          "seeds": null,
          "skip_unavailable": null,
          "transport": {
            "compress": null
          }
        }
      }
    }
  }
}

# Pause, close, and unfollow the follower index
POST /my-index/_ccr/pause_follow
POST /my-index/_close
# This will timeout
POST /my-index/_ccr/unfollow

The ES logs will show this type of error and stacktrace:

   │ info [o.e.c.s.MasterService] [CJs-MacBook-Pro] exception thrown by listener while notifying of cluster state processed from [unfollow_action], old cluster state:
   │      cluster uuid: 7r3ijP6HT-2QY81Ne2x4kQ [committed: true]
   │      version: 82
   │      state uuid: 5BkNXu6gQD6GZaqRHwv26Q
   │      from_diff: false
   │      meta data version: 79

<snip>

   │      customs:
   │         security_tokens: TokenMetadata{ everything is secret }   restore: RestoreInProgress]
   │      org.elasticsearch.transport.NoSuchRemoteClusterException: no such remote cluster: [test2]
   │      	at org.elasticsearch.transport.RemoteClusterService.getRemoteClusterClient(RemoteClusterService.java:388) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.client.node.NodeClient.getRemoteClusterClient(NodeClient.java:156) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.xpack.ccr.action.TransportUnfollowAction$1.clusterStateProcessed(TransportUnfollowAction.java:101) ~[?:?]
   │      	at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:518) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:405) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
   │      	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:405) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:265) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:257) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:234) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:669) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
   │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
   │      	at java.lang.Thread.run(Thread.java:831) [?:?]
@cjcenizal cjcenizal added >bug :Distributed/CCR Issues around the Cross Cluster State Replication features labels Apr 19, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Apr 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner self-assigned this Apr 20, 2021
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Apr 20, 2021
We remove the retention leases from the leader cluster after processing
the cluster state update in the unfollow action, but today we assume
that we're still connected to the leader cluster when doing so. If the
leader cluster has been removed then `Client#getRemoteClusterClient`
throws an exception, which means the listener is never notified of the
failure.

This commit addresses this by catching the exception, logging a warning,
and routing the exception back to the client.

Closes elastic#71885
DaveCTurner added a commit that referenced this issue Apr 20, 2021
We remove the retention leases from the leader cluster after processing
the cluster state update in the unfollow action, but today we assume
that we're still connected to the leader cluster when doing so. If the
leader cluster has been removed then `Client#getRemoteClusterClient`
throws an exception, which means the listener is never notified of the
failure.

This commit addresses this by catching the exception, logging a warning,
and routing the exception back to the client.

Closes #71885
DaveCTurner added a commit that referenced this issue Apr 20, 2021
We remove the retention leases from the leader cluster after processing
the cluster state update in the unfollow action, but today we assume
that we're still connected to the leader cluster when doing so. If the
leader cluster has been removed then `Client#getRemoteClusterClient`
throws an exception, which means the listener is never notified of the
failure.

This commit addresses this by catching the exception, logging a warning,
and routing the exception back to the client.

Closes #71885
DaveCTurner added a commit that referenced this issue Apr 20, 2021
We remove the retention leases from the leader cluster after processing
the cluster state update in the unfollow action, but today we assume
that we're still connected to the leader cluster when doing so. If the
leader cluster has been removed then `Client#getRemoteClusterClient`
throws an exception, which means the listener is never notified of the
failure.

This commit addresses this by catching the exception, logging a warning,
and routing the exception back to the client.

Closes #71885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/CCR Issues around the Cross Cluster State Replication features Team:Distributed Meta label for distributed team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants