Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

cjcenizal · 2021-04-19T22:05:20Z

Discovered via elastic/kibana#68711. To reproduce, create a follower index, remove its remote cluster, and then try to unfollow the follower index.

# Delete remote
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "test3": { 
          "mode": null,
          "seeds": null,
          "skip_unavailable": null,
          "transport": {
            "compress": null
          }
        }
      }
    }
  }
}

# Pause, close, and unfollow the follower index
POST /my-index/_ccr/pause_follow
POST /my-index/_close
# This will timeout
POST /my-index/_ccr/unfollow

The ES logs will show this type of error and stacktrace:

   │ info [o.e.c.s.MasterService] [CJs-MacBook-Pro] exception thrown by listener while notifying of cluster state processed from [unfollow_action], old cluster state:
   │      cluster uuid: 7r3ijP6HT-2QY81Ne2x4kQ [committed: true]
   │      version: 82
   │      state uuid: 5BkNXu6gQD6GZaqRHwv26Q
   │      from_diff: false
   │      meta data version: 79

<snip>

   │      customs:
   │         security_tokens: TokenMetadata{ everything is secret }   restore: RestoreInProgress]
   │      org.elasticsearch.transport.NoSuchRemoteClusterException: no such remote cluster: [test2]
   │      	at org.elasticsearch.transport.RemoteClusterService.getRemoteClusterClient(RemoteClusterService.java:388) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.client.node.NodeClient.getRemoteClusterClient(NodeClient.java:156) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.xpack.ccr.action.TransportUnfollowAction$1.clusterStateProcessed(TransportUnfollowAction.java:101) ~[?:?]
   │      	at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:518) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:405) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
   │      	at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:405) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:265) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:257) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:234) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:669) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
   │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
   │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
   │      	at java.lang.Thread.run(Thread.java:831) [?:?]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-04-19T22:05:22Z

Pinging @elastic/es-distributed (Team:Distributed)

We remove the retention leases from the leader cluster after processing the cluster state update in the unfollow action, but today we assume that we're still connected to the leader cluster when doing so. If the leader cluster has been removed then `Client#getRemoteClusterClient` throws an exception, which means the listener is never notified of the failure. This commit addresses this by catching the exception, logging a warning, and routing the exception back to the client. Closes elastic#71885

We remove the retention leases from the leader cluster after processing the cluster state update in the unfollow action, but today we assume that we're still connected to the leader cluster when doing so. If the leader cluster has been removed then `Client#getRemoteClusterClient` throws an exception, which means the listener is never notified of the failure. This commit addresses this by catching the exception, logging a warning, and routing the exception back to the client. Closes #71885

cjcenizal added >bug :Distributed/CCR Issues around the Cross Cluster State Replication features labels Apr 19, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Apr 19, 2021

cjcenizal mentioned this issue Apr 19, 2021

[CCR] App hangs if you try to unfollow a follower index that has had its remote cluster deleted elastic/kibana#68711

Closed

DaveCTurner self-assigned this Apr 20, 2021

DaveCTurner mentioned this issue Apr 20, 2021

Handle failure to get remote client in unfollow #71928

Merged

DaveCTurner closed this as completed in #71928 Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

cjcenizal commented Apr 19, 2021

elasticmachine commented Apr 19, 2021

Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

Unfollowing a follower index that has had its remote cluster deleted throws 500 error #71885

Comments

cjcenizal commented Apr 19, 2021

elasticmachine commented Apr 19, 2021