Improve handling of CCR threadpool rejections #92449

DaveCTurner · 2022-12-19T14:11:39Z

The CCR threadpool uses a fixed executor with default size 32 and default queue length 100, which means it rejects work if overloaded. However, it does not look like we handle these rejections very gracefully in several spots, even though the overload might be a transient situation:

ShardChangesAction.TransportAction#asyncShardOperation adds a GCP listener to run on the ccr pool, which if rejected looks like it might suppress some other notifications and propagate up into the ReplicationTracker.
ShardFollowTasksExecutor is a PersistentTasksExecutor which executes the task on the ccr pool, and on rejection the task is just marked as failed.
ShardFollowTasksExecutor#nodeOperation also just fails the task.
ShardFollowNodeTask#scheduleBackgroundRetentionLeaseRenewal uses scheduleWithFixedDelay which just stops running the scheduled task on rejection.
AutoFollower#finalise looks like it might call itself ad infinitum on rejection?
CcrRepository#restoreShard uses scheduleWithFixedDelay which just stops running the scheduled task on rejection. Possibly this is ok? If the restore fails I expect we will retry.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2022-12-19T14:12:02Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner added >bug :Distributed/CCR Issues around the Cross Cluster State Replication features labels Dec 19, 2022

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Dec 19, 2022

DaveCTurner mentioned this issue Jan 3, 2023

Handle rejection in PrioritizedThrottledTaskRunner #92621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of CCR threadpool rejections #92449

Improve handling of CCR threadpool rejections #92449

DaveCTurner commented Dec 19, 2022

elasticsearchmachine commented Dec 19, 2022

Improve handling of CCR threadpool rejections #92449

Improve handling of CCR threadpool rejections #92449

Comments

DaveCTurner commented Dec 19, 2022

elasticsearchmachine commented Dec 19, 2022