Add wait_if_ongoing for refresh API increasing refresh reliability #91578

luyuncheng · 2022-11-15T07:17:10Z

Problem Statement

When we have many frequently update and refresh request for many indices (i know refresh is a resource-intensive api as docs says: ).

it shows that refresh queue would blocked hundreds of thousands of queued requests even in 3 nodes with 16G heap, 10 primary shards, 20 replications shards, and 400G storage, ES Version 8.4.3, 96 Core CPU

I think this because

REFRESH thread pool type is ThreadPoolType.SCALING
TransportShardRefreshAction is extends of TransportReplicationAction which forceExecution is default true in replication(like TransportReplicationAction.java#L200 )
REFRESH API would call InternalEngine refresh with block = true (like InternalEngine.java#L1795 ) and in hot threads shows block in acquire lock

So a refresh request would expands to indices * shards * replications (in our test case is 30), with blocked executions

Proposal

May be we can add a wait_if_ongoing parameter in refresh api like flush api. which can make refresh requests with nonblocking. just calling InternalEngine#maybeRefresh. when it can not acquire a lock, it must be a in-flight refresh task is running

elasticsearchmachine · 2022-11-15T19:26:50Z

Pinging @elastic/es-distributed (Team:Distributed)

luyuncheng added 2 commits November 14, 2022 21:09

1. Add wait_if_ongoing for refresh request, reduce refresh pressure

afa7def

1. Add wait_if_ongoing for refresh request, reduce refresh pressure

2e67c4d

elasticsearchmachine added the v8.6.0 label Nov 15, 2022

luyuncheng mentioned this pull request Nov 15, 2022

Add wait_if_ongoing for refresh API increasing refresh reliability #91579

Closed

elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 15, 2022

nik9000 added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Nov 15, 2022

elasticsearchmachine added Team:Distributed Meta label for distributed team and removed needs:triage Requires assignment of a team area label labels Nov 15, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wait_if_ongoing for refresh API increasing refresh reliability #91578

Add wait_if_ongoing for refresh API increasing refresh reliability #91578

luyuncheng commented Nov 15, 2022 •

edited

elasticsearchmachine commented Nov 15, 2022

Add wait_if_ongoing for refresh API increasing refresh reliability #91578

Are you sure you want to change the base?

Add wait_if_ongoing for refresh API increasing refresh reliability #91578

Conversation

luyuncheng commented Nov 15, 2022 • edited

Problem Statement

Proposal

elasticsearchmachine commented Nov 15, 2022

luyuncheng commented Nov 15, 2022 •

edited