Add wait_if_ongoing for refresh API increasing refresh reliability #91579

luyuncheng · 2022-11-15T07:17:38Z

Description

Problem Statement

When we have many frequently update and refresh request for many indices (maybe every write 4 docs, refresh 1, and Write 1000 docs per seconds every shards, i know refresh is a resource-intensive api as docs says: ).

it shows that refresh queue would blocked hundreds of thousands of queued requests even in 3 nodes with 16G heap, 10 primary shards, 20 replications shards, and 400G storage, ES Version 8.4.3, 96 Core CPU

I think this because

REFRESH thread pool type is ThreadPoolType.SCALING
TransportShardRefreshAction is extends of TransportReplicationAction which forceExecution is default true in replication(like TransportReplicationAction.java#L200 )
REFRESH API would call InternalEngine refresh with block = true (like InternalEngine.java#L1795 ) and in hot threads shows block in acquire lock

So a refresh request would expands to indices * shards * replications (in our test case is 30), with blocked executions

Proposal

May be we can add a wait_if_ongoing parameter in refresh api like flush api. which can make refresh requests with nonblocking. just calling InternalEngine#maybeRefresh. when it can not acquire a lock, it must be a in-flight refresh task is running

PR #91578

The text was updated successfully, but these errors were encountered:

nik9000 · 2022-11-15T19:22:12Z

Indexing has a already has a wait_for_refresh option which is pretty similar. Is that useful for you?

elasticsearchmachine · 2022-11-15T19:22:49Z

Pinging @elastic/es-distributed (Team:Distributed)

luyuncheng · 2022-11-16T06:05:18Z

Indexing has a already has a wait_for_refresh option which is pretty similar. Is that useful for you?

@nik9000 thanks for the replying.
i think this is a good way to wait for doc until they refreshed in few write per seconds. But when refresh_inteval = '10s+' AND many writes per seconds, these requests which are waiting for refresh occupied many memory.
i try to use this parameter, as http client is a pipeline model in http1.1, client must wait for the response, so the client and coordianate node GC frequency goes up, and write performance drops down.

meanwhile, i think this parameter can make the refresh api more robust that would not pending 200,000 requests at only 30 shards.

ywangd · 2024-05-31T12:03:56Z

I am closing this as a duplicate of #87936.

luyuncheng added >enhancement needs:triage Requires assignment of a team area label labels Nov 15, 2022

luyuncheng mentioned this issue Nov 15, 2022

Add wait_if_ongoing for refresh API increasing refresh reliability #91578

Open

nik9000 added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Nov 15, 2022

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Nov 15, 2022

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Nov 15, 2022

ywangd closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wait_if_ongoing for refresh API increasing refresh reliability #91579

Add wait_if_ongoing for refresh API increasing refresh reliability #91579

luyuncheng commented Nov 15, 2022 •

edited

nik9000 commented Nov 15, 2022

elasticsearchmachine commented Nov 15, 2022

luyuncheng commented Nov 16, 2022

ywangd commented May 31, 2024

Add wait_if_ongoing for refresh API increasing refresh reliability #91579

Add wait_if_ongoing for refresh API increasing refresh reliability #91579

Comments

luyuncheng commented Nov 15, 2022 • edited

Description

Problem Statement

Proposal

nik9000 commented Nov 15, 2022

elasticsearchmachine commented Nov 15, 2022

luyuncheng commented Nov 16, 2022

ywangd commented May 31, 2024

luyuncheng commented Nov 15, 2022 •

edited