[Performance] Batch shard requests lying in the queue #4763

itiyama · 2022-10-13T05:13:23Z

An optimal bulk size is a function of memory, shard count, thread count, number of co-ordinating nodes etc. A lot of these factors change over time for a particular customer and customers may not revisit the bulk size numbers after a certain period of time. e.g. a cluster with 10 nodes and 10 shards may have an optimal 2000 bulk size with 200 as the shard level bulk size. But once the customer scales to 100 nodes and 100 shards- a 2000 bulk size would no longer be optimal due to huge co-ordination overhead and also the shard level bulk size is reduced to just 20- hence more fsync calls. One could argue that the customers themselves could set the bulk size well, but then a higher bulk size would mean that larger requests wait in the coordinator queue and hence increase the memory overhead.

itiyama added enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 13, 2022

mch2 added the distributed framework label Oct 18, 2022

anasalkouz added discuss Issues intended to help drive brainstorming and decision making and removed untriaged labels Oct 18, 2022

adnapibar added the Indexing Indexing, Bulk Indexing and anything related to indexing label Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Batch shard requests lying in the queue #4763

[Performance] Batch shard requests lying in the queue #4763

itiyama commented Oct 13, 2022

[Performance] Batch shard requests lying in the queue #4763

[Performance] Batch shard requests lying in the queue #4763

Comments

itiyama commented Oct 13, 2022