[ML] Bulk request/inference perfomance: don't wait for slowest node

### Description

For bulk index requests with inference, each inference request is randomly sent to an inference allocation. Then the process waits until all inference requests have finished.

In practice this means that at the end of the bulk request, many nodes are idle and waiting for the last node(s) to finish. This is inefficient use of all resources. Due to fluctuations in request count (they're randomly assigned), and in document sizes, there will always be a node taking longer than the others. Note that this problem increases as the cluster size increases, as there are more nodes that can be the slowest, and more nodes are waiting. In case of a truly slow node (e.g. due to hardware issues), the problem becomes even bigger.

It would be better to do something about this, e.g. let the idle nodes pick up work from the slowest node or so. Exact details to be discussed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Bulk request/inference perfomance: don't wait for slowest node #138766

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Bulk request/inference perfomance: don't wait for slowest node #138766

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions