Skip to content

[ML] Bulk request/inference perfomance: don't wait for slowest node #138766

@jan-elastic

Description

@jan-elastic

Description

For bulk index requests with inference, each inference request is randomly sent to an inference allocation. Then the process waits until all inference requests have finished.

In practice this means that at the end of the bulk request, many nodes are idle and waiting for the last node(s) to finish. This is inefficient use of all resources. Due to fluctuations in request count (they're randomly assigned), and in document sizes, there will always be a node taking longer than the others. Note that this problem increases as the cluster size increases, as there are more nodes that can be the slowest, and more nodes are waiting. In case of a truly slow node (e.g. due to hardware issues), the problem becomes even bigger.

It would be better to do something about this, e.g. let the idle nodes pick up work from the slowest node or so. Exact details to be discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learningTeam:MLMeta label for the ML team

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions