Skip to content

[INFERENCE] Batch size for chunked text should be dynamically calculated from the chunk size #135015

@davidkyle

Description

@davidkyle

Elasticsearch Version

9.1

Installed Plugins

No response

Java Version

bundled

OS Version

Any

Problem Description

Chunked inputs from semantic_text are automatically batched into a single request. If a bulk ingest request contains multiple semantic_text fields they will be batched together up to a certain batch size. The OpenAI embeddings API has a max batch size of 2048 inputs and 2048 is the value used to control the batch size in the OpenAI integration.

The OpenAI embeddings API is also limited to 300,000 tokens per request if the request contains 2048 inputs then that equals 146 tokens per input (300,000 / 2048) which is a small doc.

The max number of items in a single embedding request needs to respect the 300,000 tokens limit. In practice this means that a batch size of 2048 will rarely be appropriate and the chunk size should be taken into consideration.

Steps to Reproduce

Create an index with a semantic text field and bulk upload 2000 long documents.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learning>bugTeam:MLMeta label for the ML team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions