Document Request #1900

oroojlooy · 2024-05-15T14:37:32Z

Feature request

In Text-Generation-Inference (TGI), I see that there is a parameter of --max-batch-total-tokens, indicating that there is a batch request capability available via TGI. But, when I see the API guide, I cannot find anything related to that. For example, for /generate, the input is

{
  "inputs": "My name is Olivier and I",
  "parameters": {
    "best_of": 1,
    "decoder_input_details": false,
    "details": true,
    "do_sample": true,
    "frequency_penalty": 0.1,
    "grammar": null,
    "max_new_tokens": 20,
    "repetition_penalty": 1.03,
    "return_full_text": false,
    "seed": null,
    "stop": [
      "photographer"
    ],
    "temperature": 0.5,
    "top_k": 10,
    "top_n_tokens": 5,
    "top_p": 0.95,
    "truncate": null,
    "typical_p": 0.95,
    "watermark": true
  }
}

which cannot handle batch requests. I was wondering if there is a batch-request support, and if so, it would be great to add some API guideline/documentation for that.

Motivation

N/A

Your contribution

N/A

The text was updated successfully, but these errors were encountered:

ktrapeznikov · 2024-05-15T22:54:19Z

you can just send multiple simultaneous requests and the TGI will automatically batch them up

oroojlooy · 2024-05-16T13:19:00Z

you can just send multiple simultaneous requests and the TGI will automatically batch them up

Oh OK. That makes sense.

oroojlooy closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Request #1900

Document Request #1900

oroojlooy commented May 15, 2024 •

edited

Loading

ktrapeznikov commented May 15, 2024

oroojlooy commented May 16, 2024

Document Request #1900

Document Request #1900

Comments

oroojlooy commented May 15, 2024 • edited Loading

Feature request

Motivation

Your contribution

ktrapeznikov commented May 15, 2024

oroojlooy commented May 16, 2024

oroojlooy commented May 15, 2024 •

edited

Loading