Infinite Loop When max-batch-tokens < model max_input_length

### System Info

cargo 1.85.1 (d73d2caf9 2024-12-31)

```
{
  "model_id": "Qwen/Qwen3-Embedding-0.6B",
  "model_sha": null,
  "model_dtype": "float32",
  "model_type": {
    "embedding": {
      "pooling": "last_token"
    }
  },
  "max_concurrent_requests": 512,
  "max_input_length": 32768,
  "max_batch_tokens": 1000,
  "max_batch_requests": 4,
  "max_client_batch_size": 32,
  "auto_truncate": false,
  "tokenization_workers": 32,
  "version": "1.8.2",
  "sha": "ff3969a9e55405dda42b6dd167bd0c5c6900c2b0",
  "docker_label": null
}
```

Linux hostname 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Hardware: cpu only run

### Information

- [ ] Docker
- [x] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

1. `cargo run --release --features candle,ort,http --no-default-features -- --model-id Qwen/Qwen3-Embedding-0.6B --max-batch-tokens 1000`
2. `curl -vvvv -H "Content-Type: application/json" -d @tei_qwen_embed_0.6b_broken_input.json http://localhost:3000/embed`

[tei_qwen_embed_0.6b_broken_input.json](https://github.com/user-attachments/files/22400450/tei_qwen_embed_0.6b_broken_input.json)

 - from [issue 694](https://github.com/huggingface/text-embeddings-inference/issues/694)

3. Request will never end, infer actually not started (0% of cpu usage)


Source code found:

https://github.com/huggingface/text-embeddings-inference/blob/ff3969a9e55405dda42b6dd167bd0c5c6900c2b0/core/src/queue.rs#L155


### Expected behavior

Depends on configuration. If Auto truncate is set - truncate to `Min(max_batch_tokens, max_input_tokens)`. If not - reply with and Error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infinite Loop When max-batch-tokens < model max_input_length #723

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Infinite Loop When max-batch-tokens < model max_input_length #723

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions