Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` #725

kozistr · 2025-09-20T10:07:03Z

What does this PR do?

Fixes #723
Fixes #694

Changes

Raise an error when max_input_length is bigger than max_batch_tokens and auto-truncate is disabled.
Reduce max_input_length to max_batch_tokens when auto-truncate is enabled.

Feel free to let me know whether this approach would be proper or not 🤗

Log

./target/release/text-embeddings-router --model-id ../Qwen3-Embedding-0.6B/ --pooling last-token --port 8080 --dtype float32 --max-batch-tokens 1024
2025-09-20T10:05:20.198290Z  INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "../Qwe**-*********-0.6B/", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: Some(LastToken), max_concurrent_requests: 512, max_batch_tokens: 1024, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-09-20T10:05:20.516435Z  WARN text_embeddings_router: router/src/lib.rs:191: Could not find a Sentence Transformers config
Error: `max_input_length` must be smaller than `max_batch_tokens` when `auto_truncate` is disabled (32768 > 1024)

./target/release/text-embeddings-router --model-id ../Qwen3-Embedding-0.6B/ --pooling last-token --port 8080 --dtype float32 --max-batch-tokens 1024 --auto-truncate
2025-09-20T09:59:09.902213Z  INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "../Qwe**-*********-0.6B/", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: Some(LastToken), max_concurrent_requests: 512, max_batch_tokens: 1024, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-09-20T09:59:10.231469Z  WARN text_embeddings_router: router/src/lib.rs:191: Could not find a Sentence Transformers config
2025-09-20T09:59:10.231513Z  WARN text_embeddings_router: router/src/lib.rs:205: Reduce `max_input_length` to `max_batch_tokens` (from 32768 to 1024)
2025-09-20T09:59:10.231517Z  INFO text_embeddings_router: router/src/lib.rs:215: Maximum number of tokens per request: 1024
2025-09-20T09:59:10.231673Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
2025-09-20T09:59:10.534633Z  INFO text_embeddings_router: router/src/lib.rs:263: Starting model backend
2025-09-20T09:59:10.539197Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:305: Starting Qwen3 model on Cpu
2025-09-20T09:59:13.086429Z  INFO text_embeddings_router: router/src/lib.rs:281: Warming up model
2025-09-20T09:59:25.175351Z  WARN text_embeddings_router: router/src/lib.rs:290: Backend does not support a batch size > 4
2025-09-20T09:59:25.175381Z  WARN text_embeddings_router: router/src/lib.rs:291: forcing `max_batch_requests=4`
2025-09-20T09:59:25.176762Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:8080
2025-09-20T09:59:25.176786Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready
2025-09-20T10:00:00.953783Z  INFO embed{total_time="11.896093678s" tokenization_time="31.117792ms" queue_time="407.883µs" inference_time="11.864449086s"}: text_embeddings_router::http::server: router/src/http/server.rs:733: Success

curl -vvvv -H "Content-Type: application/json" -d @tei_qwen_embed_0.6b_broken_input.json http://localhost:8080/embed
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /embed HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 55126
> 

* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json
< x-compute-type: gpu+optimized
< x-compute-time: 11896
< x-compute-characters: 55060
< x-compute-tokens: 1024
< x-total-time: 11896
< x-tokenization-time: 31
< x-queue-time: 0
< x-inference-time: 11864
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: *
< content-length: 12759
< date: Sat, 20 Sep 2025 10:00:00 GMT

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil @alvarobartt

…kens

alvarobartt

Thanks @kozistr, I've included some wording suggestions whilst I review the rest and make sure it works as expected! 🤗

router/src/lib.rs

alvarobartt · 2025-09-25T12:16:25Z

P.S. The cargo test are failing with HTTP 401 Unauthorized which is most likely related to the recently included HF_TOKEN required to run EmbeddingGemma tests, but I'll look into that as it's unrelated to the PR per se, apologies for the inconvenience 🤗

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

fix: raise an error when max_input_length is bigger than max_batch to…

5a01a5d

…kens

kozistr changed the title ~~Fix the infinite loop when max_input_length is bigger than max-batch-tokens .~~ Fix the infinite loop when max_input_length is bigger than max-batch-tokens Sep 20, 2025

alvarobartt reviewed Sep 25, 2025

View reviewed changes

router/src/lib.rs Outdated Show resolved Hide resolved

router/src/lib.rs Show resolved Hide resolved

kozistr and others added 2 commits September 26, 2025 01:40

Update router/src/lib.rs

fea4811

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

Update router/src/lib.rs

7629ff1

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

alvarobartt approved these changes Sep 25, 2025

View reviewed changes

alvarobartt merged commit a593f66 into huggingface:main Sep 25, 2025

kozistr deleted the fix/infinite-loop branch September 25, 2025 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` #725

Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` #725

Uh oh!

kozistr commented Sep 20, 2025 •

edited

Loading

Uh oh!

alvarobartt left a comment

Uh oh!

Uh oh!

Uh oh!

alvarobartt commented Sep 25, 2025

Uh oh!

Uh oh!

Fix the infinite loop when max_input_length is bigger than max-batch-tokens #725

Fix the infinite loop when max_input_length is bigger than max-batch-tokens #725

Uh oh!

Conversation

kozistr commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Log

Before submitting

Who can review?

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alvarobartt commented Sep 25, 2025

Uh oh!

Uh oh!

Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` #725

Fix the infinite loop when `max_input_length` is bigger than `max-batch-tokens` #725

kozistr commented Sep 20, 2025 •

edited

Loading