You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run a container using the text-embeddings-interface:cpu-1.5 image so that cpuset.cpus is limited in cgroups. This can be done using docker --cpuset-cpus ... or Kubernetes NRI resource policies or CPU manager.
For instance, in system with 128 vCPU / 64 physical CPU cores, the output of text-generation-router shows:
(Following clip is from the ChatQnA example application, kubectl logs chatqna-teirerank-...)
2024-09-09T11:54:19.994401Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "chatqna-teirerank-7fd4d88d85-z2nzh", port: 2082, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
---8<--- snip --->8---
2024-09-09T11:54:34.747212Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-09T11:54:34.758273Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 80, index: 0, mask: {1, 65, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758288Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 84, index: 4, mask: {5, 69, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758307Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 81, index: 1, mask: {2, 66, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758353Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 83, index: 3, mask: {4, 68, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758355Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 82, index: 2, mask: {3, 67, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758391Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 85, index: 5, mask: {6, 70, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
...
That is, the model backend launches a wrong number of threads and tries to set CPU affinity of each thread to CPUs that are not allowed for this container.
Expected behavior
The model backend should align the number of threads with the number of CPUs available for it, and it should set CPU affinity of its threads only on available CPUs.
The text was updated successfully, but these errors were encountered:
System Info
text-embeddings-interface:cpu-1.5
Information
Tasks
Reproduction
Run a container using the
text-embeddings-interface:cpu-1.5
image so thatcpuset.cpus
is limited in cgroups. This can be done usingdocker --cpuset-cpus ...
or Kubernetes NRI resource policies or CPU manager.For instance, in system with 128 vCPU / 64 physical CPU cores, the output of
text-generation-router
shows:(Following clip is from the ChatQnA example application, kubectl logs chatqna-teirerank-...)
That is, the model backend launches a wrong number of threads and tries to set CPU affinity of each thread to CPUs that are not allowed for this container.
Expected behavior
The model backend should align the number of threads with the number of CPUs available for it, and it should set CPU affinity of its threads only on available CPUs.
The text was updated successfully, but these errors were encountered: