Too many model backend threads destroy performance when running on CPU #405

askervin · 2024-09-11T11:46:37Z

System Info

text-embeddings-interface:cpu-1.5

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Run a container using the text-embeddings-interface:cpu-1.5 image so that cpuset.cpus is limited in cgroups. This can be done using docker --cpuset-cpus ... or Kubernetes NRI resource policies or CPU manager.

For instance, in system with 128 vCPU / 64 physical CPU cores, the output of text-generation-router shows:
(Following clip is from the ChatQnA example application, kubectl logs chatqna-teirerank-...)

2024-09-09T11:54:19.994401Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "chatqna-teirerank-7fd4d88d85-z2nzh", port: 2082, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }

---8<--- snip --->8---

2024-09-09T11:54:34.747212Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-09-09T11:54:34.758273Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 80, index: 0, mask: {1, 65, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758288Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 84, index: 4, mask: {5, 69, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758307Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 81, index: 1, mask: {2, 66, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758353Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 83, index: 3, mask: {4, 68, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758355Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 82, index: 2, mask: {3, 67, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-09-09T11:54:34.758391Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/ort-2.0.0-rc.2/src/environment.rs:266: pthread_setaffinity_np failed for thread: 85, index: 5, mask: {6, 70, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
...

That is, the model backend launches a wrong number of threads and tries to set CPU affinity of each thread to CPUs that are not allowed for this container.

Expected behavior

The model backend should align the number of threads with the number of CPUs available for it, and it should set CPU affinity of its threads only on available CPUs.

The text was updated successfully, but these errors were encountered:

askervin mentioned this issue Sep 11, 2024

[Bug] chatqna: xeon pipeline fails (serious performance drop) when CPU affinity of tei and teirerank containers is managed opea-project/GenAIExamples#763

Open

6 tasks

eero-t mentioned this issue Sep 11, 2024

Too many router/tokenizer threads #404

Closed

4 tasks

OlivierDehaene mentioned this issue Sep 17, 2024

fix: use num_cpus::get to check as get_physical does not check cgroups #410

Merged

OlivierDehaene closed this as completed in #410 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many model backend threads destroy performance when running on CPU #405

Too many model backend threads destroy performance when running on CPU #405

askervin commented Sep 11, 2024

Too many model backend threads destroy performance when running on CPU #405

Too many model backend threads destroy performance when running on CPU #405

Comments

askervin commented Sep 11, 2024

System Info

Information

Tasks

Reproduction

Expected behavior